GO enrichment analysis is a prevalent method to characterize genesets' profiles.

What does it mean?

GO is the gene ontology, a structured, controlled vocabulary for classifying gene function at the molecular and cellular levels. It is divided into three separate sub-ontologies: biological process (e.g., signal transduction), molecular function (e.g., ATPase activity), and cellular component (e.g., ribosome)[^1].

How to calculate GO term scores?

To Calculate p-values through a hypergeometric distribution test.

We have a geneset containing n and k genes in one specific GO term. All GO terms have N genes. For one particular GO term, there have K genes.

The hypergeometric distribution measures precisely the probability of k successes in n draws, without replacement, from a finite population of size N that contains exactly K successful objects:

\(\mathrm{P}(\mathrm{X}=k)=\frac{\left(\begin{array}{l}K \\ k\end{array}\right)\left(\begin{array}{l}N-n \\ K-k\end{array}\right)}{\left(\begin{array}{l}N \\ n\end{array}\right)}\)

\(\mathrm{p-value}=1-\mathrm{P}(\mathrm{X}=k)\)

Reference

[1] https://training.galaxyproject.org/training-material/topics/transcriptomics/tutorials/goenrichment/tutorial.html