GO enrichment
GO enrichment analysis is a prevalent method to characterize genesets' profiles.
What does it mean?
GO is the gene ontology, a structured, controlled vocabulary for classifying gene function at the molecular and cellular levels. It is divided into three separate sub-ontologies: biological process (e.g., signal transduction), molecular function (e.g., ATPase activity), and cellular component (e.g., ribosome)[^1].
How to calculate GO term scores?
To Calculate p-values through a hypergeometric distribution test.
We have a geneset containing n and k genes in one specific GO term. All GO terms have N genes. For one particular GO term, there have K genes.
The hypergeometric distribution measures precisely the probability of k successes in n draws, without replacement, from a finite population of size N that contains exactly K successful objects:
\(\mathrm{P}(\mathrm{X}=k)=\frac{\left(\begin{array}{l}K \\ k\end{array}\right)\left(\begin{array}{l}N-n \\ K-k\end{array}\right)}{\left(\begin{array}{l}N \\ n\end{array}\right)}\)
\(\mathrm{p-value}=1-\mathrm{P}(\mathrm{X}=k)\)
Reference
[1] https://training.galaxyproject.org/training-material/topics/transcriptomics/tutorials/goenrichment/tutorial.html