Anndata
Anndata is a Python package for handling annotated data matrices in memory and on disk (github.com/theislab/anndata), positioned between pandas and xarray.
Anndata object
What's its structure?
The AnnData object is a collection of arrays aligned to the common dimensions of observations (obs) and variables (var).
As we can see, color is used to denote elements of the object, with "orange" colors selected for elements aligned to the observations and "blue" colors for elements aligned to variables. The object is centered around the main data matrix X, whose two dimensions correspond to observations and variables respectively. Primary labels for each of these dimensions are stored as obs_names and var_names. If needed, layers stores matrices of the exact same shape as X. One-dimensional annotations for each dimension are stored in pandas DataFrames obs and var. Multi-dimensional annotations are stored in obsm and varm. Pairwise relationships are stored in obsp and varp. Unstructured data which doesn’t fit this model, but should stay associated to the dataset are stored in
How to transfer a dataframe to anndata object
Let's see an example.
This is a gene expression matrix, and I want to transfer it to an anndata object for analysis.
The first step is to read the dataframe file.
1 | df = pd.read_csv("./Rep11_MOB_count_matrix-1.tsv",sep="\t",index_col=0) |
In an anndata object, obs and var are data frames. So, in the second step, we try to construct this elements.
1 | obs = pd.DataFrame() |
1 | spa = pd.DataFrame(index=df.columns) |