Recently, I read a paper named Accurate inference of genome-wide spatial expression with iSpatial, from which I learned a lot.

iSpatial is used to infer expression patterns of all genes at high spatial resolution by integrating scRNA-seq and ST data. In the process of single-cell RNA sequencing, it can capture more genes as possible it can. In this paper, the authors try to recover all genes' spatial information of the spatial transcriptomics data based on scRNA.

How do they infer spatial information?

In the first step, it needs to integrate scRNA and spatial transcriptomics datasets. The core function is a weighted KNN graph. For each cell t in spatial transcriptome data, iSpatial searches the KNNs (KNNt,k).

Each gene's expression of cell t was calculated by the following formula:

\(K N N_{t, k}:\left\{K N N_{t, 1}, K N N_{t, 2}, \ldots, K N N_{t, k}\right\}\)

β is an argument to balance the expression from the spatial transcriptome and scRNA-seq. ω is the weight factor that shows the distance between cell t and other KNN neighbor cells. Through the below formula, we can see the most similar neighbor cell, its weight factor is smaller.

\(\widehat{E_t}=(1-\beta) E_t+\beta\left(\sum_k \omega_{t, k} E_{\mathrm{KNN}_{t, k}^{\prime}}\right)\)

\(\operatorname{dist}\left(E_t, E_{\mathrm{KNN}_{t, k}^t}\right)=1-\operatorname{cor}\left(E_t, E_{\mathrm{KNN}_{t, k}^{\prime}}\right)\)

\(d_{t, k}=\operatorname{dist}\left(E_t, E_{\mathrm{KNN}_{t, k}}\right)\)

\(\omega_{t, k}=\frac{d_{t, k}^2}{\sum_k d_{t, k}^2}\)

But I think this method requires high quality of fitness between scRNA and spatial transcriptomics datasets.

And, the paper has developed a way to identify spatial variable genes(SVGs). First, it divides the two-dimensional space into n x n grids and calculates the gene mean expression level. Second, it randomly samples the spatial location and calculates the mean gene expression level once more. Finally, to compare whether there have significant differences between the two values.