Test statistic 5: Do genes a and b interact when expressed by cell types t and u, respectively?

Test statistic 5: Do genes a and b interact when expressed by cell types t and u, respectively?#

To go further and identify the cell types that exchange the most relevant metabolites (or ligand-receptor pathways) in the metabolic regions of interest, a cell type-aware approach has also been developed. The mathematical representation of this test statistic, which is used to quantify the communication strength between a given pair of cell types, is as follows:

\[ H_{ab}^{t,u} = \sum_{i \in C_t}^{}\sum_{j \in C_u}^{} w_{ij}X_{ai}X_{bj} \]

where i and j, in addition to being two different cells, belong to different cell types t and u. \(C_t\) and \(C_u\) refer to the set of cells that belong to cell types t and u, respectively. Further, a and b are two different genes expressed by cells i and j, respectively, and X refers to the gene expression matrix of dimension genes x cells. Further, the expression of the same gene (\(a = b\)) between two different cells must also be considered, i.e., when inferring metabolic crosstalk between cells that express the same transporter.

The weight \(w_{ij}\) represents communication strength between neighboring cells, and it is defined in the same way as in Test statistic 1.

Weights are assigned using a spatial proximity graph, such that \(w_{ij}\) is only non-zero if cells i and j are neighbors and there are no self-edges. This last statement is not true when dealing with deconvolved spot-based spatial data, where interactions between different cell types present in the same spot could be considered. In that case, though, each cell type inferred in each spot using spatial deconvolution methods is treated as a separate node in the graph, where instead of assigning a distance equal to 0, the assigned distance between cell types within the same spot is \(\frac{d}{2}\), with d being the spot diameter. For this, DestVI (Lopez et al., Nature biotechnology, 2022) or cell2location (Kleshchevnikov et al., Nature biotechnology, 2022) can be used to estimate the cell-type abundance in each spot as well as the cell-type-specific gene expression values. As a result, the double summation can be re-expressed as a sum over edges E, which results in the following sparse graph:

\[\begin{split} H_{ab}^{t,u} = \sum_{\substack{ (i, j) \in E \\ i \in C_t, j \in C_u }}^{} w_{ij}X_{ai}X_{bj} \end{split}\]

To test significance and evaluate expectations for H, a null model is needed. For this, an empirical test has been implemented, where the shuffling procedure varies for each one of the 3 different null models:

  • (1) Given the spatial co-localization of cell types t and u, which gene pairs are significantly co-expressed by cell types t and u, respectively? The null hypothesis is as follows: the observed co-expression of gene pair \((a,b)\) across cell types t and u is no stronger than expected by chance, given the spatial co-localization of cell types t and u. Therefore, gene pair expression counts within their respective cell types are shuffled.

  • (2) Given the spatial autocorrelation of a given gene pair \((a, b)\) regardless of cell type, which cell types explain the observed co-localization? The null hypothesis is as follows: the observed co-expression of genes a and b is not enriched in any specific cell type pair, that is, it is random with respect to which cell types express them. In this case, cell type labels are shuffled.

  • (3) Given a fixed cell type (e.g., stem cells), we test interactions with other cell types. The null hypothesis is as follows: the observed spatial co-expression between gene a in a cell type of interest and gene b in another cell type u is no stronger than expected if gene b’s expression were random in cell type u. Here, we fix the expression of gene a in cell type t and shuffle the expression of gene b in cell type u.

Then, the correlation values for each cell type pair and gene pair according to the \(H_{ab}^{t,u}\) equation are computed in each iteration, and \(p-value = \frac{x+1}{M+1}\) is used to calculate the p-value. P-values are finally adjusted using the Benjamini-Hochberg approach.