Mapping the Intestinal Social Network using Data
Download the full research paper (PDF)
This is a friendly walk through of how linear algebra, random walks and consensus statistics uncover who talks to whom inside the mouse gut. Thanks to all of Group 19 for the joint-effort on this!
1 Understanding the Bio 
Stem cells live at the bottom of each intestinal crypt, crank out transient‑amplifying daughters, and those daughters sprint up the villus to become one of several mature cell types (enterocytes, goblets, Paneth, Tuft, …).
Biologists know that story; here we’ll focus on how the data science works behind the scenes.
Our raw material: a single‑cell RNA‑seq cohort (≈ 5 000 cells, ~24 000 genes) capturing the full crypt‑to‑villus journey.
2 From a 24000 Dimension gene space to a colourful UMAP 
2.1 k‑NN + Leiden ≥ k‑means
- Dimensionality crunch. PCA keeps the top d = 50 principal components.
- Graph build. For each cell i, find its k=15 nearest neighbours under Euclidean distance in PC‑space.
- Edge weights. Use the Jaccard index so shared neighbourhoods matter more than raw distance.
- Leiden partitions the graph by maximising modularity
with an extra “refinement” phase that guarantees well‑connected clusters.
Result: nine clearly separated communities.
2.2 First‑pass labels 
Marker genes (e.g. Lgr5, Muc2, Defa) give us a draft taxonomy: ISC, TA, Paneth, Goblet, Enteroendocrine (EEC), Tuft, …
2.3 Tightening the taxonomy 
A second Leiden round on each broad group + differential‑expression tests cleans up stragglers — leaving a high‑confidence cast of seven major players.
3 A summary of ligand–receptor maths 
Instead of betting on a single L‑R scoring scheme, LIANA pipes our clustered counts into six independent algorithms (CellPhoneDB, NATMI, logFC, …) and spits out a consensus z‑score per sender/receiver pair.
Mathematically it’s just:
\[z_{sr}=\frac{1}{M}\sum_{m=1}^{M}\frac{s_{sr}^{(m)}-\mu_m}{\sigma_m}\]— i.e. average the method‑specific z‑scores so no method can single‑handedly skew the ranking.
4 Static interaction analysis 
Rows = senders, columns = receivers, colour = mean LIANA z‑score (blue → positive, yellow → negative).
- Observations worth flagging:
- EECs broadcast like crazy — they top almost every column.
- Paneth cells are selective but punchy (strong blue into ISCs, neutral elsewhere).
- Self‑loops (diagonal) aren’t always dominant; Goblets talk more to Paneth than to themselves.
How is the heatmap built? Just stack all individual L‑R pairs, compute the sender‑receiver‑averaged z‑score, then visualise the matrix.
5 Random walks through pseudotime
5.1 Diffusion pseudotime summary
- Construct the diffusion kernel $K=\exp(-D^2/\sigma)$ from the PC‑space distances $D$.
- Normalise rows → Markov matrix $P$.
- Pseudotime of cell i is the diffusive distance to the root:
where $\lambda_l,\psi_l$ are eigen‑pairs of $P$ and $u$ is the stationary vector.
That score is then binned into ten equal‑width intervals Bin 0…9.
5.2 Heatmaps per bin 
Each 3 × 3 panel speaks a short sentence:
“Who’s signalling right now?”
Early bins are almost silent. Starting Bin 6, Paneth → Paneth and Goblet → Goblet squares glow blue, hinting at lineage‑restricted chatter as cells mature.
5.3 Quantifying the trends 
For every sender/receiver combo we regress raw LIANA score against bin index and keep the ten steepest slopes.
Take‑away maths:
- Slope $>0$ ⇒ interaction ramps up along differentiation;
- Slope $<0$ ⇒ ramps down;
- $R^2$ values hover 0.6‑0.9, so the linear fit is not shabby.
Paneth → Paneth tops the chart (slope ≈ 8 × 10⁻³), reinforcing the idea of a self‑reinforcing niche late in crypt life.
6 Wet‑lab validation
7 Conclusion — the maths is portable
- Graph theory (Leiden, k‑NN) finds the cast.
- Spectral kernels (Diffusion map) supply a clock.
- Markov chains (random walks on the interaction graph) estimate fate probabilities — we skipped the CellRank details here, but it’s essentially solving $(I-Q)^{-1}R$.
- Ensemble statistics (LIANA consensus) damp idiosyncratic bias.
Swap out mouse gut for tumour micro‑environment or developing retina and the pipeline survives intact.
For further details, see the attached PDF or contact the author for discussion.