This is the multi-page printable view of this section. Click here to print.
Graphical models
1 - Gene regulatory networks
Classic papers
J Zhu et al. An integrative genomics approach to the reconstruction of gene networks in segregating populations. Cytogenet Genome Res 105:363–374 (2004)
J Faith et al. Large-Scale Mapping and Validation of E. coli Transcriptional Regulation from a Compendium of Expression Profiles. PLOS Biol 5:e8 (2007).
V Huynh-Thu et al. Inferring Regulatory Networks from Expression Data Using Tree-Based Methods. PLOS One 5:e12776 (2010).
See also a recent perspective on network inference in Nature: Smart software untangles gene regulation in cells
An integrative genomics approach to the reconstruction of gene networks in segregating populations
Figure 1
Hierarchical clustering of the data set in the gene expression and eQTL dimensions
Figure obtained from full text on EuropePMC.
Figure 3
Sub-networks associated with Hsd11b1
Figure obtained from full text on EuropePMC.
Large-Scale Mapping and Validation of E. coli Transcriptional Regulation from a Compendium of Expression Profiles
Software
Figure 1
Overview of the approach
Figure obtained from full text on EuropePMC.
Figure 2
The CLR algorithm
Figure obtained from full text on EuropePMC.
Figure 5
Experimental Validation of Inferred Regulatory Interactions
Figure obtained from full text on EuropePMC.
Inferring Regulatory Networks from Expression Data Using Tree-Based Methods
Software
Figure 1
GENIE3 procedure
Figure obtained from full text on EuropePMC.
Figure 4
Precision-Recall curves for the E. coli network
Figure obtained from full text on EuropePMC.
2 - Bayesian networks
Reference
Friedman, Inferring Cellular Networks Using Probabilistic Graphical Models
Christopher Bishop. Pattern Recognition and Machine Learning (2006). Chapter 8
Inferring Cellular Networks Using Probabilistic Graphical Models
Figure 1
Bayesian networks vs Markov networks
Figure obtained from full text on EuropePMC.
Figure 3
Different regulatory network architectures
Figure obtained from full text on EuropePMC.
A crash course in Bayesian networks
In Bayesian networks, the joint distribution over a set ${X_1,\dots, X_p}$ of random variables is represented by:
- a directed acyclic graph (DAG) $\cal G$ with the variables as vertices,
- a set of conditional probability distributions, $$\begin{aligned} P\bigl(X_i \mid {X_j \mid j\in \mathrm{Pa}_i}\bigr), \end{aligned}$$ where $\mathrm{Pa}_i$ is the set of parents of vertex $i$ in $\mathcal{G}$,
such that $$\begin{aligned} P(X_1,\dots,X_p) = \prod_{i=1}^p P\bigl(X_i \mid {X_j \mid j\in \mathrm{Pa}_i}\bigr) \end{aligned}$$
In GRN reconstruction:
- $X_i$ represents the expression level of gene $i$,
- $P(X_1,\dots,X_p)$ represents the joint distribution from which (independent) experimental samples are drawn,
- $\mathcal{G}$ represents the unknown GRN.
Assume we have a set of $N$ observations $x_1,\dots,x_N\in \mathbb{R}^p$ of $p$ variables, collected in the $N\times p$ matrix $\mathbf{X}$.
Assume we know $\mathcal{G}$ and the conditional distributions. Then the likelihood of observing the data is:
$$\begin{aligned} P(\mathbf{X}\mid \mathcal{G}) &= \prod_{k=1}^N P(X_{k1},\dots,X_{kp}\mid \mathcal{G})\\ &= \prod_{i=1}^p \prod_{k=1}^N P\bigl(X_{ki} \mid {X_{kj} \mid j\in \mathrm{Pa}_i}\bigr) \end{aligned}$$
We can now use an iterative algorithm to optimize $\mathcal{G}$ and the conditional distributions:
- Start with a random graph $\mathcal{G}$.
- Given $\mathcal{G}$, the likelihood decomposes in a product of independent likelihoods, one for each gene, and the conditional distributions can be optimized by standard regression analysis.
- In the next iterations, randomly add, delete, or reverse edges in $\mathcal{G}$, as long as the likelihood improves.
A more formal approach to optimizing $\mathcal{G}$ uses Bayes’ theorem: $$\begin{aligned} P(\mathcal{G}\mid \mathbf{X}) = \frac{P(\mathbf{X}\mid \mathcal{G}) P(\mathcal{G})}{P(\mathbf{X})} \end{aligned}$$
- $P(\mathcal{G})$ represents the prior distribution: even without seeing any data, not all graphs need to be equally likely a priori.
- $P(\mathbf{X})$ represents the marginal distribution: $P(\mathbf{X}) = \sum_{\mathcal{G}’} P(\mathbf{X}\mid \mathcal{G}’) P(\mathcal{G}’)$. It is independent of $\mathcal{G}$ and can be ignored.
We can use $P(\mathcal{G})$ to encode evidence for causal interactions from integrating genomics and transcriptomics data. This is the main idea from Zhu et al. (2004).
Assignment
Assignment
3 - Tutorial
Tutorials are available as Pluto notebooks. To run the reactive notebooks, you need to first follow the instructions in the “Code” section of the repository’s readme file, then follow the instruction at the bottom of the Pluto homepage.
- E. coli GRN reconstruction: static html file or reactive notebook.