This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Graphical models

Gene regulatory networks. Bayesian networks. Other network inference methods.

1: Gene regulatory networks
2: Bayesian networks
3: Tutorial

1 - Gene regulatory networks

Gene regulatory networks.

Classic papers

J Zhu et al. An integrative genomics approach to the reconstruction of gene networks in segregating populations. Cytogenet Genome Res 105:363–374 (2004)

J Faith et al. Large-Scale Mapping and Validation of E. coli Transcriptional Regulation from a Compendium of Expression Profiles. PLOS Biol 5:e8 (2007).

V Huynh-Thu et al. Inferring Regulatory Networks from Expression Data Using Tree-Based Methods. PLOS One 5:e12776 (2010).

See also a recent perspective on network inference in Nature: Smart software untangles gene regulation in cells

An integrative genomics approach to the reconstruction of gene networks in segregating populations

Figure 1

Hierarchical clustering of the data set in the gene expression and eQTL dimensions

Figure obtained from full text on EuropePMC.

Figure 3

Sub-networks associated with Hsd11b1

Figure obtained from full text on EuropePMC.

Large-Scale Mapping and Validation of E. coli Transcriptional Regulation from a Compendium of Expression Profiles

Software

Figure 1

Overview of the approach

Figure obtained from full text on EuropePMC.

Figure 2

The CLR algorithm

Figure obtained from full text on EuropePMC.

Figure 5

Experimental Validation of Inferred Regulatory Interactions

The Transcriptional Regulatory Map Inferred by CLR with an Estimated 60% Precision

Figure obtained from full text on EuropePMC.

Inferring Regulatory Networks from Expression Data Using Tree-Based Methods

Software

Genie3 (Python, Matlab, R)

Figure 1

GENIE3 procedure

Figure obtained from full text on EuropePMC.

Figure 4

Precision-Recall curves for the E. coli network

Figure obtained from full text on EuropePMC.

2 - Bayesian networks

Bayesian networks.

Reference

Friedman, Inferring Cellular Networks Using Probabilistic Graphical Models

Christopher Bishop. Pattern Recognition and Machine Learning (2006). Chapter 8

Inferring Cellular Networks Using Probabilistic Graphical Models

Figure 1

Bayesian networks vs Markov networks

Figure obtained from full text on EuropePMC.

Figure 3

Different regulatory network architectures

Figure obtained from full text on EuropePMC.

A crash course in Bayesian networks

In Bayesian networks, the joint distribution over a set ${X_1,\dots, X_p}$ of random variables is represented by:

a directed acyclic graph (DAG) $\cal G$ with the variables as vertices,
a set of conditional probability distributions, $$\begin{aligned} P\bigl(X_i \mid {X_j \mid j\in \mathrm{Pa}_i}\bigr), \end{aligned}$$ where $\mathrm{Pa}_i$ is the set of parents of vertex $i$ in $\mathcal{G}$,

such that $$\begin{aligned} P(X_1,\dots,X_p) = \prod_{i=1}^p P\bigl(X_i \mid {X_j \mid j\in \mathrm{Pa}_i}\bigr) \end{aligned}$$

In GRN reconstruction:

$X_i$ represents the expression level of gene $i$,
$P(X_1,\dots,X_p)$ represents the joint distribution from which (independent) experimental samples are drawn,
$\mathcal{G}$ represents the unknown GRN.

Assume we have a set of $N$ observations $x_1,\dots,x_N\in \mathbb{R}^p$ of $p$ variables, collected in the $N\times p$ matrix $\mathbf{X}$.

Assume we know $\mathcal{G}$ and the conditional distributions. Then the likelihood of observing the data is:

$$\begin{aligned} P(\mathbf{X}\mid \mathcal{G}) &= \prod_{k=1}^N P(X_{k1},\dots,X_{kp}\mid \mathcal{G})\\ &= \prod_{i=1}^p \prod_{k=1}^N P\bigl(X_{ki} \mid {X_{kj} \mid j\in \mathrm{Pa}_i}\bigr) \end{aligned}$$

We can now use an iterative algorithm to optimize $\mathcal{G}$ and the conditional distributions:

Start with a random graph $\mathcal{G}$.
Given $\mathcal{G}$, the likelihood decomposes in a product of independent likelihoods, one for each gene, and the conditional distributions can be optimized by standard regression analysis.
In the next iterations, randomly add, delete, or reverse edges in $\mathcal{G}$, as long as the likelihood improves.

A more formal approach to optimizing $\mathcal{G}$ uses Bayes’ theorem: $$\begin{aligned} P(\mathcal{G}\mid \mathbf{X}) = \frac{P(\mathbf{X}\mid \mathcal{G}) P(\mathcal{G})}{P(\mathbf{X})} \end{aligned}$$

$P(\mathcal{G})$ represents the prior distribution: even without seeing any data, not all graphs need to be equally likely a priori.
$P(\mathbf{X})$ represents the marginal distribution: $P(\mathbf{X}) = \sum_{\mathcal{G}’} P(\mathbf{X}\mid \mathcal{G}’) P(\mathcal{G}’)$. It is independent of $\mathcal{G}$ and can be ignored.

We can use $P(\mathcal{G})$ to encode evidence for causal interactions from integrating genomics and transcriptomics data. This is the main idea from Zhu et al. (2004).

Assignment

3 - Tutorial

Illustration of gene regulatory network reconstruction and evaluation using data from E. coli.

Tutorials are available as Pluto notebooks. To run the reactive notebooks, you need to first follow the instructions in the “Code” section of the repository’s readme file, then follow the instruction at the bottom of the Pluto homepage.

E. coli GRN reconstruction: static html file or reactive notebook.