Reproduce the heatmap from inferCNV

Reproduce the heatmap from inferCNV#

This document demonstrates to reproduce how the example heatmap from the original R inverCNV implementation. It is based on a small, 183-cell example dataset of malignant and non-malignant cells from Oligodendroglioma derived from Tirosh et al. (2016).

import infercnvpy as cnv
import scanpy as sc

import warnings

warnings.simplefilter("ignore")

Prepare and inspect dataset#

The example dataset is available in the datasets module. It is already TPM-normalized, but not log-transformed.

adata = cnv.datasets.oligodendroglioma()
sc.pp.log1p(adata)

It also already has the genomic positions annotated inadata.var:

adata.var.head()
chromosome start end n_counts n_cells
gene_symbol
WASH7P chr1 14363 29806 534.271301 116
LINC00115 chr1 761586 762902 203.649094 18
NOC2L chr1 879584 894689 880.449707 63
SDF4 chr1 1152288 1167411 1013.251404 70
UBE2J2 chr1 1189289 1209265 411.488953 33

It contains four types of malignant cells, and two clusters of non-malignant cells.

sc.tl.pca(adata)
sc.pp.neighbors(adata)
sc.tl.umap(adata)
sc.pl.umap(adata, color="cell_type")
../_images/4b82930c2573f206f71488303b1d098bec1ece84db077de238525e4a25e4698a.png

Run infercnvpy#

In this case we know which cells are non-malignant. For best results, it is recommended to use the non-malignant cells as a background. We can provide this information using reference_key and reference_cat.

In order to reproduce the results as exactely as possible, we use a window_size of 100 and a step of 1.

%%time
cnv.tl.infercnv(
    adata,
    reference_key="cell_type",
    reference_cat=["Oligodendrocytes (non-malignant)", "Microglia/Macrophage"],
    window_size=100,
    step=1,
)
CPU times: user 85 ms, sys: 169 ms, total: 254 ms
Wall time: 462 ms
%%time
cnv.pl.chromosome_heatmap(adata, groupby="cell_type", dendrogram=True)
WARNING: dendrogram data not found (using key=dendrogram_cell_type). Running `sc.tl.dendrogram` with default parameters. For fine tuning it is recommended to run `sc.tl.dendrogram` independently.
WARNING: You’re trying to run this on 9013 dimensions of `.X`, if you really want this, set `use_rep='X'`.
         Falling back to preprocessing with `sc.pp.pca` and default params.
WARNING: Groups are not reordered because the `groupby` categories and the `var_group_labels` are different.
categories: Microglia/Macrophage, Oligodendrocytes (non-malignant), malignant_93, etc.
var_group_labels: chr1, chr2, chr3, etc.
CPU times: user 3.1 s, sys: 6.1 s, total: 9.2 s
Wall time: 1.52 s
../_images/2f973c491a257ea1e19e47e7755e2d74101057f96a343db547d8815a92e9471e.png

Note that running the same analysis in R (invercnv v1.6.0 from Bioconductor) takes about 1:30 min.