Reproduce the heatmap from inferCNV#
This document demonstrates to reproduce how the example heatmap from the original R inverCNV implementation. It is based on a small, 183-cell example dataset of malignant and non-malignant cells from Oligodendroglioma derived from Tirosh et al. (2016).
import infercnvpy as cnv
import scanpy as sc
import warnings
warnings.simplefilter("ignore")
Prepare and inspect dataset#
The example dataset is available in the datasets
module. It is already TPM-normalized, but not log-transformed.
adata = cnv.datasets.oligodendroglioma()
sc.pp.log1p(adata)
It also already has the genomic positions annotated inadata.var
:
adata.var.head()
chromosome | start | end | n_counts | n_cells | |
---|---|---|---|---|---|
gene_symbol | |||||
WASH7P | chr1 | 14363 | 29806 | 534.271301 | 116 |
LINC00115 | chr1 | 761586 | 762902 | 203.649094 | 18 |
NOC2L | chr1 | 879584 | 894689 | 880.449707 | 63 |
SDF4 | chr1 | 1152288 | 1167411 | 1013.251404 | 70 |
UBE2J2 | chr1 | 1189289 | 1209265 | 411.488953 | 33 |
It contains four types of malignant cells, and two clusters of non-malignant cells.
sc.tl.pca(adata)
sc.pp.neighbors(adata)
sc.tl.umap(adata)
sc.pl.umap(adata, color="cell_type")
Run infercnvpy#
In this case we know which cells are non-malignant. For best results, it is recommended to use
the non-malignant cells as a background. We can provide this information using reference_key
and reference_cat
.
In order to reproduce the results as exactely as possible, we use a window_size
of 100 and a step
of 1.
%%time
cnv.tl.infercnv(
adata,
reference_key="cell_type",
reference_cat=["Oligodendrocytes (non-malignant)", "Microglia/Macrophage"],
window_size=100,
step=1,
)
CPU times: user 85 ms, sys: 169 ms, total: 254 ms
Wall time: 462 ms
%%time
cnv.pl.chromosome_heatmap(adata, groupby="cell_type", dendrogram=True)
WARNING: dendrogram data not found (using key=dendrogram_cell_type). Running `sc.tl.dendrogram` with default parameters. For fine tuning it is recommended to run `sc.tl.dendrogram` independently.
WARNING: You’re trying to run this on 9013 dimensions of `.X`, if you really want this, set `use_rep='X'`.
Falling back to preprocessing with `sc.pp.pca` and default params.
WARNING: Groups are not reordered because the `groupby` categories and the `var_group_labels` are different.
categories: Microglia/Macrophage, Oligodendrocytes (non-malignant), malignant_93, etc.
var_group_labels: chr1, chr2, chr3, etc.
CPU times: user 3.1 s, sys: 6.1 s, total: 9.2 s
Wall time: 1.52 s
Note that running the same analysis in R (invercnv v1.6.0
from Bioconductor) takes about 1:30 min.