Infer CNV on lung cancer dataset#

import scanpy as sc
import infercnvpy as cnv
import matplotlib.pyplot as plt
import warnings

warnings.simplefilter("ignore")

sc.settings.set_figure_params(figsize=(5, 5))
sc.logging.print_header()
scanpy==1.9.5 anndata==0.9.2 umap==0.5.4 numpy==1.25.2 scipy==1.11.2 pandas==2.1.1 scikit-learn==1.3.1 statsmodels==0.14.0 igraph==0.10.8 pynndescent==0.5.10

Loading the example dataset#

adata = cnv.datasets.maynard2020_3k()
adata.var.loc[:, ["ensg", "chromosome", "start", "end"]].head()
ensg chromosome start end
symbol
AL645933.5 ENSG00000288587.1 chr6 31400702 31463705
AC010184.1 ENSG00000288585.1 chr3 141449745 141456434
AC023296.1 ENSG00000288580.1 chr8 2923568 2926689
AL117334.2 ENSG00000288577.1 chr20 3406380 3410036
AC107294.4 ENSG00000288576.1 chr3 184778723 184780720
ensg chromosome start end
symbol
AL645933.5 ENSG00000288587.1 chr6 31400702 31463705
AC010184.1 ENSG00000288585.1 chr3 141449745 141456434
AC023296.1 ENSG00000288580.1 chr8 2923568 2926689
AL117334.2 ENSG00000288577.1 chr20 3406380 3410036
AC107294.4 ENSG00000288576.1 chr3 184778723 184780720

Let’s first inspect the UMAP plot based on the transcriptomics data:

sc.pl.umap(adata, color="cell_type")
../_images/99c1336510d7d890c2df7ceb0f5c90039da1ea11788fbad8dbbdf5b096640e41.png

Running infercnv#

# We provide all immune cell types as "normal cells".
cnv.tl.infercnv(
    adata,
    reference_key="cell_type",
    reference_cat=[
        "B cell",
        "Macrophage",
        "Mast cell",
        "Monocyte",
        "NK cell",
        "Plasma cell",
        "T cell CD4",
        "T cell CD8",
        "T cell regulatory",
        "mDC",
        "pDC",
    ],
    window_size=250,
)

Now, we can plot smoothed gene expression by cell-type and chromosome. We can observe that the Epithelial cell cluster, which consists largely of tumor cells, appears to be subject to copy number variation.

cnv.pl.chromosome_heatmap(adata, groupby="cell_type")
../_images/a2be59abafa9a508bbcdd2b5ee71079bf0bb82674ae973a88224331d155dc7f3.png

Clustering by CNV profiles and identifying tumor cells#

cnv.tl.pca(adata)
cnv.pp.neighbors(adata)
cnv.tl.leiden(adata)

After running leiden clustering, we can plot the chromosome heatmap by CNV clusters. We can observe that, as opposted to the clusters at the bottom, the clusters at the top have essentially no differentially expressed genomic regions. The differentially expressed regions are likely due to copy number variation and the respective clusters likely represent tumor cells.

cnv.pl.chromosome_heatmap(adata, groupby="cnv_leiden", dendrogram=True)
WARNING: dendrogram data not found (using key=dendrogram_cnv_leiden). Running `sc.tl.dendrogram` with default parameters. For fine tuning it is recommended to run `sc.tl.dendrogram` independently.
WARNING: You’re trying to run this on 5314 dimensions of `.X`, if you really want this, set `use_rep='X'`.
         Falling back to preprocessing with `sc.pp.pca` and default params.
WARNING: Groups are not reordered because the `groupby` categories and the `var_group_labels` are different.
categories: 0, 1, 2, etc.
var_group_labels: chr1, chr2, chr3, etc.
../_images/d1f6329ffbafd7f8130c94740e67421fb56c883381f5c86ff3a776f6de08d4cf.png

UMAP plot of CNV profiles#

cnv.tl.umap(adata)
cnv.tl.cnv_score(adata)

The UMAP plot consists of a large blob of “normal” cells and several smaller clusters with distinct CNV profiles. Except for cluster “12”, which consists of ciliated cells, the isolated clusters are all epithelial cells. These are likely tumor cells and each cluster represents an individual sub-clone.

fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(11, 11))
ax4.axis("off")
cnv.pl.umap(
    adata,
    color="cnv_leiden",
    legend_loc="on data",
    legend_fontoutline=2,
    ax=ax1,
    show=False,
)
cnv.pl.umap(adata, color="cnv_score", ax=ax2, show=False)
cnv.pl.umap(adata, color="cell_type", ax=ax3)
../_images/567cf9814077834ea129b2599e3793459ed9143b4643eb1ee79861d4b82abf20.png

We can also visualize the CNV score and clusters on the transcriptomics-based UMAP plot. Again, we can see that there are subclusters of epithelial cells that belong to a distinct CNV cluster, and that these clusters tend to have the highest CNV score.

fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(12, 11), gridspec_kw={"wspace": 0.5})
ax4.axis("off")
sc.pl.umap(adata, color="cnv_leiden", ax=ax1, show=False)
sc.pl.umap(adata, color="cnv_score", ax=ax2, show=False)
sc.pl.umap(adata, color="cell_type", ax=ax3)
../_images/e9fc3a4dde38661f14d00165d3eabb758109469e6582cc143e6155f668c98f8a.png

Classifying tumor cells#

Based on these observations, we can now assign cell to either “tumor” or “normal”. To this end, we add a new column cnv_status to adata.obs.

adata.obs["cnv_status"] = "normal"
adata.obs.loc[adata.obs["cnv_leiden"].isin(["10", "16", "13", "8", "12", "17", "1", "14", "11"]), "cnv_status"] = (
    "tumor"
)
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5), gridspec_kw={"wspace": 0.5})
cnv.pl.umap(adata, color="cnv_status", ax=ax1, show=False)
sc.pl.umap(adata, color="cnv_status", ax=ax2)
../_images/1f0df611039a84d96918cdc15af999c60b5b654446225f2181efefc20758192f.png

Now, we can plot the CNV heatmap for tumor and normal cells separately:

cnv.pl.chromosome_heatmap(adata[adata.obs["cnv_status"] == "tumor", :])
../_images/09c074a362c63bd9643d6605747a402860128548954dcea443506b13cbeb98aa.png
cnv.pl.chromosome_heatmap(adata[adata.obs["cnv_status"] == "normal", :])
../_images/f91383afaa404ac3560411016898ac652b03460ace4b91c9bd2cb36a1350ac73.png