Infer CNV on lung cancer dataset#
import scanpy as sc
import infercnvpy as cnv
import matplotlib.pyplot as plt
import warnings
warnings.simplefilter("ignore")
sc.settings.set_figure_params(figsize=(5, 5))
sc.logging.print_header()
scanpy==1.9.5 anndata==0.9.2 umap==0.5.4 numpy==1.25.2 scipy==1.11.2 pandas==2.1.1 scikit-learn==1.3.1 statsmodels==0.14.0 igraph==0.10.8 pynndescent==0.5.10
Loading the example dataset#
adata = cnv.datasets.maynard2020_3k()
adata.var.loc[:, ["ensg", "chromosome", "start", "end"]].head()
ensg | chromosome | start | end | |
---|---|---|---|---|
symbol | ||||
AL645933.5 | ENSG00000288587.1 | chr6 | 31400702 | 31463705 |
AC010184.1 | ENSG00000288585.1 | chr3 | 141449745 | 141456434 |
AC023296.1 | ENSG00000288580.1 | chr8 | 2923568 | 2926689 |
AL117334.2 | ENSG00000288577.1 | chr20 | 3406380 | 3410036 |
AC107294.4 | ENSG00000288576.1 | chr3 | 184778723 | 184780720 |
ensg | chromosome | start | end | |
---|---|---|---|---|
symbol | ||||
AL645933.5 | ENSG00000288587.1 | chr6 | 31400702 | 31463705 |
AC010184.1 | ENSG00000288585.1 | chr3 | 141449745 | 141456434 |
AC023296.1 | ENSG00000288580.1 | chr8 | 2923568 | 2926689 |
AL117334.2 | ENSG00000288577.1 | chr20 | 3406380 | 3410036 |
AC107294.4 | ENSG00000288576.1 | chr3 | 184778723 | 184780720 |
Let’s first inspect the UMAP plot based on the transcriptomics data:
sc.pl.umap(adata, color="cell_type")
Running infercnv#
# We provide all immune cell types as "normal cells".
cnv.tl.infercnv(
adata,
reference_key="cell_type",
reference_cat=[
"B cell",
"Macrophage",
"Mast cell",
"Monocyte",
"NK cell",
"Plasma cell",
"T cell CD4",
"T cell CD8",
"T cell regulatory",
"mDC",
"pDC",
],
window_size=250,
)
Now, we can plot smoothed gene expression by cell-type and chromosome. We can observe that the Epithelial cell cluster, which consists largely of tumor cells, appears to be subject to copy number variation.
cnv.pl.chromosome_heatmap(adata, groupby="cell_type")
Clustering by CNV profiles and identifying tumor cells#
cnv.tl.pca(adata)
cnv.pp.neighbors(adata)
cnv.tl.leiden(adata)
After running leiden clustering, we can plot the chromosome heatmap by CNV clusters. We can observe that, as opposted to the clusters at the bottom, the clusters at the top have essentially no differentially expressed genomic regions. The differentially expressed regions are likely due to copy number variation and the respective clusters likely represent tumor cells.
cnv.pl.chromosome_heatmap(adata, groupby="cnv_leiden", dendrogram=True)
WARNING: dendrogram data not found (using key=dendrogram_cnv_leiden). Running `sc.tl.dendrogram` with default parameters. For fine tuning it is recommended to run `sc.tl.dendrogram` independently.
WARNING: You’re trying to run this on 5314 dimensions of `.X`, if you really want this, set `use_rep='X'`.
Falling back to preprocessing with `sc.pp.pca` and default params.
WARNING: Groups are not reordered because the `groupby` categories and the `var_group_labels` are different.
categories: 0, 1, 2, etc.
var_group_labels: chr1, chr2, chr3, etc.
UMAP plot of CNV profiles#
cnv.tl.umap(adata)
cnv.tl.cnv_score(adata)
The UMAP plot consists of a large blob of “normal” cells and several smaller clusters with distinct CNV profiles. Except for cluster “12”, which consists of ciliated cells, the isolated clusters are all epithelial cells. These are likely tumor cells and each cluster represents an individual sub-clone.
fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(11, 11))
ax4.axis("off")
cnv.pl.umap(
adata,
color="cnv_leiden",
legend_loc="on data",
legend_fontoutline=2,
ax=ax1,
show=False,
)
cnv.pl.umap(adata, color="cnv_score", ax=ax2, show=False)
cnv.pl.umap(adata, color="cell_type", ax=ax3)
We can also visualize the CNV score and clusters on the transcriptomics-based UMAP plot. Again, we can see that there are subclusters of epithelial cells that belong to a distinct CNV cluster, and that these clusters tend to have the highest CNV score.
fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(12, 11), gridspec_kw={"wspace": 0.5})
ax4.axis("off")
sc.pl.umap(adata, color="cnv_leiden", ax=ax1, show=False)
sc.pl.umap(adata, color="cnv_score", ax=ax2, show=False)
sc.pl.umap(adata, color="cell_type", ax=ax3)
Classifying tumor cells#
Based on these observations, we can now assign cell to either “tumor” or “normal”.
To this end, we add a new column cnv_status
to adata.obs
.
adata.obs["cnv_status"] = "normal"
adata.obs.loc[adata.obs["cnv_leiden"].isin(["10", "16", "13", "8", "12", "17", "1", "14", "11"]), "cnv_status"] = (
"tumor"
)
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5), gridspec_kw={"wspace": 0.5})
cnv.pl.umap(adata, color="cnv_status", ax=ax1, show=False)
sc.pl.umap(adata, color="cnv_status", ax=ax2)
Now, we can plot the CNV heatmap for tumor and normal cells separately:
cnv.pl.chromosome_heatmap(adata[adata.obs["cnv_status"] == "tumor", :])
cnv.pl.chromosome_heatmap(adata[adata.obs["cnv_status"] == "normal", :])