infercnvpy.tl.copykat

Contents

infercnvpy.tl.copykat#

infercnvpy.tl.copykat(adata, gene_ids='S', organism='human', segmentation_cut=0.1, distance='euclidean', s_name='copykat_result', min_genes_chr=5, key_added='cnv', inplace=True, layer=None, n_jobs=None, norm_cell_names='', cell_line='no', window_size=25)#

Inference of genomic copy number and subclonal structure.

Runs CopyKAT (Copynumber Karyotyping of Tumors) [RSY+21] based on integrative Bayesian approaches to identify genome-wide aneuploidy at 5MB resolution in single cells to separate tumor cells from normal cells, and tumor subclones using high-throughput sc-RNAseq data.

Note on input data from the original authors:

The matrix values are often the count of unique molecular identifier (UMI) from nowadays high througput single cell RNAseq data. The early generation of scRNAseq data may be summarized as TPM values or total read counts, which should also work.

This means that unlike for infercnvpy.tl.infercnv() the input data should not be log-transformed.

CopyKAT also does NOT require running infercnvpy.io.genomic_position_from_gtf(), it infers the genomic position from the gene symbols in adata.var_names.

You can find more info on GitHub: navinlabcode/copykat

Parameters:
  • adata (AnnData) – annotated data matrix

  • key_added (str (default: 'cnv')) – Key under which the copyKAT scores will be stored in adata.obsm and adata.uns.

  • inplace (bool (default: True)) – If True, store the result in adata, otherwise return it.

  • layer (Optional[str] (default: None)) – AnnData layer to use for running copykat

  • gene_ids (str (default: 'S')) – gene id type: Symbol (“S”) or Ensemble (“E”).

  • segmentation_cut (float (default: 0.1)) – segmentation parameters, input 0 to 1; larger looser criteria.

  • distance (str (default: 'euclidean')) – distance methods include “euclidean”, and correlation coverted distance include “pearson” and “spearman”.

  • s_name (str (default: 'copykat_result')) – sample (output file) name.

  • min_genes_chr (int (default: 5)) – minimal number of genes per chromosome for cell filtering.

  • norm_cell_names (str (default: '')) – cell barcodes (adata.obs.index) indicate normal cells

  • n_jobs (Optional[int] (default: None)) – Number of cores to use for copyKAT analysis. Per default, uses all cores available on the system. Multithreading does not work on Windows and this value will be ignored.

  • organism (str (default: 'human')) – Runs methods for calculating copy numbers from: “human” or “mouse” scRNAseq data (default: “human”)

  • cell_line (default: 'no') – if the data are from pure cell line (ie. not a mixture of tumor and normal), put “yes” to use a synthetic baseline (default: “no”)

  • window_size (default: 25) – Sets a minimal window size for segmentation

Return type:

(DataFrame, Series)

Returns:

Depending on the value of inplace, either returns None or a tuple (CNV Matrix,`CopyKat prediction`)