infercnvpy.tl.copykat#
- infercnvpy.tl.copykat(adata, gene_ids='S', organism='human', segmentation_cut=0.1, distance='euclidean', s_name='copykat_result', min_genes_chr=5, key_added='cnv', inplace=True, layer=None, n_jobs=None, norm_cell_names='', cell_line='no', window_size=25)#
Inference of genomic copy number and subclonal structure.
Runs CopyKAT (Copynumber Karyotyping of Tumors) [RSY+21] based on integrative Bayesian approaches to identify genome-wide aneuploidy at 5MB resolution in single cells to separate tumor cells from normal cells, and tumor subclones using high-throughput sc-RNAseq data.
Note on input data from the original authors:
The matrix values are often the count of unique molecular identifier (UMI) from nowadays high througput single cell RNAseq data. The early generation of scRNAseq data may be summarized as TPM values or total read counts, which should also work.
This means that unlike for
infercnvpy.tl.infercnv()
the input data should not be log-transformed.CopyKAT also does NOT require running
infercnvpy.io.genomic_position_from_gtf()
, it infers the genomic position from the gene symbols inadata.var_names
.You can find more info on GitHub: navinlabcode/copykat
- Parameters:
adata (
AnnData
) – annotated data matrixkey_added (
str
(default:'cnv'
)) – Key under which the copyKAT scores will be stored inadata.obsm
andadata.uns
.inplace (
bool
(default:True
)) – If True, store the result in adata, otherwise return it.layer (
Optional
[str
] (default:None
)) – AnnData layer to use for running copykatgene_ids (
str
(default:'S'
)) – gene id type: Symbol (“S”) or Ensemble (“E”).segmentation_cut (
float
(default:0.1
)) – segmentation parameters, input 0 to 1; larger looser criteria.distance (
str
(default:'euclidean'
)) – distance methods include “euclidean”, and correlation coverted distance include “pearson” and “spearman”.s_name (
str
(default:'copykat_result'
)) – sample (output file) name.min_genes_chr (
int
(default:5
)) – minimal number of genes per chromosome for cell filtering.norm_cell_names (
str
(default:''
)) – cell barcodes (adata.obs.index
) indicate normal cellsn_jobs (
Optional
[int
] (default:None
)) – Number of cores to use for copyKAT analysis. Per default, uses all cores available on the system. Multithreading does not work on Windows and this value will be ignored.organism (
str
(default:'human'
)) – Runs methods for calculating copy numbers from: “human” or “mouse” scRNAseq data (default: “human”)cell_line (default:
'no'
) – if the data are from pure cell line (ie. not a mixture of tumor and normal), put “yes” to use a synthetic baseline (default: “no”)window_size (default:
25
) – Sets a minimal window size for segmentation
- Return type:
- Returns:
Depending on the value of
inplace
, either returnsNone
or a tuple (CNV Matrix
,`CopyKat prediction`)