infercnvpy.tl.infercnv#
- infercnvpy.tl.infercnv(adata, *, reference_key=None, reference_cat=None, reference=None, lfc_clip=3, window_size=100, step=10, dynamic_threshold=1.5, exclude_chromosomes=('chrX', 'chrY'), chunksize=5000, n_jobs=None, inplace=True, layer=None, key_added='cnv', calculate_gene_values=False)#
Infer Copy Number Variation (CNV) by averaging gene expression over genomic regions.
This method is heavily inspired by infercnv but more computationally efficient. The method is described in more detail in on the The inferCNV method page.
There, you can also find instructions on how to prepare input data.
- Parameters:
adata (
AnnData) – annotated data matrixreference_key (
Optional[str] (default:None)) – Column name in adata.obs that contains tumor/normal annotations. If this is set to None, the average of all cells is used as reference.reference_cat (
Union[None,str,Sequence[str]] (default:None)) – One or multiple values inadata.obs[reference_key]that annotate normal cells.reference (
Optional[ndarray] (default:None)) – Directly supply an array of average normal gene expression. Overridesreference_keyandreference_cat.lfc_clip (
float(default:3)) – Clip log fold changes at this valuewindow_size (
int(default:100)) – size of the running window (number of genes in to include in the window)step (
int(default:10)) – only compute every nth running window where n =step. Set to 1 to compute all windows.dynamic_threshold (
float|None(default:1.5)) – Values< dynamic threshold * STDDEVwill be set to 0, where STDDEV is the stadard deviation of the smoothed gene expression. Set toNoneto disable this step.exclude_chromosomes (
Sequence[str] |None(default:('chrX', 'chrY'))) – List of chromosomes to exclude. The default is to exclude genosomes.chunksize (
int(default:5000)) – Process dataset in chunks of cells. This allows to run infercnv on datasets with many cells, where the dense matrix would not fit into memory.n_jobs (
Optional[int] (default:None)) – Number of jobs for parallel processing. Default: use all cores. Data will be submitted to workers in chunks, seechunksize.inplace (
bool(default:True)) – If True, save the results in adata.obsm, otherwise return the CNV matrix.layer (
Optional[str] (default:None)) – Layer from adata to use. IfNone, useX.key_added (
str(default:'cnv')) – Key under which the cnv matrix will be stored in adata ifinplace=True. Will store the matrix inadata.obsm["X_{key_added}"] and additional information in `adata.uns[key_added].calculate_gene_values (
bool(default:False)) – If True per gene CNVs will be calculated and stored inadata.layers["gene_values_{key_added}"]. As many genes will be included in each segment the resultant per gene value will be an average of the genes included in the segment. Additionally not all genes will be included in the per gene CNV, due to the window size and step size not always being a multiple of the number of genes. Any genes not included in the per gene CNV will be filled with NaN. Note this will significantly increase the memory and computation time, it is recommended to decrease the chunksize to ~100 if this is set to True.
- Return type:
- Returns:
Depending on inplace, either return the smoothed and denoised gene expression matrix sorted by genomic position, or add it to adata.