infercnvpy.tl.infercnv#

infercnvpy.tl.infercnv(adata, *, reference_key=None, reference_cat=None, reference=None, lfc_clip=3, window_size=100, step=10, dynamic_threshold=1.5, exclude_chromosomes=('chrX', 'chrY'), chunksize=5000, n_jobs=None, inplace=True, layer=None, key_added='cnv', calculate_gene_values=False)#

Infer Copy Number Variation (CNV) by averaging gene expression over genomic regions.

This method is heavily inspired by infercnv but more computationally efficient. The method is described in more detail in on the The inferCNV method page.

There, you can also find instructions on how to prepare input data.

Parameters:

adata (AnnData) – annotated data matrix
reference_key (Optional[str] (default: None)) – Column name in adata.obs that contains tumor/normal annotations. If this is set to None, the average of all cells is used as reference.
reference_cat (Union[None, str, Sequence[str]] (default: None)) – One or multiple values in adata.obs[reference_key] that annotate normal cells.
reference (Optional[ndarray] (default: None)) – Directly supply an array of average normal gene expression. Overrides reference_key and reference_cat.
lfc_clip (float (default: 3)) – Clip log fold changes at this value
window_size (int (default: 100)) – size of the running window (number of genes in to include in the window)
step (int (default: 10)) – only compute every nth running window where n = step. Set to 1 to compute all windows.
dynamic_threshold (float | None (default: 1.5)) – Values < dynamic threshold * STDDEV will be set to 0, where STDDEV is the stadard deviation of the smoothed gene expression. Set to None to disable this step.
exclude_chromosomes (Sequence[str] | None (default: ('chrX', 'chrY'))) – List of chromosomes to exclude. The default is to exclude genosomes.
chunksize (int (default: 5000)) – Process dataset in chunks of cells. This allows to run infercnv on datasets with many cells, where the dense matrix would not fit into memory.
n_jobs (Optional[int] (default: None)) – Number of jobs for parallel processing. Default: use all cores. Data will be submitted to workers in chunks, see chunksize.
inplace (bool (default: True)) – If True, save the results in adata.obsm, otherwise return the CNV matrix.
layer (Optional[str] (default: None)) – Layer from adata to use. If None, use X.
key_added (str (default: 'cnv')) – Key under which the cnv matrix will be stored in adata if inplace=True. Will store the matrix in adata.obsm["X_{key_added}"] and additional information in `adata.uns[key_added].
calculate_gene_values (bool (default: False)) – If True per gene CNVs will be calculated and stored in adata.layers["gene_values_{key_added}"]. As many genes will be included in each segment the resultant per gene value will be an average of the genes included in the segment. Additionally not all genes will be included in the per gene CNV, due to the window size and step size not always being a multiple of the number of genes. Any genes not included in the per gene CNV will be filled with NaN. Note this will significantly increase the memory and computation time, it is recommended to decrease the chunksize to ~100 if this is set to True.

Return type:

None | tuple[dict, csr_matrix, ndarray | None]

Returns:

Depending on inplace, either return the smoothed and denoised gene expression matrix sorted by genomic position, or add it to adata.

infercnvpy.tl.infercnv

Contents

infercnvpy.tl.infercnv#