infercnvpy.tl.infercnv

Contents

infercnvpy.tl.infercnv#

infercnvpy.tl.infercnv(adata, *, reference_key=None, reference_cat=None, reference=None, lfc_clip=3, window_size=100, step=10, dynamic_threshold=1.5, exclude_chromosomes=('chrX', 'chrY'), chunksize=5000, n_jobs=None, inplace=True, layer=None, key_added='cnv')#

Infer Copy Number Variation (CNV) by averaging gene expression over genomic regions.

This method is heavily inspired by infercnv but more computationally efficient. The method is described in more detail in on the The inferCNV method page.

There, you can also find instructions on how to prepare input data.

Parameters:
  • adata (AnnData) – annotated data matrix

  • reference_key (Optional[str] (default: None)) – Column name in adata.obs that contains tumor/normal annotations. If this is set to None, the average of all cells is used as reference.

  • reference_cat (Union[None, str, Sequence[str]] (default: None)) – One or multiple values in adata.obs[reference_key] that annotate normal cells.

  • reference (Optional[ndarray] (default: None)) – Directly supply an array of average normal gene expression. Overrides reference_key and reference_cat.

  • lfc_clip (float (default: 3)) – Clip log fold changes at this value

  • window_size (int (default: 100)) – size of the running window (number of genes in to include in the window)

  • step (int (default: 10)) – only compute every nth running window where n = step. Set to 1 to compute all windows.

  • dynamic_threshold (Optional[float] (default: 1.5)) – Values < dynamic threshold * STDDEV will be set to 0, where STDDEV is the stadard deviation of the smoothed gene expression. Set to None to disable this step.

  • exclude_chromosomes (Optional[Sequence[str]] (default: ('chrX', 'chrY'))) – List of chromosomes to exclude. The default is to exclude genosomes.

  • chunksize (int (default: 5000)) – Process dataset in chunks of cells. This allows to run infercnv on datasets with many cells, where the dense matrix would not fit into memory.

  • n_jobs (Optional[int] (default: None)) – Number of jobs for parallel processing. Default: use all cores. Data will be submitted to workers in chunks, see chunksize.

  • inplace (bool (default: True)) – If True, save the results in adata.obsm, otherwise return the CNV matrix.

  • layer (Optional[str] (default: None)) – Layer from adata to use. If None, use X.

  • key_added (str (default: 'cnv')) – Key under which the cnv matrix will be stored in adata if inplace=True. Will store the matrix in adata.obsm["X_{key_added}"] and additional information in `adata.uns[key_added].

Return type:

Optional[tuple[dict, csr_matrix]]

Returns:

Depending on inplace, either return the smoothed and denoised gene expression matrix sorted by genomic position, or add it to adata.