infercnvpy.datasets.maynard2020_3k

Contents

infercnvpy.datasets.maynard2020_3k#

infercnvpy.datasets.maynard2020_3k()#

Return the dataset from [MMR+20] as AnnData object, downsampled to 3000 cells.

Return type:

AnnData

In brief, this data set was processed as follows:
  • raw data downloaded from ENA

  • gene expression quantified using Salmon and the nf-core/rnaseq pipeline.

  • basic quality control (min_counts=20k, max_counts=5M, min_genes=1k, max_mitochondrial_fraction=0.2)

  • filtered to 6000 HVG using sc.pp.highly_variable_genes(..., flavor="seurat_v3")

  • raw counts processed using scVI, providing sample information as batch key.

  • cell types manually annotated based on marker genes and leiden clustering and subclustering.

  • downsampled to 3000 cells.

adata.X contains the log1p transformed, cpm-normalized raw counts. The scVI latent representation is stored in adata.obsm["X_scVI"]. A UMAP for the 3000 cells is precomputed.