infercnvpy.datasets.maynard2020_3k#

infercnvpy.datasets.maynard2020_3k()#

Return the dataset from [MMR+20] as AnnData object, downsampled to 3000 cells.

Return type:: AnnData

In brief, this data set was processed as follows:

raw data downloaded from ENA
gene expression quantified using Salmon and the nf-core/rnaseq pipeline.
basic quality control (min_counts=20k, max_counts=5M, min_genes=1k, max_mitochondrial_fraction=0.2)
filtered to 6000 HVG using sc.pp.highly_variable_genes(..., flavor="seurat_v3")
raw counts processed using scVI, providing sample information as batch key.
cell types manually annotated based on marker genes and leiden clustering and subclustering.
downsampled to 3000 cells.

adata.X contains the log1p transformed, cpm-normalized raw counts. The scVI latent representation is stored in adata.obsm["X_scVI"]. A UMAP for the 3000 cells is precomputed.

infercnvpy.datasets.maynard2020_3k

Contents

infercnvpy.datasets.maynard2020_3k#