infercnvpy.datasets.maynard2020_3k#
- infercnvpy.datasets.maynard2020_3k()#
Return the dataset from [MMR+20] as AnnData object, downsampled to 3000 cells.
- Return type:
- In brief, this data set was processed as follows:
raw data downloaded from ENA
gene expression quantified using Salmon and the nf-core/rnaseq pipeline.
basic quality control (min_counts=20k, max_counts=5M, min_genes=1k, max_mitochondrial_fraction=0.2)
filtered to 6000 HVG using
sc.pp.highly_variable_genes(..., flavor="seurat_v3")
raw counts processed using scVI, providing sample information as batch key.
cell types manually annotated based on marker genes and leiden clustering and subclustering.
downsampled to 3000 cells.
adata.X
contains thelog1p
transformed, cpm-normalized raw counts. ThescVI
latent representation is stored inadata.obsm["X_scVI"]
. A UMAP for the 3000 cells is precomputed.