harreman.tools.apply_gene_filtering

harreman.tools.apply_gene_filtering#

harreman.tools.apply_gene_filtering(adata, layer_key=None, cell_type_key=None, model=None, feature_elimination=False, threshold=0.2, autocorrelation_filt=False, expression_filt=False, de_filt=False, umi_counts_obs_key=None, device=device(type='cpu'), verbose=False)[source]#

Applies multi-step gene filtering to an AnnData object.

Parameters:
  • adata (AnnData) – Annotated data object (AnnData).

  • layer_key (str, optional) – Key to use from adata.layers or “use_raw” to use adata.raw.X.

  • cell_type_key (str, optional) – Key in adata.obs containing cell type annotations.

  • model (str, optional) – Model name for autocorrelation computation.

  • feature_elimination (bool, optional (default: False)) – If True, filters genes based on sparsity across all cells.

  • threshold (float, optional (default: 0.2)) – Minimum fraction of cells in which the gene must be expressed.

  • autocorrelation_filt (str, optional (default: False)) – If True, filters genes based on spatial autocorrelation significance.

  • expression_filt (str, optional (default: False)) – If True, filters genes based on expression in each cell type.

  • de_filt (str, optional (default: False)) – If True, filters genes based on differential expression between each cell type and the rest.

  • umi_counts_obs_key (str, optional) – Key in adata.obs with total UMI counts per cell. If None, inferred from the expression matrix.

  • device (torch.device, optional) – Device to use for computation (e.g., CUDA or CPU). Defaults to GPU if available.

  • verbose (bool, optional (default: False)) – Whether to print progress and status messages.

Return type:

None