mmcontext.eval.query_annotate.OmicsQueryAnnotator#

class mmcontext.eval.query_annotate.OmicsQueryAnnotator(model, is_cosine=True)#

Bases: object

Class to Annotate and Query omics data using text labels.

A class to handle two main modes of operation: 1) Annotating omics data with text labels, 2) Querying omics data from textual descriptions.

This class uses matrix multiplication for similarity computations.

Variables:
  • model (SentenceTransformer or similar) – A model providing encode(list_of_strings) -> np.ndarray.

  • embeddings (np.ndarray) – Embeddings currently stored in the class, either label embeddings or sample embeddings, depending on the mode.

  • labels (List[str]) – A list storing the labels that were used for annotation.

  • sample_ids (List[str]) – A list storing sample IDs for the omics data.

  • is_cosine (bool) – Whether to perform L2 normalization and treat the similarity as cosine similarity.

annotate_omics_data(adata, labels, emb_key='mmcontext_emb', n_top=5, text_template='{}')#

Annotate omics data by finding top-matching labels for each sample.

The scores and the labels are stored in adata.obs["inferred_labels"]. The single best label per sample is stored in adata.obs["best_label"].

Parameters:
  • adata (anndata.AnnData) – Anndata object containing omics data. We expect adata.obsm[emb_key] to hold the omics embeddings (dim: n_samples x embed_dim).

  • labels (List[str]) – A list of text labels to use for annotation.

  • emb_key (str, optional) – Key in adata.obsm containing the embeddings to use, by default “mmcontext_emb”

  • n_top (int, optional) – Number of top matching labels to retrieve per sample.

  • text_template (str, optional) – Template string to format labels, by default “{}”

query_with_text(adata, queries, emb_key='mmcontext_emb')#

Query omics data with textual queries.

Compare text query embeddings with omics data using matrix multiplication and store similarities in adata.obs["query_scores"].

Parameters:
  • adata (anndata.AnnData) – The AnnData object whose samples we want to score against the queries.

  • queries (List[str]) – A list of textual queries.

  • emb_key (str, optional) – Key in adata.obsm containing the embeddings to use.

Methods table#

annotate_omics_data(adata, labels[, ...])

Annotate omics data by finding top-matching labels for each sample.

query_with_text(adata, queries[, emb_key])

Query omics data with textual queries.

Methods#

OmicsQueryAnnotator.annotate_omics_data(adata, labels, emb_key='mmcontext_emb', n_top=5, text_template='{}')#

Annotate omics data by finding top-matching labels for each sample.

The scores and the labels are stored in adata.obs["inferred_labels"]. The single best label per sample is stored in adata.obs["best_label"].

Parameters:
  • adata (anndata.AnnData) – Anndata object containing omics data. We expect adata.obsm[emb_key] to hold the omics embeddings (dim: n_samples x embed_dim).

  • labels (List[str]) – A list of text labels to use for annotation.

  • emb_key (str, optional) – Key in adata.obsm containing the embeddings to use, by default “mmcontext_emb”

  • n_top (int, optional) – Number of top matching labels to retrieve per sample.

  • text_template (str, optional) – Template string to format labels, by default “{}”

OmicsQueryAnnotator.query_with_text(adata, queries, emb_key='mmcontext_emb')#

Query omics data with textual queries.

Compare text query embeddings with omics data using matrix multiplication and store similarities in adata.obs["query_scores"].

Parameters:
  • adata (anndata.AnnData) – The AnnData object whose samples we want to score against the queries.

  • queries (List[str]) – A list of textual queries.

  • emb_key (str, optional) – Key in adata.obsm containing the embeddings to use.