mmcontext.eval.query_annotate.OmicsQueryAnnotator#

class mmcontext.eval.query_annotate.OmicsQueryAnnotator(model, is_cosine=True)#

Class to Annotate and Query omics data using text labels.

A class to handle two main modes of operation: 1) Annotating omics data with text labels, 2) Querying omics data from textual descriptions.

This class uses matrix multiplication for similarity computations.

Variables:

model (SentenceTransformer or similar) – A model providing encode(list_of_strings) -> np.ndarray.
embeddings (np.ndarray) – Embeddings currently stored in the class, either label embeddings or sample embeddings, depending on the mode.
labels (List[str]) – A list storing the labels that were used for annotation.
sample_ids (List[str]) – A list storing sample IDs for the omics data.
is_cosine (bool) – Whether to perform L2 normalization and treat the similarity as cosine similarity.

annotate_omics_data(adata, labels, emb_key='mmcontext_emb', n_top=5, text_template='{}')#

Annotate omics data by finding top-matching labels for each sample.

The scores and the labels are stored in adata.obs["inferred_labels"]. The single best label per sample is stored in adata.obs["best_label"].

Parameters:

adata (anndata.AnnData) – Anndata object containing omics data. We expect adata.obsm[emb_key] to hold the omics embeddings (dim: n_samples x embed_dim).
labels (List[str]) – A list of text labels to use for annotation.
emb_key (str, optional) – Key in adata.obsm containing the embeddings to use, by default “mmcontext_emb”
n_top (int, optional) – Number of top matching labels to retrieve per sample.
text_template (str, optional) – Template string to format labels, by default “{}”

query_with_text(adata, queries, emb_key='mmcontext_emb')#

Query omics data with textual queries.

Compare text query embeddings with omics data using matrix multiplication and store similarities in adata.obs["query_scores"].

Parameters:

adata (anndata.AnnData) – The AnnData object whose samples we want to score against the queries.
queries (List[str]) – A list of textual queries.
emb_key (str, optional) – Key in adata.obsm containing the embeddings to use.

Methods table#

`annotate_omics_data`(adata, labels[, ...])	Annotate omics data by finding top-matching labels for each sample.
`query_with_text`(adata, queries[, emb_key])	Query omics data with textual queries.

OmicsQueryAnnotator.annotate_omics_data(adata, labels, emb_key='mmcontext_emb', n_top=5, text_template='{}')#

Annotate omics data by finding top-matching labels for each sample.

The scores and the labels are stored in adata.obs["inferred_labels"]. The single best label per sample is stored in adata.obs["best_label"].

Parameters:

adata (anndata.AnnData) – Anndata object containing omics data. We expect adata.obsm[emb_key] to hold the omics embeddings (dim: n_samples x embed_dim).
labels (List[str]) – A list of text labels to use for annotation.
emb_key (str, optional) – Key in adata.obsm containing the embeddings to use, by default “mmcontext_emb”
n_top (int, optional) – Number of top matching labels to retrieve per sample.
text_template (str, optional) – Template string to format labels, by default “{}”

OmicsQueryAnnotator.query_with_text(adata, queries, emb_key='mmcontext_emb')#

Query omics data with textual queries.

Compare text query embeddings with omics data using matrix multiplication and store similarities in adata.obs["query_scores"].

Parameters:

adata (anndata.AnnData) – The AnnData object whose samples we want to score against the queries.
queries (List[str]) – A list of textual queries.
emb_key (str, optional) – Key in adata.obsm containing the embeddings to use.