mmcontext.eval.query_annotate.OmicsQueryAnnotator#
- class mmcontext.eval.query_annotate.OmicsQueryAnnotator(model, is_cosine=True)#
Bases:
objectClass to Annotate and Query omics data using text labels.
A class to handle two main modes of operation: 1) Annotating omics data with text labels, 2) Querying omics data from textual descriptions.
This class uses matrix multiplication for similarity computations.
- Variables:
model (SentenceTransformer or similar) – A model providing
encode(list_of_strings) -> np.ndarray.embeddings (np.ndarray) – Embeddings currently stored in the class, either label embeddings or sample embeddings, depending on the mode.
labels (List[str]) – A list storing the labels that were used for annotation.
sample_ids (List[str]) – A list storing sample IDs for the omics data.
is_cosine (bool) – Whether to perform L2 normalization and treat the similarity as cosine similarity.
- annotate_omics_data(adata, labels, emb_key='mmcontext_emb', n_top=5, text_template='{}')#
Annotate omics data by finding top-matching labels for each sample.
The scores and the labels are stored in
adata.obs["inferred_labels"]. The single best label per sample is stored inadata.obs["best_label"].- Parameters:
adata (anndata.AnnData) – Anndata object containing omics data. We expect
adata.obsm[emb_key]to hold the omics embeddings (dim: n_samples x embed_dim).labels (List[str]) – A list of text labels to use for annotation.
emb_key (str, optional) – Key in adata.obsm containing the embeddings to use, by default “mmcontext_emb”
n_top (int, optional) – Number of top matching labels to retrieve per sample.
text_template (str, optional) – Template string to format labels, by default “{}”
- query_with_text(adata, queries, emb_key='mmcontext_emb')#
Query omics data with textual queries.
Compare text query embeddings with omics data using matrix multiplication and store similarities in
adata.obs["query_scores"].- Parameters:
adata (anndata.AnnData) – The AnnData object whose samples we want to score against the queries.
queries (List[str]) – A list of textual queries.
emb_key (str, optional) – Key in adata.obsm containing the embeddings to use.
Methods table#
|
Annotate omics data by finding top-matching labels for each sample. |
|
Query omics data with textual queries. |
Methods#
- OmicsQueryAnnotator.annotate_omics_data(adata, labels, emb_key='mmcontext_emb', n_top=5, text_template='{}')#
Annotate omics data by finding top-matching labels for each sample.
The scores and the labels are stored in
adata.obs["inferred_labels"]. The single best label per sample is stored inadata.obs["best_label"].- Parameters:
adata (anndata.AnnData) – Anndata object containing omics data. We expect
adata.obsm[emb_key]to hold the omics embeddings (dim: n_samples x embed_dim).labels (List[str]) – A list of text labels to use for annotation.
emb_key (str, optional) – Key in adata.obsm containing the embeddings to use, by default “mmcontext_emb”
n_top (int, optional) – Number of top matching labels to retrieve per sample.
text_template (str, optional) – Template string to format labels, by default “{}”
- OmicsQueryAnnotator.query_with_text(adata, queries, emb_key='mmcontext_emb')#
Query omics data with textual queries.
Compare text query embeddings with omics data using matrix multiplication and store similarities in
adata.obs["query_scores"].- Parameters:
adata (anndata.AnnData) – The AnnData object whose samples we want to score against the queries.
queries (List[str]) – A list of textual queries.
emb_key (str, optional) – Key in adata.obsm containing the embeddings to use.