dreams package
Subpackages
- dreams.algorithms package
- dreams.models package
- dreams.training package
- dreams.utils package
- Submodules
- dreams.utils.annotation module
- dreams.utils.data module
AnnotatedSpectraDatasetAttentionEntropyValidationCSRKNNCVDataModuleContrastiveSpectraDatasetContrastiveValidationCorrelationValidationImplExplValidationKNNValidationLabeledSpectraDatasetMSDataMSData.add_column()MSData.at()MSData.columns()MSData.extend_column()MSData.form_subset()MSData.from_hdf5()MSData.from_hdf5_chunks()MSData.from_mgf()MSData.from_msp()MSData.from_mzml()MSData.from_mzxml()MSData.from_pandas()MSData.from_pickle()MSData.get_adducts()MSData.get_charges()MSData.get_prec_mzs()MSData.get_smiles()MSData.get_spectra()MSData.get_values()MSData.load()MSData.load_col_in_mem()MSData.load_hdf5_in_mem()MSData.merge()MSData.remove_column()MSData.rename_column()MSData.spec_to_matchms()MSData.to_matchms()MSData.to_mgf()MSData.to_pandas()MSData.to_pynndescent()MSData.to_torch_dataset()MSData.use_col_as_index()
ManualValidationMaskedSpectraDatasetMaskedSpectraDataset.dataMaskedSpectraDataset.dformatMaskedSpectraDataset.ssl_objectiveMaskedSpectraDataset.spec_preprocMaskedSpectraDataset.frac_masksMaskedSpectraDataset.min_n_masksMaskedSpectraDataset.n_samplesMaskedSpectraDataset.mask_valMaskedSpectraDataset.min_mask_intensMaskedSpectraDataset.mask_precMaskedSpectraDataset.deterministic_maskMaskedSpectraDataset.mask_peaksMaskedSpectraDataset.mask_intens_strategyMaskedSpectraDataset.ret_order_pairsMaskedSpectraDataset.return_chargeMaskedSpectraDataset.acc_est_weightMaskedSpectraDataset.lsh_weightMaskedSpectraDataset.bert801010_maskingMaskedSpectraDataset.__len__()MaskedSpectraDataset.__getitem__()MaskedSpectraDataset.get_spec()MaskedSpectraDataset.get_spec()
MatchmsSpectraDatasetRandomSplitDataModuleRawSpectraDatasetSSLProbingValidationSpecRetrievalValidationSpectrumPreprocessorSpectrumPreprocessor.dformatSpectrumPreprocessor.prec_intensSpectrumPreprocessor.n_highest_peaksSpectrumPreprocessor.spec_entropy_cleaningSpectrumPreprocessor.normalize_mzsSpectrumPreprocessor.to_relative_intensitiesSpectrumPreprocessor.precisionSpectrumPreprocessor.mz_shift_aug_pSpectrumPreprocessor.mz_shift_aug_maxSpectrumPreprocessor.__call__()
SplittedDataModulecondense_dreams_knn()evaluate_split()load_hdf5_in_mem()subset_lsh()
- dreams.utils.dformats module
DataFormatDataFormat.high_intensity_thldDataFormat.lsh_bin_sizeDataFormat.lsh_n_hplanesDataFormat.max_chargeDataFormat.max_ms_levelDataFormat.max_mzDataFormat.max_peaks_nDataFormat.max_prec_mzDataFormat.max_tbxic_stdevDataFormat.min_chargeDataFormat.min_file_spectraDataFormat.min_intensity_amplDataFormat.min_peaks_nDataFormat.val_spec()
DataFormatADataFormatA1DataFormatA2DataFormatA3DataFormatBDataFormatBuilderDataFormatCassign_dformat()to_A_format()to_format()
- dreams.utils.io module
ChunkedDatasetAccessorChunkedHDF5FileTqdmToLoggerappend_to_stem()bytes_to_human_str()bytes_to_units()cache_pkl()clean_ftps()compress_hdf()downloadpublicdata_to_hdf5s()ftp_to_msv_id()lcmsms_to_hdf5()list_from_txt()list_to_txt()lsh_subset()merge_lcmsms_hdf5s()merge_ms_hdfs()parse_sirius_ms()parsed_lcmsms_to_hdf()prepend_to_stem()read_json()read_json_spec()read_lcmsms()read_mgf()read_ms()read_msp()read_mzml()read_pickle()read_textual_ms_format()sample_hdf()save_nist_like_df_to_mgf()savefig()setup_logger()suppress_output()wandb_import()write_json()write_pickle()
- dreams.utils.lcms module
- dreams.utils.misc module
all_close_pairwise()calc_attention_entropy()chunk_list()chunk_list_eq_sum()complete_permutation()contains_similar()download_pretrained_model()gems_hf_download()get_closest_values()hf_download()interpolate_interval()is_float()is_sorted()lists_to_legends()merge_stats()networkx_to_dataframe()
- dreams.utils.mols module
MolPropertyCalculatorclosest_mz_frags()disable_rdkit_log()formula_is_carbohydrate()formula_is_halogenated()formula_to_dict()formula_type()fp_func_from_str()generate_fragments()generate_spectrum()get_mol_mass()maccs_fp()mol_to_formula()mol_to_img_str()mol_to_inchi14()morgan_fp()morgan_mol_sim()morgan_smiles_sim()np_classify()np_to_rdkit_fp()rdkit_fp()rdkit_fp_to_np()rdkit_mol_sim()rdkit_smiles_sim()show_mols()smiles_to_formula()smiles_to_inchi14()tanimoto_sim()
- dreams.utils.plots module
- dreams.utils.spectra module
MSnSpectrumMSnSpectrum.get_collision_energy()MSnSpectrum.get_intensities()MSnSpectrum.get_ionization_mode()MSnSpectrum.get_mzs()MSnSpectrum.get_peak_list()MSnSpectrum.get_peaks_n()MSnSpectrum.get_precursor_charge()MSnSpectrum.get_precursor_formula()MSnSpectrum.get_precursor_mass()MSnSpectrum.get_precursor_mol()MSnSpectrum.get_precursor_mz()
PeakListModifiedCosinebin_peak_list()bin_peak_lists()df_to_MSnSpectra()from_hot()from_hot_logits()get_base_peak()get_closest_mz_peak()get_closest_mz_peaks()get_highest_peaks()get_num_peaks()get_peak_intens_nbhd()has_peak_at()intens_amplitude()is_valid_peak_list()max_mz()merge_peak_lists()normalize_mzs()num_high_peaks()num_hot_classes()pad_peak_list()parse_raw_peak_list()plot_spectrum()prepend_precursor_peak()process_peak_list()to_classes()to_hot()to_rel_intensity()trim_peak_list()unpad_peak_list()
- Module contents
Submodules
dreams.api module
- class dreams.api.DreaMSAtlas(local_dir: str | Path | None = None)
Bases:
objectInitialize a DreaMSAtlas object enabling access to the DreaMS Atlas k-NN graph and associated data for individual nodes in the graph.
- Parameters:
local_dir (Union[str, Path], optional) – Local directory to download and cache data. Defaults to
~/.cache/huggingface/hub.
- decode_knn_i(knn_i)
- encode_knn_i(node_repr_i)
- get_data(idx: int | Iterable[int], vals=None, plot=False, return_spec=True, msv_metadata=False)
- get_lib_idx()
- get_lsh_cluster(i, as_dataframe=False, vals=None, msv_metadata=False)
- get_neighbors(i, n_hops=1, inv_neighbors=False, sim_thld=-inf, as_dataframe=False, data_vals=None, msv_metadata=False, return_spec=True)
- get_node_cluster(i, data=True, lsh=False, msv_metadata=False)
- get_node_repr(i)
- is_library_i(i)
- class dreams.api.DreaMSSearch(ref_spectra: Path | str | MSData, verbose: bool = True, store_embs: bool = True)
Bases:
object
- class dreams.api.PreTrainedModel(model: DreaMS | FineTuningHead, n_highest_peaks: int = 100)
Bases:
object- static available_models()
- classmethod from_ckpt(ckpt_path: Path, ckpt_cls: Type[DreaMS] | Type[FineTuningHead], n_highest_peaks: int, remove_unused_backbone_parameters: bool = True, dreams_args: dict | None = None)
- classmethod from_name(name: str)
- remove_unused_backbone_parameters()
Helper function to remove unused heads from the pre-trained DreaMS backbone model.
- dreams.api.dreams_attn_scores(model: Path | str | DreaMS, msdata: Path | str, layers_idx=None, precursor_only=True, batch_size=32, progress_bar=True, spec_col='spectrum', prec_mz_col='precursor_mz', n_highest_peaks=None, spec_preproc: SpectrumPreprocessor = None)
- dreams.api.dreams_embeddings(pth, batch_size=32, progress_bar=True, logger_pth=None, store_embs=False, **msdata_kwargs)
- dreams.api.dreams_intermediates(model: Path | str | PreTrainedModel, msdata: Path | str, layers_idx=None, precursor_only=True, batch_size=32, progress_bar=True, spec_col='spectrum', prec_mz_col='precursor_mz', n_highest_peaks=60, compute_attn_matrices=True, compute_embeddings=False, spec_preproc: SpectrumPreprocessor = None)
Extracts intermediate representations (embeddings and attention matrices) from individual layers of a DreaMS model.
This function allows for the extraction of both embeddings and attention matrices from specified layers of a DreaMS model. It supports loading the model from a checkpoint and processing mass spectrometry data to obtain the desired intermediate representations. The function is flexible, allowing for customization of various parameters such as batch size, the number of highest peaks to consider, and whether to compute embeddings or attention matrices.
- Parameters:
model (Union[Path, str, PreTrainedModel]) – The model instance or the path to the model checkpoint file.
msdata (Union[Path, str]) – The mass spectrometry data or the path to the data file.
layers_idx (list, optional) – A list of layer indices from which to extract embeddings. If not provided, defaults to the last layer.
precursor_only (bool, optional) – If True, only extract embeddings for the precursor ion. Defaults to True.
batch_size (int, optional) – The number of samples to process in each batch. Defaults to 32.
progress_bar (bool, optional) – If True, display a progress bar during processing and print a log. Defaults to True.
spec_col (str, optional) – The column name in the data that contains the spectra. Defaults to SPECTRUM.
prec_mz_col (str, optional) – The column name in the data that contains the precursor m/z values. Defaults to PRECURSOR_MZ.
n_highest_peaks (int, optional) – The number of highest intensity peaks to consider in each spectrum. Defaults to 60.
compute_attn_matrices (bool, optional) – If True, compute and return attention matrices from the model. Defaults to True.
compute_embeddings (bool, optional) – If True, compute and return embeddings from the model. Defaults to False.
spec_preproc (du.SpectrumPreprocessor, optional) – An instance of SpectrumPreprocessor for preprocessing the spectra. Defaults to None.
- Returns:
- A dictionary containing the extracted embeddings and/or attention matrices. The keys are the layer indices, and the values
are the corresponding embeddings or attention matrices.
- Return type:
dict
- dreams.api.dreams_predictions(model_ckpt: PreTrainedModel | FineTuningHead | DreaMS | Path | str, spectra: Path | str | MSData, model_cls=None, batch_size=32, progress_bar=True, n_highest_peaks=None, title='', logger_pth=None, store_preds=False, **msdata_kwargs)
- dreams.api.predict_fluorine(in_dir: Path | str, verbose: bool = True)
Predict fluorine probabilities for a directory of spectra. :param in_dir: Path to the directory containing the spectra. :param verbose: Whether to print verbose output.
- Returns:
DataFrame with the predictions stored as dreams_fluorine_predictions.csv in the input directory.
dreams.cli module
- dreams.cli.serialize(x)
dreams.definitions module
- dreams.definitions.export()