dreams.algorithms.murcko_hist package

Submodules

dreams.algorithms.murcko_hist.murcko_hist module

dreams.algorithms.murcko_hist.murcko_hist.are_sub_hists(h1: Dict[str, int], h2: Dict[str, int], k: int = 3, d: int = 4) bool

Determines if two Murcko histograms are considered sub-histograms of each other.

This function checks if the histograms are equal when their sums are small, or if their distance is within a specified threshold for larger histograms.

Parameters:
  • h1 (Dict[str, int]) – The first Murcko histogram dictionary.

  • h2 (Dict[str, int]) – The second Murcko histogram dictionary.

  • k (int, optional) – The threshold for considering small histograms. Defaults to 3.

  • d (int, optional) – The maximum allowed distance for larger histograms. Defaults to 4.

Returns:

True if the histograms are considered sub-histograms, False otherwise.

Return type:

bool

dreams.algorithms.murcko_hist.murcko_hist.break_rings(mol: Mol, rings_size: int = 3) Mol

Breaks all rings of a specified size in a molecule by removing a bond with minimal degree with respect to rings.

This function is intended to be used prior to computing Murcko scaffolds. NOTE: It is not extensively tested and may not be useful for rings_size != 3.

Parameters:
  • mol (Mol) – The input RDKit molecule.

  • rings_size (int, optional) – The size of rings to break. Defaults to 3.

Returns:

The modified RDKit molecule with specified rings broken.

Return type:

Mol

dreams.algorithms.murcko_hist.murcko_hist.multirings(mol: Mol) List[Set[int]]

Returns a list of sets of atom indices belonging to “multiring” in a molecule.

Each “multiring” is a generalized ring, where ordinary fused rings are considered to be a single ring/set (hence the “multiring” name).

Parameters:

mol (Mol) – The input RDKit molecule.

Returns:

A list of sets, where each set contains atom indices belonging to a multiring.

Return type:

List[Set[int]]

dreams.algorithms.murcko_hist.murcko_hist.murcko_hist(mol: Mol, as_dict: bool = True, show_mol_scaffold: bool = False, no_residue_atom_as_linker: bool = True, break_three_membered_rings: bool = True) Dict[str, int] | Tuple[ndarray, ndarray]

Computes the Murcko scaffold histogram for a given molecule.

This function calculates a histogram of rings in the Murcko scaffold of the input molecule, with respect to the number of adjacent rings and linkers.

Parameters:
  • mol (Mol) – The input RDKit molecule.

  • as_dict (bool, optional) – If True, return the histogram as a dictionary. Otherwise, return as numpy arrays. Defaults to True.

  • show_mol_scaffold (bool, optional) – If True, display the original molecule and its Murcko scaffold. Defaults to False.

  • no_residue_atom_as_linker (bool, optional) – If True, do not consider residue atoms as linkers. Defaults to True.

  • break_three_membered_rings (bool, optional) – If True, break all three-membered rings before processing. Defaults to True.

Returns:

If as_dict is True, returns a dictionary where keys are string representations of (adjacent rings, adjacent linkers) and values are counts. If as_dict is False, returns a tuple of two numpy arrays: unique (adjacent rings, adjacent linkers) pairs and their counts.

Return type:

Union[Dict[str, int], Tuple[np.ndarray, np.ndarray]]

dreams.algorithms.murcko_hist.murcko_hist.murcko_hists_dist(h1: Dict[str, int], h2: Dict[str, int]) int

Computes the distance between two Murcko histogram dictionaries.

The distance is calculated as the sum of absolute differences between corresponding histogram values, including keys present in only one histogram.

Parameters:
  • h1 (Dict[str, int]) – The first Murcko histogram dictionary.

  • h2 (Dict[str, int]) – The second Murcko histogram dictionary.

Returns:

The distance between the two histograms.

Return type:

int

Module contents

dreams.algorithms.murcko_hist.are_sub_hists(h1: Dict[str, int], h2: Dict[str, int], k: int = 3, d: int = 4) bool

Determines if two Murcko histograms are considered sub-histograms of each other.

This function checks if the histograms are equal when their sums are small, or if their distance is within a specified threshold for larger histograms.

Parameters:
  • h1 (Dict[str, int]) – The first Murcko histogram dictionary.

  • h2 (Dict[str, int]) – The second Murcko histogram dictionary.

  • k (int, optional) – The threshold for considering small histograms. Defaults to 3.

  • d (int, optional) – The maximum allowed distance for larger histograms. Defaults to 4.

Returns:

True if the histograms are considered sub-histograms, False otherwise.

Return type:

bool

dreams.algorithms.murcko_hist.murcko_hist(mol: Mol, as_dict: bool = True, show_mol_scaffold: bool = False, no_residue_atom_as_linker: bool = True, break_three_membered_rings: bool = True) Dict[str, int] | Tuple[ndarray, ndarray]

Computes the Murcko scaffold histogram for a given molecule.

This function calculates a histogram of rings in the Murcko scaffold of the input molecule, with respect to the number of adjacent rings and linkers.

Parameters:
  • mol (Mol) – The input RDKit molecule.

  • as_dict (bool, optional) – If True, return the histogram as a dictionary. Otherwise, return as numpy arrays. Defaults to True.

  • show_mol_scaffold (bool, optional) – If True, display the original molecule and its Murcko scaffold. Defaults to False.

  • no_residue_atom_as_linker (bool, optional) – If True, do not consider residue atoms as linkers. Defaults to True.

  • break_three_membered_rings (bool, optional) – If True, break all three-membered rings before processing. Defaults to True.

Returns:

If as_dict is True, returns a dictionary where keys are string representations of (adjacent rings, adjacent linkers) and values are counts. If as_dict is False, returns a tuple of two numpy arrays: unique (adjacent rings, adjacent linkers) pairs and their counts.

Return type:

Union[Dict[str, int], Tuple[np.ndarray, np.ndarray]]

dreams.algorithms.murcko_hist.murcko_hists_dist(h1: Dict[str, int], h2: Dict[str, int]) int

Computes the distance between two Murcko histogram dictionaries.

The distance is calculated as the sum of absolute differences between corresponding histogram values, including keys present in only one histogram.

Parameters:
  • h1 (Dict[str, int]) – The first Murcko histogram dictionary.

  • h2 (Dict[str, int]) – The second Murcko histogram dictionary.

Returns:

The distance between the two histograms.

Return type:

int