CoREMOF package

Subpackages

Submodules

CoREMOF.curate module

CoREMOF.get_mofid module

Process your CIF to get mofid v1 and v2.

CoREMOF.get_mofid.are_identical_smiles(smiles1, smiles2)[source]
CoREMOF.get_mofid.convert_ase_pymat(ase_objects)[source]

convert to ase to pymatgen atoms.

Parameters:

ase_objects (ase.Atoms) – ase-type atoms.

Returns:

  • pymatgen-type atoms.

Return type:

ase.Atoms

CoREMOF.get_mofid.dict2str(dct)[source]

Convert symbol-to-number dict to str.

CoREMOF.get_mofid.get_node_linker_files(structure, prefix='Output')[source]

Split MOF to sbu + linker.

Parameters:
  • structure (str) – path to your CIF.

  • prefix (str) – the path to save processed XYZ.

Returns:

  • structure, node, linker, …

Return type:

files

CoREMOF.get_mofid.remove_pbc_cuts(atoms)[source]

Remove building block cuts due to periodic boundary conditions. After the removal, the atoms object is centered at the center of the unit cell.

Parameters:

atoms (ase.Atoms) – aase.Atoms.

Returns:

  • The processed atoms object.

Return type:

ase.Atoms

CoREMOF.get_mofid.run_v1(structure)[source]

Converting CIF to mofid-v1, see https://snurr-group.github.io/mofid/ for additional installation. Tip: Please check CMAKE, JAVA, etc. before installing.

Parameters:

structure (str) – path to your CIF.

Returns:

  • mofid-v1.

Return type:

String

CoREMOF.get_mofid.run_v2(structure, nodes_dataset, refname)[source]

run mofidv2 from CIF.

Parameters:
Returns:

  • mofid-v2.

Return type:

str

CoREMOF.get_mofid.split_linkers_from_cif(structure, prefix='Output')[source]

Split linkers (CIF) to single XYZ.

Parameters:
  • structure (str) – path to your CIF.

  • prefix (str) – the path to save processed XYZ.

Returns:

  • 0: sucess; 1: fail.

Return type:

int

CoREMOF.get_mofid.split_nodes_from_cif(structure, prefix='Output')[source]

Split nodes (CIF) to single XYZ.

Parameters:
  • structure (str) – path to your CIF.

  • prefix (str) – the path to save processed XYZ.

Returns:

  • 0: sucess; 1: fail.

Return type:

int

CoREMOF.get_mofid.xyz2fomula(xyzpath)[source]

from XYZ to chemical fomula based on A-Z.

Parameters:

xyzpath (str) – path to your XYZ.

Returns:

  • fomula.

Return type:

str

CoREMOF.mosaec module

CoREMOF.mosaec.HighestKnownONs()[source]

Determines the highest known oxidation states for each metal element.

Returns:

dictionary with metal element symbols as keys

and their highest known oxidation state as values.

Return type:

HKONs (dict[str, int])

CoREMOF.mosaec.IonizationEnergies()[source]

Reads in the reported ionization energies for each metal element.

Returns:

dictionary with metal element symbols as keys

and a list of their reported ionization energies as values.

Return type:

KIEs (dict[str, list[float]])

CoREMOF.mosaec.KnownONs()[source]

Reads in the known oxidation states for each metal element.

Returns:

dictionary with metal element symbols as keys

and a list of their known oxidation states as values.

Return type:

KONs (dict[str, list(int)])

CoREMOF.mosaec.ONmostprob(iONP)[source]

Determines the highest probability oxidation state for each metal element. These values are utilized during charge distribution routines.

Parameters:

iONP (dict[str, list[float]]) – dictionary with metal element symbols as keys and a list of the probability at the relevant oxidation states as values.

Returns:

dictionary with metal element symbols as keys and

their oxidation state with the highest probability as values.

Return type:

MPOS (dict[str, int])

CoREMOF.mosaec.ONprobabilities()[source]

Reads in the probability of each oxidation state for all metal elements. Approximate probabilities are assessed by their relative frequency of occurence in the CSD metadata.

Returns:

dictionary with metal element symbols as keys

and a list of the probability at the relevant oxidation states as values.

Return type:

ONP (dict[str, list[float]])

CoREMOF.mosaec.assign_VBS(atom, rVBO, dVBO)[source]

Assigns a Valence-Bond-Sum (VBS) to an atom.

Parameters:
  • atom (ccdc.molecule.Atom) – Atom object.

  • rVBO (dict[int, int]) – dictionary with each atom’s index in mole.atoms as keys and VBO (valence bond order) as values.

  • dVBO (dict[int, float]) – dictionary with delocalized bond-possessing atom’s index in mole.atoms as keys and their corresponding (delocalized-only) VBS.

Returns:

valence bond sum value.

Return type:

VBO (int)

CoREMOF.mosaec.binding_contrib(binding_sphere, binding_sites, AON)[source]

Redistributes oxidation state contributions within a binding domain. Equal distribution is assumed across connected binding sites in each domain.

Parameters:
  • binding_sphere (dict[ccdc.molecule.Atom, list[ccdc.molecule.Atom]]) – dictionary with Atom object as keys and a list of Atoms connected through bonding that form a binding domain as values.

  • binding_sites (list[ccdc.molecule.Atom]) – list of binding sites connecting metal atoms and ligands.

  • AON (dict[ccdc.molecule.Atom, float]) – dictionary with Atom object as keys and their oxidation state contribution as values for unique Atoms.

Returns:

dictionary with Atom object as keys

and their updated oxidation state contribution as values accounting for distribution within the binding domain.

Return type:

site_contrib (dict[ccdc.molecule.Atom, float])

CoREMOF.mosaec.binding_domain(binding_sites, AON, molecule, usites)[source]

Builds bonding domains within the crystal structure to determine which metal binding sites (Atom objects directly bonded to a metal) are connected via conjugation. Function accounts for the inconsistent assignment of delocalized bonds, by using the bonding domains (see methodology section for details on the implementation and validation).

Parameters:
  • binding_sites (list[ccdc.molecule.Atom]) – list of binding sites connecting metal atoms and ligands.

  • AON (dict[ccdc.molecule.Atom, float]) – dictionary with Atom object as keys and their oxidation state contribution as values for unique Atom objects.

  • molecule (ccdc.molecule.Molecule) – Molecule object.

  • uniquesites (list[ccdc.molecule.Atom]) – list of unique atoms.

Returns:

dictionary with Atom object as keys and a list of Atoms connected through bonding that form a binding domain as values.

Return type:

sitedomain (dict[ccdc.molecule.Atom, list[ccdc.molecule.Atom]])

CoREMOF.mosaec.bridging(atom)[source]

Determines how many metal atoms the input atom binds to search for bridging sites.

Parameters:

atom (ccdc.molecule.Atom) – binding site Atom object.

Returns:

number of metal atoms bound to the atom.

Return type:

bridge (int)

CoREMOF.mosaec.carbene_type(atom)[source]

Distinguishes between singlet and triplet carbenes.

Parameters:

atom (ccdc.molecule.Atom) – Atom object(s) suspected of belonging to a carbene (2-coordinate carbon II).

Returns:

carbene type at input atom.

Return type:

Literal[“singlet”, “triplet”]

CoREMOF.mosaec.carbocation_check(atom)[source]

Check carbocation/carbanion geometry according to bond angles.

Parameters:

atom (ccdc.molecule.Atom) – Atom object.

Returns:

geometry at input atom.

Return type:

Literal[“tetrahedral”, “trigonal”]

CoREMOF.mosaec.check(file_path)[source]

Process a single .cif or .mol2 file and return a dict mapping each metal site label to the list of flag-names that are not ‘GOOD’.

Return type:

dict[str, list[str]]

CoREMOF.mosaec.delocalisedLBO(molecule)[source]

Writes a dictionary of all atoms in the molecule with delocalized bonds and their (delocalized-only) valence bond sum (VBS).

Parameters:

molecule (ccdc.molecule.Molecule) – Molecule object.

Returns:

dictionary with delocalized bond-possessing

atom’s index in mole.atoms as keys and their corresponding (delocalized-only) VBS.

Return type:

delocal_dict (dict[int, float])

CoREMOF.mosaec.distribute_ONEC(sONEC, metal_networks, IEs, ONP, highest_known_ON, metal_CN, most_probable_ON)[source]

Redistributes the oxidation state contributions across all metal atoms in the structure according to their metal networks (fully local distribution) & calculates their associated electron counts. Features utilizing electron counts are minimally implemented at this time.

Parameters:
  • sONEC (dict[ccdc.molecule.Atom, list[float, float]]) – dictionary with metal Atom object as keys and lists containing the initial oxidation state and electron count implied by only the equal splitting of binding domain charges as values.

  • metal_networks (dict[ccdc.molecule.Atom, list[ccdc.molecule.Atom]]) – dictionary with as metal Atom objects as keys and a list of other metal Atom objects connected through binding domains/ charged ligands as values. Ignores neutral ligand connections.

  • IEs (dict[str, list[float]]) – dictionary with metal element symbols as keys and a list of their reported ionization energies as values.

  • ONP (dict[str, list[float]]) – dictionary with metal element symbols as keys and a list of the probability at the relevant oxidation states as values.

  • highest_known_ON (dict[str, int]) – dictionary with metal element symbols as keys and a their highest known oxidation state as values.

  • metal_CN (dict[ccdc.molecule.Molecule, int]) – dictionary with as metal Atom objects as keys and their effective coordination number as values.

  • most_probable_ON (dict[str, int]) – dictionary with metal element symbols as keys and a their oxidation state with the highest probability as values.

Returns:

dictionary

with metal Atom object as keys and lists containing their redistributed oxidation state and electron count as values.

Return type:

distributed_ONEC (dict[ccdc.molecule.Atom, list[float, float]])

CoREMOF.mosaec.distribute_OuterSphere(sONEC, outer_sphere_charge, IEs, ONP, highest_known_ON, metal_CN)[source]

Redistributes the oxidation state contributions across all metal atoms in the structure according to the outer sphere charge contribution (partially local distribution) & calculates their associated electron counts. Features utilizing electron counts are minimally implemented at this time.

Parameters:
  • sONEC (dict[ccdc.molecule.Atom, list[float, float]]) – dictionary with metal Atom object as keys and lists containing the initial oxidation state and electron count implied by only the equal splitting of binding domain charges as values.

  • outer_sphere_charge (int) – sum of outer sphere charge contributions.

  • IEs (dict[str, list(float)]) – dictionary with metal element symbols as keys and a list of their reported ionization energies as values.

  • ONP (dict[str, list(float)]) – dictionary with metal element symbols as keys and a list of the probability at the relevant oxidation states as values.

  • highest_known_ON (dict[str, int]) – dictionary with metal element symbols as keys and a their highest known oxidation state as values.

  • metal_CN (dict[ccdc.molecule.Molecule, int]) – dictionary with as metal Atom objects as keys and their effective coordination number as values.

Returns:

dictionary

with metal Atom object as keys and lists containing their redistributed oxidation state and electron count as values.

Return type:

distributed_ONEC (dict[ccdc.molecule.Atom, list[float, float]])

CoREMOF.mosaec.getCN(lsites)[source]

Determines the highest probability oxidation state for each metal element. These values are utilized during charge distribution routines.

Parameters:

lsites (dict[Atom, list[Atom]]) – dictionary with metal Atom object as keys and the list of ligand atoms which bind them as values.

Returns:

dictionary with as metal Atom objects as

keys and effective coordination number as values.

Return type:

CNdict (dict[Molecule, int])

CoREMOF.mosaec.get_CN(atom)[source]

Determines the coordination number of the input atom.

Parameters:

atom (ccdc.molecule.Atom) – Atom object.

Returns:

Atom’s coordination number.

Return type:

coord_number (int)

CoREMOF.mosaec.get_binding_sites(metalsites, uniquesites)[source]

Get the binding sites in a structure, given the list of unique metal atoms and all unique atoms.

Parameters:
  • metalsites (list[ccdc.molecule.Atom]) – list of unique metal atoms.

  • uniquesites (list[ccdc.molecule.Atom]) – list of unique atoms.

Returns:

list of binding sites connecting

metal atoms and ligands.

Return type:

binding_sites (list[ccdc.molecule.Atom])

CoREMOF.mosaec.get_ligand_sites(metalsites, sites)[source]

Get the ligand sites binding each metal atom in a structure.

Parameters:
  • metalsites (list[ccdc.molecule.Atom]) – list of metal sites in the structure that belong to the asymmetric unit.

  • sites (list[ccdc.molecule.Atom]) – list of unique atoms in the structure that belong to the asymmetric unit.

Returns:

dictionary with metal Atom object as keys and the the list of ligand atoms which bind them as values.

Return type:

metal_sphere (dict[ccdc.molecule.Atom, list[ccdc.molecule.Atom]])

CoREMOF.mosaec.get_metal_networks(ligand_sites, binding_sphere, bindingAON)[source]

Determines the metal atoms that are connected through binding domains and charged ligands. Any connections through neutral ligands are ignored as they do not contribute to the charge accounting.

Parameters:
  • ligand_sites (dict[ccdc.molecule.Atom, list[ccdc.molecule.Atom]]) – dictionary with metal Atom object as key and the list of ligand atoms which bind them as values.

  • binding_sphere (dict[ccdc.molecule.Atom, list[ccdc.molecule.Atom]]) – dictionary with Atom object as keys and a list of Atoms connected through bonding that form a binding domain as values.

  • bindingAON (dict[ccdc.molecule.Atom, float]) – dictionary with Atom object as keys and their updated oxidation state contribution as values accounting for distribution within the binding domain.

Returns:

dictionary with as metal Atom objects as keys and a list of other metal Atom objects connected through binding domains/charged ligands as values. Ignores neutral ligand connections.

Return type:

network_dict (dict[ccdc.molecule.Atom, list[ccdc.molecule.Atom]])

CoREMOF.mosaec.get_metal_sites(sites)[source]

Get the metal sites in a structure belonging to the asymmetric unit.

Parameters:

sites (list[ccdc.molecule.Atom]) – list of unique atoms in the structure that belong to the asymmetric unit.

Returns:

list of metal sites in the structure

that belong to the asymmetric unit.

Return type:

metalsites (list[ccdc.molecule.Atom])

CoREMOF.mosaec.get_no_metal_molecule(inputmolecule)[source]

Remove metal atoms from the input Molecule object.

Parameters:

inputmolecule (ccdc.molecule.Molecule) – original Molecule object.

Returns:

Molecule object with all metal

atoms removed.

Return type:

workingmol (ccdc.molecule.Molecule)

CoREMOF.mosaec.get_unique_sites(mole, asymmole)[source]

Get the unique atoms in a structure belonging to the asymmetric unit.

Parameters:
  • mole (ccdc.molecule.Molecule) – original structure Molecule object.

  • asymmole (ccdc.molecule.Molecule) – asymmetric unit of the structure.

Returns:

list of unique atoms in the structure

that belong to the asymmetric unit.

Return type:

uniquesites (list[ccdc.molecule.Atom])

CoREMOF.mosaec.global_charge_distribution(metalONdict, IEs, ONP, highest_known_ON, metal_CN, most_probable_ON)[source]

Redistributes the oxidation state contributions across all metal atoms in the structure according to full/global shating (fully delocalized distribution) & calculates their associated electron counts. Features utilizing electron counts are minimally implemented at this time.

Parameters:
  • metalONdict (dict[ccdc.molecule.Atom, list[float, float]]) – dictionary with metal Atom object as keys and lists containing the initial oxidation state and electron count implied by only the equal splitting of binding domain charges as values.

  • IEs (dict[str, list[float]]) – dictionary with metal element symbols as keys and a list of their reported ionization energies as values.

  • ONP (dict[str, list[float]]) – dictionary with metal element symbols as keys and a list of the probability at the relevant oxidation states as values.

  • highest_known_ON (dict[str, int]) – dictionary with metal element symbols as keys and a their highest known oxidation state as values.

  • metal_CN (dict[ccdc.molecule.Molecule, int]) – dictionary with as metal Atom objects as keys and their effective coordination number as values.

  • most_probable_ON (dict[str, int]) – dictionary with metal element symbols as keys and a their oxidation state with the highest probability as values.

Returns:

dictionary with

metal Atom object as keys and lists containing their redistributed oxidation state and electron count as values.

Return type:

global_ONEC (dict[ccdc.molecule.Atom, list[float, float]])

CoREMOF.mosaec.hapticity(atom, metalsite)[source]

Determines if a ligand binding site possesses hapticity (any n-hapto).

Parameters:
  • atom (ccdc.molecule.Atom) – Atom object.

  • metalsites (list[ccdc.molecule.Atom]) – list of metal sites in the structure that belong to the asymmetric unit.

Returns:

whether the the input ligand is hapto-.

Return type:

bool

CoREMOF.mosaec.iVBS_FormalCharge(atom)[source]

Determines the formal charge of an atom NOT involved in any aromatic or delocalized bonding system.

Parameters:

atom (ccdc.molecule.Atom) – Atom object

Returns:

formal charge of the input atom.

Return type:

charge (int)

CoREMOF.mosaec.iVBS_Oxidation_Contrib(unique_atoms, rVBO, dVBO)[source]

Determines the oxidation state contribution of all unique atoms.

Parameters:
  • unique_atoms (list[ccdc.molecule.Atom]) – unique atoms belonging to the asymmetric unit.

  • rVBO (dict[int, int]) – dictionary with each atom’s index in mole.atoms as keys and VBO (valence bond order) as values.

  • dVBO (dict[int, float]) – dictionary with delocalized bond-possessing atom’s index in mole.atoms as keys and their corresponding (delocalized-only) VBS.

Returns:

dictionary with Atom object

as keys and their oxidation state contribution as values.

Return type:

oxi_contrib (dict[ccdc.molecule.Atom, float)]

CoREMOF.mosaec.outer_sphere_contrib(outer_sphere, AON)[source]

Calculates the total oxidation state contribution of the outer sphere atoms as the sum of their formal charge/contributions.

Parameters:
  • outer_sphere (list[ccdc.molecule.Atom]) – list of unique, non-metal atoms outside of binding domains.

  • AON (dict[ccdc.molecule.Atom, float]) – dictionary with Atom object as keys and their oxidation state contribution as values for unique Atoms.

Returns:

sum of outer sphere charge contributions.

Return type:

contrib (int)

CoREMOF.mosaec.outer_sphere_domain(uniquesites, binding_domains)[source]

Identifies sites outside of the binding domains which must be checked for outer sphere charge contributions.

Parameters:
  • uniquesites (list[ccdc.molecule.Atom]) – list of unique atoms in the structure belonging to the asymmetric unit.

  • binding_domains (dict[ccdc.molecule.Atom, list[ccdc.molecule.Atom]]) – dictionary with Atom object as keys and a list of Atoms connected through bonding that form a binding domain as values.

Returns:

list of unique, non-metal atoms

outside of binding domains.

Return type:

outer_sphere (list[ccdc.molecule.Atom])

CoREMOF.mosaec.readSBU(input_mol2)[source]

Reads a MOL2 file containing SBU/metal complex structural data and converts it to a standard atom labeling convention using the ccdc.crystal module.

Parameters:

input_mol2 (str) – filename (.mol2) containing SBU/metal complex structural data.

Returns:

Crystal object containing structural data

in the standard atom labeling convention.

Return type:

mol (ccdc.crystal.Crystal)

CoREMOF.mosaec.read_CSD_entry(input_refcode)[source]

Read entries directly from the CSD CrystalReader according to CSD refcode.

Parameters:

input_refcode (str) – string used to identify materials in the CSD.

Returns:

Crystal object containing structural data

in the standard atom labeling convention.

Return type:

cif (ccdc.crystal.Crystal)

CoREMOF.mosaec.readentry(input_cif)[source]

Reads a CIF file containing structure data and converts it to a standard atom labeling convention using the ccdc.crystal module.

Parameters:

input_cif (str) – filename (.CIF) containing crystal structure data.

Returns:

Crystal object containing structural data

in the standard atom labeling convention.

Return type:

newcif (ccdc.crystal.Crystal)

CoREMOF.mosaec.redundantAON(AON, molecule)[source]

Maps the oxidation contributions of unique atom sites to the redundant atom sites according to their shared atom labels.

Parameters:
  • AON (dict[ccdc.molecule.Atom, float]) – dictionary with Atom object as keys and their oxidation state contribution as values for unique Atom objects.

  • molecule (ccdc.molecule.Molecule) – Molecule object.

Returns:

dictionary with Atom object as keys

and their oxidation state contribution as values for all (including redundant) Atom objects.

Return type:

redAON (dict[ccdc.molecule.Atom, float])

CoREMOF.mosaec.ringVBOs(mole)[source]

Calculates the VBO (valence bond order) for each atom in the structure.

Parameters:

mole (ccdc.molecule.Molecule) – Molecule object representing the structure.

Returns:

dictionary with each atom’s index in mole.atoms

as keys and VBO (valence bond order) as values.

Return type:

ringVBO (dict[int, int])

CoREMOF.mosaec.run(cif_folder, save_path='./', max_workers=64)[source]
CoREMOF.mosaec.valence_e(elmnt)[source]

Determines the number of valence electrons of an atom/element.

Parameters:

elmnt (ccdc.molecule.Atom) – Atom object.

Returns:

Atom’s valence electron count.

Return type:

valence (int)

CoREMOF.mosaec.worker(cif_path)[source]

CoREMOF.prediction module

CoREMOF.structure module

Download structures and query information of CoRE MOF Database.

CoREMOF.structure.download_from_CSD(refcode, output_folder='./CoREMOF2024DB')[source]

download structures from CSD, you need to install [CSD python API](https://downloads.ccdc.cam.ac.uk/documentation/API/installation_notes.html) with licence.

Parameters:
  • refcode (str) – CSD refcode.

  • output_folder (str) – path to save structures.

Returns:

downloading CIF.

Return type:

cif

class CoREMOF.structure.download_from_SI(output_folder='./CoREMOF2024DB')[source]

Bases: object

download structures that we got from supporting information.

Parameters:

output_folder (str) – path to save structures.

Returns:

CoRE MOF SI dataset.

Return type:

cif

get_from_SI(zip_path, entry, output_folder)[source]

unzip files from a ZIP.

Parameters:
  • zip_path (str) – path to ZIP.

  • entry (str) – name of structure.

  • output_folder (str) – path to save structures.

list_zip(zip_path)[source]

list of files from a ZIP.

Parameters:

zip_path (str) – path to ZIP.

Returns:

name list from a ZIP.

Return type:

List

run()[source]

start to run.

CoREMOF.structure.information(dataset, entry)[source]

get information of CoRE MOF database.

Parameters:
  • dataset (str) – name of subset.

  • entry (str) – name of structure

Returns:

properties, DOI, issues and so on.

Return type:

Dictionary

CoREMOF.structure.read_aif(GEMC_data)[source]

get adsorption amount of water from GEMC.

Parameters:

GEMC_data (list) – from detail_of_CR.json, for example, information(“CR-ASR”, “2020[Cu][sql]2[ASR]1”)[“GEMC”].

Returns:

  • information,by [“info”] always “(‘_units_loading’, ‘Molecules/Supercell’)”.

  • pressure by [“pressure”].

  • uptake by [“uptake”].

Return type:

Dictionary

Module contents