API documentation¶
The GaudiMM package is comprised of several core modules that establish the base architecture to build an extensible platform of molecular design.
The main module is gaudi.base
, which defines the gaudi.base.Individual
,
whose instances represent the potential solutions to the proposed problem. Two plugin
packages allow easy customization of how individuals are defined (gaudi.genes
) and
how they are evaluated (gaudi.objectives
). Additionally:
gaudi.algorithms
is the place to look for the actual GA implementationgaudi.box
is a placeholder for several small functions that are used across GaudiMM.gaudi.exceptions
defines custom exceptions.gaudi.parse
contains parsing utilities to retrieve the configuration files.gaudi.parallel
contains helpers to deal with parallel GaudiMM jobs.gaudi.plugin
holds some magic to make the plugin system work.gaudi.similarity
defines the diversity enhancers.
gaudi.algorithms¶
This module implements evolutionary algorithms as seen in DEAP, and extends their functionality to make use of GAUDI goodies.
Todo
- Genealogy
-
gaudi.algorithms.
dump_population
(population, cfg, subdir=None)¶
-
gaudi.algorithms.
ea_mu_plus_lambda
(population, toolbox, mu, lambda_, cxpb, mutpb, ngen, cfg, stats=None, halloffame=None, verbose=True, prompt_on_exception=True)¶ This is the \((\mu + \lambda)\) evolutionary algorithm.
Parameters: - population – A list of individuals.
- toolbox – A
Toolbox
that contains the evolution operators. - mu – The number of individuals to select for the next generation.
- lambda_ – The number of children to produce at each generation.
- cxpb – The probability that an offspring is produced by crossover.
- mutpb – The probability that an offspring is produced by mutation.
- ngen – The number of generation.
- stats – A
Statistics
object that is updated inplace, optional. - halloffame – A
HallOfFame
object that will contain the best individuals, optional. - verbose – Whether or not to log the statistics.
Returns: The final population.
First, the individuals having an invalid fitness are evaluated. Then, the evolutionary loop begins by producing lambda_ offspring from the population, the offspring are generated by a crossover, a mutation or a reproduction proportionally to the probabilities cxpb, mutpb and 1 - (cxpb + mutpb). The offspring are then evaluated and the next generation population is selected from both the offspring and the population. Briefly, the operators are applied as following
evaluate(population) for i in range(ngen): offspring = varOr(population, toolbox, lambda_, cxpb, mutpb) evaluate(offspring) population = select(population + offspring, mu)
This function expects
toolbox.mate()
,toolbox.mutate()
,toolbox.select()
andtoolbox.evaluate()
aliases to be registered in the toolbox. This algorithm uses thevarOr()
variation.
gaudi.base¶
Contains the core classes we use to build individuals (potential solutions of the optimization process).
-
class
gaudi.base.
BaseIndividual
(cfg=None, cache=None, **kwargs)¶ Bases:
object
Base class for individual objects that are evaluated by DEAP.
Each individual is a potential solution. It contains all that is needed for an evaluation. With multiprocessing in mind, individuals should be self-contained so it can be passed between threads.
The defined methods are only wrapper calls to the respective methods of each gene.
Parameters: - cfg (gaudi.parse.Settings) – The full parsed object from the configuration YAML file.
- cache (dict or dict-like) – A mutable object that can be used to store values across instances.
- dummy (bool) – If True, create an uninitialized Individual, only containing the cfg attribute. If false, call __ready__ and complete initialization.
-
__CACHE
¶ Class attribute that caches gene data across instances
Type: dict
-
__CACHE_OBJ
¶ Class attribute that caches objectives data across instances
Type: dict
Todo
write()
should use Pickle and just save the whole object, but Chimera’s inmutable objects (Atoms, Residues, etc) get in the way. A workaround may be found if we take a look a the session saving code.-
clear_cache
()¶
-
evaluate
(environment)¶ Express individual, evaluate it and unexpress it.
Parameters: environment (Environment) – Objectives that will evaluate the individual
-
express
()¶ Express genes in this environment. Very much like ‘compiling’ the individual to a chimera.Molecule.
-
mate
(other)¶ Recombine genes of self with other. It simply calls mate on each gene instance
Parameters: other (Individual) – Another individual to mate with.
-
mutate
(indpb)¶ Trigger a round of possible mutations across all genes
Parameters: indpb (float) – Probability of suffering a mutation
-
post_express
()¶
-
post_unexpress
()¶
-
pre_express
()¶
-
pre_unexpress
()¶
-
similar
(other)¶ Compare self and other with a similarity function.
Returns: Return type: bool
-
unexpress
()¶ Undo .express()
-
write
(i, path=None)¶ Export the individual to a mol2 file
Parameters: - i (int) – Individual identificator in current generation or hall of fame
- note :: (.) – Maybe someday we can pickle it all :/ >>> filename = os.path.join(path, ‘{}_{}.pickle.gz’.format(name,i)) >>> with gzip.GzipFile(filename, ‘wb’) as f: >>> cPickle.dump(self, f, 0) >>> return filename
-
class
gaudi.base.
Environment
(cfg=None, *args, **kwargs)¶ Bases:
object
Objective container and helper to evaluate an individual. It must be instantiated with a gaudi.parse.Settings object.
Parameters: cfg (gaudi.parse.Settings) – The parsed configuration YAML file that contains objectives information -
clear_cache
()¶
-
evaluate
(individual)¶ individual : Individual
-
-
class
gaudi.base.
MolecularIndividual
(*args, **kwargs)¶ Bases:
gaudi.base.BaseIndividual
-
find_molecule
(name)¶
-
post_express
()¶
-
xyz
(gene=None)¶
-
-
gaudi.base.
expressed
(*args, **kwds)¶
gaudi.box¶
This module is a messy collection of useful functions used all along GAUDI.
Todo
Some of these functions are hardly used, so maybe we should clean it a little in the future…
-
gaudi.box.
atoms_between
(atom1, atom2)¶ Finds all connected atoms between two given atoms
-
gaudi.box.
atoms_by_serial
(*serials, **kw)¶ Find atoms in kw[‘atoms’] with serialNumber = serials.
Parameters: - serials (int) – List of serial numbers to match
- atoms (list of chimera.Atom, optional) – List of atoms to be traversed while looking for serial numbers
Returns: Return type: list of chimera.Atom
-
gaudi.box.
create_single_individual
(path)¶ Create an individual within Chimera. Convenience method for Chimera IDLE.
-
gaudi.box.
do_cprofile
(func)¶ Decorator to cProfile a certain function and output the results to cprofile.out
-
gaudi.box.
draw_interactions
(interactions, startCol='FF0000', endCol='FFFF00', key=None, name='Custom pseudobonds')¶ Draw pseudobonds depicting atoms relationships.
Parameters: - interactions (list of tuples) – Each tuple contains an interaction, defined, at least, by the two atoms involved.
- startCol (str, optional) – Hex code for the initial color of the pseudobond (closer to the first atom of the pair).
- endCol (str, optional) – Hex code for the final color of the pseudobond. (closer to the second atom of the pair)
- key (int, optional) – The index of an interaction tuple that represent the alpha channel in the color used to depict the interaction.
- name (str, optional) – Name of the pseudobond group created.
Returns: Return type: chimera.pseudoBondGroup
-
gaudi.box.
files_in
(path, ext=None)¶ Returns all the files in a given directory, filtered by extension if desired.
Parameters: - path (str) –
- ext (list of str, optional) – File extension(s) to filter on.
Returns: Return type: List of absolute paths
-
gaudi.box.
find_nearest
(anchor, atoms)¶ Find the atom of atoms that is closer to anchor, in terms of number of atoms in between.
-
gaudi.box.
highest_atom_indices
(r)¶ Returns a dictionary with highest atom indices in given residue Key: value -> element.name: highest index in residue
-
gaudi.box.
incremental_existing_path
(path, separator='__')¶
-
gaudi.box.
open_models_and_close
(*args, **kwds)¶
-
gaudi.box.
pseudobond_to_bond
(molecule, remove=False)¶ Transforms every pseudobond in molecule to a covalent bond
Parameters: - molecule (chimera.Molecule) –
- remove (bool) – If True, remove original pseudobonds after actual bonds have been created.
-
gaudi.box.
rmsd
(a, b)¶
-
gaudi.box.
sequential_bonds
(atoms, s)¶ Returns bonds in atoms in sequential order, beginning at atom s
-
gaudi.box.
silent_stdout
(*args, **kwds)¶
-
gaudi.box.
stdout_to_file
(workspace, stderr=True)¶
-
gaudi.box.
suppress_ksdssp
(trig_name, my_data, molecules)¶ Monkey-patch Chimera triggers to disable KSDSSP computation
-
gaudi.box.
write_individuals
(inds, outpath, name, evalfn, remove=True)¶ Write an individual to disk.
Note
Deprecated since an Individual object is able to write itself to disk.
gaudi.exceptions¶
This module collects more meaningful exceptions than builtins.
-
exception
gaudi.exceptions.
AtomsNotFound
¶ Bases:
exceptions.Exception
-
exception
gaudi.exceptions.
MoleculesNotFound
¶ Bases:
exceptions.Exception
-
exception
gaudi.exceptions.
ResiduesNotFound
¶ Bases:
exceptions.Exception
-
exception
gaudi.exceptions.
TooManyAtoms
¶ Bases:
exceptions.Exception
-
exception
gaudi.exceptions.
TooManyResidues
¶ Bases:
exceptions.Exception
gaudi.parallel¶
Helper functions to deal with parallel execution of GaudiMM jobs.abs
Useful for benchmarks.
-
gaudi.parallel.
run_parallel
(fn, args=(), processes=None, initializer=None, initargs=(), maxtasksperchild=1, map_timeout=9999999, map_chunksize=1, map_callback=None)¶ Create a pool instance with built-in exception handling.
gaudi.parse¶
This module parses and validates YAML input files into convenient objects that allow per-attribute access to configuration parameters.
-
gaudi.parse.
AssertList
(*validators, **kwargs)¶ Make sure the value is contained in a list
-
gaudi.parse.
Coordinates
(v)¶
-
gaudi.parse.
Degrees
(v)¶
-
gaudi.parse.
ExpandUserPathExists
(v)¶
-
gaudi.parse.
Importable
(v)¶
-
gaudi.parse.
MakeDir
(validator)¶
-
class
gaudi.parse.
MoleculeAtom
(molecule, atom)¶ Bases:
tuple
-
atom
¶ Alias for field number 1
-
molecule
¶ Alias for field number 0
-
-
class
gaudi.parse.
MoleculeResidue
(molecule, residue)¶ Bases:
tuple
-
molecule
¶ Alias for field number 0
-
residue
¶ Alias for field number 1
-
-
gaudi.parse.
Molecule_name
(v)¶ Ideal implementation:
def fn(v): valid = [i['name'] for i in items if i['module'] == 'gaudi.genes.molecule'] if v not in valid: raise Invalid("{} is not a valid Molecule name".format(v)) return v return fn
However, I must figure a way to get the gene list beforehand
-
gaudi.parse.
Named_spec
(*names)¶ Assert that str is formatted like “Molecule/123”, with Molecule being a valid name of a Molecule gene and 123 a positive int or *
-
gaudi.parse.
RelPathToInputFile
(inputpath=None)¶
-
gaudi.parse.
ResidueThreeLetterCode
(v)¶
-
class
gaudi.parse.
Settings
(path=None, validation=True)¶ Bases:
munch.Munch
Parses a YAML input file with PyYAML, validates it with voluptuous and builds a attribute-accessible dict with Munch.
Hence, all the attributes in this class are generated automatically from the default values, and the updated with the contents of the YAML file.
Parameters: path (str) – Path to YAML file -
output
¶ Contains the parameters to determine how to write and report results.
Type: dict
-
output.
path
¶ Directory that will contain the result files. If it does not exist, it will be created. If it does, the contents could be overwritten. Better change this between different attempts.
Type: str, optional, defaults to . (current dir)
-
output.
name
¶ A small identifier for your calculation. If not set, it will use five random characters.
Type: str, optional
-
output.
precision
¶ How many decimals should be used along the simulation. This won’t only affect reports, but also the reported scores by the objectives during the selection process.
Type: int, optional, defaults to 3
-
output.
compress
¶ Whether to apply compression to the individual ZIP files or not.
Type: bool, optional, defaults to True
-
output.
history
¶ Whether to save all the genealogy of the individuals created along the simulation or not. Only for advanced users.
Type: bool, optional, defaults to False
-
output.
pareto
¶ If True, the elite population will be the Pareto front of the population. If False, the elite population will be the best solutions, according to the lexicographic sorting of the fitness values.
Type: bool, optional, defaults to True
-
output.
verbose
¶ Whether to realtime report the progress of the simulation or not.
Type: bool, optional, defaults to True
-
output.
check_every
¶ Dump the elite population every n generations. Switched off if set to 0.
Type: int, optional, defaults to 10
-
output.
prompt_on_exception
¶ When an exception is raised, GaudiMM tries to dump the current population to disk as an emergency rescue plan. This includes pressing Ctrl+C. If this happens, it prompts the user whether to dump it or not. For interactive sessions this is desirable, but no so much for unsupervised cluster jobs. If set to False, this behaviour will be disabled.
Type: bool, optional, defaults to True
-
ga
¶ Contains the genetic algorithm parameters.
Type: dict
-
ga.
population
¶ Size of the starting population, in number of individuals.
Type: int
-
ga.
generations
¶ Number of generations to simulate.
Type: float
-
ga.
mu
¶ The number of children to select at each generation, expressed as a multiplier of
ga.population
.Type: float, optional, defaults to 1.0
-
ga.
lambda_
¶ The number of children to produce at each generation, expressed as a multiplier of
ga.population
.Type: float, optional, defaults to 3.0
-
ga.
mut_eta
¶ Crowding degree of the mutation. A high eta will produce a mutant resembling its parent, while a small eta will produce a solution much more different.
Type: float, optional, defaults to 5
-
ga.
mut_pb
¶ The probability that an offspring is produced by mutation.
Type: float, optional, defaults to 0.5
-
ga.
mut_indpb
¶ Independent probability for each gene to be mutated.
Type: float, optional, defaults to 0.75
-
ga.
cx_eta
¶ Crowding degree of the crossover. A high eta will produce children resembling to their parents, while a small eta will produce solutions much more different.
Type: float, optional, defaults to 5
-
ga.
cx_pb
¶ The probability that an offspring is produced by crossover.
Type: float, optional, defaults to 0.5
-
similarity
¶ Contains the parameters to the similarity operator, which, given two individuals with the same fitness, whether they can be considered the same solution or not.
Type: dict
-
similarity.
module
¶ The function to call when a fitness draw happens. It should be expressed as Python importable path; ie, separated by dots:
gaudi.similarity.rmsd
.Type: str
-
similarity.
args
¶ Positional arguments to the similarity function.
Type: list
-
similarity.
kwargs
¶ Optional arguments to the similarity function.
Type: dict
-
genes
¶ Contains the list of genes that each Individual will have.
Type: list of dict
-
objectives
¶ Contains the list of objectives that will make the Environment object to evaluate the Individuals.
Type: list of dict
-
default_values
= {'ga': {'cx_eta': 5, 'cx_pb': 0.5, 'generations': 3, 'lambda_': 3, 'mu': 1, 'mut_eta': 5, 'mut_indpb': 0.75, 'mut_pb': 0.5, 'population': 10}, 'genes': [{}], 'objectives': [{}], 'output': {'check_every': 10, 'compress': True, 'history': False, 'name': 'dLKBW', 'pareto': True, 'path': '.', 'precision': 3, 'prompt_on_exception': True, 'verbose': True}, 'similarity': {'args': [['Ligand'], 2.5], 'kwargs': {}, 'module': 'gaudi.similarity.rmsd'}}¶
-
name_objectives
¶
-
schema
= {'genes': All(Length(min=1, max=None), [<type 'dict'>], msg=None), 'objectives': All(Length(min=1, max=None), [<type 'dict'>], msg=None), 'output': {'check_every': All(Coerce(int, msg=None), Range(min=0, max=None, min_included=True, max_included=True, msg=None), msg=None), 'compress': Coerce(bool, msg=None), 'history': Coerce(bool, msg=None), 'name': All(<type 'basestring'>, Length(min=1, max=255), msg=None), 'pareto': Coerce(bool, msg=None), 'path': <function fn>, 'precision': All(Coerce(int, msg=None), Range(min=-3, max=6, min_included=True, max_included=True, msg=None), msg=None), 'prompt_on_exception': Coerce(bool, msg=None), 'verbose': Coerce(bool, msg=None)}, 'similarity': {'args': <type 'list'>, 'kwargs': <type 'dict'>, 'module': <type 'basestring'>}, '_path': <function ExpandUserPathExists>, 'ga': {'cx_eta': All(Coerce(int, msg=None), Range(min=0, max=None, min_included=True, max_included=True, msg=None), msg=None), 'cx_pb': All(Coerce(float, msg=None), Range(min=0, max=1, min_included=True, max_included=True, msg=None), msg=None), 'generations': All(Coerce(int, msg=None), Range(min=0, max=None, min_included=True, max_included=True, msg=None), msg=None), 'lambda_': All(Coerce(float, msg=None), Range(min=0, max=None, min_included=True, max_included=True, msg=None), msg=None), 'mu': All(Coerce(float, msg=None), Range(min=0, max=1, min_included=True, max_included=True, msg=None), msg=None), 'mut_eta': All(Coerce(int, msg=None), Range(min=0, max=None, min_included=True, max_included=True, msg=None), msg=None), 'mut_indpb': All(Coerce(float, msg=None), Range(min=0, max=1, min_included=True, max_included=True, msg=None), msg=None), 'mut_pb': All(Coerce(float, msg=None), Range(min=0, max=1, min_included=True, max_included=True, msg=None), msg=None), 'population': All(Coerce(int, msg=None), Range(min=2, max=None, min_included=True, max_included=True, msg=None), msg=None)}}¶
-
validate
(data=None)¶
-
weights
¶
-
-
gaudi.parse.
deep_update
(source, overrides)¶ Update a nested dictionary or similar mapping.
Modify
source
in place.
-
gaudi.parse.
parse_rawstring
(s)¶ It parses reference strings contained in some fields.
These strings contain references to
genes.molecule
instances, and one of its atoms
-
gaudi.parse.
validate
(schema, data)¶
gaudi.plugin¶
This module provides the basic functionality for the plugin system of genes and objectives.
-
class
gaudi.plugin.
PluginMount
(name, bases, attrs)¶ Bases:
type
Base class for plugin mount points.
Metaclass trickery obtained from Marty Alchin’s blog Each mount point (ie,
genes
andobjectives
), MUST inherit this one.
-
gaudi.plugin.
import_plugins
(*pluginlist)¶ Import requested modules, only once, when launch.py is called and the configuration is parsed successfully.
Parameters: pluginlist (list of gaudi.parse.Param) – Usually, the genes or objectives list resulting from the configuration parsing.
-
gaudi.plugin.
load_plugins
(plugins, container=None, **kwargs)¶ Requests an instance of the class that resides in each plugin. For genes, each individual has its own instance, but objectives are treated like a singleton. So, they are only instantiated once. That’s the reason behind usen a mutable container.
Parameters: - plugins (list of gaudi.parse.Param) – Modules to load. Each Param must have a module attr with a full import path.
- container (dict or dict-like) – If provided, use this container to retain instances across individuals.
- kwargs – Everything else will be passed to the requested plugin instances.
gaudi.similarity¶
This module contains the similarity functions that are used to discard individuals that are not different enough.
-
gaudi.similarity.
rmsd
(ind1, ind2, subjects, threshold, *args, **kwargs)¶ Returns the RMSD between two individuals
Parameters: - ind1 (gaudi.base.Individual) –
- ind2 (gaudi.base.Individual) –
- subjects (list of str) – Name of gaudi.genes.molecule instances to measure
- threshold (float) – Maximum RMSD value to consider two individuals as similar.
If
rmsd > threshold
, they are considered different.
Returns: True if
rmsd
is within threshold, False otherwise. It will always return False if number of atoms is not equal in the two Individuals.Return type: bool
gaudi._cpdrift¶
Coherent Point Drift (affine and rigid) Python2/3 implementation, adapted from kwohlfahrt’s.
Only 3D points are supported in this version.
Depends on:
- Python 2.7, 3.4+
- Numpy
- Matplotlib (plotting only)
-
class
gaudi._cpdrift.
Quaternion
(s, i, j, k)¶ Bases:
object
-
axis_angle
¶
-
conjugate
(other=None)¶
-
classmethod
fromAxisAngle
(v, theta)¶
-
matrix
()¶
-
norm
()¶
-
reciprocal
()¶
-
unit
()¶
-
vector
¶
-
-
gaudi._cpdrift.
RMSD
(X, Y)¶
-
gaudi._cpdrift.
affine_cpd
(X, Y, w=0.0, B=None)¶
-
gaudi._cpdrift.
affine_xform
(X, B=<Mock name='mock.array()' id='140309713689552'>, t=0)¶
-
gaudi._cpdrift.
coherent_point_drift
(X, Y, w=0.0, B=None, guess_steps=5, max_iterations=20, method='affine')¶
-
gaudi._cpdrift.
common_steps
(X, Y, Y_, w, sigma_sq)¶
-
class
gaudi._cpdrift.
frange
(start, stop, step)¶ Bases:
object
-
gaudi._cpdrift.
last
(sequence)¶
-
gaudi._cpdrift.
pairwise_sqdist
(X, Y)¶
-
gaudi._cpdrift.
plot
(x, y, t)¶ Plot the initial datasets and registration results.
Parameters: - x (ndarray) – The static shape that y will be registered to. Expected array shape is [n_points_x, n_dims]
- y (ndarray) – The moving shape. Expected array shape is [n_points_y, n_dims]. Note that n_dims should be equal for x and y, but n_points does not need to match.
- t (ndarray) – The transformed version of y. Output shape is [n_points_y, n_dims].
-
gaudi._cpdrift.
rigid_cpd
(X, Y, w=0.0, R=None)¶
-
gaudi._cpdrift.
rigid_xform
(X, R=<Mock name='mock.array()' id='140309713689552'>, t=0.0, s=1.0)¶
-
gaudi._cpdrift.
rotation_matrix
(*angles)¶
-
gaudi._cpdrift.
spaced_rotations
(N)¶
-
gaudi._cpdrift.
std
(x)¶