API documentation¶

The GaudiMM package is comprised of several core modules that establish the base architecture to build an extensible platform of molecular design.

The main module is gaudi.base, which defines the gaudi.base.Individual, whose instances represent the potential solutions to the proposed problem. Two plugin packages allow easy customization of how individuals are defined (gaudi.genes) and how they are evaluated (gaudi.objectives). Additionally:

gaudi.algorithms is the place to look for the actual GA implementation
gaudi.box is a placeholder for several small functions that are used across GaudiMM.
gaudi.exceptions defines custom exceptions.
gaudi.parse contains parsing utilities to retrieve the configuration files.
gaudi.parallel contains helpers to deal with parallel GaudiMM jobs.
gaudi.plugin holds some magic to make the plugin system work.
gaudi.similarity defines the diversity enhancers.

gaudi.cli
- gaudi.cli.gaudi_cli
- gaudi.cli.gaudi_run

gaudi.genes

gaudi.objectives

gaudi.algorithms¶

This module implements evolutionary algorithms as seen in DEAP, and extends their functionality to make use of GAUDI goodies.

Todo

Genealogy

gaudi.algorithms.dump_population(population, cfg, subdir=None)¶

gaudi.algorithms.ea_mu_plus_lambda(population, toolbox, mu, lambda_, cxpb, mutpb, ngen, cfg, stats=None, halloffame=None, verbose=True, prompt_on_exception=True)¶

This is the \((\mu + \lambda)\) evolutionary algorithm.

Parameters:

population – A list of individuals.
toolbox – A Toolbox that contains the evolution operators.
mu – The number of individuals to select for the next generation.
lambda_ – The number of children to produce at each generation.
cxpb – The probability that an offspring is produced by crossover.
mutpb – The probability that an offspring is produced by mutation.
ngen – The number of generation.
stats – A Statistics object that is updated inplace, optional.
halloffame – A HallOfFame object that will contain the best individuals, optional.
verbose – Whether or not to log the statistics.

Returns:

The final population.

First, the individuals having an invalid fitness are evaluated. Then, the evolutionary loop begins by producing lambda_ offspring from the population, the offspring are generated by a crossover, a mutation or a reproduction proportionally to the probabilities cxpb, mutpb and 1 - (cxpb + mutpb). The offspring are then evaluated and the next generation population is selected from both the offspring and the population. Briefly, the operators are applied as following

evaluate(population)
for i in range(ngen):
    offspring = varOr(population, toolbox, lambda_, cxpb, mutpb)
    evaluate(offspring)
    population = select(population + offspring, mu)

This function expects toolbox.mate(), toolbox.mutate(), toolbox.select() and toolbox.evaluate() aliases to be registered in the toolbox. This algorithm uses the varOr() variation.

gaudi.base¶

Contains the core classes we use to build individuals (potential solutions of the optimization process).

class gaudi.base.BaseIndividual(cfg=None, cache=None, **kwargs)¶

Bases: object

Base class for individual objects that are evaluated by DEAP.

Each individual is a potential solution. It contains all that is needed for an evaluation. With multiprocessing in mind, individuals should be self-contained so it can be passed between threads.

The defined methods are only wrapper calls to the respective methods of each gene.

Parameters:	cfg (gaudi.parse.Settings) – The full parsed object from the configuration YAML file. cache (dict or dict-like) – A mutable object that can be used to store values across instances. dummy (bool) – If True, create an uninitialized Individual, only containing the cfg attribute. If false, call __ready__ and complete initialization.

__CACHE¶

Class attribute that caches gene data across instances

Type:	dict

__CACHE_OBJ¶

Class attribute that caches objectives data across instances

Type:	dict

Todo

write() should use Pickle and just save the whole object, but Chimera’s inmutable objects (Atoms, Residues, etc) get in the way. A workaround may be found if we take a look a the session saving code.

clear_cache()¶

evaluate(environment)¶

Express individual, evaluate it and unexpress it.

Parameters:	environment (Environment) – Objectives that will evaluate the individual

express()¶: Express genes in this environment. Very much like ‘compiling’ the individual to a chimera.Molecule.

mate(other)¶

Recombine genes of self with other. It simply calls mate on each gene instance

Parameters:	other (Individual) – Another individual to mate with.

mutate(indpb)¶

Trigger a round of possible mutations across all genes

Parameters:	indpb (float) – Probability of suffering a mutation

post_express()¶

post_unexpress()¶

pre_express()¶

pre_unexpress()¶

similar(other)¶

Compare self and other with a similarity function.

Returns:
Return type:	bool

unexpress()¶: Undo .express()

write(i, path=None)¶

Export the individual to a mol2 file

Parameters:	i (int) – Individual identificator in current generation or hall of fame note :: (.) – Maybe someday we can pickle it all :/ >>> filename = os.path.join(path, ‘{}_{}.pickle.gz’.format(name,i)) >>> with gzip.GzipFile(filename, ‘wb’) as f: >>> cPickle.dump(self, f, 0) >>> return filename

class gaudi.base.Environment(cfg=None, *args, **kwargs)¶

Bases: object

Objective container and helper to evaluate an individual. It must be instantiated with a gaudi.parse.Settings object.

Parameters:	cfg (gaudi.parse.Settings) – The parsed configuration YAML file that contains objectives information

clear_cache()¶

evaluate(individual)¶: individual : Individual

class gaudi.base.Fitness(weights)¶

Bases: deap.base.Fitness

wvalues = ()¶

class gaudi.base.MolecularIndividual(*args, **kwargs)¶

Bases: gaudi.base.BaseIndividual

find_molecule(name)¶

post_express()¶

xyz(gene=None)¶

gaudi.base.expressed(*args, **kwds)¶

gaudi.box¶

This module is a messy collection of useful functions used all along GAUDI.

Todo

Some of these functions are hardly used, so maybe we should clean it a little in the future…

gaudi.box.atoms_between(atom1, atom2)¶: Finds all connected atoms between two given atoms

gaudi.box.atoms_by_serial(*serials, **kw)¶

Find atoms in kw[‘atoms’] with serialNumber = serials.

Parameters:	serials (int) – List of serial numbers to match atoms (list of chimera.Atom, optional) – List of atoms to be traversed while looking for serial numbers
Returns:
Return type:	list of chimera.Atom

gaudi.box.create_single_individual(path)¶: Create an individual within Chimera. Convenience method for Chimera IDLE.

gaudi.box.do_cprofile(func)¶: Decorator to cProfile a certain function and output the results to cprofile.out

gaudi.box.draw_interactions(interactions, startCol='FF0000', endCol='FFFF00', key=None, name='Custom pseudobonds')¶

Draw pseudobonds depicting atoms relationships.

Parameters:	interactions (list of tuples) – Each tuple contains an interaction, defined, at least, by the two atoms involved. startCol (str, optional) – Hex code for the initial color of the pseudobond (closer to the first atom of the pair). endCol (str, optional) – Hex code for the final color of the pseudobond. (closer to the second atom of the pair) key (int, optional) – The index of an interaction tuple that represent the alpha channel in the color used to depict the interaction. name (str, optional) – Name of the pseudobond group created.
Returns:
Return type:	chimera.pseudoBondGroup

gaudi.box.files_in(path, ext=None)¶

Returns all the files in a given directory, filtered by extension if desired.

Parameters:	path (str) – ext (list of str, optional) – File extension(s) to filter on.
Returns:
Return type:	List of absolute paths

gaudi.box.find_nearest(anchor, atoms)¶: Find the atom of atoms that is closer to anchor, in terms of number of atoms in between.

gaudi.box.highest_atom_indices(r)¶: Returns a dictionary with highest atom indices in given residue Key: value -> element.name: highest index in residue

gaudi.box.incremental_existing_path(path, separator='__')¶

gaudi.box.open_models_and_close(*args, **kwds)¶

gaudi.box.pseudobond_to_bond(molecule, remove=False)¶

Transforms every pseudobond in molecule to a covalent bond

Parameters:	molecule (chimera.Molecule) – remove (bool) – If True, remove original pseudobonds after actual bonds have been created.

gaudi.box.rmsd(a, b)¶

gaudi.box.sequential_bonds(atoms, s)¶: Returns bonds in atoms in sequential order, beginning at atom s

gaudi.box.silent_stdout(*args, **kwds)¶

gaudi.box.stdout_to_file(workspace, stderr=True)¶

gaudi.box.suppress_ksdssp(trig_name, my_data, molecules)¶: Monkey-patch Chimera triggers to disable KSDSSP computation

gaudi.box.write_individuals(inds, outpath, name, evalfn, remove=True)¶: Write an individual to disk.

Note

Deprecated since an Individual object is able to write itself to disk.

gaudi.exceptions¶

This module collects more meaningful exceptions than builtins.

exception gaudi.exceptions.AtomsNotFound¶: Bases: exceptions.Exception

exception gaudi.exceptions.MoleculesNotFound¶: Bases: exceptions.Exception

exception gaudi.exceptions.ResiduesNotFound¶: Bases: exceptions.Exception

exception gaudi.exceptions.TooManyAtoms¶: Bases: exceptions.Exception

exception gaudi.exceptions.TooManyResidues¶: Bases: exceptions.Exception

gaudi.parallel¶

Helper functions to deal with parallel execution of GaudiMM jobs.abs

Useful for benchmarks.

gaudi.parallel.run_parallel(fn, args=(), processes=None, initializer=None, initargs=(), maxtasksperchild=1, map_timeout=9999999, map_chunksize=1, map_callback=None)¶: Create a pool instance with built-in exception handling.

gaudi.parse¶

This module parses and validates YAML input files into convenient objects that allow per-attribute access to configuration parameters.

gaudi.parse.AssertList(*validators, **kwargs)¶: Make sure the value is contained in a list

gaudi.parse.Coordinates(v)¶

gaudi.parse.Degrees(v)¶

gaudi.parse.ExpandUserPathExists(v)¶

gaudi.parse.Importable(v)¶

gaudi.parse.MakeDir(validator)¶

class gaudi.parse.MoleculeAtom(molecule, atom)¶

Bases: tuple

atom¶: Alias for field number 1

molecule¶: Alias for field number 0

class gaudi.parse.MoleculeResidue(molecule, residue)¶

Bases: tuple

molecule¶: Alias for field number 0

residue¶: Alias for field number 1

gaudi.parse.Molecule_name(v)¶

Ideal implementation:

def fn(v):
    valid = [i['name'] for i in items if i['module'] == 'gaudi.genes.molecule']
    if v not in valid:
        raise Invalid("{} is not a valid Molecule name".format(v))
    return v
return fn

However, I must figure a way to get the gene list beforehand

gaudi.parse.Named_spec(*names)¶: Assert that str is formatted like “Molecule/123”, with Molecule being a valid name of a Molecule gene and 123 a positive int or *

gaudi.parse.RelPathToInputFile(inputpath=None)¶

gaudi.parse.ResidueThreeLetterCode(v)¶

class gaudi.parse.Settings(path=None, validation=True)¶

Bases: munch.Munch

Parses a YAML input file with PyYAML, validates it with voluptuous and builds a attribute-accessible dict with Munch.

Hence, all the attributes in this class are generated automatically from the default values, and the updated with the contents of the YAML file.

Parameters:	path (str) – Path to YAML file

output¶

Contains the parameters to determine how to write and report results.

Type:	dict

output.path¶

Directory that will contain the result files. If it does not exist, it will be created. If it does, the contents could be overwritten. Better change this between different attempts.

Type:	str, optional, defaults to . (current dir)

output.name¶

A small identifier for your calculation. If not set, it will use five random characters.

Type:	str, optional

output.precision¶

How many decimals should be used along the simulation. This won’t only affect reports, but also the reported scores by the objectives during the selection process.

Type:	int, optional, defaults to 3

output.compress¶

Whether to apply compression to the individual ZIP files or not.

Type:	bool, optional, defaults to True

output.history¶

Whether to save all the genealogy of the individuals created along the simulation or not. Only for advanced users.

Type:	bool, optional, defaults to False

output.pareto¶

If True, the elite population will be the Pareto front of the population. If False, the elite population will be the best solutions, according to the lexicographic sorting of the fitness values.

Type:	bool, optional, defaults to True

output.verbose¶

Whether to realtime report the progress of the simulation or not.

Type:	bool, optional, defaults to True

output.check_every¶

Dump the elite population every n generations. Switched off if set to 0.

Type:	int, optional, defaults to 10

output.prompt_on_exception¶

When an exception is raised, GaudiMM tries to dump the current population to disk as an emergency rescue plan. This includes pressing Ctrl+C. If this happens, it prompts the user whether to dump it or not. For interactive sessions this is desirable, but no so much for unsupervised cluster jobs. If set to False, this behaviour will be disabled.

Type:	bool, optional, defaults to True

ga¶

Contains the genetic algorithm parameters.

Type:	dict

ga.population¶

Size of the starting population, in number of individuals.

Type:	int

ga.generations¶

Number of generations to simulate.

Type:	float

ga.mu¶

The number of children to select at each generation, expressed as a multiplier of ga.population.

Type:	float, optional, defaults to 1.0

ga.lambda_¶

The number of children to produce at each generation, expressed as a multiplier of ga.population.

Type:	float, optional, defaults to 3.0

ga.mut_eta¶

Crowding degree of the mutation. A high eta will produce a mutant resembling its parent, while a small eta will produce a solution much more different.

Type:	float, optional, defaults to 5

ga.mut_pb¶

The probability that an offspring is produced by mutation.

Type:	float, optional, defaults to 0.5

ga.mut_indpb¶

Independent probability for each gene to be mutated.

Type:	float, optional, defaults to 0.75

ga.cx_eta¶

Crowding degree of the crossover. A high eta will produce children resembling to their parents, while a small eta will produce solutions much more different.

Type:	float, optional, defaults to 5

ga.cx_pb¶

The probability that an offspring is produced by crossover.

Type:	float, optional, defaults to 0.5

similarity¶

Contains the parameters to the similarity operator, which, given two individuals with the same fitness, whether they can be considered the same solution or not.

Type:	dict

similarity.module¶

The function to call when a fitness draw happens. It should be expressed as Python importable path; ie, separated by dots: gaudi.similarity.rmsd.

Type:	str

similarity.args¶

Positional arguments to the similarity function.

Type:	list

similarity.kwargs¶

Optional arguments to the similarity function.

Type:	dict

genes¶

Contains the list of genes that each Individual will have.

Type:	list of dict

objectives¶

Contains the list of objectives that will make the Environment object to evaluate the Individuals.

Type:	list of dict

default_values = {'ga': {'cx_eta': 5, 'cx_pb': 0.5, 'generations': 3, 'lambda_': 3, 'mu': 1, 'mut_eta': 5, 'mut_indpb': 0.75, 'mut_pb': 0.5, 'population': 10}, 'genes': [{}], 'objectives': [{}], 'output': {'check_every': 10, 'compress': True, 'history': False, 'name': 'dLKBW', 'pareto': True, 'path': '.', 'precision': 3, 'prompt_on_exception': True, 'verbose': True}, 'similarity': {'args': [['Ligand'], 2.5], 'kwargs': {}, 'module': 'gaudi.similarity.rmsd'}}¶

name_objectives¶

schema = {'genes': All(Length(min=1, max=None), [<type 'dict'>], msg=None), 'objectives': All(Length(min=1, max=None), [<type 'dict'>], msg=None), 'output': {'check_every': All(Coerce(int, msg=None), Range(min=0, max=None, min_included=True, max_included=True, msg=None), msg=None), 'compress': Coerce(bool, msg=None), 'history': Coerce(bool, msg=None), 'name': All(<type 'basestring'>, Length(min=1, max=255), msg=None), 'pareto': Coerce(bool, msg=None), 'path': <function fn>, 'precision': All(Coerce(int, msg=None), Range(min=-3, max=6, min_included=True, max_included=True, msg=None), msg=None), 'prompt_on_exception': Coerce(bool, msg=None), 'verbose': Coerce(bool, msg=None)}, 'similarity': {'args': <type 'list'>, 'kwargs': <type 'dict'>, 'module': <type 'basestring'>}, '_path': <function ExpandUserPathExists>, 'ga': {'cx_eta': All(Coerce(int, msg=None), Range(min=0, max=None, min_included=True, max_included=True, msg=None), msg=None), 'cx_pb': All(Coerce(float, msg=None), Range(min=0, max=1, min_included=True, max_included=True, msg=None), msg=None), 'generations': All(Coerce(int, msg=None), Range(min=0, max=None, min_included=True, max_included=True, msg=None), msg=None), 'lambda_': All(Coerce(float, msg=None), Range(min=0, max=None, min_included=True, max_included=True, msg=None), msg=None), 'mu': All(Coerce(float, msg=None), Range(min=0, max=1, min_included=True, max_included=True, msg=None), msg=None), 'mut_eta': All(Coerce(int, msg=None), Range(min=0, max=None, min_included=True, max_included=True, msg=None), msg=None), 'mut_indpb': All(Coerce(float, msg=None), Range(min=0, max=1, min_included=True, max_included=True, msg=None), msg=None), 'mut_pb': All(Coerce(float, msg=None), Range(min=0, max=1, min_included=True, max_included=True, msg=None), msg=None), 'population': All(Coerce(int, msg=None), Range(min=2, max=None, min_included=True, max_included=True, msg=None), msg=None)}}¶

validate(data=None)¶

weights¶

gaudi.parse.deep_update(source, overrides)¶

Update a nested dictionary or similar mapping.

Modify source in place.

gaudi.parse.parse_rawstring(s)¶

It parses reference strings contained in some fields.

These strings contain references to genes.molecule instances, and one of its atoms

gaudi.parse.validate(schema, data)¶

gaudi.plugin¶

This module provides the basic functionality for the plugin system of genes and objectives.

class gaudi.plugin.PluginMount(name, bases, attrs)¶

Bases: type

Base class for plugin mount points.

Metaclass trickery obtained from Marty Alchin’s blog Each mount point (ie, genes and objectives), MUST inherit this one.

gaudi.plugin.import_plugins(*pluginlist)¶

Import requested modules, only once, when launch.py is called and the configuration is parsed successfully.

Parameters:	pluginlist (list of gaudi.parse.Param) – Usually, the genes or objectives list resulting from the configuration parsing.

gaudi.plugin.load_plugins(plugins, container=None, **kwargs)¶

Requests an instance of the class that resides in each plugin. For genes, each individual has its own instance, but objectives are treated like a singleton. So, they are only instantiated once. That’s the reason behind usen a mutable container.

Parameters:	plugins (list of gaudi.parse.Param) – Modules to load. Each Param must have a module attr with a full import path. container (dict or dict-like) – If provided, use this container to retain instances across individuals. kwargs – Everything else will be passed to the requested plugin instances.

gaudi.similarity¶

This module contains the similarity functions that are used to discard individuals that are not different enough.

gaudi.similarity.rmsd(ind1, ind2, subjects, threshold, *args, **kwargs)¶

Returns the RMSD between two individuals

Parameters:	ind1 (gaudi.base.Individual) – ind2 (gaudi.base.Individual) – subjects (list of str) – Name of gaudi.genes.molecule instances to measure threshold (float) – Maximum RMSD value to consider two individuals as similar. If `rmsd > threshold`, they are considered different.
Returns:	True if `rmsd` is within threshold, False otherwise. It will always return False if number of atoms is not equal in the two Individuals.
Return type:	bool

gaudi._cpdrift¶

Coherent Point Drift (affine and rigid) Python2/3 implementation, adapted from kwohlfahrt’s.

Only 3D points are supported in this version.

Depends on:

Python 2.7, 3.4+
Numpy
Matplotlib (plotting only)

class gaudi._cpdrift.Quaternion(s, i, j, k)¶

Bases: object

axis_angle¶

conjugate(other=None)¶

classmethod fromAxisAngle(v, theta)¶

matrix()¶

norm()¶

reciprocal()¶

unit()¶

vector¶

gaudi._cpdrift.RMSD(X, Y)¶

gaudi._cpdrift.affine_cpd(X, Y, w=0.0, B=None)¶

gaudi._cpdrift.affine_xform(X, B=<Mock name='mock.array()' id='140309713689552'>, t=0)¶

gaudi._cpdrift.coherent_point_drift(X, Y, w=0.0, B=None, guess_steps=5, max_iterations=20, method='affine')¶

gaudi._cpdrift.common_steps(X, Y, Y_, w, sigma_sq)¶

class gaudi._cpdrift.frange(start, stop, step)¶: Bases: object

gaudi._cpdrift.last(sequence)¶

gaudi._cpdrift.pairwise_sqdist(X, Y)¶

gaudi._cpdrift.plot(x, y, t)¶

Plot the initial datasets and registration results.

Parameters:	x (ndarray) – The static shape that y will be registered to. Expected array shape is [n_points_x, n_dims] y (ndarray) – The moving shape. Expected array shape is [n_points_y, n_dims]. Note that n_dims should be equal for x and y, but n_points does not need to match. t (ndarray) – The transformed version of y. Output shape is [n_points_y, n_dims].

gaudi._cpdrift.rigid_cpd(X, Y, w=0.0, R=None)¶

gaudi._cpdrift.rigid_xform(X, R=<Mock name='mock.array()' id='140309713689552'>, t=0.0, s=1.0)¶

gaudi._cpdrift.rotation_matrix(*angles)¶

gaudi._cpdrift.spaced_rotations(N)¶

gaudi._cpdrift.std(x)¶