API documentation

The GaudiMM package is comprised of several core modules that establish the base architecture to build an extensible platform of molecular design.

The main module is gaudi.base, which defines the gaudi.base.Individual, whose instances represent the potential solutions to the proposed problem. Two plugin packages allow easy customization of how individuals are defined (gaudi.genes) and how they are evaluated (gaudi.objectives). Additionally:

gaudi.algorithms

This module implements evolutionary algorithms as seen in DEAP, and extends their functionality to make use of GAUDI goodies.

Todo

  • Genealogy
gaudi.algorithms.dump_population(population, cfg, subdir=None)
gaudi.algorithms.ea_mu_plus_lambda(population, toolbox, mu, lambda_, cxpb, mutpb, ngen, cfg, stats=None, halloffame=None, verbose=True, prompt_on_exception=True)

This is the \((\mu + \lambda)\) evolutionary algorithm.

Parameters:
  • population – A list of individuals.
  • toolbox – A Toolbox that contains the evolution operators.
  • mu – The number of individuals to select for the next generation.
  • lambda_ – The number of children to produce at each generation.
  • cxpb – The probability that an offspring is produced by crossover.
  • mutpb – The probability that an offspring is produced by mutation.
  • ngen – The number of generation.
  • stats – A Statistics object that is updated inplace, optional.
  • halloffame – A HallOfFame object that will contain the best individuals, optional.
  • verbose – Whether or not to log the statistics.
Returns:

The final population.

First, the individuals having an invalid fitness are evaluated. Then, the evolutionary loop begins by producing lambda_ offspring from the population, the offspring are generated by a crossover, a mutation or a reproduction proportionally to the probabilities cxpb, mutpb and 1 - (cxpb + mutpb). The offspring are then evaluated and the next generation population is selected from both the offspring and the population. Briefly, the operators are applied as following

evaluate(population)
for i in range(ngen):
    offspring = varOr(population, toolbox, lambda_, cxpb, mutpb)
    evaluate(offspring)
    population = select(population + offspring, mu)

This function expects toolbox.mate(), toolbox.mutate(), toolbox.select() and toolbox.evaluate() aliases to be registered in the toolbox. This algorithm uses the varOr() variation.

gaudi.base

Contains the core classes we use to build individuals (potential solutions of the optimization process).

class gaudi.base.BaseIndividual(cfg=None, cache=None, **kwargs)

Bases: object

Base class for individual objects that are evaluated by DEAP.

Each individual is a potential solution. It contains all that is needed for an evaluation. With multiprocessing in mind, individuals should be self-contained so it can be passed between threads.

The defined methods are only wrapper calls to the respective methods of each gene.

Parameters:
  • cfg (gaudi.parse.Settings) – The full parsed object from the configuration YAML file.
  • cache (dict or dict-like) – A mutable object that can be used to store values across instances.
  • dummy (bool) – If True, create an uninitialized Individual, only containing the cfg attribute. If false, call __ready__ and complete initialization.
__CACHE

Class attribute that caches gene data across instances

Type:dict
__CACHE_OBJ

Class attribute that caches objectives data across instances

Type:dict

Todo

write() should use Pickle and just save the whole object, but Chimera’s inmutable objects (Atoms, Residues, etc) get in the way. A workaround may be found if we take a look a the session saving code.

clear_cache()
evaluate(environment)

Express individual, evaluate it and unexpress it.

Parameters:environment (Environment) – Objectives that will evaluate the individual
express()

Express genes in this environment. Very much like ‘compiling’ the individual to a chimera.Molecule.

mate(other)

Recombine genes of self with other. It simply calls mate on each gene instance

Parameters:other (Individual) – Another individual to mate with.
mutate(indpb)

Trigger a round of possible mutations across all genes

Parameters:indpb (float) – Probability of suffering a mutation
post_express()
post_unexpress()
pre_express()
pre_unexpress()
similar(other)

Compare self and other with a similarity function.

Returns:
Return type:bool
unexpress()

Undo .express()

write(i, path=None)

Export the individual to a mol2 file

Parameters:i (int) – Individual identificator in current generation or hall of fame
:param .. note ::: Maybe someday we can pickle it all :/
>>> filename = os.path.join(path, '{}_{}.pickle.gz'.format(name,i))
>>> with gzip.GzipFile(filename, 'wb') as f:
>>>     cPickle.dump(self, f, 0)
>>> return filename
class gaudi.base.Environment(cfg=None, *args, **kwargs)

Bases: object

Objective container and helper to evaluate an individual. It must be instantiated with a gaudi.parse.Settings object.

Parameters:cfg (gaudi.parse.Settings) – The parsed configuration YAML file that contains objectives information
clear_cache()
evaluate(individual)

individual : Individual

class gaudi.base.Fitness(weights)

Bases: deap.base.Fitness

wvalues = ()
class gaudi.base.MolecularIndividual(*args, **kwargs)

Bases: gaudi.base.BaseIndividual

find_molecule(name)
post_express()
xyz(gene=None)
gaudi.base.expressed(*args, **kwds)

gaudi.box

This module is a messy collection of useful functions used all along GAUDI.

Todo

Some of these functions are hardly used, so maybe we should clean it a little in the future…

gaudi.box.atoms_between(atom1, atom2)

Finds all connected atoms between two given atoms

gaudi.box.atoms_by_serial(*serials, **kw)

Find atoms in kw[‘atoms’] with serialNumber = serials.

Parameters:
  • serials (int) – List of serial numbers to match
  • atoms (list of chimera.Atom, optional) – List of atoms to be traversed while looking for serial numbers
Returns:

Return type:

list of chimera.Atom

gaudi.box.create_single_individual(path)

Create an individual within Chimera. Convenience method for Chimera IDLE.

gaudi.box.do_cprofile(func)

Decorator to cProfile a certain function and output the results to cprofile.out

gaudi.box.draw_interactions(interactions, startCol='FF0000', endCol='FFFF00', key=None, name='Custom pseudobonds')

Draw pseudobonds depicting atoms relationships.

Parameters:
  • interactions (list of tuples) – Each tuple contains an interaction, defined, at least, by the two atoms involved.
  • startCol (str, optional) – Hex code for the initial color of the pseudobond (closer to the first atom of the pair).
  • endCol (str, optional) – Hex code for the final color of the pseudobond. (closer to the second atom of the pair)
  • key (int, optional) – The index of an interaction tuple that represent the alpha channel in the color used to depict the interaction.
  • name (str, optional) – Name of the pseudobond group created.
Returns:

Return type:

chimera.pseudoBondGroup

gaudi.box.files_in(path, ext=None)

Returns all the files in a given directory, filtered by extension if desired.

Parameters:
  • path (str) –
  • ext (list of str, optional) – File extension(s) to filter on.
Returns:

Return type:

List of absolute paths

gaudi.box.find_nearest(anchor, atoms)

Find the atom of atoms that is closer to anchor, in terms of number of atoms in between.

gaudi.box.highest_atom_indices(r)

Returns a dictionary with highest atom indices in given residue Key: value -> element.name: highest index in residue

gaudi.box.incremental_existing_path(path, separator='__')
gaudi.box.open_models_and_close(*args, **kwds)
gaudi.box.pseudobond_to_bond(molecule, remove=False)

Transforms every pseudobond in molecule to a covalent bond

Parameters:
  • molecule (chimera.Molecule) –
  • remove (bool) – If True, remove original pseudobonds after actual bonds have been created.
gaudi.box.rmsd(a, b)
gaudi.box.sequential_bonds(atoms, s)

Returns bonds in atoms in sequential order, beginning at atom s

gaudi.box.silent_stdout(*args, **kwds)
gaudi.box.stdout_to_file(workspace, stderr=True)
gaudi.box.suppress_ksdssp(trig_name, my_data, molecules)

Monkey-patch Chimera triggers to disable KSDSSP computation

gaudi.box.write_individuals(inds, outpath, name, evalfn, remove=True)

Write an individual to disk.

Note

Deprecated since an Individual object is able to write itself to disk.

gaudi.exceptions

This module collects more meaningful exceptions than builtins.

exception gaudi.exceptions.AtomsNotFound

Bases: exceptions.Exception

exception gaudi.exceptions.MoleculesNotFound

Bases: exceptions.Exception

exception gaudi.exceptions.ResiduesNotFound

Bases: exceptions.Exception

exception gaudi.exceptions.TooManyAtoms

Bases: exceptions.Exception

exception gaudi.exceptions.TooManyResidues

Bases: exceptions.Exception

gaudi.parallel

Helper functions to deal with parallel execution of GaudiMM jobs.abs

Useful for benchmarks.

gaudi.parallel.run_parallel(fn, args=(), processes=None, initializer=None, initargs=(), maxtasksperchild=1, map_timeout=9999999, map_chunksize=1, map_callback=None)

Create a pool instance with built-in exception handling.

gaudi.parse

This module parses and validates YAML input files into convenient objects that allow per-attribute access to configuration parameters.

gaudi.parse.AssertList(*validators, **kwargs)

Make sure the value is contained in a list

gaudi.parse.Coordinates(v)
gaudi.parse.Degrees(v)
gaudi.parse.ExpandUserPathExists(v)
gaudi.parse.Importable(v)
gaudi.parse.MakeDir(validator)
class gaudi.parse.MoleculeAtom(molecule, atom)

Bases: tuple

atom

Alias for field number 1

molecule

Alias for field number 0

class gaudi.parse.MoleculeResidue(molecule, residue)

Bases: tuple

molecule

Alias for field number 0

residue

Alias for field number 1

gaudi.parse.Molecule_name(v)

Ideal implementation:

def fn(v):
    valid = [i['name'] for i in items if i['module'] == 'gaudi.genes.molecule']
    if v not in valid:
        raise Invalid("{} is not a valid Molecule name".format(v))
    return v
return fn

However, I must figure a way to get the gene list beforehand

gaudi.parse.Named_spec(*names)

Assert that str is formatted like “Molecule/123”, with Molecule being a valid name of a Molecule gene and 123 a positive int or *

gaudi.parse.RelPathToInputFile(inputpath=None)
gaudi.parse.ResidueThreeLetterCode(v)
class gaudi.parse.Settings(path=None, validation=True)

Bases: munch.Munch

Parses a YAML input file with PyYAML, validates it with voluptuous and builds a attribute-accessible dict with Munch.

Hence, all the attributes in this class are generated automatically from the default values, and the updated with the contents of the YAML file.

Parameters:path (str) – Path to YAML file
output

Contains the parameters to determine how to write and report results.

Type:dict
output.path

Directory that will contain the result files. If it does not exist, it will be created. If it does, the contents could be overwritten. Better change this between different attempts.

Type:str, optional, defaults to . (current dir)
output.name

A small identifier for your calculation. If not set, it will use five random characters.

Type:str, optional
output.precision

How many decimals should be used along the simulation. This won’t only affect reports, but also the reported scores by the objectives during the selection process.

Type:int, optional, defaults to 3
output.compress

Whether to apply compression to the individual ZIP files or not.

Type:bool, optional, defaults to True
output.history

Whether to save all the genealogy of the individuals created along the simulation or not. Only for advanced users.

Type:bool, optional, defaults to False
output.pareto

If True, the elite population will be the Pareto front of the population. If False, the elite population will be the best solutions, according to the lexicographic sorting of the fitness values.

Type:bool, optional, defaults to True
output.verbose

Whether to realtime report the progress of the simulation or not.

Type:bool, optional, defaults to True
output.check_every

Dump the elite population every n generations. Switched off if set to 0.

Type:int, optional, defaults to 10
output.prompt_on_exception

When an exception is raised, GaudiMM tries to dump the current population to disk as an emergency rescue plan. This includes pressing Ctrl+C. If this happens, it prompts the user whether to dump it or not. For interactive sessions this is desirable, but no so much for unsupervised cluster jobs. If set to False, this behaviour will be disabled.

Type:bool, optional, defaults to True
ga

Contains the genetic algorithm parameters.

Type:dict
ga.population

Size of the starting population, in number of individuals.

Type:int
ga.generations

Number of generations to simulate.

Type:float
ga.mu

The number of children to select at each generation, expressed as a multiplier of ga.population.

Type:float, optional, defaults to 1.0
ga.lambda_

The number of children to produce at each generation, expressed as a multiplier of ga.population.

Type:float, optional, defaults to 3.0
ga.mut_eta

Crowding degree of the mutation. A high eta will produce a mutant resembling its parent, while a small eta will produce a solution much more different.

Type:float, optional, defaults to 5
ga.mut_pb

The probability that an offspring is produced by mutation.

Type:float, optional, defaults to 0.5
ga.mut_indpb

Independent probability for each gene to be mutated.

Type:float, optional, defaults to 0.75
ga.cx_eta

Crowding degree of the crossover. A high eta will produce children resembling to their parents, while a small eta will produce solutions much more different.

Type:float, optional, defaults to 5
ga.cx_pb

The probability that an offspring is produced by crossover.

Type:float, optional, defaults to 0.5
similarity

Contains the parameters to the similarity operator, which, given two individuals with the same fitness, whether they can be considered the same solution or not.

Type:dict
similarity.module

The function to call when a fitness draw happens. It should be expressed as Python importable path; ie, separated by dots: gaudi.similarity.rmsd.

Type:str
similarity.args

Positional arguments to the similarity function.

Type:list
similarity.kwargs

Optional arguments to the similarity function.

Type:dict
genes

Contains the list of genes that each Individual will have.

Type:list of dict
objectives

Contains the list of objectives that will make the Environment object to evaluate the Individuals.

Type:list of dict
default_values = {'ga': {'cx_eta': 5, 'cx_pb': 0.5, 'generations': 3, 'lambda_': 3, 'mu': 1, 'mut_eta': 5, 'mut_indpb': 0.75, 'mut_pb': 0.5, 'population': 10}, 'genes': [{}], 'objectives': [{}], 'output': {'check_every': 10, 'compress': True, 'history': False, 'name': 'tkomH', 'pareto': True, 'path': '.', 'precision': 3, 'prompt_on_exception': True, 'verbose': True}, 'similarity': {'args': [['Ligand'], 2.5], 'kwargs': {}, 'module': 'gaudi.similarity.rmsd'}}
name_objectives
schema = {'genes': All(Length(min=1, max=None), [<type 'dict'>], msg=None), 'objectives': All(Length(min=1, max=None), [<type 'dict'>], msg=None), 'output': {'check_every': All(Coerce(int, msg=None), Range(min=0, max=None, min_included=True, max_included=True, msg=None), msg=None), 'compress': Coerce(bool, msg=None), 'history': Coerce(bool, msg=None), 'name': All(<type 'basestring'>, Length(min=1, max=255), msg=None), 'pareto': Coerce(bool, msg=None), 'path': <function fn>, 'precision': All(Coerce(int, msg=None), Range(min=-3, max=6, min_included=True, max_included=True, msg=None), msg=None), 'prompt_on_exception': Coerce(bool, msg=None), 'verbose': Coerce(bool, msg=None)}, 'similarity': {'args': <type 'list'>, 'kwargs': <type 'dict'>, 'module': <type 'basestring'>}, '_path': <function ExpandUserPathExists>, 'ga': {'cx_eta': All(Coerce(int, msg=None), Range(min=0, max=None, min_included=True, max_included=True, msg=None), msg=None), 'cx_pb': All(Coerce(float, msg=None), Range(min=0, max=1, min_included=True, max_included=True, msg=None), msg=None), 'generations': All(Coerce(int, msg=None), Range(min=0, max=None, min_included=True, max_included=True, msg=None), msg=None), 'lambda_': All(Coerce(float, msg=None), Range(min=0, max=None, min_included=True, max_included=True, msg=None), msg=None), 'mu': All(Coerce(float, msg=None), Range(min=0, max=1, min_included=True, max_included=True, msg=None), msg=None), 'mut_eta': All(Coerce(int, msg=None), Range(min=0, max=None, min_included=True, max_included=True, msg=None), msg=None), 'mut_indpb': All(Coerce(float, msg=None), Range(min=0, max=1, min_included=True, max_included=True, msg=None), msg=None), 'mut_pb': All(Coerce(float, msg=None), Range(min=0, max=1, min_included=True, max_included=True, msg=None), msg=None), 'population': All(Coerce(int, msg=None), Range(min=2, max=None, min_included=True, max_included=True, msg=None), msg=None)}}
validate(data=None)
weights
gaudi.parse.deep_update(source, overrides)

Update a nested dictionary or similar mapping.

Modify source in place.

gaudi.parse.parse_rawstring(s)

It parses reference strings contained in some fields.

These strings contain references to genes.molecule instances, and one of its atoms

gaudi.parse.validate(schema, data)

gaudi.plugin

This module provides the basic functionality for the plugin system of genes and objectives.

class gaudi.plugin.PluginMount(name, bases, attrs)

Bases: type

Base class for plugin mount points.

Metaclass trickery obtained from Marty Alchin’s blog Each mount point (ie, genes and objectives), MUST inherit this one.

gaudi.plugin.import_plugins(*pluginlist)

Import requested modules, only once, when launch.py is called and the configuration is parsed successfully.

Parameters:pluginlist (list of gaudi.parse.Param) – Usually, the genes or objectives list resulting from the configuration parsing.
gaudi.plugin.load_plugins(plugins, container=None, **kwargs)

Requests an instance of the class that resides in each plugin. For genes, each individual has its own instance, but objectives are treated like a singleton. So, they are only instantiated once. That’s the reason behind usen a mutable container.

Parameters:
  • plugins (list of gaudi.parse.Param) – Modules to load. Each Param must have a module attr with a full import path.
  • container (dict or dict-like) – If provided, use this container to retain instances across individuals.
  • kwargs – Everything else will be passed to the requested plugin instances.

gaudi.similarity

This module contains the similarity functions that are used to discard individuals that are not different enough.

gaudi.similarity.rmsd(ind1, ind2, subjects, threshold, *args, **kwargs)

Returns the RMSD between two individuals

Parameters:
  • ind1 (gaudi.base.Individual) –
  • ind2 (gaudi.base.Individual) –
  • subjects (list of str) – Name of gaudi.genes.molecule instances to measure
  • threshold (float) – Maximum RMSD value to consider two individuals as similar. If rmsd > threshold, they are considered different.
Returns:

True if rmsd is within threshold, False otherwise. It will always return False if number of atoms is not equal in the two Individuals.

Return type:

bool

gaudi._cpdrift

Coherent Point Drift (affine and rigid) Python2/3 implementation, adapted from kwohlfahrt’s.

Only 3D points are supported in this version.

Depends on:

  • Python 2.7, 3.4+
  • Numpy
  • Matplotlib (plotting only)
class gaudi._cpdrift.Quaternion(s, i, j, k)

Bases: object

axis_angle
conjugate(other=None)
classmethod fromAxisAngle(v, theta)
matrix()
norm()
reciprocal()
unit()
vector
gaudi._cpdrift.RMSD(X, Y)
gaudi._cpdrift.affine_cpd(X, Y, w=0.0, B=None)
gaudi._cpdrift.affine_xform(X, B=<Mock name='mock.array()' id='140575305619088'>, t=0)
gaudi._cpdrift.coherent_point_drift(X, Y, w=0.0, B=None, guess_steps=5, max_iterations=20, method='affine')
gaudi._cpdrift.common_steps(X, Y, Y_, w, sigma_sq)
class gaudi._cpdrift.frange(start, stop, step)

Bases: object

gaudi._cpdrift.last(sequence)
gaudi._cpdrift.pairwise_sqdist(X, Y)
gaudi._cpdrift.plot(x, y, t)

Plot the initial datasets and registration results.

Parameters:
  • x (ndarray) – The static shape that y will be registered to. Expected array shape is [n_points_x, n_dims]
  • y (ndarray) – The moving shape. Expected array shape is [n_points_y, n_dims]. Note that n_dims should be equal for x and y, but n_points does not need to match.
  • t (ndarray) – The transformed version of y. Output shape is [n_points_y, n_dims].
gaudi._cpdrift.rigid_cpd(X, Y, w=0.0, R=None)
gaudi._cpdrift.rigid_xform(X, R=<Mock name='mock.array()' id='140575305619088'>, t=0.0, s=1.0)
gaudi._cpdrift.rotation_matrix(*angles)
gaudi._cpdrift.spaced_rotations(N)
gaudi._cpdrift.std(x)