GaudiMM

https://travis-ci.org/insilichem/gaudi.svg?branch=master https://readthedocs.org/projects/gaudi/badge/?version=latest https://anaconda.org/insilichem/gaudi/badges/installer/conda.svg https://img.shields.io/badge/doi-10.1002%2Fjcc.24847-blue.svg

GaudiMM, for Genetic Algorithms with Unrestricted Descriptors for Intuitive Molecular Modeling, helps to sketch new molecular designs that require complex interactions.

GaudiMM logo

How to install

Quick steps:

1 - Download the latest stable copy of UCSF Chimera and install it with:

chmod +x chimera-*.bin && sudo ./chimera-*.bin

2 - Install Miniconda Python 2.7 Distribution for your platform and install it with:

bash Miniconda2*.sh

3 - Install gaudi with conda in a new environment called insilichem (or whatever name you prefer after the -n flag), using these custom channels (-c flags):

conda create -n insilichem -c omnia -c salilab -c insilichem -c conda-forge -c bioconda -c tpeulen gaudi

4 - Activate the new environment as proposed:

conda activate insilichem

or

source activate insilichem

5 - Run it!

gaudi

Check everything is OK

If everything went OK, you will get the usage screen:

Usage: gaudi [OPTIONS] COMMAND [ARGS]...

  GaudiMM: Genetic Algorithms with Unrestricted Descriptors for Intuitive
  Molecular Modeling

  (C) 2017, InsiliChem
  https://github.com/insilichem/gaudi

Options:
  --version   Show the version and exit.
  -h, --help  Show this message and exit.

Commands:
  prepare  Create or edit a GAUDI input file.
  run      Launch a GAUDI input file.
  view     Analyze the results in a GAUDI output file.

Quick usage

Running GAUDI jobs is quite easy with gaudi.cli.gaudi_run:

gaudi run /path/to/some_file.gaudi-input

To learn how to create input files, go to Input files, and make sure to check the tutorials!

After the job is completed, you can check the results with gaudi.cli.gaudi_view:

gaudi view /path/to/some_file.gaudi-output

You can choose between two molecular visualization tools: Chimera itself (using GaudiView extension), or our in-house GUI, GAUDInspect.

GAUDInspect GUI under Ubuntu 14.04 GAUDInspect GUI under Windows 10

Both tools support lazy-load, filtering and sorting, so choose whichever you prefer. A quick tutorial on GaudiView is available here:

Input files

GaudiMM uses YAML-formatted files for both input and output files. YAML is a human-readable serialization format, already implemented in a broad range of languages. Input files must contain these five sections:

  • output. Project options. Configure it to your liking
  • ga. Genetic algorithm configuration. Normally, you don’t have to touch this, except maybe the number of generations and population size.
  • similarity. The similarity function to compare potentially redundant solutions.
  • genes. List of descriptors used to define an individual
  • objectives. The list of functions that will evaluate your individuals.

You can check some sample input files in the examples directory.

How to create GaudiMM input files

If you don’t mind installing GAUDInspect, it provides a full GUI to create GAUDI input files, step by step. Add genes and objectives, configure the paths, number of generations and population size, and run it. Simple and easy.

Note

The development of GAUDInspect is currently stalled.

However, you can also edit them manually, since they are just plain text files. Create a copy of one of the examples and edit them to your convenience. While you can check the API documentation of gaudi.genes and gaudi.objectives to find a list of available components, we really recommend checking the beginners tutorial:

Output files

GAUDI output files are also YAML-formatted, so you can just open the file and read it in your favourite editor. However, that’s not probably what you were expecting to do.

For the time being, your best bet to understand how to analyze the results is to read the visualization tutorial:

Todo

  • Multi-objective solutions: Pareto front vs lexicographically sorted elite

FAQ & Known issues

Known issues

Most problems we have faced have to do with the installation and Chimera-Conda incompatibilities. Those are covered in the pychimera documentation, so please check that list before raising an issue in this repository!

How can I cite GaudiMM? What licenses apply?

GaudiMM is scientific software, funded by public research grants. If you make use of GaudiMM in scientific publications, please cite it. It will help measure the impact of our research and future funding!

@preamble{ " \newcommand{\noop}[1]{} " }
@article{GaudiMM2017,
   title = {GaudiMM: A Modular Multi-Objective Platform for Molecular Modeling},
   author = {Rodr{\'i}guez-Guerra Pedregal, Jaime and Sciortino, Giuseppe and Guasp, Jordi and Municoy, Mart{\'i} and Mar{\'e}chal, Jean-Didier},
   year = {\noop{2017}submitted},
}

GaudiMM itself is licensed under Apache License 2.0, but includes work from other developers, whose licenses apply. Please check the LICENSE file in the root directory for further details.

How many generations / Which population size should I pick?

This depends on the complexity of your search space (related to the number and type of genes in use), and the evaluation power of the chosen objectives. A simple job would work OK with 50 generations and populations of 100 individuals, while more complex ones would require 500 generations for populations of 1000 indivuals. This also depends on the values of mu and lambda parameters.

The output produces a lot of similar results and I want to focus on diversity

You should play with the cutoff value of the RMSD similarity section. If the RMSD of two potentially similar solutions is under the cutoff, one of them is discarded. However, this is only applied if the score of two solutions are the same; ie, if there is a draw.

To force draws, one can reduce the number of decimal positions returned by every objective, which is controlled by the precision parameter. It is setup globally in the output section, but can also be overriden by any objective. By default, precision is 3. Also, if you are using the gaudi.genes.search.Search gene, make sure its precision parameter is small enough so you don’t lose exploration efforts in too similar positions.

The output produces very different results and I am interested in clustered solutions

In this case, one should increase the precision value (both globally and in the gaudi.genes.search.Search gene, if it applies) and reduce the RMSD similarity cutoff.

What is the difference between atom_names and atom_types in some objectives or genes?

atom_names refers to the name attribute of chimera.Atom objects, while atom_types is applied with idatmType attributes. Normally, the name attribute is picked directly from the molecule file (PDB, mol2), while the idatmType is assigned algorithmically by UCSF Chimera.

Tip

Any further questions? Feel free to submit your inquiries to our issues page!

GaudiMM basics (start here!)

The main strength of GaudiMM is, possibly, its flexible approach to solve molecular modeling problems. However, flexibility does not come without a price: the learning curve can be somewhat steep for beginners. In this tutorial we will try to smooth that curve out.

GaudiMM makes a big effort to clearly separate exploration from evaluation. As such, both stages are configured in different sections of the input file: genes and objectives.

Exploration and genes

Most molecular modeling problems can be explained using search space analogies. This is, given a problem, one can identify the dimensions it must explore to find adequate solutions for that problem. That n-dimensional space can be comprised of one or more (bio)chemical structures and their respective spatial variability. Modifying the topology and/or composition of such structures would be equivalent to moving along the (bio)chemical axes of such search space, while modifying their atomic positions would be equivalent to moving along the cartesian and/or internal coordinates axes. Thus, visiting different regions of the whole search space means creating new candidate solutions to the problem… We will call this stage exploration, and it is detailed in the genes section of the input file. If you want to know why they are called genes, check the Genetic Algorithms Primer in the Developers guide.

The more important gene is, almost always, the gaudi.genes.molecule.Molecule gene. This is used to load (bio)chemical structures, such as proteins, peptides and small cofactors. Normally they are straight PDB or Mol2 files, but can also be more complex constructs. As of now, it is the only one capable of providing topologies. The other genes are responsible of applying changes to the coordinates of that initial structure. Some examples include torsion angles of rotatable bonds (gaudi.genes.torsion.Torsion gene), translating and rotating a Molecule in the 3D space (gaudi.genes.search.Search gene), or importing the atomic positions of a molecular dynamics trajectory frame (gaudi.genes.trajectory.Trajectory gene).

Evaluation and objectives

However, there is no guarantee that, given a new position on such search space, that hypothetical solution is actually good enough to solve the problem. To assess the quality of a random solution, it must be evaluated with some criteria or objectives. All objectives have something in common: they take a candidate solution (ie, a point of the search space) and measure some property to return a numeric score. Some examples include the distance between two given atoms (gaudi.objectives.distance.Distance objective), the steric clashes of the molecules (gaudi.objectives.contacts.Contacts objective) or the forcefield energy of the system (gaudi.objectives.energy.Energy objective).

Examples of application

Now we got the basics covered, we can start to think of how we could solve some standard (and not so standard) molecular modeling problems with GaudiMM.

Docking

Docking problems devote to finding the correct orientation and position of a small molecule (the ligand) within the cavity of a bigger one (normally, a protein). For the exploration stage, at least three genes must be defined:

This is all you need to move a rigid ligand around a protein. However, most of the time you want to implement some kind of internal flexibility, such as torsions in the ligand or some residues sidechains. To do that, add these to your genes section:

  • A gaudi.genes.torsion.Torsion gene with the ligand as target to implement dihedral angle torsions along the rotatable bonds of the ligand.
  • A gaudi.genes.rotamers.Rotamers gene with some residues of the protein as target. This will randomly modify the dihedral torsions of the sidechains of the specified residues according to the angles provided by the rotamer library (defaults to Dunbrack’s).

There is also a gaudi.genes.mutamers.Mutamers gene that allows you to mutate residues in addition to rotameric exploration. This will provide some traversal along the biochemical axis of your protein, but careful, because the current implementation is RAM hungry!

The evaluation stage can be comprised of several objectives, but normally you’d want to:

Covalent docking

Not all docking studies consider a free ligand. Sometimes the ligand is anchored to some part of the protein. While there’s no specific gene to implement a covalent bond (for now, at least), you can mimick it with a Search gene with radius=0 and rotate=True. The origin of this null search sphere should be set to the atom that is part of the covalent bond, so you would have add an extra atom in one of the molecules. Also, we recommend setting an gaudi.objectives.angle.Angle objective between the involved atoms in the covalent bond so that the resulting rotation matches the expected geometry of the new bond (109.5º for sp3, 120º for sp2, 180º for sp1).

Naked metal ions docking

If instead a small organic molecule you choose to load a naked metal ion… would it work? Well, in most docking programs probably not, but with GaudiMM the answer is… it depends!

It depends on the objectives you choose. Since we ship a gaudi.objectives.coordination.Coordination objective that can deal with coordination geometries, you can dock naked metal ions in any protein or peptide.

Competitive docking

Since you can instantiate as many gaudi.genes.molecule.Molecule genes as you want, nothing prevents you from adding more ligand molecules at the same time. They will compete to find its place in the protein(s). Just remember to replicate any dependent genes (gaudi.genes.search.Search, gaudi.genes.torsion.Torsion and so on) accordingly. Also, why not two proteins which the ligand should choose from (but it’s true there are finer ways to assess that)?

Hacking Molecule genes for complex studies

In addition to loading molecular structure files, the gaudi.genes.molecule.Molecule does a couple of extra things. The path parameter can be set to two different values:

  • The path to a PDB or Mol2 file (or any format that Chimera can open). This is the standard behaviour. It will load the structure and that’s it.

  • The path to a directory, whose contents determine the final behaviour:

    1. If the directory contains molecule files, GaudiMM will choose one of them randomly for each case.
    2. If the directory contains subdirectories which, in turn, contain molecules files, GaudiMM will sort those subdirectories by name and then pick one molecule from each, in that order. The chosen molecules will be chained linearly as specified in the accompanying *.attr files. Perfect for drug design, since you can fill a couple of directories with different alkanes to link a cofactor

Case A allows GaudiMM to test a library of compounds against certain criteria: ie, virtual screening!

Case B makes drug design studies possible. If you want to test different linker molecules to anchor a cofactor to a protein, you can use a two-directories scheme: one directory would contain the aforementioned linkers, and the other one just the cofactor (it’s fine it there’s only one molecule in one directory).

Peptide folding & Conformational analysis

Loading an unfolded peptide with gaudi.genes.molecule.Molecule is pretty easy. Just specify the path. Then, if you apply a gaudi.genes.torsion.Torsion gene on such peptide, but considering only the bonds involving an alpha-carbon, you can effectively explore the conformational space of its folding. Use gaudi.genes.rotamers.Rotamers to implement sidechain flexibility.

For the evaluation, you can use the gaudi.objectives.energy.Energy objective to assess the forcefield energy as provided by OpenMM GPU calculations. This type of study could be considered some sort of highly explosive metadynamics.

Homology modeling

In the same fashion, given an unfolded peptided or protein segment, you could apply the same Torsion scheme to explore different conformations. A hypothetical objective could be devised to calculate the RMS deviation of the new fragment and a reference one, which then the GA would minimize to obtain a similar structure.

Conformational analysis

Another possible variation of this scheme is to impose geometric guides as additional objectives, like gaudi.objectives.distance.Distance, gaudi.objectives.angle.Angle or gaudi.objectives.coordination.Coordination. The resulting calculation could be regarded as a restrained conformational analysis, very useful for finding initial structures of unparametrized small molecules you want to study with higher levels of theory, such as QM.

Trajectory analysis

GaudiMM features a gaudi.genes.trajectory.Trajectory gene capable of importing frames of MD movies and apply them to the corresponding gaudi.genes.molecule.Molecule instance. This way, all the objectives are available as trajectory analysis tools, maybe not present in other specific software. For example, finding coordination geometries of a given metal ion along a molecular mechanics simulation.

Tutorial: Your first GaudiMM calculation

Running a GaudiMM calculation is easy. All you need is a simple command:

gaudi run input_file.yaml

The question is… how do I create that input_file.yaml.

How to create an input file from scratch

Input files in GaudiMM are formatted with YAML, which follows readable conventions that are still parseable by computers. To easily edit YAML files, we recommend using a text editor that supports syntax highlighting, such as Sublime Text 3, Atom or Visual Studio Code. If you use .yaml as the extension of the input filename, the editor will colorize the text automatically. Else, you can always configure it manually. Check the docs of your editor to do so. If you don’t want syntax highlighting it’s fine, GaudiMM will work the same.

Input files must contain the sections mentioned above: output, ga, similarity, genes, and objectives. Each of this sections it’s defined with a colon and a newline, and its contents will be inside an indented block. For readability, I usually insert a blank line between sections.

output:
    contents of output

ga:
    contents of ga

similarity:
    contents of similarity

genes:
    contents of genes

objectives:
    contents of objectives

Note

The API documentation of gaudi.parse.Settings contains the full list of parameters for output and ga sections. Programmatically defined default values are always defined in gaudi.parse.Settings.default_values. The appropriate types and whether they are required or not are defined in gaudi.parse.Settings.schema. GaudiMM will check if the submitted values conform to these rules and report any possible mistakes.

The output section

This section governs how the results and reports will be created. Everything is optional, since each key has a default value, but it is preferrable to at least specify the name of the job (otherwise, it will be set to five random characters), and the path where the result files will be written. Like this:

output:
    name: some_example
    path: results

Note

All the relative paths in GaudiMM are relative to the location of the input file, not the working directory. To ease this difference, we recommend running the jobs from the same folder where the input file is located. Of course, you can always use absolute paths.

The ga section

This section hosts the parameters of the Genetic Algorithm GaudiMM uses. Unless you know what you are doing, the only values you should modify are population and generations. To see the appropriate values, refer to FAQ & Known issues. For example:

ga:
    population: 200
    generations: 100

The similarity section

This section contains the parameters to the similarity operator, which, given two individuals with the same fitness, whether they can be considered the same solution or not. This section is deliberatedly loose: you define the Python function to call, together with its positional and keyword arguments.

For the time being, the only similarity function we ship is based on the RMSD of the two structures: gaudi.similarity.rmsd(). The arguments are which Molecule genes should be compared and the RMSD threshold to consider whether they are equivalent or not.

similarity:
    module: gaudi.similarity.rmsd
    args: [[Ligand], 1.0]
    kwargs: {}

The genes section

This section describes the components of the exploration stage of the algorithm; ie, the features of each Individual in the population. While the previous sections were dictionaries (this is, a collection key-value pairs), the genes and objectives section is actually a list of dictionaries. As a result, you need to specify them like this:

genes:
    -   name: Protein
        module: gaudi.genes.molecule
        path: /path/to/protein.mol2

    -   name: Torsion
        module: gaudi.genes.torsion
        target: Ligand
        flexibility: 360

Notice the dash - next to name. This, and the extra indentation, define a list. Each element of this list is a new gene. Each gene must include two compulsory values:

  • name. A unique identifier for this gene. If you add two genes with the same name, GaudiMM will complain.
  • module. The Python import path to the module that contains the gene. All GaudiMM builtin genes are located at gaudi.genes.

All other parameters are determined by the chosen gene. Check the corresponding documentation for each one!

Note

How do I know which genes to use?
Unless you code a gene of your own to replace it, you will always need one or more gaudi.genes.molecule.Molecule genes. Then, choose the flexibility models you want to implement on top of such molecule. Several examples are provided in GaudiMM basics (start here!).

The objectives section

Like the genes section, the objectives section is also a list of dictionaries, so they follow the same syntax:

-   name: Clashes
    module: gaudi.objectives.contacts
    which: clashes
    weight: -1.0
    probes: [Ligand]
    radius: 5.0

-   name: LigScore
    module: gaudi.objectives.ligscore
    weight: -1.0
    proteins: [Protein]
    ligands: [Ligand]
    method: pose

In addition to the required name and module parameters, each objective needs a weight parameter. If set to 1.0, the algorithm will maximize the score returned by the objective; if set to -1.0, it will be minimized. Theoretically, any other positive or negative float will work, but stick to the convention of using 1.0 or -1.0.

Any other parameters present in an objective are responsibility of that objective, and are specified in its corresponding documentation.

That’s it! Now save it with a memorable filename and run it!

How to run your input file

Let’s get back to the beginning of the tutorial: all you need to do is typing:

gaudi run input_file.yaml

If everything is fine, you’ll see the following output in the console:

$> gaudi run input_file.yaml

  .g8"""bgd       db   `7MMF'   `7MF'`7MM"""Yb. `7MMF'
.dP'     `M      ;MM:    MM       M    MM    `Yb. MM
dM'       `     ,V^MM.   MM       M    MM     `Mb MM  `7MMpMMMb.pMMMb.  `7MMpMMMb.pMMMb.
MM             ,M  `MM   MM       M    MM      MM MM    MM    MM    MM    MM    MM    MM
MM.    `7MMF'  AbmmmqMA  MM       M    MM     ,MP MM    MM    MM    MM    MM    MM    MM
`Mb.     MM   A'     VML YM.     ,M    MM    ,dP' MM    MM    MM    MM    MM    MM    MM
  `"bmmmdPY .AMA.   .AMMA.`bmmmmd"'  .JMMmmmdP' .JMML..JMML  JMML  JMML..JMML  JMML  JMML.
------------------------------------------------------------------------------------------
GaudiMM: Genetic Algorithms with Unrestricted Descriptors for Intuitive Molecular Modeling
2017, InsiliChem · v0.0.2+251.g122cdf0.dirty

Loaded input input_file.yaml
Launching job with...
  Genes: Protein, Ligand, Rotamers, Torsion, Search
  Objectives: Clashes, Contacts, HBonds, LigScore

After the first iteration is complete, the realtime report data will kick in:

gen progress    nevals  speed       eta     avg                                                 std                                                 min                                     max
0   4.76%       20      1.25 ev/s   0:16:34 [  3.080e+03  -2.690e+02   7.500e-01   1.179e+03]   [  1.027e+03   7.200e+01   8.292e-01   4.964e+02]   [ 584.881 -398.517    0.     144.78 ]   [  4.753e+03  -1.053e+02   3.000e+00   2.066e+03]
1   9.52%       60      1.28 ev/s   0:16:32 [  2.787e+03  -2.659e+02   1.400e+00   9.142e+02]   [ 1198.699    98.675     1.393   484.309]           [ 415.912 -398.517    0.      16.45 ]   [ 4538.766   -71.562     5.     1854.63 ]

The first time you see this it might result too confusing, especially if the terminal wraps long lines. Let’s describe each tab-separated column:

  • gen. The current generation.
  • progress. Percentage of completion of the job. This is estimated with the expected number of operations: \((generations + 1) * lambda\_ * (cxpb + mutpb)\)
  • nevals. Number of evaluations performed in current generation.
  • speed. Estimated number of evaluations per second. This does not take into account the time spent in the variation stage.
  • eta. Estimated time left.
  • avg. Average of all the fitness values reported by each objective in the current generation. They are listed in the order given in the input file, and also reflected above, after the Launching job with… line.
  • std. Same as avg, but for the standard deviation.
  • max. The maximum fitness value reported by each objective in the current generation.
  • min. Same as above, but for the minimum value.

If the setting check_every in the output section is greater than zero, GaudiMM will dump the current population every check_every generations. That way, you can assess the progress visually along the simulation.

Also, if you feel that the algorithm has progressed enough to satisfy your needs, you can cancel it prematurely with Ctrl+C. GaudiMM will detect the interruption and offer to dump the current state of the simulation:

^C[!]

Interruption detected. Write results so far? (y/N):

Answer y and wait a couple of seconds while GaudiMM writes the results. To analyze them, check the following tutorial:

Tutorial: Visualizing your first results

When a GaudiMM job finishes correctly, you will find the results in the directory specified at output.path. There you will find:

  • *.gaudi-input. An (updated) copy of the input file used in this job, for reproducibility purposes.
  • *.gaudi-log. A detailed log of the simulation, for debugging purposes.
  • *.gaudi-output. A YAML file with the scores and paths of the solutions.
  • (Potentially lots of) *.zip. As many as the elite population size. Each zip contains an Individual, which means that you will find molecule files (normally, mol2 files), as well as the allele metadata of the rest of the genes.

To analyze the results, currently the best tool is our own extension for UCSF Chimera, GaudiView. After installing it, you can type the following:

gaudi view *.gaudi-output

, and hopefully it will open Chimera and GaudiView to load the output file. If it doesn’t work (please, if that’s the case, report it in the issues page!), you can always UCSF Chimera, fire up GaudiView from Tools> InsiliChem> GAUDIView and browse the file manually.

Browsing GaudiView output files

One way or another, you will end up seeing the following dialog:

GaudiView results GUI

Actions

The following actions are available:

  • Depiction

    • Click on a row to show that solution as is. Double-click may trigger additional post-processing as detailes in the genes metadata, depending on the chosen genes.
    • Ctrl-click or Shift-click to select more than one solution at the same time.
    • Use up and down arrows to move along the list.
  • Sorting

    • Click on the column headers to sort by that key.
  • Filtering

    • Press +Add to load a new filter. Select the filter key, specify the cutoff and press Filter. Repeat to add more filters with the selected boolean.
    • Remove filters by pressing on -.
    • Clear all the filters with Reset.
  • Commands

    • Every time you change the selected row, the command specified in the command input panel will be run. You can force it by pressing Enter key or click on Run.
    • Combine the command line with the dynamic selection panel to run commands like show sel zr < 5 to depict the surrounding residues of docking poses.
    • Multiple commands can be chained with ;.
  • Clustering

    • First, choose the column to use as energy criteria.
    • Select the RMSD cutoff. If two solutions have RMSD bigger than this value, they won’t be considered similar and thus not clustered together.
    • Press Cluster!. Since this method requires to load all solutions at once, it can take some time if you have a lot of solutions.
    • When it’s done, a new column Cluster will be available at the end of the table. Sort by that column to group them together and Shift-click to depict the full cluster.

When you are done with the analysis, you can exit GaudiView in different ways:

  • Click OK. The depicted solutions at that moment won’t be closed, and thus available for further post-processing in Chimera. Every other solution previously loaded will be removed from the canvas and closed.
  • Click Close or upper-corner X. All solutions, visible or not, will be removed from canvas and closed.

Error

There’s a known bug in GaudiView that may trigger a non-critical exception if you filter the solutions and then try to sort them again without selecting a new row beforehand. You can dismiss the dialog and simply click in any row to avoid this bug. It will be fixed soon, promise!

Developers guide

Introduction to Genetic Algorithms

The GA in GaudiMM stands for Genetic Algorithm, a search heuristic inspired by natural selection that is used for optimization processes.

Genetic Algorithms use a biologicist terminology. Each candidate solution to the problem is considered an individual, which is part of the so-called population (the set of all candidate solutions).

The initial population is generated from scratch, almost always randomly. These individuals also comprise the first generation of the evolution process. As in nature, only the fittest survive. The survival process is simulated with an evaluation function, that tests them against the optimization variable(s). This is called selection.

  • How do we create an individual? With one or more genes, naturally. Genes describe how an individual should look like, of course!
  • How do we evaluate that individual? With one or more objectives.

Also, as in nature, the fittest individuals are allowed to mate (exchange their defining values), and mutate (spontaneously modify their own defining values). This adds some more variability to the process, and that is key to survival.

After a number of generations repeating the process of selection, choosing the fittest over the weakest, we will obtain better and better solutions to our problem. Since it’s all heuristics, it’s up to us to stop at some point. We won’t probably get the solution, but we can live with pretty good ones, right?

The One Max Problem

Ok, that was a lot of biology and we are trying to code, I get it. Let’s explain a classical GA example, the trivial One Max Problem, adapted from the original deap documentation.

In this problem, we have a list of integers that can be either 0 or 1, and we want to obtain a list full of 1s. So, in this example, we have individuals defined by a single gene and evaluated with a single objective.

We build individuals with the gene onemax:

import random.randint
def onemax(size=5):
    return [random.randint(0, 1) for i in range(size)]
adam = onemax(5) # returns [0, 0, 0, 1, 0]
eve = onemax(5)  # returns [0, 1, 1, 0, 0]

The objective is also trivial. We have to maximize the sum of the numbers inside a given individual:

def evaluate(individual):
    return sum(individual)

So… which is one is fittest, adam or eve? Obviously, eve:

evaluate(adam)  # returns 1
evaluate(eve)   # returns 2

Of course, an initial population is usually larger! At least, a hundred individuals. With such a trivial case, given a big enough population, we may obtain the solution in the first generation by pure change. However, we must not rely on the initial population as the only diversity source.

Additional diversity is achieved with the mutation and mating operations, implemented as additional functions:

def mate(a, b):
    """ Let a and b mate, in hope of fitter children """
    i = crossover_point = random.random() * min(len(a), len(b))
    c, d = a[:], b[:]
    c[:i], d[i:] = d[:i], c[i:]
    return c, d

def mutate(individual, probability):
    """ Spontaneous mutation at random places can result in a fitter individual """
    return [random.randint(0, 1) for i in individual if random.random() < probability]

Let’s see how this is useful:

cain, abel = mate(adam, eve)
# cain = [ 0, 1, 1, 1, 0 ]
# abel = [ 0, 0, 0, 0, 0 ]
evaluate(cain) # returns 3
evaluate(abel) # returns 0

See? adam and eve gave birth to cain and abel. cain had luck and inherited the good parts, while abel… Well, he was not that lucky. In the next selection process, cain will be selected over abel, and probably over its own father adam. Now, the population (cain and eve) as a whole is fitter, with an average fitness of 2.5. That’s higher than the average in the previous generation (1.5). Evolution!

Mutation works similarly:

enoch = mutate(cain)
# enoch = [ 1, 1, 1, 1, 0]
seth = mutate(eve)
# seth = [ 0, 0, 1, 0, 0]

Take into account that mutations can be beneficial, like in the case of enoch, but also detrimental, as in the case of seth. Some of them will contribute to evolution, and some of them not. Lucky ones will be selected, the others, discarded.

By the way, deap already defines some mutation and mating operators for you that will work in most cases. So, hopefully, this part will be trivial.

And that’s it! Deap does the rest! So, to sum up, you only need to worry about:

  • How to define your individuals.
  • How to evaluate them.
  • How to implement mutation and mating (normally, with deap built-in operators).

If you want to know more about Deap and Genetic Algorithms, go check their documentation. It’s great!

Our implementation

GaudiMM is built as an extensible and highly modular Python platform. Although the main focus is Chemistry and molecular design, you can use your own genes and objectives. You can think of GaudiMM as a new API for deap that provides an object-oriented interface to easily create new individuals and objectives.

In deap an individual can be any Python object (check their overview and GA examples), which is a very versatile approach, but it tends to be very limited when your individual gets complex. For example, if an individual needs to be defined by several genes with different mutation strategies.

In GaudiMM, each individual is a gaudi.base.Individual, which is a very (bio)fancy name for a list of genes. To create a gene, you just subclass gaudi.genes.GeneProvider and define the needed methods: express, unexpress, mutate, and mate. The gaudi.base.Individual class then provides some wrapper methods that call the respective counterparts in each gene.

To evaluate the fitness of an individual, you must first define the set of evaluation functions. Each function is called objective, and you keep them inside a gaudi.base.Environment.

To create a new objective, you have to subclass gaudi.objectives.ObjectiveProvider, which provides a very simple interface: evaluate. Define your function there, and that’s it!

Todo

  • Tutorial: How to create your own gene
  • Tutorial: How to create your own objective

API documentation

The GaudiMM package is comprised of several core modules that establish the base architecture to build an extensible platform of molecular design.

The main module is gaudi.base, which defines the gaudi.base.Individual, whose instances represent the potential solutions to the proposed problem. Two plugin packages allow easy customization of how individuals are defined (gaudi.genes) and how they are evaluated (gaudi.objectives). Additionally:

gaudi.cli

This package holds all the CLI scripts and wrappers provided by GAUDI.

gaudi.cli.gaudi_cli

gaudi.cli.gaudi_cli is the CLI entry point for all GaudiMM scripts.

Available commands:

  • run
  • view
  • prepare
gaudi.cli.gaudi_cli.echo_banner()
gaudi.cli.gaudi_cli.load_chimera(nogui=True)
gaudi.cli.gaudi_cli.test_import(command, module)
gaudi.cli.gaudi_cli.timeit(func, *args, **kwargs)

gaudi.cli.gaudi_run

gaudi.cli.gaudi_run is the main hub for launching GAUDI jobs.

It sets up the configuration environment needed by DEAP (responsible for the GA) and ties it up to the GAUDI custom classes that shape up the invididuals and objectives. All in a loosely-coupled approach based on Python modules called on-demand.

Usage. Simply, type:

gaudi run /path/to/job.gaudi-input
gaudi.cli.gaudi_run.enable_logging(path=None, name=None, debug=False)

Register loggers and handlers for both stdout and file

gaudi.cli.gaudi_run.launch(cfg)

Runs a GAUDI job

Parameters:cfg (gaudi.parse.Settings) – Parsed YAML dict with attribute-like access
gaudi.cli.gaudi_run.main(cfg, debug=False)

Starts a GAUDI job

Parameters:
  • cfg (str or gaudi.parse.Settings) – Path to YAML input file or an already parsed YAML file via gaudi.parse.Settings class
  • debug (bool, optional, default=False) – Whether to enable verbose logging or not.
gaudi.cli.gaudi_run.unbuffer_stdout()

gaudi.genes

These are the built-in genes in GAUDI. You can also build your own, but these are ready to use.

Molecule gene

This gene implements a wrapper around Chimera.molecule objects to expand its original features, such as appending new molecules.

This allows to build new structures with a couple of building blocks as a starting point, as well as keeping several ligands as different potential solutions to the essay (think about multi-molecule alternative docking). The user can also request more gaudi.genes.molecule instances for the genome of the individual, resulting in a competitive multi-docking essay.

To handle all this diversity, each construction is cached the first time is built.

This class is a dependency of most of the other genes (and even objectives), so it will be requested almost always.

class gaudi.genes.molecule.Compound(molecule=None, origin=None, seed=0.0, **kwargs)

Bases: object

Wraps chimera.Molecule instances and allows to perform copies, appending new fragments, free placement and extended attributes.

Parameters:
  • molecule (chimera.Molecule or str, optional) –

    The chimera.Molecule object to wrap, or

    ’dummy’ (for a empty molecule), or

    The path to a mol2 file that will be parsed by Chimera to return a valid chimera.Molecule object.

  • origin (3-tuple of float, optional) – Coordinates to the place where the molecule should be placed
  • seed (float, optional) – Random seed, used by the vertex chooser on appending.
  • optional (kwargs,) – Additional parameters that should be injected as attributes
mol
Type:chimera.Molecule
donor

If self were to be bonded to another Compound, this atom would be the one involved in the bond. By default, first element in chimera.Molecule.atoms

Type:chimera.Atom
acceptor

If another Compound wanted to be bonded to self, this atom would be the one involved in the bond. By default, last element in chimera.Molecule.atoms

Type:chimera.Atom
origin

Default location of mol in the 3D canvas.

Type:3-tuple of float
seed

Randomness seed for vertex computation.

Type:float
rotatable_bonds

Bonds that can be torsioned

Type:list of chimera.Bond
nonrotatable

Bonds that, although possible, should not be torsiond. Ie, fixed bonds.

Type:list of chimera.Bond
built_atoms

Memo of already built_atoms

Type:list of chimera.Atom

Notes

Todo

Instead of a molecule parameter overload, think of using equivalent classmethods.

add_dummy_atom(where, name='dum', element=None, residue=None, bonded_to=None, serial=None)

Adds a placeholder atom at the coordinates specified by where

Parameters:
  • where (chimera.Atom or 3-tuple of float) – Coordinates of target location. A chimera.Atom can be supplied, in which case its coordinates will be used (via .coord())
  • name (str, optional) – Name for the new atom
  • element (chimera.Element, optional) – Element of the new atom
  • residue (chimera.Residue, optional) – Residue that will incorporate the new atom
  • bonded_to (chimera.Atom, optional) – Atom that will form a bond with new atom
  • serial (int) – Serial number that will be assigned to atom
add_hydrogens()

Add missing hydrogens to current molecule

append(molecule)

Wrapper around attach to add a new molecule to self, using self.acceptor as bonding atom.

Parameters:molecule (Compound) –
apply_pdbfix(pH=7.0)

Run PDBFixer and replace original molecule with new one

attach(molecule, acceptor, donor)

Call join to bond molecule to self.mol and updates attributes.

Parameters:
  • molecule (Compound) – The molecule that will be attached to self
  • acceptor (chimera.Atom) – Atom of self that will participate in the new bond
  • donor (chimera.Atom) – Atom of molecule that will participate in the new bond

Notes

Note

After joining the two molecules together, we have to update the attributes of self to match the new molecular reality. For example, the atom participating in the bond will not be available for new bonds (we are forcing linear joining for now), so the new donor will be inherited from molecule, before deleting the object.

destroy()

Removes mol from canvas, deletes it and then, deletes itself

join(molecule, acceptor, donor, newres=False)

Take a molecule and bond it to self.mol, copying the atoms in the process so they’re contained in the same chimera.Molecule object.

Parameters:
  • molecule (Compound) – The molecule that will be attached to self
  • acceptor (chimera.Atom) – Atom of self that will participate in the new bond
  • donor (chimera.Atom) – Atom of molecule that will participate in the new bond
  • newres (bool) – If True, don’t reuse acceptor.residue for the new molecule, and create a new blank one instead.
Returns:

built_atoms – Maps original atoms in molecules to their new counterparts in self.mol.

Return type:

dict

Notes

Note

Chimera does not allow bonds between different chimera.Molecule objects, so firstly, we have to copy the atoms of molecule to self.mol and, only then, make the joining bond.

It traverses the atoms of molecule and adds a copy of each of them to self.mol using chimera.molEdit.addAtom in the same spot of space. All the bonds are preserved and, finally, bond the two molecules.

The algorithm starts by adding the bonding atom of molecule (donor), to the sprouts list. Then, the loop starts:

while sprouts contains atoms:
    sprout = sprouts.pop(0)
    copy sprout to self.mol
    for each neighbor of sprout
        copy neighbor to self.mol
        if neighbor itself has more than one neighbor (ie, sprout)
            add neighbor to sprouts

Along the way, we have to take care of already processed sprouts, so we don’t repeat ourselves. That’s what the built_atoms dict is for.

Also, instead of letting addAtom guess new serial numbers, we calculate them beforehand by computing the highest serial number in self.mol prior to the additions and then incremeting one by one on a per-element basis.

Todo

This code is UGLY. Find a better way!

parse_attr()
place(where, anchor=None)

Convenience wrapper around translate, supplying default

Parameters:
  • where (3-tuple of float, or chimera.Atom) – Coordinates of destination. If chimera.Atom, use supplied coord()
  • anchor (chimera.Atom) – The atom that will guide the translation.

Notes

Note

Anchor atoms are called that way in the API because I picture them as the one we pick with our hands to drag the molecule to the desired place.

place_for_bonding(target, anchor=None)

Translate self.mol to a covalent distance of target atom, with an adequate orientation.

Parameters:
  • target (chimera.Atom) – The atom we are willing to bond later on.
  • anchor (chimera.Atom, optional) –
prepend(molecule)

Wrapper around attach to add a new molecule to self, using self.donor as bonding atom.

Parameters:molecule (Compound) –
set_vdw_radii(vdw_radii)
update_attr(d)
class gaudi.genes.molecule.Molecule(path=None, symmetry=None, hydrogens=False, pdbfix=False, vdw_radii=None, **kwargs)

Bases: gaudi.genes.GeneProvider

Interface around the gaudi.genes.molecule.Compound to handle the GAUDI protocol and caching features.

Parameters:
  • path (str, optional) – Path to a molecule file or a directory containing dirs of molecule files.
  • symmetry (str, optional) – If path is a directory, list of pairs of directories whose chosen mocule must be the same, thus enabling symmetry.
  • hydrogens (bool, optional) – Add hydrogens to Molecule (True) or not (False).
  • pdbfix (bool, optional) – Only for testing and debugging. Better run pdbfixer prior to GAUDI. Fix potential issues that may cause troubles with OpenMM forcefields.
  • vdw_radii (dict {str: float} , optional) –

    Set a specific vdw_radius for a particular element (instead of standard Chimera VdW table). It can be useful in particular cases together with a contacts objective. Example of use in the .yaml file: vdw_radii: {

    Fe: 2.00, Cu: 2.16}

    Defaults to None

allele

Paths to every fragment that composes a given Compound. It will consist of a single value tuple if there’s no dynamic building involved; ie, a mol2 or pdb file as is.

Type:tuple of str
_CATALOG

Class attribute (shared among all Molecule instances) that holds all the possible molecules GAUDI can build given current path.

If path is a single molecule file, that’s the only possibility, but if it’s set to a directory, the engine can potentially build all the combinations of molecule blocks found in those subdirectories.

Normally, it is accessed via the catalog property.

Type:dict

Notes

Use of `_cache`

Molecule class uses _cache to store already built molecules. Its entry in the _cache directory is a boltons.cacheutils.LRU cache (least recently used) set to a maximum size of 300 entries.

Todo

The LRU cache size should be proportional to job size, depending on the population size and number of generations, but also taking available memory into account (?).

SUPPORTED_FILETYPES = ('mol2', 'pdb')
build(key, where=None)

Builds a Compound following the recipe contained in key through Compound.append methods.

Parameters:key (tuple of str) – Paths to the molecule blocks that comprise the final molecule. A single molecule is just a one-block recipe.
Returns:The final molecule result of the sequential appending.
Return type:Compound
classmethod clear_cache()
compound

Get expressed allele on-demand (read-only attribute)

express()

Adds Chimera molecule object to the viewer canvas.

It also converts pseudobonds (used by Chimera to depict coordinated ligands to metals) to regular bonds.

find_atom(serial)
find_atoms(serial, only_one=False)
find_residue(position)
find_residues(position, only_one=False)
get(key)

Looks for the compound corresponding to key in _cache. If found, return it. Else, build it on demand, store it on cache and return it.

Parameters:key (str) – Path (or combination of) to the requested molecule. It should be extracted from catalog.
Returns:The result of building the requested molecule.
Return type:gaudi.genes.molecule.Compound
mate(mate)

Todo

Allow mating while preserving symmetry

mutate(indpb)

VERY primitive. It only gets another compound.

unexpress()

Removes the Chimera molecule from the viewer canvas (without deleting the object).

write(path=None, name=None, absolute=None, combined_with=None, filetype='mol2')

Writes full mol2 to disk.

Todo

It’d be preferable to get a string instead of a file

xyz(transformed=True)
gaudi.genes.molecule.enable(**kwargs)

Mutamers gene

This modules allows to explore side chains flexibility in proteins, as well as mutation.

It needs that at least a gaudi.genes.rotamers.molecule.Molecule has been requested in the input file. Residues of those are referenced in the residues argument.

It also allows mutations in the selected residues. However, the resulting structure keeps the same backbone, which may not be representative of the in-vivo behaviour. Use with caution.

class gaudi.genes.mutamers.Mutamers(residues=None, library='Dunbrack', avoid_replacement=False, mutations=[], ligation=False, hydrogens=False, **kwargs)

Bases: gaudi.genes.GeneProvider

Mutamers class

Parameters:
  • residues (list of str) –

    Residues that can mutate. This has to be in the form:

    [ Protein/233, Protein/109 ]

    where the first element (before slash) is the gaudi.genes.molecule name and the second element (after slash) is the residue position number in that molecule.

    This list of str is later parsed to the proper chimera.Residue objects

  • library ({'Dunbrack', 'Dynameomics'}) – The rotamer library to use.
  • mutations (list of str, required) – Aminoacids (in 3-letter codes) residues can mutate to.
  • ligation (bool, optional) – If True, all residues will mutate to the same type of aminoacid.
  • hydrogens (bool, optional) – If True, add hydrogens to replacing residues (buggy)
allele

For i residues, it contains i tuples with two values each: residue type and a float within [0, 1), which will be used to pick one of the rotamers for that residue type.

Type:list of 2-tuple (str, float)
static add_hydrogens_to_isolated_rotamer(rotamers)
choice(l)

Overrides random.choice with custom one so we can reuse a previously obtained random number. This helps dealing with the ligation parameter, which forces all the requested residues to mutate to the same type

express()

Compile the gene to an evaluable object.

get_rotamers(mol, pos, restype)

Gets the requested rotamers out of cache and if not found, creates the library and stores it in the cache.

Parameters:
  • mol (str) – gaudi.genes.molecule name that contains the residue
  • pos – Residue position in mol
  • restype – Get rotamers of selected position with this type of residue. It does not need to be the original type, so this allows mutations
Returns:

Return type:

List of rotamers returned by Rotamers.getRotamers.

mate(mate)

Perform a crossover with another gene of the same kind.

mutate(indpb)

Perform a mutation on the gene.

unexpress()

Revert expression.

static update_rotamer_coords(residue, rotamer)
gaudi.genes.mutamers.enable(**kwargs)

NormalModes gene

This module allows to explore molecular folding through normal modes analysis.

It works by calculating normal modes for the input molecule and moving along a combination of normal modes.

It needs at least a gaudi.genes.rotamers.molecule.Molecule.

class gaudi.genes.normalmodes.NormalModes(method='prody', target=None, modes=None, n_samples=10000, rmsd=1.0, group_by=None, group_lambda=None, path=None, write_modes=False, **kwargs)

Bases: gaudi.genes.GeneProvider

NormalModes class

Parameters:
  • method (str, optional, default=prody) – Either: - prody : calculate normal modes using prody algorithms - gaussian : read normal modes from a gaussian output file
  • target (str) – Name of the Gene containing the actual molecule
  • modes (list, optional, default=range(12)) – Modes to be used to move the molecule
  • group_by (str or callable, optional, default=None) – Available str names: residues, mass. This is, consider only pseudoatoms that group a whole residue, groups of contiguous atoms that amount for certain mass, or only alpha carbons. If residues or mass, the group_lambda parameter can customize the behaviour.
  • group_lambda (int, optional) – Either: number of residues per group (default=7), or number of groups with same mass (default=100).
  • path (str) – Gaussian or prody modes output path. Required if method is gaussian.
  • write_modes (bool, optional) – write a molecule_modes.nmd file with the ProDy modes
  • n_samples (int, optional, default=10000) – number of conformations to generate
  • rmsd (float, optional, default=1.0) – RMSD (in angstrom) that the conformations will have with respect to the initial conformation. In some cases, it can be slightly higher than the requested threshold (e.g. 1.07A instead of 1A)
allele

Randomly picked coordinates from NORMAL_MODE_SAMPLES

Type:slice of prody.ensemble
NORMAL_MODES

normal modes calculated for the molecule or readed from the gaussian frequencies output file stored in a prody modes class (ANM or RTB)

Type:prody.modes
NORMAL_MODE_SAMPLES

configurations applying modes to molecule

Type:prody.ensemble
_original_coords

Parent coordinates

Type:numpy.array
_chimera2prody

_chimera2prody[chimera_index] = prody_index

Type:dict
NORMAL_MODES
NORMAL_MODES_SAMPLES
calculate_prody_normal_modes()

calculate normal modes, creates a diccionary between chimera and prody indices and calculate n_confs number of configurations using this modes

express()

Apply new coords as provided by current normal mode

mate(mate)

Todo

Combine coords between two samples in NORMAL_MODES_SAMPLES? Or two samples between diferent NORMAL_MODES_SAMPLES? Or combine samples between two NORMAL_MODES_SAMPLES?

For now : pass

molecule
mutate(indpb)

(mutate to/get) another SAMPLE with probability = indpb

read_gaussian_normal_modes()

read normal modes, creates a diccionary between chimera and prody indices and calculate n_confs number of configurations using this modes

read_prody_normal_modes()
unexpress()

Undo coordinates change

gaudi.genes.normalmodes.alg3(moldy, max_bonds=3, **kwargs)

TESTS PENDING!

Coarse Grain Algorithm 3: Graph algorithm.
New group when a vertice: have more than n,
have 0 edges new chain
Parameters:
  • moldy (prody.AtomGroup) –
  • n (int, optional, default=2) – maximum bonds number
Returns:

moldy – New Betas added

Return type:

prody.AtomGroup

gaudi.genes.normalmodes.chimeracoords2numpy(molecule)
Parameters:molecule (chimera.molecule) –
Returns:
Return type:numpy.array with molecule.atoms coordinates
gaudi.genes.normalmodes.chunker(end, n)

divide end integers in closed groups of n

gaudi.genes.normalmodes.convert_chimera_molecule_to_prody(molecule)

Function that transforms a chimera molecule into a prody atom group

Parameters:molecule (chimera.Molecule) –
Returns:
  • prody_molecule (prody.AtomGroup())
  • chimera2prody (dict) – dictionary: chimera2prody[chimera_atom.coordIndex] = i-thm element prody getCoords() array
gaudi.genes.normalmodes.enable(**kwargs)
gaudi.genes.normalmodes.gaussian_modes(path)

Read the modes Create a prody.modes instance

Parameters:path (str) – gaussian frequencies output path
Returns:modes
Return type:ProDy modes ANM or RTB
gaudi.genes.normalmodes.group_by_mass(molecule, n=100)

Coarse Grain Algorithm 2: groups per mass percentage

Parameters:
  • molecule (prody.AtomGroup) –
  • n (int, optional, default=100) – Intended number of groups. The mass of the system will be divided by this number, and each group will have the corresponding proportional mass. However, the final number of groups can be slightly different.
Returns:

molecule – New Betas added

Return type:

prody.AtomGroup

gaudi.genes.normalmodes.group_by_residues(molecule, n=7)

Coarse Grain Algorithm 1: groups per residues

Parameters:
  • molecule (prody.AtomGroup) –
  • n (int, optional, default=7) – number of residues per group
Returns:

molecule – New betas added

Return type:

prody.AtomGroup

gaudi.genes.normalmodes.prody_modes(molecule, max_modes, algorithm=None, **options)
Parameters:
  • molecule (prody.AtomGroup) –
  • nax_modes (int) – number of modes to calculate
  • algorithm (callable, optional, default=None) – coarseGrain(prm) wich make molecule.select().setBetas(i) where i is the index Coarse Grain group Where prm is prody AtomGroup
  • options (dict, optional) – Parameters for algorithm callable
Returns:

modes

Return type:

ProDy modes ANM or RTB

Rotamers gene

This modules allows to explore side chains flexibility in proteins, as well as mutation.

It needs that at least a gaudi.genes.rotamers.molecule.Molecule has been requested in the input file. Residues of those are referenced in the residues argument.

It also allows mutations in the selected residues. However, the resulting structure keeps the same backbone, which may not be representative of the in-vivo behaviour. Use with caution.

class gaudi.genes.rotamers.Rotamers(residues=None, library='Dunbrack', with_original=True, **kwargs)

Bases: gaudi.genes.GeneProvider

Rotamers class

Parameters:
  • residues (list of str) –

    Residues that should be analyzed. This has to be in the form:

    [ Protein/233, Protein/109 ]

    where the first element (before slash) is the gaudi.genes.molecule name and the second element (after slash) is the residue position number in that molecule.

    This list of str is later parsed to the proper chimera.Residue objects

  • library ({'Dunbrack', 'Dynameomics'}) – The rotamer library to use.
  • with_original (bool, defaults to True) – Whether to include the original set of chi angles as part of the rotamer library.
allele

For i residues, it contains i floats within [0, 1), that will point to the selected rotamer torsions for each residue.

Type:list of float
static all_chis(residue)
express()

Compile the gene to an evaluable object.

mate(mate)

Perform a crossover with another gene of the same kind.

mutate(indpb)

Perform a mutation on the gene.

static patch_residue(residue)
retrieve_rotamers(molecule, position, residue, library='Dunbrack')
unexpress()

Revert expression.

static update_rotamer(residue, chis)
gaudi.genes.rotamers.enable(**kwargs)

Search gene

This module provides spatial exploration of the environment.

It works by creating a sphere with radius self.radius and origin at self.origin. The movement is achieved with three matrices that contain a translation, a rotation, and a reference position.

It depends on gaudi.genes.molecule.Molecule, since these are the ones that will be moved around. Combined with the adequate objectives, this module can be used to implement docking experiments.

class gaudi.genes.search.Search(target=None, center=None, radius=None, rotate=True, precision=0, interpolation=0.5, **kwargs)

Bases: gaudi.genes.GeneProvider

Parameters:
  • target (namedtuple or Name of gaudi.genes.molecule) – Can be either: - The anchor atom of the molecule we want to move, with syntax <molecule_name>/<index>. For example, if we want to move Ligand using atom with serial number = 1 as pivot, we would specify Ligand/1. It’s parsed to the actual chimera.Atom later on. - A name of gaudi.genes.molecule instance. In this case, the anchor atom for the movement of the molecule will be set to its nearest atom to the geometric center of the molecule.
  • center (3-item list or tuple of float, optional) – Coordinates to the center of the desired search sphere
  • radius (float) – Maximum distance from center that the molecule can move
  • rotate (bool, bool) – If False, don’t rotatate the molecule - only translation
  • precision (int, bool) – Rounds the decimal part of the 3D search matrix to get a coarser model of space. Ie, less points can be accessed, the search is less exhaustive, more variability in less runs.
allele

A 4x3 matrix of float, as explained in Notes.

Type:3-tuple of 4-tuple of floats
origin

The initial position of the requested target molecule. If we don’t take this into account, we can’t move the molecule around was not originally in the center of the sphere.

Type:3-tuple of float

Notes

How matricial translation and rotation takes place

A single movement is summed up in a 4x3 matrix:

( (R1, R2, R3, T1), (R4, R5, R6, T2), (R7, R8, R9, T3) )

R-elements contain the rotation information, while T elements account for the translation movement.

That matrix can be obtained from multipying three different matrices with this expression:

multiply_matrices(translation, rotation, to_zero)

To understand the operation, it must be read from the right:

  1. First, translate the molecule the origin of coordinates 0,0,0
  2. In that position, the rotation can take place.
  3. Then, translate to the final coordinates from zero. There’s no need to get back to the original position.

How do we get the needed matrices?

  • to_zero. Record the original position (origin) of the molecule and multiply it by -1. Done with method to_zero().
  • rotation. Obtained directly from FitMap.search.random_rotation
  • translation. Check docstring of random_translation() in this module.
center
express()

Multiply all the matrices, convert the result to a chimera.CoordFrame and set that as the xform for the target molecule. If precision is set, round them.

mate(mate)

Interpolate the matrices and assign them to each individual. Ind1 gets the rotated interpolation, while Ind2 gets the translation.

molecule
mutate(indpb)

Perform a mutation on the gene.

origin
random_transform()

Wrapper function to provide translation and rotation in a single call

to_zero

Return a translation matrix that takes the molecule from its original position to the origin of coordinates (0,0,0).

Needed for rotations.

unexpress()

Reset xform to unity matrix.

gaudi.genes.search.center(mol)
gaudi.genes.search.distance(a, b)
gaudi.genes.search.enable(**kwargs)
gaudi.genes.search.nearest_atom(mol, position)
gaudi.genes.search.parse_origin(origin, individual=None)

The center of the sphere can be given as an Atom, or directly as a list of three floats (x,y,z). If it’s an Atom, find it and return the xyz coords. If not, just turn the list into a tuple.

Parameters:
  • origin (3-item list of coordinates, or chimera.Atom) –
  • genes (gaudi.parse.Settings.genes) – List of gene-dicts to look for molecules that may contain the referred atom in origin
Returns:

The x,y,z coordinates

Return type:

Tuple of float

gaudi.genes.search.rand_xform(origin, destination, r, rotate=True)
gaudi.genes.search.random_translation(center, r)

Get a random point from the cube built with l=r and test if it’s within the sphere. Most of the points will be, but not all of them, so get another one until that criteria is met.

Parameters:
  • center (3-tuple of float) – Coordinates of the center of the search sphere
  • r (float) – Radius of the search sphere
Returns:

Return type:

A translation matrix to a random point in the search sphere, with no rotation.

gaudi.genes.search.rotate(molecule, at, alpha)
gaudi.genes.search.translate(molecule, anchor, target)

Torsions gene

This module helps explore small molecules flexibility.

It does so by performing bond rotations in the selected gaudi.genes.molecule.Molecule objects, if they exhibit free bond rotations.

class gaudi.genes.torsion.Torsion(target=None, flexibility=360.0, max_bonds=None, anchor=None, rotatable_atom_types=('C3', 'N3', 'C2', 'N2', 'P'), rotatable_atom_names=(), rotatable_elements=(), non_rotatable_bonds=(), rotatable_bonds=(), precision=1, **kwargs)

Bases: gaudi.genes.GeneProvider

Parameters:
  • target (str) – Name of gaudi.genes.molecule instance to perform rotation on
  • flexibility (int or float) – Maximum number of degrees a bond can rotate
  • max_bonds – Expected number of free rotations in molecule. Needed to store arbitrary rotations.
  • anchor (str) – Molecule/atom_serial_number of reference atom for torsions
  • rotatable_atom_types (list of str) – Which type of atom types (as in chimera.Atom.idatmType) should rotate. Defaults to (‘C3’, ‘N3’, ‘C2’, ‘N2’, ‘P’).
  • rotatable_atom_names (list of str) – Which type of atom names (as in chimera.Atom.name) should rotate. Defaults to ().
  • rotatable_bonds (list of [SerialNumberAtom1, SerialNumberAtom2, SerialNumberAnchor]) – Concrete bonds that are allowed to rotate. Atoms have to be designated using their chimera serial number. IMPORTANT: if set, these will be the ONLY bonds allowed to rotate, ignoring other possible conditions (e.g. rotatable_atom_types, rotatable_atom_names…).
allele

For i rotatable bonds in molecule, it contains i floats which correspond to each torsion angle. As such, each falls within [-180.0, 180.0).

Type:tuple of float

Notes

Todo

max_bonds is now automatically computed, but probably won’t deal nicely with block-built ligands.

BONDS_ROTS = {}
anchor

Get the molecular anchor. Ie, the root of the rotations, the fixed atom of the molecule.

Usually, this is the target atom in the Search gene, but if we can’t find it, get the nearest atom to the geometric center of the molecule, and if it’s not possible, the donor atom of the molecule.

clear_cache()
express()

Apply rotations to rotatable bonds

mate(mate)

Perform a crossover with another gene of the same kind.

molecule
mutate(indpb)

Perform a mutation on the gene.

random_angle()

Returns a random angle within flexibility limits

rotatable_bonds

Gets potentially rotatable bonds in molecule

First, it retrieves all the atoms. Then, the bonds are filtered, discarding coordination (pseudo)bonds and sort them by atom serial.

For each bond, try to retrieve it from the cache. If unavailable, request a bond rotation object to chimera.BondRot.

In this step, we have to discard non rotatable atoms (as requested by the user), and make sure the involved atoms are of compatible type. Namely, one of them must be either C3, N3, C2 or N2, and both of them, non-terminal (more than one neighbor).

If the bond is valid, get the BondRot object. Chimera will complain if we already have requested that bond previously, or if the bond is in a cycle. Handle those exceptions silently, and get the next bond in that case.

If no exceptions are raised, then store the rotation anchor in the BondRot object (that’s the nearest atom in the bond to the molecular anchor), and store the BondRot object in the rotations cache.

unexpress()

Undo the rotations

gaudi.genes.torsion.center(mol)
gaudi.genes.torsion.distance(a, b)
gaudi.genes.torsion.enable(**kwargs)
gaudi.genes.torsion.nearest_atom(mol, position)

Trajectory gene

This module provides spatial exploration of the environment through a MD trajectory file.

class gaudi.genes.trajectory.Trajectory(target=None, path=None, max_frame=None, stride=1, preload=False, **kwargs)

Bases: gaudi.genes.GeneProvider

Parameters:
  • target (str) – The Molecule that contains the topology of the trajectory.
  • path (str) – Path to a MD trajectory file, as supported by mdtraj.
  • max_frame (int) – Last frame of the trajectory that can be loaded.
  • stride (int, optional) – Only load one in every stride frames
  • preload (bool, optional) – Load the full trajectory in memory to accelerate expression. Not recommended for large files!
allele

The index of a frame in the MD trajectory.

Type:int
_traj

Alias to the frames cache

Type:dict
express()

Load the frame requested by the current allele into a new CoordSet object (always at index 1) and set that as the active one.

load_frame(n)
mate(mate)

Simply exchange alleles. Can’t try to interpolate an intermediate structure because the result wouldn’t probably belong to the original trajectory!

molecule

The target Molecule gene

mutate(indpb)

Perform a mutation on the gene.

random_frame_number()
topology

Returns the equivalent mdtraj Topology object of the currently expressed Chimera molecule

unexpress()

Set the original coordinates (stored at mol.coordSets[0]) as the active ones.

gaudi.genes.trajectory.enable(**kwargs)

Base class for all genes

Genes are modules that reside in the gaudi.genes package, and have a certain class structure.

class gaudi.genes.GeneProvider(parent=None, name=None, cx_eta=5.0, mut_eta=5.0, mut_indpb=0.75, **kwargs)

Bases: object

Base class that every genes plugin MUST inherit.

The methods listed here are compulsory for all subclasses, since the individual will be using them anyway. If it’s not relevant in your plugin, just define them with a single pass statement. Also, don’t forget to call GeneProvider.__init__ in your overriden __init__ function; it registers compulsory

— From (M.A. itself)[http://martyalchin.com/2008/jan/10/simple-plugin-framework/]: Now that we have a mount point, we can start stacking plugins onto it. As mentioned above, individual plugins will subclass the mount point. Because that also means inheriting the metaclass, the act of subclassing alone will suffice as plugin registration. Of course, the goal is to have plugins actually do something, so there would be more to it than just defining a base class, but the point is that the entire contents of the class declaration can be specific to the plugin being written. The plugin framework itself has absolutely no expectation for how you build the class, allowing maximum flexibility. Duck typing at its finest.

classmethod clear_cache()
express()

Compile the gene to an evaluable object.

mate(gene)

Perform a crossover with another gene of the same kind.

mutate()

Perform a mutation on the gene.

plugins = [<class 'gaudi.genes.search.Search'>, <class 'gaudi.genes.molecule.Molecule'>, <class 'gaudi.genes.mutamers.Mutamers'>, <class 'gaudi.genes.normalmodes.NormalModes'>, <class 'gaudi.genes.rotamers.Rotamers'>, <class 'gaudi.genes.torsion.Torsion'>, <class 'gaudi.genes.trajectory.Trajectory'>]
unexpress()

Revert expression.

classmethod validate(data, schema=None)
classmethod with_validation(**kwargs)
write(path, name, *args, **kwargs)

Write results of expression to a file representation.

gaudi.objectives

These are the built-in objectives in GaudiMM. You can also build your own, but these are ready to use.

Angle objective

This objective calculates the angle formed by three given atoms (or the dihedral, if four atoms are given) and returns the absolute difference of that angle and the target value.

class gaudi.objectives.angle.Angle(threshold=None, probes=None, *args, **kwargs)

Bases: gaudi.objectives.ObjectiveProvider

Angle class

Parameters:
  • threshold (float) – Optimum angle
  • probes (list of str) – Atoms that make the angle, expressed as a series of <molecule_name>/<serial_number> strings
Returns:

Deviation from threshold angle, in degrees

Return type:

float

evaluate(ind)

Return the score of the individual under the current conditions.

probes(ind)
gaudi.objectives.angle.enable(**kwargs)

Contacts objective

This objective provides a wrapper around Chimera’s DetectClash that detects clashes and contacts. Clashes are understood as steric conflicts that increases the energy of the system. They are evaluated as the sum of volumetric overlapping of the Van der Waals’ spheres of the implied atoms. Contacts are considered as stabilizing, and they are evaluated with a Lennard-Jones 12-6 like function.

class gaudi.objectives.contacts.Contacts(probes=None, radius=5.0, which='hydrophobic', clash_threshold=0.6, hydrophobic_threshold=-0.4, cutoff=0.0, hydrophobic_elements=('C', 'S'), bond_separation=4, same_residue=True, only_internal=False, *args, **kwargs)

Bases: gaudi.objectives.ObjectiveProvider

Contacts class :param probes: Name of molecule gene that is object of contacts analysis :type probes: str :param radius: Maximum distance from any point of probes that is searched

for possible interactions
Parameters:
  • which ({'hydrophobic', 'clashes'}) – Type of interactions to measure
  • clash_threshold (float, optional) – Maximum overlap of van-der-Waals spheres that is considered as a contact (attractive). If the overlap is greater, it’s considered a clash (repulsive)
  • hydrophobic_threshold (float, optional) – Maximum overlap for hydrophobic patches.
  • hydrophobic_elements (list of str, optional, defaults to [C, S]) – Which elements are allowed to interact in hydrophobic patches
  • cutoff (float, optional) – If the overlap volume is greater than this, a penalty is applied. Useful to filter bad solutions.
  • bond_separation (int, optional) – Ignore clashes or contacts between atoms within n bonds.
  • only_internal (bool, optional) – If set to True, take into account only intramolecular interactions, defaults to False
Returns:

Lennard-Jones-like energy when which`=`hydrophobic, and volumetric overlap of VdW spheres in A³ if which`=`clashes.

Return type:

float

evaluate_clashes(ind)
evaluate_hydrophobic(ind)
find_interactions(ind)
molecules(ind)
probes(ind)
gaudi.objectives.contacts.enable(**kwargs)

Coordination objective

This objective performs rough estimations of good orientations of ligating residues in a protein to coordinate a given metal or small molecule. The geometry is approximated by computing average distances from ligating atoms the metal centre (self.probe) as well as the angles formed by the probe, the ligating atom and its immediate neighbor. Good planarity is assured by a dihedral check.

class gaudi.objectives.coordination.Coordination(probe=None, radius=3.0, atom_types=(), atom_elements=(), atom_names=(), residues=(), geometry='tetrahedral', distance=0, min_atoms=1, prevent_intruders=True, enforce_all_residues=False, only_one_ligand_per_residue=False, center_of_mass_correction=False, distance_correction=False, *args, **kwargs)

Bases: gaudi.objectives.ObjectiveProvider

Coordination class

Parameters:
  • probe (tuple) – The atom that acts as the metal center, expressed as <molecule_name>/<atom serial>. This will be parsed later on.
  • residues (list of str) – Residues that must coordinate to probe, expressed as <molecule_name>/<residue position>. Position can be *.
  • radius (float, optional, default=3.0) – Distance from probe where ligating atoms must be found
  • atom_types (list of str, optional) – Types of atoms that are considered ligands to probe
  • atom_names (list of str, optional) – Names of atoms that are considered ligands to probe
  • atom_elements (list of str, optional) – Elements of atoms that are considered ligands to probe
  • distance (float, optional) – Perfect distance a ligand atom should be from target.
  • geometry (str or list of 3-tuple floats, optional) – Which geometry should be fitted. Choose from GEOMETRIES dict or specify a set of vectors.
  • enforce_all_residues (bool, optional) – Whether to force or not if all specified residues should coordinate.
  • only_one_ligand_per_residue (bool, optional) – Enforce that only one ligand for each residue should coordinate.
  • prevent_intruders (bool, optional) – Don’t let non-ligand atoms to be closer to the target than the selected ligand atoms.
  • center_of_mass_correction (bool, optional) – If True, calculate the distance between the metal center and the center of mass of the ligand atoms, and sum that to the final score.
  • distance_correction (bool, optional) – If True, report the deviation of the experimental coordination bond length and the ideal one, as tabulated by chimera.Element, and sum that to the final score.
Returns:

Sum of RMSD of vertices from ideal RMSD and average cosine of angle deviation from ideal orientation of ligand neighbors. A perfect match should report 0.0.

Return type:

float

coordination_sphere(ind)

1. Get atoms and residues found within self.radius angstroms from self.probe. Found residues MUST include self.residues. Otherwise, apply penalty.

2. Sort atoms by absolute difference of self.distance and distance to self.probe. That way, nearest atoms are computed first. If found atoms do not include some of the requested types, apply penalty.

evaluate(ind)
  1. Get requested atoms sorted by distance
  2. If they meet the minimum quantity, return the rmsd for that geometry
  3. If that’s not possible of they are not enough, return penalty
molecules(ind)
probe(ind)
residues(ind)
gaudi.objectives.coordination.enable(**kwargs)
gaudi.objectives.coordination.ideal_bond_deviation(metal, ligand, other_ligands=())

Assess if the current bond vector is well oriented with respect to the ideal bond vector.

Parameters:
  • metal (chimera.Atom) – The ion ligands are coordinating to
  • ligand (chimera.Atom) – Potential ligand atoms to metal
Returns:

Absolute sine of the angle between the ideal vector and the ligand-metal one.

Return type:

float

gaudi.objectives.coordination.ideal_bonded_positions(atom, element, geometry=None)

Distance objective

This objective calculates the distance between two given atoms. It returns the absolute difference between the calculated distance and the target value.

class gaudi.objectives.distance.Distance(threshold=None, tolerance=None, target=None, probes=None, center_of_mass=False, *args, **kwargs)

Bases: gaudi.objectives.ObjectiveProvider

Distance class

Parameters:
  • threshold (float) – Optimum distance to meet
  • tolerance (float) – Maximum deviation from threshold that is not penalized
  • target (str) – The atom to measure the distance to, expressed as <molecule name>/<atom serial>
  • probes (list of str) – The atoms whose distance to target is being measured, expressed as <molecule name>/<atom serial>. If more than one is provided, the average of all of them is returned
  • center_of_mass (bool) –
Returns:

(Mean of) absolute deviation from threshold distance, in A.

Return type:

float

atoms(ind, *targets)
evaluate_center_of_mass(ind)
evaluate_distances(ind)

Measure the distance

gaudi.objectives.distance.enable(**kwargs)

DrugScoreX objective

This objective is a wrapper around the binaries provided by Neudert and Klebe and calculates the score of the current pose.

The lower, the better, so usually you will use a -1.0 weight.

class gaudi.objectives.dsx.DSX(binary=None, potentials=None, proteins=('Protein', ), ligands=('Ligand', ), terms=None, sorting=1, cofactor_mode=0, with_covalent=False, with_metals=True, *args, **kwargs)

Bases: gaudi.objectives.ObjectiveProvider

DSX class

Parameters:
  • protein (str) – The molecule name that is acting as a protein
  • ligand (str) – The molecule name that is acting as a ligand
  • binary (str, optional) – Path to the DSX binary. Only needed if drugscorex is not in PATH.
  • potentials (str, optional) – Path to DSX potentials. Only needed if DSX_POTENTIALS env var has not been set by the installation process (conda install -c insilichem drugscorex normally takes care of that).
  • terms (list of bool, optional) – Enable (True) or disable (False) certain terms in the score function in this order: distance-dependent pair potentials, torsion potentials, intramolecular clashes, sas potentials, hbond potentials
  • sorting (int, defaults to 1) –

    Sorting mode. An int between 0-6, read binary help for -S:

    -S int :  Here you can specify the mode that affects how the results
            will be sorted. The default mode is '-S 1', which sorts the
            ligands in the same order as they are found in the lig_file.
            The following modes are possible::
    
                0: Same order as in the ligand file
                1: Ordered by increasing total score
                2: Ordered by increasing per-atom-score
                3: Ordered by increasing per-contact-score
                4: Ordered by increasing rmsd
                5: Ordered by increasing torsion score
                6: Ordered by increasing per-torsion-score
    
  • cofactor_mode (int, defaults to 0) –

    Cofactor handling mode. An int between 0-7, read binary help for -I:

    -I int :  Here you can specify the mode that affects how cofactors,
            waters and metals will be handeled.
            The default mode is '-I 1', which means, that all molecules
            are treated as part of the protein. If a structure should
            not be treated as part of the protein you have supply a
            seperate file with seperate MOLECULE entries corresponding
            to each MOLECULE entry in the ligand_file (It is assumed
            that the structure, e.g. a cofactor, was kept flexible in
            docking, so that there should be a different geometry
            corresponding to each solution. Otherwise it won't make
            sense not to treat it as part of the protein.).
            The following modes are possible:
                0: cofactors, waters and metals interact with protein,
                ligand and each other
                1: cofactors, waters and metals are treated as part of
                the protein
                2: cofactors and metals are treated as part of the protein
                (waters as in mode 0)
                3: cofactors and waters are treated as part of the protein
                4: cofactors are treated as part of the protein
                5: metals and waters are treated as part of the protein
                6: metals are treated as part of the protein
                7: waters are treated as part of the protein
            Please note: Only those structures can be treated
            individually, which are supplied in seperate files.
    
  • with_covalent (bool, defaults to False) – Whether to deal with covalently bonded atoms as normal atoms (False) or not (True)
  • with_metals (bool, defaults to True) – Whether to deal with metal atoms as normal atoms (False) or not (True)
Returns:

Interaction energy as reported by DSX output logs.

Return type:

float

clean()
evaluate(ind)

Run a subprocess calling DSX binary with provided options, and parse the results. Clean tmp files at exit.

get_molecule_by_name(ind, *names)

Get a molecule gene instance of individual by its name

parse_output(stream)
prepare_command()
prepare_ligands(ligands)
prepare_proteins(proteins)
gaudi.objectives.dsx.enable(**kwargs)

Energy (OpenMM) objective

This objective is a wrapper around OpenMM, providing a GPU-accelerated energy calculation of the system with a simple forcefield evaluation.

class gaudi.objectives.energy.Energy(targets=None, forcefields=('amber99sbildn.xml', ), auto_parametrize=None, parameters=None, platform=None, *args, **kwargs)

Bases: gaudi.objectives.ObjectiveProvider

Calculate the energy of a system

Parameters:
  • targets (list of str, default=None) – If set, which molecules should be evaluated. Else, all will be evaluated.
  • forcefields (list of str, default=('amber99sbildn.xml',)) – Which forcefields to use
  • auto_parametrize (list of str, default=None) – List of Molecule instances GAUDI should try to auto parametrize with antechamber.
  • parameters (list of 2-item list of str) – List of (gaff.mol2, .frcmod) files to use as parametrization source.
  • platform (str) – Which platform to use for calculations. Choose between CPU, CUDA, OpenCL.
Returns:

The estimated potential energy, in kJ/mol

Return type:

float

calculate_energy(coordinates)

Set up an OpenMM simulation with default parameters and return the potential energy of the initial state

Parameters:coordinates (simtk.unit.Quantity) – Positions of the atoms in the system
Returns:potential_energy – Potential energy of the system, in kJ/mol
Return type:float
static chimera_molecule_to_openmm_positions(*molecules)
static chimera_molecule_to_openmm_topology(*molecules)

Convert a Chimera Molecule object to OpenMM structure, providing topology and coordinates.

Parameters:molecule (chimera.Molecule) –
Returns:
  • topology (simtk.openmm.app.topology.Topology)
  • coordinates (simtk.unit.Quantity)
evaluate(individual)

Calculates the energy of current individual

Notes

For static calculations, where molecules are essentially always the same, but with different coordinates, we only need to generate topologies once. However, for dynamic jobs, with potentially different molecules involved each time, we cannot guarantee having the same topology. As a result, we generate it again for each evaluation.

molecules(individual)
simulation

Build a new OpenMM simulation if not yet defined and return it

Notes

self.topology must be defined previously! Use self.chimera_molecule_to_openmm_topology to set it.

gaudi.objectives.energy.calculate_energy(filename, forcefields=None)

Calculate energy from PDB file with desired forcefields. If not specified, amber99sbildn will be used. Returns potential energy in kJ/mol.

gaudi.objectives.energy.enable(**kwargs)

GOLD objective

This objective is a wrapper around the scoring functions provided by CCDC’s GOLD.

It will use the rescoring abilities in GOLD to extract the fitness corresponding to any of the available scoring functions:

  • GoldScore (goldscore)
  • ChemScore (chemscore)
  • Astex Statistical Potential (asp)
  • CHEMPLP (chemplp)

Since GOLD is commercial software, you will need to install it separately and provide a valid license! This is just a wrapper. Make sure to set all the needed environment variables, such as CCDC_LICENSE_FILE, and that ‘gold_auto’ is in $PATH. Check tests/test_objectives_gold.py for an example; make sure to have GOLDXX/bin before GOLDXX/GOLD/bin!

class gaudi.objectives.gold.Gold(protein='Protein', ligand='Ligand', scoring='chemscore', score_component='Score', radius=10, *args, **kwargs)

Bases: gaudi.objectives.ObjectiveProvider

Gold class

Parameters:
  • protein (str) – The name of molecule acting as protein
  • ligand (str) – The name of molecule acting as ligand
  • scoring (str, optional, defaults to chemscore) – Fitness function to use. Choose between chemscore, chemplp, goldscore and asp.
  • score_component (str, optional, defaults to 'Score') – Scoring fields to parse out of the rescore.log file, such as Score, DG, S(metal), etc.
  • radius (float, optional, defaults to 10.0) – Radius (in A) of binding site sphere, the origin of which is automatically centered at the ligand’s center of mass.
Returns:

Interaction energy as reported by GOLD’s chosen scoring function

Return type:

float

clean()
evaluate(ind)

Run a subprocess calling LigScore binary with provided options, and parse the results. Clean tmp files at exit.

get_molecule_by_name(ind, *names)

Get a molecule gene instance of individual by its name

origin(molecule)
parse_output(filename)

Get last word of first line (and unique) and parse it into float

prepare_command(protein_path, ligand_path, origin)
prepare_ligands(ligands)
prepare_proteins(proteins)
gaudi.objectives.gold.enable(**kwargs)

Hydrogen bonds objective

This objective is a wrapper around Chimera’s FindHBond. It returns the number of hydrogen bonds that can be formed between the target molecule and its environment. .. todo:

Evaluate the possible HBonds with some kind of function that
gives a rough idea of the strength (energy) of each of them.
class gaudi.objectives.hbonds.Hbonds(probes=None, radius=5.0, distance_tolerance=0.4, angle_tolerance=20.0, only_intermolecular=True, only_probes=False, *args, **kwargs)

Bases: gaudi.objectives.ObjectiveProvider

Hbonds class :param probes: Names of molecules being object of analysis :type probes: list of str :param radius: Maximum distance from any point of probe that is searched

for a possible interaction
Parameters:
  • distance_tolerance (float, optional) – Allowed deviation from ideal distance to consider a valid H bond.
  • angle_tolerance (float, optional) – Allowed deviation from ideal angle to consider a valid H bond.
  • only_intermolecular (boolean, optional) – Only intermolecular interactions are considered (defaults to True)
  • only_probes (boolean, optional) – Only interactions between probe molecules are considered, excluding other molecule genes. (defaults to False)
Returns:

Number of detected Hydrogen bonds.

Return type:

int

display(bonds)

Mock method to show a graphical depiction of the found H Bonds.

evaluate(ind)

Find H bonds within self.radius angstroms from self.probes, and return only those that interact with probe. Ie, discard those hbonds in that search space whose none of their atoms involved are not part of self.probe.

molecules(ind)
probes(ind)
gaudi.objectives.hbonds.enable(**kwargs)

Inertia objective

This objective calculates the alignment between the axes of inertia of the given molecules.

class gaudi.objectives.inertia.AxesOfInertia(reference=None, targets=None, only_primaries=False, threshold=0.84, *args, **kwargs)

Bases: gaudi.objectives.ObjectiveProvider

Calculates the axes of inertia of given molecules and returns their alignment deviation.

Parameters:
  • reference (str) – Molecule name targets should align to.
  • targets (list of str) – Names of molecules to be aligned to reference
  • threshold (float) – Target average of cosine of angle of alignment between targets and reference.
  • only_primaries (bool) – Consider only the largest inertia vectors.
Returns:

Mean absolute difference of threshold alignment and mean of all the cosines involved for each axis.

Return type:

float

evaluate(individual)

Return the score of the individual under the current conditions.

reference(individual)

The reference molecule. Usually, the biggest in size

targets(individual)
gaudi.objectives.inertia.calculate_alignment(reference_axis, *probes_axes)
gaudi.objectives.inertia.calculate_axes_of_inertia(molecule)
gaudi.objectives.inertia.calculate_inertial_matrix(coordinates, masses)
gaudi.objectives.inertia.centroid(coordinates, masses)
gaudi.objectives.inertia.enable(**kwargs)

LigScore objective

This objective is a wrapper around the scoring fuction provided by IMP’s ligand_score.

The lower, the better, so usually you will use a -1.0 weight.

class gaudi.objectives.ligscore.LigScore(proteins=('Protein', ), ligands=('Ligand', ), method='pose', binary=None, library=None, *args, **kwargs)

Bases: gaudi.objectives.ObjectiveProvider

LigScore class

Parameters:
  • proteins (list of str) – The name of molecules that are acting as proteins
  • ligands (list of str) – The name of molecules that are acting as ligands
  • binary (str, optional) – Path to ligand_score executable
  • library (str, optional) – Path to LigScore lib file
Returns:

Interaction energy as reported by IMP’s ligand_score.

Return type:

float

clean()
evaluate(ind)

Run a subprocess calling LigScore binary with provided options, and parse the results. Clean tmp files at exit.

get_molecule_by_name(ind, *names)

Get a molecule gene instance of individual by its name

parse_output(stream)

Get last word of first line (and unique) and parse it into float

prepare_command(protein_path, ligand_path)
prepare_ligands(ligands)
prepare_proteins(proteins)
gaudi.objectives.ligscore.enable(**kwargs)

NWChem objective

This objective is a wrapper around NWChem.

It expects an additional input template with the keyword $MOLECULE, which will be replaced by the currently expressed molecule(s). See TEMPLATE for an example, which works as a default template if none is provided.

A ~/.nwchemrc file should be present. If you installed NWChem with our conda recipe, you will find the file in $CONDA_PREFIX/etc/default.nwchemrc. Copy it to your $HOME.

class gaudi.objectives.nwchem.NWChem(template=None, targets=('Ligand', ), parser=None, title=None, executable=None, basis_library=None, processors=None, *args, **kwargs)

Bases: gaudi.objectives.ObjectiveProvider

NWChem class

Parameters:
  • targets (list of str) – Molecule name(s) to be processed with NWChem. Small ones!
  • template (str, optional) – NWChem input template (or path to a file with such contents) containing a $MOLECULE placeholder to be replaced by the currently expressed molecule(s) requested in targets, and optionally, a $TITLE placeholder to be replaced by the job name. If not provided, it will default to the TEMPLATE example (single-point dft energy).
  • parser (str, optional) – Path to a Python script containing a top-level function called parse_output which will parse the NWChem output and return a float. This replaces the default parser, which looks for the last ‘Total <whatever> energy’ value.
  • processors (int, optional=None) – Number of physical processors to use with openmpi
Returns:

Any numeric value as reported by the parser routines. By default, last ‘Total <whatever> energy’ value.

Return type:

float

clean()
evaluate(ind)

Run a subprocess calling DSX binary with provided options, and parse the results. Clean tmp files at exit.

get_molecule_by_name(ind, *names)

Get a molecule gene instance of individual by its name

get_xyz(*molecules)
parse_output(stream)
prepare_nwfile(*molecules)
gaudi.objectives.nwchem.enable(**kwargs)

Solvation objective

This objective calculates SASA for the given system (or region).

class gaudi.objectives.solvation.Solvation(targets=None, threshold=0.0, radius=5.0, method='area', *args, **kwargs)

Bases: gaudi.objectives.ObjectiveProvider

Solvation class

Parameters:
  • targets ([str]) – Names of the molecule genes being analyzed
  • threshold (float, optional, default=0) – Optimize the difference to this value
  • radius (float, optional, default=5.0) – Max distance to search for neighbor atoms from targets.
  • method (str, optional, default=area) – Which method should be used. Both methods compute the surface of the solvated molecule. area returns the surface area of such surface, while volume returns the volume occuppied by the model.
Returns:

Surface area of solvated shell, in A² (if method=area), or volume of solvated shell, in A³ (if method=volume).

Return type:

float

evaluate_area(ind)
evaluate_volume(ind)
molecules(ind)
surface(ind)
targets(ind)
zone_atoms(probes, molecules)
gaudi.objectives.solvation.enable(**kwargs)
gaudi.objectives.solvation.grid_sas_surface(atoms, probe_radius=1.4, grid_spacing=0.5)

Stripped from Chimera’s Surface.gridsurf

Vina objective

This objective is a wrapper around the scoring functions provided by AutoDock Vina.

class gaudi.objectives.vina.Vina(receptor='Protein', ligand='Ligand', prepare_each=False, *args, **kwargs)

Bases: gaudi.objectives.ObjectiveProvider

Vina class

Parameters:
  • receptor (str) – Key of the gene containing the molecule acting as receptor (protein)
  • ligand (str) – Key of the gene containing the molecule acting as ligand
  • prepare_each (bool) – Whether to prepare receptors and ligands in every evaluation or try to cache the results for faster performance.
Returns:

Interaction energy in kcal/mol, as reported by AutoDock Vina –score-only.

Return type:

float

Notes

  • AutoDock scripts prepare_ligand4.py and prepare_receptor4.py are

used to prepare the corresponding .pdqt files that will be used as input for AutoDock Vina scorer. - No repairs nor cleanups will be performed on ligand/receptor molecules, so the user has to take into account that provided .mol2 or .pdb files have correct atom types and correct structure (including Hydrogen atoms that will be taken into account in the docking evaluation). Otherwise, AutoDock errors/warnings could appear (e.g. ValueError: Could not find atomic number for Lp Lp) - Gasteiger charges will be added during the preparation of the .pdbqt files. - All torsions of the ligand will be marked as inactive for AutoDock, because torsion changes are part of GaudiMM genes.

clean()
evaluate(ind)

Run a subprocess calling Vina binary with provided options, and parse the results. Clean tmp files at exit.

parse_output(stream)
prepare_ligand(molecule)
prepare_receptor(molecule)
tmpfile
gaudi.objectives.vina.enable(**kwargs)

Volume objective

This objective calculates the volume occupied by the requested Molecule gene instance.

Note

Volume is calculated from the surfacePiece created by a new experimental method found in Chimera’s Surface.gridsurf. This could be used as an objective for SES, instead of Solvation.

class gaudi.objectives.volume.Volume(threshold=0.0, target=None, cavities=False, *args, **kwargs)

Bases: gaudi.objectives.ObjectiveProvider

Volume class

Parameters:
  • threshold (float or 'auto') – Final volume to target. If ‘auto’, it will calculate the sum of VdW volumes of all requested atoms in probes. (Unimplemented!)
  • target (list of str) – Molecule gene name to calculate volume over
  • cavities (boolean, optional, default=False) – If True, evaluate cavities volume creating a convex hull and calculating the difference between convex hull volume and molecule volume
Returns:

Calculated volume in A³

Return type:

float

evaluate_convexhull(ind)
evaluate_volume(ind)
target(ind)
gaudi.objectives.volume.convexhull_volume(surface)

This function gets a surface, creates the convex hull and calculates its volume

Parameters:surface (Surface.gridsurf.ses_surface(molecule.atoms)) –
Returns:volume – Convex hull volume
Return type:float

Notes

Some systems may produce small volume blobs, resulting in a number of different surface pieces. This should be discussed in the future. # points = surface.surfacePieces[0].geometry[0] # convexhull = scipy.spatial.ConvexHull(points)

gaudi.objectives.volume.enable(**kwargs)

Base class for all objectives

Objectives are modules that reside in the gaudi.objectives package, and have a certain class structure

class gaudi.objectives.ObjectiveProvider(environment=None, name=None, weight=None, zone=None, precision=3, **kwargs)

Bases: object

Base class that every objectives plugin MUST inherit.

Mount point for plugins implementing new objectives to be evaluated by DEAP. The objective resides within the Fitness attribute of the individual. Do whatever you want, but use an evaluate() function to return the results. Apart from that, there’s no requirements.

The base class includes some useful attributes, so don’t forget to call ObjectiveProvider.__init__ in your overriden __init__. For example, self.zone is a Chimera.selection.ItemizedSelection object which is shared among all objectives. Use that to get atoms in the surrounding of the target gene, and remember to self.zone.clear() it before use.

— From (M.A. itself)[http://martyalchin.com/2008/jan/10/simple-plugin-framework/]: Now that we have a mount point, we can start stacking plugins onto it. As mentioned above, individual plugins will subclass the mount point. Because that also means inheriting the metaclass, the act of subclassing alone will suffice as plugin registration. Of course, the goal is to have plugins actually do something, so there would be more to it than just defining a base class, but the point is that the entire contents of the class declaration can be specific to the plugin being written. The plugin framework itself has absolutely no expectation for how you build the class, allowing maximum flexibility. Duck typing at its finest.

classmethod clear_cache()
evaluate(individual)

Return the score of the individual under the current conditions.

plugins = [<class 'gaudi.objectives.angle.Angle'>, <class 'gaudi.objectives.contacts.Contacts'>, <class 'gaudi.objectives.coordination.Coordination'>, <class 'gaudi.objectives.distance.Distance'>, <class 'gaudi.objectives.dsx.DSX'>, <class 'gaudi.objectives.energy.Energy'>, <class 'gaudi.objectives.gold.Gold'>, <class 'gaudi.objectives.hbonds.Hbonds'>, <class 'gaudi.objectives.inertia.AxesOfInertia'>, <class 'gaudi.objectives.ligscore.LigScore'>, <class 'gaudi.objectives.nwchem.NWChem'>, <class 'gaudi.objectives.solvation.Solvation'>, <class 'gaudi.objectives.vina.Vina'>, <class 'gaudi.objectives.volume.Volume'>]
classmethod validate(data, schema=None)
classmethod with_validation(**kwargs)

gaudi.algorithms

This module implements evolutionary algorithms as seen in DEAP, and extends their functionality to make use of GAUDI goodies.

Todo

  • Genealogy
gaudi.algorithms.dump_population(population, cfg, subdir=None)
gaudi.algorithms.ea_mu_plus_lambda(population, toolbox, mu, lambda_, cxpb, mutpb, ngen, cfg, stats=None, halloffame=None, verbose=True, prompt_on_exception=True)

This is the \((\mu + \lambda)\) evolutionary algorithm.

Parameters:
  • population – A list of individuals.
  • toolbox – A Toolbox that contains the evolution operators.
  • mu – The number of individuals to select for the next generation.
  • lambda_ – The number of children to produce at each generation.
  • cxpb – The probability that an offspring is produced by crossover.
  • mutpb – The probability that an offspring is produced by mutation.
  • ngen – The number of generation.
  • stats – A Statistics object that is updated inplace, optional.
  • halloffame – A HallOfFame object that will contain the best individuals, optional.
  • verbose – Whether or not to log the statistics.
Returns:

The final population.

First, the individuals having an invalid fitness are evaluated. Then, the evolutionary loop begins by producing lambda_ offspring from the population, the offspring are generated by a crossover, a mutation or a reproduction proportionally to the probabilities cxpb, mutpb and 1 - (cxpb + mutpb). The offspring are then evaluated and the next generation population is selected from both the offspring and the population. Briefly, the operators are applied as following

evaluate(population)
for i in range(ngen):
    offspring = varOr(population, toolbox, lambda_, cxpb, mutpb)
    evaluate(offspring)
    population = select(population + offspring, mu)

This function expects toolbox.mate(), toolbox.mutate(), toolbox.select() and toolbox.evaluate() aliases to be registered in the toolbox. This algorithm uses the varOr() variation.

gaudi.base

Contains the core classes we use to build individuals (potential solutions of the optimization process).

class gaudi.base.BaseIndividual(cfg=None, cache=None, **kwargs)

Bases: object

Base class for individual objects that are evaluated by DEAP.

Each individual is a potential solution. It contains all that is needed for an evaluation. With multiprocessing in mind, individuals should be self-contained so it can be passed between threads.

The defined methods are only wrapper calls to the respective methods of each gene.

Parameters:
  • cfg (gaudi.parse.Settings) – The full parsed object from the configuration YAML file.
  • cache (dict or dict-like) – A mutable object that can be used to store values across instances.
  • dummy (bool) – If True, create an uninitialized Individual, only containing the cfg attribute. If false, call __ready__ and complete initialization.
__CACHE

Class attribute that caches gene data across instances

Type:dict
__CACHE_OBJ

Class attribute that caches objectives data across instances

Type:dict

Todo

write() should use Pickle and just save the whole object, but Chimera’s inmutable objects (Atoms, Residues, etc) get in the way. A workaround may be found if we take a look a the session saving code.

clear_cache()
evaluate(environment)

Express individual, evaluate it and unexpress it.

Parameters:environment (Environment) – Objectives that will evaluate the individual
express()

Express genes in this environment. Very much like ‘compiling’ the individual to a chimera.Molecule.

mate(other)

Recombine genes of self with other. It simply calls mate on each gene instance

Parameters:other (Individual) – Another individual to mate with.
mutate(indpb)

Trigger a round of possible mutations across all genes

Parameters:indpb (float) – Probability of suffering a mutation
post_express()
post_unexpress()
pre_express()
pre_unexpress()
similar(other)

Compare self and other with a similarity function.

Returns:
Return type:bool
unexpress()

Undo .express()

write(i, path=None)

Export the individual to a mol2 file

Parameters:i (int) – Individual identificator in current generation or hall of fame
:param .. note ::: Maybe someday we can pickle it all :/
>>> filename = os.path.join(path, '{}_{}.pickle.gz'.format(name,i))
>>> with gzip.GzipFile(filename, 'wb') as f:
>>>     cPickle.dump(self, f, 0)
>>> return filename
class gaudi.base.Environment(cfg=None, *args, **kwargs)

Bases: object

Objective container and helper to evaluate an individual. It must be instantiated with a gaudi.parse.Settings object.

Parameters:cfg (gaudi.parse.Settings) – The parsed configuration YAML file that contains objectives information
clear_cache()
evaluate(individual)

individual : Individual

class gaudi.base.Fitness(weights)

Bases: deap.base.Fitness

wvalues = ()
class gaudi.base.MolecularIndividual(*args, **kwargs)

Bases: gaudi.base.BaseIndividual

find_molecule(name)
post_express()
xyz(gene=None)
gaudi.base.expressed(*args, **kwds)

gaudi.box

This module is a messy collection of useful functions used all along GAUDI.

Todo

Some of these functions are hardly used, so maybe we should clean it a little in the future…

gaudi.box.atoms_between(atom1, atom2)

Finds all connected atoms between two given atoms

gaudi.box.atoms_by_serial(*serials, **kw)

Find atoms in kw[‘atoms’] with serialNumber = serials.

Parameters:
  • serials (int) – List of serial numbers to match
  • atoms (list of chimera.Atom, optional) – List of atoms to be traversed while looking for serial numbers
Returns:

Return type:

list of chimera.Atom

gaudi.box.create_single_individual(path)

Create an individual within Chimera. Convenience method for Chimera IDLE.

gaudi.box.do_cprofile(func)

Decorator to cProfile a certain function and output the results to cprofile.out

gaudi.box.draw_interactions(interactions, startCol='FF0000', endCol='FFFF00', key=None, name='Custom pseudobonds')

Draw pseudobonds depicting atoms relationships.

Parameters:
  • interactions (list of tuples) – Each tuple contains an interaction, defined, at least, by the two atoms involved.
  • startCol (str, optional) – Hex code for the initial color of the pseudobond (closer to the first atom of the pair).
  • endCol (str, optional) – Hex code for the final color of the pseudobond. (closer to the second atom of the pair)
  • key (int, optional) – The index of an interaction tuple that represent the alpha channel in the color used to depict the interaction.
  • name (str, optional) – Name of the pseudobond group created.
Returns:

Return type:

chimera.pseudoBondGroup

gaudi.box.files_in(path, ext=None)

Returns all the files in a given directory, filtered by extension if desired.

Parameters:
  • path (str) –
  • ext (list of str, optional) – File extension(s) to filter on.
Returns:

Return type:

List of absolute paths

gaudi.box.find_nearest(anchor, atoms)

Find the atom of atoms that is closer to anchor, in terms of number of atoms in between.

gaudi.box.highest_atom_indices(r)

Returns a dictionary with highest atom indices in given residue Key: value -> element.name: highest index in residue

gaudi.box.incremental_existing_path(path, separator='__')
gaudi.box.open_models_and_close(*args, **kwds)
gaudi.box.pseudobond_to_bond(molecule, remove=False)

Transforms every pseudobond in molecule to a covalent bond

Parameters:
  • molecule (chimera.Molecule) –
  • remove (bool) – If True, remove original pseudobonds after actual bonds have been created.
gaudi.box.rmsd(a, b)
gaudi.box.sequential_bonds(atoms, s)

Returns bonds in atoms in sequential order, beginning at atom s

gaudi.box.silent_stdout(*args, **kwds)
gaudi.box.stdout_to_file(workspace, stderr=True)
gaudi.box.suppress_ksdssp(trig_name, my_data, molecules)

Monkey-patch Chimera triggers to disable KSDSSP computation

gaudi.box.write_individuals(inds, outpath, name, evalfn, remove=True)

Write an individual to disk.

Note

Deprecated since an Individual object is able to write itself to disk.

gaudi.exceptions

This module collects more meaningful exceptions than builtins.

exception gaudi.exceptions.AtomsNotFound

Bases: exceptions.Exception

exception gaudi.exceptions.MoleculesNotFound

Bases: exceptions.Exception

exception gaudi.exceptions.ResiduesNotFound

Bases: exceptions.Exception

exception gaudi.exceptions.TooManyAtoms

Bases: exceptions.Exception

exception gaudi.exceptions.TooManyResidues

Bases: exceptions.Exception

gaudi.parallel

Helper functions to deal with parallel execution of GaudiMM jobs.abs

Useful for benchmarks.

gaudi.parallel.run_parallel(fn, args=(), processes=None, initializer=None, initargs=(), maxtasksperchild=1, map_timeout=9999999, map_chunksize=1, map_callback=None)

Create a pool instance with built-in exception handling.

gaudi.parse

This module parses and validates YAML input files into convenient objects that allow per-attribute access to configuration parameters.

gaudi.parse.AssertList(*validators, **kwargs)

Make sure the value is contained in a list

gaudi.parse.Coordinates(v)
gaudi.parse.Degrees(v)
gaudi.parse.ExpandUserPathExists(v)
gaudi.parse.Importable(v)
gaudi.parse.MakeDir(validator)
class gaudi.parse.MoleculeAtom(molecule, atom)

Bases: tuple

atom

Alias for field number 1

molecule

Alias for field number 0

class gaudi.parse.MoleculeResidue(molecule, residue)

Bases: tuple

molecule

Alias for field number 0

residue

Alias for field number 1

gaudi.parse.Molecule_name(v)

Ideal implementation:

def fn(v):
    valid = [i['name'] for i in items if i['module'] == 'gaudi.genes.molecule']
    if v not in valid:
        raise Invalid("{} is not a valid Molecule name".format(v))
    return v
return fn

However, I must figure a way to get the gene list beforehand

gaudi.parse.Named_spec(*names)

Assert that str is formatted like “Molecule/123”, with Molecule being a valid name of a Molecule gene and 123 a positive int or *

gaudi.parse.RelPathToInputFile(inputpath=None)
gaudi.parse.ResidueThreeLetterCode(v)
class gaudi.parse.Settings(path=None, validation=True)

Bases: munch.Munch

Parses a YAML input file with PyYAML, validates it with voluptuous and builds a attribute-accessible dict with Munch.

Hence, all the attributes in this class are generated automatically from the default values, and the updated with the contents of the YAML file.

Parameters:path (str) – Path to YAML file
output

Contains the parameters to determine how to write and report results.

Type:dict
output.path

Directory that will contain the result files. If it does not exist, it will be created. If it does, the contents could be overwritten. Better change this between different attempts.

Type:str, optional, defaults to . (current dir)
output.name

A small identifier for your calculation. If not set, it will use five random characters.

Type:str, optional
output.precision

How many decimals should be used along the simulation. This won’t only affect reports, but also the reported scores by the objectives during the selection process.

Type:int, optional, defaults to 3
output.compress

Whether to apply compression to the individual ZIP files or not.

Type:bool, optional, defaults to True
output.history

Whether to save all the genealogy of the individuals created along the simulation or not. Only for advanced users.

Type:bool, optional, defaults to False
output.pareto

If True, the elite population will be the Pareto front of the population. If False, the elite population will be the best solutions, according to the lexicographic sorting of the fitness values.

Type:bool, optional, defaults to True
output.verbose

Whether to realtime report the progress of the simulation or not.

Type:bool, optional, defaults to True
output.check_every

Dump the elite population every n generations. Switched off if set to 0.

Type:int, optional, defaults to 10
output.prompt_on_exception

When an exception is raised, GaudiMM tries to dump the current population to disk as an emergency rescue plan. This includes pressing Ctrl+C. If this happens, it prompts the user whether to dump it or not. For interactive sessions this is desirable, but no so much for unsupervised cluster jobs. If set to False, this behaviour will be disabled.

Type:bool, optional, defaults to True
ga

Contains the genetic algorithm parameters.

Type:dict
ga.population

Size of the starting population, in number of individuals.

Type:int
ga.generations

Number of generations to simulate.

Type:float
ga.mu

The number of children to select at each generation, expressed as a multiplier of ga.population.

Type:float, optional, defaults to 1.0
ga.lambda_

The number of children to produce at each generation, expressed as a multiplier of ga.population.

Type:float, optional, defaults to 3.0
ga.mut_eta

Crowding degree of the mutation. A high eta will produce a mutant resembling its parent, while a small eta will produce a solution much more different.

Type:float, optional, defaults to 5
ga.mut_pb

The probability that an offspring is produced by mutation.

Type:float, optional, defaults to 0.5
ga.mut_indpb

Independent probability for each gene to be mutated.

Type:float, optional, defaults to 0.75
ga.cx_eta

Crowding degree of the crossover. A high eta will produce children resembling to their parents, while a small eta will produce solutions much more different.

Type:float, optional, defaults to 5
ga.cx_pb

The probability that an offspring is produced by crossover.

Type:float, optional, defaults to 0.5
similarity

Contains the parameters to the similarity operator, which, given two individuals with the same fitness, whether they can be considered the same solution or not.

Type:dict
similarity.module

The function to call when a fitness draw happens. It should be expressed as Python importable path; ie, separated by dots: gaudi.similarity.rmsd.

Type:str
similarity.args

Positional arguments to the similarity function.

Type:list
similarity.kwargs

Optional arguments to the similarity function.

Type:dict
genes

Contains the list of genes that each Individual will have.

Type:list of dict
objectives

Contains the list of objectives that will make the Environment object to evaluate the Individuals.

Type:list of dict
default_values = {'ga': {'cx_eta': 5, 'cx_pb': 0.5, 'generations': 3, 'lambda_': 3, 'mu': 1, 'mut_eta': 5, 'mut_indpb': 0.75, 'mut_pb': 0.5, 'population': 10}, 'genes': [{}], 'objectives': [{}], 'output': {'check_every': 10, 'compress': True, 'history': False, 'name': 'PjBYT', 'pareto': True, 'path': '.', 'precision': 3, 'prompt_on_exception': True, 'verbose': True}, 'similarity': {'args': [['Ligand'], 2.5], 'kwargs': {}, 'module': 'gaudi.similarity.rmsd'}}
name_objectives
schema = {'genes': All(Length(min=1, max=None), [<type 'dict'>], msg=None), 'objectives': All(Length(min=1, max=None), [<type 'dict'>], msg=None), 'output': {'check_every': All(Coerce(int, msg=None), Range(min=0, max=None, min_included=True, max_included=True, msg=None), msg=None), 'compress': Coerce(bool, msg=None), 'history': Coerce(bool, msg=None), 'name': All(<type 'basestring'>, Length(min=1, max=255), msg=None), 'pareto': Coerce(bool, msg=None), 'path': <function fn>, 'precision': All(Coerce(int, msg=None), Range(min=-3, max=6, min_included=True, max_included=True, msg=None), msg=None), 'prompt_on_exception': Coerce(bool, msg=None), 'verbose': Coerce(bool, msg=None)}, 'similarity': {'args': <type 'list'>, 'kwargs': <type 'dict'>, 'module': <type 'basestring'>}, '_path': <function ExpandUserPathExists>, 'ga': {'cx_eta': All(Coerce(int, msg=None), Range(min=0, max=None, min_included=True, max_included=True, msg=None), msg=None), 'cx_pb': All(Coerce(float, msg=None), Range(min=0, max=1, min_included=True, max_included=True, msg=None), msg=None), 'generations': All(Coerce(int, msg=None), Range(min=0, max=None, min_included=True, max_included=True, msg=None), msg=None), 'lambda_': All(Coerce(float, msg=None), Range(min=0, max=None, min_included=True, max_included=True, msg=None), msg=None), 'mu': All(Coerce(float, msg=None), Range(min=0, max=1, min_included=True, max_included=True, msg=None), msg=None), 'mut_eta': All(Coerce(int, msg=None), Range(min=0, max=None, min_included=True, max_included=True, msg=None), msg=None), 'mut_indpb': All(Coerce(float, msg=None), Range(min=0, max=1, min_included=True, max_included=True, msg=None), msg=None), 'mut_pb': All(Coerce(float, msg=None), Range(min=0, max=1, min_included=True, max_included=True, msg=None), msg=None), 'population': All(Coerce(int, msg=None), Range(min=2, max=None, min_included=True, max_included=True, msg=None), msg=None)}}
validate(data=None)
weights
gaudi.parse.deep_update(source, overrides)

Update a nested dictionary or similar mapping.

Modify source in place.

gaudi.parse.parse_rawstring(s)

It parses reference strings contained in some fields.

These strings contain references to genes.molecule instances, and one of its atoms

gaudi.parse.validate(schema, data)

gaudi.plugin

This module provides the basic functionality for the plugin system of genes and objectives.

class gaudi.plugin.PluginMount(name, bases, attrs)

Bases: type

Base class for plugin mount points.

Metaclass trickery obtained from Marty Alchin’s blog Each mount point (ie, genes and objectives), MUST inherit this one.

gaudi.plugin.import_plugins(*pluginlist)

Import requested modules, only once, when launch.py is called and the configuration is parsed successfully.

Parameters:pluginlist (list of gaudi.parse.Param) – Usually, the genes or objectives list resulting from the configuration parsing.
gaudi.plugin.load_plugins(plugins, container=None, **kwargs)

Requests an instance of the class that resides in each plugin. For genes, each individual has its own instance, but objectives are treated like a singleton. So, they are only instantiated once. That’s the reason behind usen a mutable container.

Parameters:
  • plugins (list of gaudi.parse.Param) – Modules to load. Each Param must have a module attr with a full import path.
  • container (dict or dict-like) – If provided, use this container to retain instances across individuals.
  • kwargs – Everything else will be passed to the requested plugin instances.

gaudi.similarity

This module contains the similarity functions that are used to discard individuals that are not different enough.

gaudi.similarity.rmsd(ind1, ind2, subjects, threshold, *args, **kwargs)

Returns the RMSD between two individuals

Parameters:
  • ind1 (gaudi.base.Individual) –
  • ind2 (gaudi.base.Individual) –
  • subjects (list of str) – Name of gaudi.genes.molecule instances to measure
  • threshold (float) – Maximum RMSD value to consider two individuals as similar. If rmsd > threshold, they are considered different.
Returns:

True if rmsd is within threshold, False otherwise. It will always return False if number of atoms is not equal in the two Individuals.

Return type:

bool

gaudi._cpdrift

Coherent Point Drift (affine and rigid) Python2/3 implementation, adapted from kwohlfahrt’s.

Only 3D points are supported in this version.

Depends on:

  • Python 2.7, 3.4+
  • Numpy
  • Matplotlib (plotting only)
class gaudi._cpdrift.Quaternion(s, i, j, k)

Bases: object

axis_angle
conjugate(other=None)
classmethod fromAxisAngle(v, theta)
matrix()
norm()
reciprocal()
unit()
vector
gaudi._cpdrift.RMSD(X, Y)
gaudi._cpdrift.affine_cpd(X, Y, w=0.0, B=None)
gaudi._cpdrift.affine_xform(X, B=<Mock name='mock.array()' id='140682130454096'>, t=0)
gaudi._cpdrift.coherent_point_drift(X, Y, w=0.0, B=None, guess_steps=5, max_iterations=20, method='affine')
gaudi._cpdrift.common_steps(X, Y, Y_, w, sigma_sq)
class gaudi._cpdrift.frange(start, stop, step)

Bases: object

gaudi._cpdrift.last(sequence)
gaudi._cpdrift.pairwise_sqdist(X, Y)
gaudi._cpdrift.plot(x, y, t)

Plot the initial datasets and registration results.

Parameters:
  • x (ndarray) – The static shape that y will be registered to. Expected array shape is [n_points_x, n_dims]
  • y (ndarray) – The moving shape. Expected array shape is [n_points_y, n_dims]. Note that n_dims should be equal for x and y, but n_points does not need to match.
  • t (ndarray) – The transformed version of y. Output shape is [n_points_y, n_dims].
gaudi._cpdrift.rigid_cpd(X, Y, w=0.0, R=None)
gaudi._cpdrift.rigid_xform(X, R=<Mock name='mock.array()' id='140682130454096'>, t=0.0, s=1.0)
gaudi._cpdrift.rotation_matrix(*angles)
gaudi._cpdrift.spaced_rotations(N)
gaudi._cpdrift.std(x)

Indices and tables