OpenForcefield 0.4.0 parametrization tests on XChem Data
A while ago I worked together with Anthony Bradley, Maciej Majewski and Xavier Barril on porting our dynamic undocking approach (Nat Chem 2016) from a MOE - Amber based version to a full open-source version. In that work we used MOE to essentially prepare and parametrize chunks of protein binding sites with their ligands in them. Amber was used to run steered molecular dynamics. The whole source code for running these simulations is available on the Barril lab website.
I love MOE, it’s a great piece of software, but unfortunately it’s not free and now that I’m nor in academia, nor in a pharma company I find myself unable to further develop an approach I implemented in the past which is kind of frustrating to put it politely. The same goes for Amber.
So I started to look for alternatives, also under a gentle demand from Anthony (poking me regularly) who wanted to apply the whole approach in his fragment screening pipeline. That’s in the scope of the crazy impressive XChem project. If you haven’t heard / read about that check it out - it gives you a bit of an insight on what will be possible very near future in xray crystallography!
Back at the time the openforcefield project just started and an alpha version for an rdkit integration was sort of available. After a lot of ugly hacking we were able to prepare and run our dynamic undocking runs using openforcefield and openmm. The code is still available here . The problem was though that this rdkit integration and the forcefield itself didn’t allow to cover the large variety of chemotypes in the fragment library at hand (only 60% were covered back at the time) and we had a lot of issues with parametrizing nitrogens in aryles. Unfortunately that’s rather common in druglike molecules ;)
Recently a few updates of the openforcefield toolkit came out … a game changer, as you’ll see.
The openforcefield initiative
If you have to follow and support something during the years to come, then this initiative. Why? It is an effort to bring an open source small molecule force field to the community. Where before you had to handle MMFF in MOE, OPLS in the Schrödinger tools etc here the idea is to enable molecular mechanics on small and macromolecules jointly using open and freely available software. That’s really awesome in my opinion (given the reasons I mentioned before + the tons of public money that already went into the development of some of the previous forcefields).
It’s lead by a consortium of academic and industrial partners and advisors
It’s all about open science / data
It’ll allow people to develop methods and apply them where before you needed expert molecular modelling tools that integrated and maintained for most of them their own small molecule force fields
It’s openmm native and amber compatible
They apply an orthogonal approach to parametrize the ligands using smirks patterns
Recently version 0.2 was released with this time around an official support for rdkit. Things are moving fast, today 0.4 was released. This post is describing the first tests on preparing ligand topologies with the version 0.4. More to come on more detailed tests in upcoming posts.
My test run here
In my first test run, source code available here, I wanted to see whether the 768 fragments from the XChem fragment library can be parametrized with the new version and how they behave after a simple minimization.
Again a bit of code, but just the relevant parts. The fully functional source code is on github.
My first naive approach intended to read in smiles and use the handy Molecule.from_smiles function to create a new molecule instance with the openforcefield toolkit.
with open('data/fragment_library.smi', 'r') as reader:
for smiles in reader.readlines():
smiles=smiles.strip()
try:
mol = Molecule.from_smiles(smiles)
except Exception as err:
cntErrors1.append(type(err).__name__)
pass
reader.close()
That works actually for a lot of the fragments, but I got 112 errors complaining about stereo-centers not being defined. That’s a known issue and there are ways around that. Here’s how.
As we are reading smiles strings, when creating the molecule we have no 3D conformation ready to be used. That’s the issue here. Let’s use RDKit to generate such a single initial 3D conformation, assign stereocenters from that conformation and then try again to parametrize the ligand. Again, that’s only necessary if you start from plain 2D structures which should not happen very often. If necessary though, you could also use RDKit to create all stereoisomers of your molecules. Not needed here!
with open('data/fragment_library.smi', 'r') as reader:
n=0
for smiles in reader.readlines():
n+=1
smiles=smiles.strip()
print(str(n)+" : "+smiles)
rdkit_mol=Chem.MolFromSmiles(smiles)
rdkit_mol_h = Chem.AddHs(rdkit_mol)
AllChem.EmbedMolecule(rdkit_mol_h) #generate a 3D conformation to avoid the errors before
Chem.rdmolops.AssignAtomChiralTagsFromStructure(rdkit_mol_h) #very important, else you'll repeat the errors from before
AllChem.ComputeGasteigerCharges(rdkit_mol_h) #thats not at all mandatory nore recommended. It's just to go faster here. Don't do that in your code !!!
try:
#here is where the magic happens
mol = Molecule.from_rdkit(rdkit_mol_h) #create a openff molecule from the rdkit one
top=mol.to_topology() #extract the topology
ligand_system = force_field.create_openmm_system(top,charge_from_molecules=[mol]) #create our openmm system (that's the simplest version here)
In the end we have prepared our ligand system that we can then use to either combine that with a protein (I’ll cover in another post) or run directly a simulation in vacuum.
In the example code I did a short minimization in vacuum and wrote back out the RMS, energy difference and final conformation for visual inspection.
Good news
All fragments technically pass the parametrization and minimization step. That’s great news compared to version 0.1!
Visual inspection
If you are interested in seeing the outcome of that test run, just download output sdf from the github respository and look at it in your favourite visualization tool. Usually I’d use MOE for that, but that’s not possible anymore, so I used 3decision here, dropped it in the ligand browser which allowed me to quickly browse through all conformations (you can rate them if you want as well everything with the keyboard ;)).
That’s very qualitative for now, but compared to the previous version it gives a quite good view on the improvements on this project so far and this is really exciting!
I have a few doubts about some of the out of plane arrangements between aryles and attached amides, but that’s really minor for now and I’d have to check that a bit further. I’ll try to do a more thorough analysis on commonly difficult to handle moieties a bit later.
Bottomline
It’s a really good basis and I’m really excited about this project as it gives a lot of freedom to develop new approaches (I already have a few in mind … just need time …). You’ll very likely read more about the usage of this toolkit in near future.
Acknowledgments
Big thanks to Anthony Bradley for the dataset and pushing me on the duck integration for openmm -> thus making me discover the openforcefield initiative.
Kudos to the openforcefield initiative people (David Mobley, John Chodera & Jeff Wager + a lot more I haven’t heard of for sure)….it’s just so cool!
Further reading
The website of the initiative: https://openforcefield.org/
Official examples are heere: https://github.com/openforcefield/openforcefield/tree/master/examples
Current version Openforcefield supports rdkit - Blogpost by @iwatobipen: https://iwatobipen.wordpress.com/2019/05/21/current-version-openforcefield-supports-rdkit-rdkit-openforcefield-chemoinformatics/