MISATO: machine learning dataset of protein–ligand complexes for structure-based drug discovery

Jumper, J. et al. Highly correct protein construction prediction with AlphaFold. Nature 596, 583–589 (2021).Article 

Google Scholar 
Berman, H., Henrick, Okay. & Nakamura, H. Announcing the worldwide Protein Data Bank. Nat. Struct. Mol. Biol. 10, 980 (2003).Article 

Google Scholar 
Mohs, R. C. & Greig, N. H. Drug discovery and growth: function of primary organic analysis. Alzheimer’s Dement. Transl. Res. Clin. Interv. 3, 651–657 (2017).Article 

Google Scholar 
Sliwoski, G., Kothiwale, S., Meiler, J. & Lowe, E. W. Computational strategies in drug discovery. Pharm. Rev. 66, 334–395 (2014).Article 

Google Scholar 
Thiel, W. Semiempirical quantum-chemical strategies. WIREs Comput. Mol. Sci. 4, 145–157 (2014).Article 

Google Scholar 
Hollingsworth, S. A. & Dror, R. O. Molecular dynamics simulation for all. Neuron 99, 1129–1143 (2018).Article 

Google Scholar 
Siebenmorgen, T. & Zacharias, M. Computational prediction of protein–protein binding affinities. WIREs Comput. Mol. Sci. 10, e1448 (2020).Article 

Google Scholar 
Trott, O. & Olson, A. J. AutoDock Vina: enhancing the pace and accuracy of docking with a brand new scoring operate, environment friendly optimization, and multithreading. J. Comput. Chem. 31, 455–461 (2010).Article 

Google Scholar 
Kmiecik, S. et al. Coarse-grained protein fashions and their functions. Chem. Rev. 116, 7898–7936 (2016).Article 

Google Scholar 
Spicher, S. & Grimme, S. Robust atomistic modeling of supplies, organometallic, and biochemical programs. Angew. Chem. Int. Ed. 59, 15665–15673 (2020).Article 

Google Scholar 
Vandenbrande, S., Waroquier, M., Speybroeck, V. V. & Verstraelen, T. The monomer electron density pressure subject (MEDFF): a bodily impressed mannequin for noncovalent interactions. J. Chem. Theory Comput. 13, 161–179 (2017).Article 

Google Scholar 
Wang, J. & Dokholyan, N. V. Yuel: enhancing the generalizability of structure-free compound–protein interplay prediction. J. Chem. Inf. Model. 62, 463–471 (2022).Article 

Google Scholar 
Ponder, J. W. et al. Current standing of the AMOEBA polarizable pressure subject. J. Phys. Chem. B 114, 2549–2564 (2010).Article 

Google Scholar 
Chen, B. et al. Automated discovery of basic variables hidden in experimental information. Nat. Comput Sci. 2, 433–442 (2022).Article 

Google Scholar 
Durrant, J. D. & McCammon, J. A. NNScore: a neural-network-based scoring operate for the characterization of protein−ligand complexes. J. Chem. Inf. Model. 50, 1865–1871 (2010).Article 

Google Scholar 
Wang, X., Terashi, G., Christoffer, C. W., Zhu, M. & Kihara, D. Protein docking mannequin analysis by 3D deep convolutional neural networks. Bioinformatics 36, 2113–2118 (2020).Article 

Google Scholar 
Wang, N.-N. et al. ADME properties analysis in drug discovery: prediction of Caco-2 cell permeability utilizing a mix of NSGA-II and boosting. J. Chem. Inf. Model. 56, 763–773 (2016).Article 

Google Scholar 
Ishida, S., Terayama, Okay., Kojima, R., Takasu, Okay. & Okuno, Y. AI-driven artificial route design integrated with retrosynthesis data. J. Chem. Inf. Model. 62, 1357–1367 (2022).Article 

Google Scholar 
Karpov, P., Godin, G. & Tetko, I. V. A transformer mannequin for retrosynthesis. In Artificial Neural Networks and Machine Learning—ICANN 2019: Workshop and Special Sessions (eds Tetko, I. V. et al.) 817–830 (Springer, 2019).Öztürk, H., Özgür, A. & Ozkirimli, E. DeepDTA: deep drug–goal binding affinity prediction. Bioinformatics 34, i821–i829 (2018).Article 

Google Scholar 
Karimi, M., Wu, D., Wang, Z. & Shen, Y. DeepAffinity: interpretable deep learning of compound–protein affinity by means of unified recurrent and convolutional neural networks. Bioinformatics 35, 3329–3338 (2019).Article 

Google Scholar 
Hassan-Harrirou, H., Zhang, C. & Lemmin, T. RosENet: enhancing binding affinity prediction by leveraging molecular mechanics energies with an ensemble of 3D convolutional neural networks. J. Chem. Inf. Model. 60, 2791–2802 (2020).Article 

Google Scholar 
Feinberg, E. N. et al. PotentialNet for molecular property prediction. ACS Cent. Sci. 4, 1520–1530 (2018).Article 

Google Scholar 
Li, Y., Rezaei, M. A., Li, C. & Li, X. DeepAtom: a framework for protein–ligand binding affinity prediction. In 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) 303–310 (IEEE, 2019).Wang, R., Fang, X., Lu, Y., Yang, C.-Y. & Wang, S. The PDBbind database: methodologies and updates. J. Med. Chem. 48, 4111–4119 (2005).Article 

Google Scholar 
Liu, T., Lin, Y., Wen, X., Jorissen, R. N. & Gilson, M. Okay. BindingDB: a web-accessible database of experimentally decided protein–ligand binding affinities. Nucleic Acids Res. 35, D198–D201 (2007).Article 

Google Scholar 
Hu, L., Benson, M. L., Smith, R. D., Lerner, M. G. & Carlson, H. A. Binding MOAD (Mother Of All Databases). Proteins Struct. Funct. Bioinform. 60, 333–340 (2005).Article 

Google Scholar 
Friedrich, N.-O., Simsir, M. & Kirchmair, J. How numerous are the protein-bound conformations of small-molecule medication and cofactors? Front. Chem. 6, 68 (2018).Article 

Google Scholar 
Korlepara, D. B. et al. PLAS-5k: dataset of protein–ligand affinities from molecular dynamics for machine learning functions. Sci. Data 9, 548 (2022).Article 

Google Scholar 
Korlepara, D. B. et al. PLAS-20k: prolonged dataset of protein–ligand affinities from MD simulations for machine learning functions. Sci. Data 11, 180 (2024).Article 

Google Scholar 
Yang, J., Shen, C. & Huang, N. Predicting or pretending: synthetic intelligence for protein–ligand interactions lack of sufficiently massive and unbiased datasets. Front. Pharmacol. 11, 69 (2020).Article 

Google Scholar 
Volkov, M. et al. On the frustration to foretell binding affinities from protein–ligand buildings with deep neural networks. J. Med. Chem. 65, 7946–7958 (2022).Article 

Google Scholar 
Vajda, S., Beglov, D., Wakefield, A. E., Egbert, M. & Whitty, A. Cryptic binding websites on proteins: definition, detection, and druggability. Curr. Opin. Chem. Biol. 44, 1–8 (2018).Article 

Google Scholar 
Zeng, L. et al. Selective small molecules blocking HIV-1 Tat and coactivator PCAF affiliation. J. Am. Chem. Soc. 127, 2376–2377 (2005).Article 

Google Scholar 
Johnson, R. D. III (ed). Computational Chemistry Comparison and Benchmark Database Standard Reference Database Number 101 Release 22 (NIST, accessed 12 Jul 2022); http://cccbdb.nist.gov/Bista, M. et al. Transient protein states in designing inhibitors of the MDM2–p53 interplay. Structure 21, 2143–2151 (2013).Article 

Google Scholar 
Xie, M. et al. Structural foundation of inhibition of ERα–coactivator interplay by high-affinity N-terminus isoaspartic acid tethered helical peptides. J. Med. Chem. 60, 8731–8740 (2017).Article 

Google Scholar 
Jakalian, A., Jack, D. B. & Bayly, C. I. Fast, environment friendly era of high-quality atomic prices. AM1-BCC mannequin: II. Parameterization and validation. J. Comput. Chem. 23, 1623–1641 (2002).Article 

Google Scholar 
Dodda, L. S., Vilseck, J. Z., Tirado-Rives, J. & Jorgensen, W. L. 1.14*CM1A-LBCC: localized bond-charge corrected CM1A prices for condensed-phase simulations. J. Phys. Chem. B 121, 3864–3870 (2017).Article 

Google Scholar 
Jorgensen, W. L., Maxwell, D. S. & Tirado-Rives, J. Development and testing of the OPLS all-atom pressure subject on conformational energetics and properties of natural liquids. J. Am. Chem. Soc. 118, 11225–11236 (1996).Article 

Google Scholar 
Storer, J. W., Giesen, D. J., Cramer, C. J. & Truhlar, D. G. Class IV cost fashions: a brand new semiempirical strategy in quantum chemistry. J. Comput. Aided Mol. Des. 9, 87–110 (1995).Article 

Google Scholar 
Li, J., Zhu, T., Cramer, C. J. & Truhlar, D. G. New class IV cost mannequin for extracting correct partial prices from wave features. J. Phys. Chem. A 102, 1820–1831 (1998).Article 

Google Scholar 
Thompson, J. D., Cramer, C. J. & Truhlar, D. G. Parameterization of cost mannequin 3 for AM1, PM3, BLYP, and B3LYP. J. Comput. Chem. 24, 1291–1304 (2003).Article 

Google Scholar 
Grimme, S. & Bannwarth, C. Ultra-fast computation of digital spectra for massive programs by tight-binding based mostly simplified Tamm–Dancoff approximation (sTDA-xTB). J. Chem. Phys. 145, 054103 (2016).Article 

Google Scholar 
Wang, E. et al. End-point binding free power calculation with MM/PBSA and MM/GBSA: methods and functions in drug design. Chem. Rev. 119, 9478–9508 (2019).Article 

Google Scholar 
Sun, Z., Liu, Q., Qu, G., Feng, Y. & Reetz, M. T. Utility of B components in protein science: decoding rigidity, flexibility, and inner movement and engineering thermostability. Chem. Rev. 119, 1626–1665 (2019).Article 

Google Scholar 
Guilligay, D. et al. The structural foundation for cap binding by influenza virus polymerase subunit PB2. Nat. Struct. Mol. Biol. 15, 500–506 (2008).Article 

Google Scholar 
Rayne, S. & Forest, Okay. Benchmarking semiempirical, Hartree–Fock, DFT, and MP2 strategies in opposition to the ionization energies and electron affinities of short- by means of long-chain [n]acenes and [n]phenacenes. Can. J. Chem. 94, 251–258 (2016).Article 

Google Scholar 
Zhan, C.-G., Nichols, J. A. & Dixon, D. A. Ionization potential, electron affinity, electronegativity, hardness, and electron excitation power: molecular properties from density practical concept orbital energies. J. Phys. Chem. A 107, 4184–4195 (2003).Article 

Google Scholar 
Lange, G. et al. Requirements for particular binding of low affinity inhibitor fragments to the SH2 area of pp60Src are similar to these for excessive affinity binding of full size inhibitors. J. Med. Chem. 46, 5184–5195 (2003).Article 

Google Scholar 
Öster, L., Tapani, S., Xue, Y. & Käck, H. Successful era of structural data for fragment-based drug discovery. Drug Discov. Today 20, 1104–1111 (2015).Article 

Google Scholar 
Heinzlmeir, S. et al. Chemoproteomics-aided medicinal chemistry for the discovery of EPHA2 inhibitors. ChemMedChem 12, 999–1011 (2017).Article 

Google Scholar 
Gaieb, Z. et al. D3R Grand Challenge 2: blind prediction of protein–ligand poses, affinity rankings, and relative binding free energies. J. Comput. Aided Mol. Des. 32, 1–20 (2018).Article 

Google Scholar 
Whitehouse, A. J. et al. Development of inhibitors in opposition to Mycobacterium abscessus tRNA (m1G37) methyltransferase (TrmD) utilizing fragment-based approaches. J. Med. Chem. 62, 7210–7232 (2019).Article 

Google Scholar 
Menezes, F. & Popowicz, G. M. ULYSSES: an environment friendly and straightforward to make use of semiempirical library for C. J. Chem. Inf. Model. 62, 3685–3694 (2022).Article 

Google Scholar 
Bannwarth, C., Ehlert, S. & Grimme, S. GFN2-xTB—an correct and broadly parametrized self-consistent tight-binding quantum chemical technique with multipole electrostatics and density-dependent dispersion contributions. J. Chem. Theory Comput. 15, 1652–1671 (2019).Article 

Google Scholar 
Dewar, M. J. S., Zoebisch, E. G., Healy, E. F. & Stewart, J. J. P. Development and use of quantum mechanical molecular fashions. 76. AM1: a brand new common goal quantum mechanical molecular mannequin. J. Am. Chem. Soc. 107, 3902–3909 (1985).Article 

Google Scholar 
Stewart, J. J. P. Application of the PM6 technique to modeling proteins. J. Mol. Model. 15, 765–805 (2009).Article 

Google Scholar 
Sigalov, G., Fenley, A. & Onufriev, A. Analytical electrostatics for biomolecules: past the generalized Born approximation. J. Chem. Phys. 124, 124902 (2006).Article 

Google Scholar 
Christensen, A. S., Kubař, T., Cui, Q. & Elstner, M. Semiempirical quantum mechanical strategies for noncovalent interactions for chemical and biochemical functions. Chem. Rev. 116, 5301–5337 (2016).Article 

Google Scholar 
Dixon, S. L. & Merz, Okay. M. Fast, correct semiempirical molecular orbital calculations for macromolecules. J. Chem. Phys. 107, 879–893 (1997).Article 

Google Scholar 
O’Boyle, N. M. et al. Open Babel: an open chemical toolbox. J. Cheminform. 3, 33 (2011).Article 

Google Scholar 
Caldeweyher, E. et al. A usually relevant atomic-charge dependent London dispersion correction. J. Chem. Phys. 150, 154122 (2019).Article 

Google Scholar 
Hanwell, M. D. et al. Avogadro: a sophisticated semantic chemical editor, visualization, and evaluation platform. J. Cheminform. 4, 17 (2012).Article 

Google Scholar 
Case, D. A. et al. Amber 2021 (Univ. of California, San Francisco, 2021).Wang, J., Wolf, R. M., Caldwell, J. W., Kollman, P. A. & Case, D. A. Development and testing of a common Amber pressure subject. J. Comput. Chem. 25, 1157–1174 (2004).Article 

Google Scholar 
Maier, J. A. et al. ff14SB: enhancing the accuracy of protein facet chain and spine parameters from ff99SB. J. Chem. Theory Comput. 11, 3696–3713 (2015).Article 

Google Scholar 
Jorgensen, W. L., Chandrasekhar, J., Madura, J. D., Impey, R. W. & Klein, M. L. Comparison of easy potential features for simulating liquid water. J. Chem. Phys. 79, 926–935 (1983).Article 

Google Scholar 
Townshend, R. J. L. et al. ATOM3D: duties on molecules in three dimensions. Preprint at https://doi.org/10.48550/arXiv.2012.04035 (2022).Kipf, T. N. & Welling, M. Semi-supervised classification with graph convolutional networks. Preprint at https://doi.org/10.48550/arXiv.1609.02907 (2017).Huang, Y., Niu, B., Gao, Y., Fu, L. & Li, W. CD-HIT Suite: an internet server for clustering and evaluating organic sequences. Bioinformatics 26, 680–682 (2010).Article 

Google Scholar 
Forli, S. et al. Computational protein–ligand docking and digital drug screening with the AutoDock suite. Nat. Protoc. 11, 905–919 (2016).Article 

Google Scholar 
Zhao, Y., Stoffler, D. & Sanner, M. Hierarchical and multi-resolution illustration of protein flexibility. Bioinformatics 22, 2768–2774 (2006).Article 

Google Scholar 
Ravindranath, P. A., Forli, S., Goodsell, D. S., Olson, A. J. & Sanner, M. F. AutoDockFR: advances in protein–ligand docking with explicitly specified binding website flexibility. PLoS Comput. Biol. 11, e1004586 (2015).Article 

Google Scholar 
Delaglio, F. et al. NMRPipe: a multidimensional spectral processing system based mostly on UNIX pipes. J. Biomol. NMR 6, 277–293 (1995).Article 

Google Scholar 
Johnson, B. A. & Blevins, R. A. NMR View: a pc program for the visualization and evaluation of NMR information. J. Biomol. NMR 4, 603–614 (1994).Article 

Google Scholar 
Siebenmorgen, T. et al. MISATO—machine learning dataset for structure-based drug discovery. Zenodo https://doi.org/10.5281/zenodo.7711953 (2023).t7morgen/misato-dataset: launch for publication. Zenodo https://doi.org/10.5281/zenodo.10926008 (2024).

https://www.nature.com/articles/s43588-024-00627-2

Recommended For You