Preparing a database of corrected protein structures important in cell signaling pathways

Samaneh Hatami , Hajar Sirous, Karim Mahnam, Aylar Najafipour, Afshin Fassihi


Background and purpose: Precise structures of macromolecules are important for structure-based drug design. Due to the limited resolution of some structures obtained from X-ray diffraction crystallography, differentiation between the NH and O atoms can be difficult. Sometimes a number of amino acids are missing from the protein structure. In this research, we intend to introduce a small database that we have prepared for providing the corrected 3D structure files of proteins frequently used in structure-based drug design protocols.

Experimental approach: 3454 soluble proteins belonging to the cancer signaling pathways were collected from the PDB database from which a dataset of 1001 was obtained. All were subjected to corrections in the protein preparation step. 896 protein structures out of 1001 were corrected successfully and the decision on the remained 105 proposed twelve for homology modeling to correct the missing residues. Three of them were subjected to molecular dynamics simulation for 30 ns.

Findings / Results: 896 corrected proteins were perfect and homology modeling on 12 proteins with missing residues in the backbone resulted in acceptable models according to Ramachandran, z-score, and DOPE energy plots. RMSD, RMSF, and Rg values verified the stability of the models after 30 ns molecular dynamics simulation.

Conclusion and implication: A collection of 1001 proteins were modified for some defects such as adjustment of the bond orders and formal charges, and addition of missing side chains of residues. Homology modeling corrected the amino missing backbone residues. This database will be completed for quite a lot of water-soluble proteins to be uploaded to the internet.


PDB; Homology modeling; Molecular dynamics simulation; Protein database; Protein structure.

Full Text:



Prieto-Martínez FD, López-López E, Juárez-Mercado KE, Medina-Franco JL. Chapter 2-Computational drug design methods-current and future In silico drug des. Veracruz: 2019; 19-44.DOI: 10.1016/B978-0-12-816125-8.00002-X.

Goodsell DS, Zardecki C, Di Costanzo L, Duarte JM, Hudson BP, Persikova I, et al. RCSB protein data bank: enabling biomedical research and drug discovery. Protein Sci. 2020;29(1):52-65. DOI: 10.1002/pro.3730.

Armstrong DR, Berrisford JM, Conroy MJ, Gutmanas A, Anyango S, Choudhary P, et al. PDBe: improved findability of macromolecular structure data in the PDB. Nucleic Acids Res. 2020;48(D1): D335–D343.DOI: 10.1093/nar/gkz990.

Nakamura H. Big data science at AMED-BINDS. Biophys Rev. 2020;12(2):221-224. DOI: 10.1007/s12551-020-00628-1.

Burley SK, Berman HM, Bhikadiya C, Bi C, Chen L, Di Costanzo l, et al. RCSB protein data bank: biological macromolecular structures enabling research and education in fundamental biology, biomedicine, biotechnology and energy. Nucleic Acids Res. 2019; 47(D1): D464–D474. DOI: 10.1093/nar/gky1004.

Li Z, Li S, Wei X, Peng X, Zhao Q. Recovering the missing regions in crystal structures from the nuclear magnetic resonance measurement data using matrix completion method. J Comput Biol. 2019;27(5):709-717.DOI: 10.1089/cmb.2019.0107.

Santhosh R, Bankoti N, Padmashri AM, Michael D, Jeyakanthan J, Sekar K. MRPC (missing regions in polypeptide chains): a knowledgebase. J Appl Crystallogr. 2019;52(6):1422-1426.DOI: 10.1107/s1600576719012330.

Ataee MH, Mirhosseini SA, Mirnejad R, Rezaie E, Mahmoodzadeh Hosseini H, Amani J. Design of two immunotoxins based rovalpituzumab antibody against DLL3 receptor; a promising potential opportunity. Res Pharm Sci. 2022;17(4):428-444.DOI: 10.4103/1735-5362.350243.

Razzaghi-Asl N, Mirzayi S, Mahnam K, Adhami V, Sepehri S. In silico screening and molecular dynamics simulations toward new human papillomavirus 16 type inhibitors. Res Pharm Sci. 2022;17(2):189-208.DOI: 10.4103/1735-5362.335177.

Sastry GM, Adzhigirey M, Day T, Annabhimoju R, Sherman W. Protein and ligand preparation: parameters, protocols, and influence on virtual screening enrichments. J Comput Aided Mol Des. 2013;27(3):221-234.DOI: 10.1007/s10822-013-9644-8.

Kitchen DB, Decornez H, Furr JR, Bajorath J. Docking and scoring in virtual screening for drug discovery: methods and applications. Nat Rev Drug Discov. 2004;3(11):935-949.DOI: 10.1038/nrd1549.

Kellenberger E, Rodrigo J, Muller P, Rognan D. Comparative evaluation of eight docking tools for docking and virtual screening accuracy. Proteins. 2004;57(2):225-242. DOI: 10.1002/prot.20149.

Feher M, Williams CI. Numerical errors and chaotic behavior in docking simulations. J Chem Inf Model. 2012;52(3):724-738.DOI: 10.1021/ci200598m.

Siegel RL, Miller KD, Jemal A. Cancer statistics, 2020. CA Cancer J Clin. 2020;70(1):7-30.DOI: 10.3322/caac.21590.

Farooqi AA, De La Roche M, Djamgoz MB, Siddik ZH. Overview of the oncogenic signaling pathways in colorectal cancer: mechanistic insights. Sem Cancer Biol. 2019;58:65-79.DOI: 10.1016/j.semcancer.2019.01.001.

Bernstein FC, Koetzle TF, Williams GJ, Meyer Jr EF, Brice MD, Rodgers JR, et al. The Protein data bank: a computer-based archival file for macromolecular structures. J Mol Biol. 1977;112(3):535-542.DOI: 10.1016/s0022-2836(77)80200-3.

Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, et al. The protein data bank. Nucleic acids Res. 2000;28(1):235-242.DOI: 10.1093/nar/28.1.235.

Protein Preparation Wizard 2015, -1; Epik version 2.4, Schrödinger, LLC, New York, NY, 2015; Impact version 5.9, Schrödinger, LLC, New York, NY, 2015; Prime version 3.2, Schrödinger, LLC, New York, NY, 2015.

Impact, Impact Version 5.9. Schrödinger, LLC, New York, NY, 2015.

Jorgensen WL, Maxwell DS, Tirado-Rives J. Development and testing of the OPLS all-atom force field on conformational energetics and properties of organic liquids. J Am Chem Soc. 1996;118(45):11225-11236.DOI: 10.1021/ja9621760.

Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, et al. Gapped blast and PSI-blast: a new generation of protein database search programs. Nucleic Acids Res. 1997;25(17):3389-3402.DOI: 10.1093/nar/25.17.3389.

Sali A, Blundell TL. Comparative protein modeling by satisfaction of spatial restraints. J Mol Biol. 1993:234(3);779-815.DOI: 10.1006/jmbi.1993.1626.

Laskowski RA, Macarthur MW, Moss DS, Thornton J. Procheck: a program to check the stereochemical quality of protein structures. J Appl Crystallogr. 1993;26(2):283-291.DOI: 10.1107/S0021889892009944.

Wiederstein M, Sippl MJ. ProSA-web: interactive web service for the recognition of errors in three-dimensional structures of proteins. Nucleic acids Res. 2007;35(suppl 2):W407-W410.DOI: 10.1093/nar/gkm290.

Sippl MJ. Recognition of errors in three‐dimensional structures of proteins. Proteins. 1993;17(4):355-362.DOI: 10.1002/prot.340170404.

Adcock SA, McCammon JA. Moleculardynamics: survey of methods for simulating the activity of proteins. Chem Rev. 2006;106(5): 1589-1615.DOI: 10.1021/cr040426m.

Maestro, In Maestro Version 10.1; Schrödinger, LLC: New York, NY, USA, 2015.

Desmond Molecular Dynamics System, version 4.1,D. E. Shaw Research, 2015. Maestro-Desmond Interoperability Tools, Version 4.1. Schrödinger, New York, NY, 2015.

Bowers KJ, Chow DE, Xu H, Dror RO, Eastwood MP, Gregersen BA, et al. Scalable algorithms for molecular dynamics simulations on commodity clusters. Proceedings of the 2006 ACM/IEEE conference on Supercomputing, Florida: IEEE;2006. pp. 84-96.DOI: 10.1109/SC.2006.54.

Jorgensen WL, Chandrasekhar J, Madura JD, Impey RW, Klein ML. Comparison of simple potential functions for simulating liquid water. J chem phys. 1983;79(2):926-935.DOI: 10.1063/1.445869.

Humphreys DD, Friesner RA, Berne BJ. A multiple-time-step molecular dynamics algorithm for macromolecules. J Phys Chem. 1994;98(27):6885-6892.DOI: 10.1021/j100078a035.

Hoover WG. Canonical dynamics: equilibrium phase-space distributions. Phys Rev A. 1985;31(3):1695-1697.DOI: 10.1103/PhysRevA.31.1695.

Martyna GJ, Tobias DJ, Klein ML. Constant pressure molecular dynamics algorithms. J Chem Phys. 1994;101(5):4177-4189. DOI: 10.1063/1.467468.

Essmann U, Perera L, Berkowitz ML, Darden T, Lee H, Pedersen LG. A smooth particle mesh Ewald method. J Chem Phys. 1995;103(19):8577-8593. DOI: 10.1063/1.470117.


  • There are currently no refbacks.

Creative Commons LicenseThis work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License which allows users to read, copy, distribute and make derivative works for non-commercial purposes from the material, as long as the author of the original work is cited properly.