Ex Parte Genge et alDownload PDFPatent Trial and Appeal BoardApr 28, 201612266837 (P.T.A.B. Apr. 28, 2016) Copy Citation UNITED STATES PATENT AND TRADEMARK OFFICE UNITED STATES DEPARTMENT OF COMMERCE United States Patent and Trademark Office Address: COMMISSIONER FOR PATENTS P.O. Box 1450 Alexandria, Virginia 22313-1450 www.uspto.gov APPLICATION NO. FILING DATE FIRST NAMED INVENTOR ATTORNEY DOCKET NO. CONFIRMATION NO. 12/266,837 11/07/2008 Brian Genge USC-138 4925 22827 7590 04/29/2016 DORITY & MANNING, P.A. POST OFFICE BOX 1449 GREENVILLE, SC 29602-1449 EXAMINER BORIN, MICHAEL L ART UNIT PAPER NUMBER 1631 MAIL DATE DELIVERY MODE 04/29/2016 PAPER Please find below and/or attached an Office communication concerning this application or proceeding. The time period for reply, if any, is set in the attached communication. PTOL-90A (Rev. 04/07) UNITED STATES PATENT AND TRADEMARK OFFICE ____________ BEFORE THE PATENT TRIAL AND APPEAL BOARD ____________ Ex parte BRIAN GENGE, LICIA WU and ROY WUTHIER ____________ Appeal 2013-007917 Application 12/266,8371 Technology Center 1600 ____________ Before RICHARD M. LEBOVITZ, JEFFREY N. FREDMAN, and ROBERT A. POLLOCK, Administrative Patent Judges. LEBOVITZ, Administrative Patent Judge. DECISION ON APPEAL This appeal involves claims directed to a “method for identifying a test compound that blocks formation of a complex including calcium ion, inorganic phosphate, and a phospholipid.” Appellants appeal from the Examiner’s final rejection of claims 1, 4-6, 40, 41, 43-46, 48-50, and 52 as obvious under 35 U.S.C. § 103. We have jurisdiction under 35 U.S.C. § 134. The Examiner’s rejections are reversed. We set forth a new ground of rejection under 37 C.F.R. § 41.50(b). 1 “The ’837 application.” Appeal 2013-007917 Application 12/266,837 2 STATEMENT OF CASE The Examiner finally rejected claims 1, 4-6, 40, 41, 43-46, 48-50, and 52. Appellant appeals from the Examiner’s decision in the final rejection. The claims stand rejected by the Examiner as follows: 1. Claims 1, 4-6, 40, 44-46, 48-50, and 52 under 35 U.S.C. § 103 as obvious in view of Wu (1996),2 Wu (2003),3 Sourina,4 and Moy.5 Final Rej’n 3. 2. Claim 41 under 35 U.S.C. § 103 as obvious in view of Wu (1996), Wu, Sourina, Moy, and Concha. Final Rej’n 6. 3. Claim 43 under 35 U.S.C. § 103 as obvious in view of Wu (1996), Wu (2003), Sourina, Moy, and Kirsh. Final Rej’n 3. Claim 1 is representative and is reproduced below. 1. A method for identifying a test compound that blocks formation of a complex including calcium ion, inorganic phosphate, and a phospholipid comprising: integrating physical parameters of a calcium ion, an inorganic phosphate molecule, a phospholipid molecule and a test compound molecule into an in silico system, said physical parameters comprising the partial charges of the atoms of the 2 Wu et al., Characterization and Reconstitution of the Nucleational Complex Responsible for Mineral Formation by Growth Plate Cartilage Matrix Vesicles, Connective Tissue Research 35, 309-315 (1996). 3 Wu et al., Effects of analogues of inorganic phosphate and sodium ion on mineralization of matrix vesicles isolated from growth plate cartilage of normal rapidly growing chickens, J. Inorganic Biochemistry 94, 221-235 (2003). 4 Sourina et al., Visual Mining and Spatio-Temporal Querying in Molecular Dynamics, J. Comput. Theor. Nanosci. 2, 1-7 (2005). 5 Moy et al., Intermolecular Forces and Energies Between Ligands and Receptors, Science 266, 257-259 (1994). Appeal 2013-007917 Application 12/266,837 3 molecules, the bond length between the atoms of the molecules, the bond angles between the atoms of the molecules, and dihedrals of the molecules; bringing the calcium ion, the inorganic phosphate molecule, the phospholipid molecule and the test compound molecule within an interactive distance of one another during an in silico simulation, the interactive distance being between about 5 and about 15 Angstroms; and determining via the in silico simulation that the test compound blocks formation of a complex comprising the calcium ion, the inorganic phosphate molecule, and the phospholipid molecule. REJECTIONS 1-3 Claim 1 is directed to an in silico computer modeling method “for identifying a test compound that blocks formation of a complex including calcium ion, inorganic phosphate, and a phospholipid.” The method comprises “integrating physical parameters of a calcium ion, an inorganic phosphate molecule, a phospholipid molecule and a test compound molecule into an in silica system.” The recited physical parameters comprise “the partial charges of the atoms of the molecules, the bond length between the atoms of the molecules, the bond angles between the atoms of the molecules, and dihedrals of the molecules.” The Examiner cited Sourina for its teaching of a computer program for modeling dynamics between molecules. Final Rej’n 3. The Examiner acknowledged that Sourina does not describe modeling a calcium- phosphate-phospholipid complex, but found that Wu (1996) and Wu (2003) describe the biological importance of the recited calcium complex formation. Id. at 3-4. Wu (2003) also describes an inhibitor of the complex. Id. at 4. The Examiner reasoned that the skilled worker would have been led Appeal 2013-007917 Application 12/266,837 4 to utilize a computer model to identify inhibitors of calcium complex formation as taught by Wu (2003) because computer modeling is a “common tool of the trade” to solve such problems. Id. at 4-5. (“The nature of the problem to be solved - identifying inhibitors of formation of calcium complexes leading to mineralization (neocalcification) - would lead inventors to look at references relating to methods of identifying inhibitors by methods other than experimental in vitro techniques.”) Id. at 4. Appellants contend that Sourina does not teach or suggest an algorithm which incorporates “the bond length between the atoms of the molecules, the bond angles between the atoms of the molecules, or the dihedrals of the molecules as required by claim 1 Appeal Br. 5. Appellants also contend that there would have been no reason to identify inhibitors of the calcium complex formation using Sourina’s in silico system. Id. at 8. We agree with Appellants that Sourina does not appear to describe all the physical parameters recited in rejected claim 1. Sourina teaches The molecular dynamics algorithm applies classical Newtonian equations of motion and with self-consistent approximation for inter- and intramolecular forces (empirical potentials for chemical bonds, angles and torsions, electrostatic potential for ions and partially charged atoms, Lennard-Jones potentials for disperse interactions). Sourina 4. Specifically, as argued by Appellants, the Examiner did not identify disclosure in Sourina of bond length and dihedrals, parameters required by claim 1. Consequently, we reverse all the rejections. Appeal 2013-007917 Application 12/266,837 5 NEW GROUND OF REJECTION The ’837 application discloses that NAMD (NAno-scale Molecular Dynamics) software can be used to model interactions between molecules. ’837 application, ¶ 65. The ’837 application also describes using NAMD to perform the computer modeling of interactions between calcium, phosphate, and phospholipid as in claim 1. Id. at ¶¶ 75, 78. NAMD is a computer system for simulating biomolecular systems. Phillips (2005).6 NAMD uses the same physical parameters recited in claim 1, including the charge of an atom in the molecule (id. at 1783-84 (formula (6), (7), “Full Electrostatic Computation”)), the bond length between atoms of the molecules (id. at 1783, 1792 (formula (3), “bond lengths, can be constrained”)), the bond angles between the atoms of the molecules (id. at 1783 (formula (4), Fig. 2)), and dihedrals of the molecules (id. at 1783 (formula (5), Fig. 2)). NAMD has been used to simulate the activity of inhibitors on molecular interactions. See Phillips (2005), p. 1797 (“Multiscale lac Repressor-DNA Complex”); Mitra (2006),7 p. 5226 (“Furthermore, to investigate how dinitroanilines affect tubulin dynamics, molecular dynamics simulations of Leishmania α-tubulin with and without a bound dinitroaniline are performed.”); p. 5227 (“To determine the binding site of the dinitroanilines, ensembles of conformations for Toxoplasma, Plasmodium, and Leishmania α-tubulin were required. Using the molecular dynamics (MD) program NAMD v2.5” simulations were performed.). Because 6 Phillips et al., Scalable Molecular Dynamics with NAMD, J. Comput. Chem. 26, 1781-1802 (2005). 7 Mitra et al., Blinding and Interaction of Dinitroanilines with Apicomplexan and Kinetoplastid α-Tubulin, J. Med. Chem. 49, 5226-5231 (2006). Appeal 2013-007917 Application 12/266,837 6 NAMD is known tool to simulate molecular interactions, and as it has been used to study inhibitors of molecular interactions, one of ordinary skill in the art would have had reason to use NAMD to identify inhibitors of the calcium-phosphate-phospholipid complex described in Wu (1996) and Wu (2003). As stated in KSR Int’l Co. v. Teleflex Inc., 550 U.S. 398, 417 (2007): [I]f a technique has been used to improve one device, and a person of ordinary skill in the art would recognize that it would improve similar devices in the same way, using the technique is obvious unless its actual application is beyond his or her skill. . . . [A] court must ask whether the improvement is more than the predictable use of prior art elements according to their established functions. In this case, the use of NAMD to study the molecular interactions described in Wu (1996) and Wu (2003), including interactions with inhibitors as taught by Wu (2003) (see, e.g., p. 225 describing effects of analogues on MV mineralization; p. 229, “Discussion”), is the predictable use of a computer simulation program for its established function. The ’837 application acknowledges: Brief reviews on potential energy functions used in MD [molecular dynamic] simulations are known in the art, any of which can be incorporated into disclosed modeling systems. For example, properties that can be incorporated can include, without limitation, lipid head area, stability, diffusion, radial distribution functions, structures and atomic distances throughout the simulation trajectory. ’837 application, ¶ 65. Thus, contrary to the arguments made in the Appeal Brief, it does not appear that the physical parameters utilized in the computer model are new. See also, id. at ¶¶ 67-68. The application also states that certain parameters utilized in the model “can be obtained or calculated from Appeal 2013-007917 Application 12/266,837 7 other sources as are generally known to one of skill in the art,” indicating that the values to accomplish the computer simulation were also known in the art. Id. at ¶ 69. Consequently, we conclude that claim 1 would have been obvious to one of ordinary skill in the art in view of Wu (1996), Wu (2003), Philips (2005), and Mitra (2006). This is a new ground of rejection under 37 C.F.R. § 41.50(b). With respect to the dependent claims 4-6, 40, 41, 43, 44-46, and 48- 50, we leave it to the Examiner to determine whether their subject matter is obvious in view of the above-cited publications alone or in combination with other prior art. Claim 4, 40, 41, and 43 further involve modeling with the annexin-5 molecule using physical parameters of the molecule. We leave to the Examiner to determine whether the recited physical parameters of annexin-5 were available at the time of the invention. TIME PERIOD This decision contains new grounds of rejection pursuant to 37 C.F.R. § 41.50(b) (effective September 13, 2004, 69 Fed. Reg. 49960 (August 12, 2004), 1286 Off. Gaz. Pat. Office 21 (September 7, 2004)). 37 C.F.R. § 41.50(b) provides that “[a] new ground of rejection pursuant to this paragraph shall not be considered final for judicial review.” 37 C.F.R. § 41.50(b) also provides that the Appellants, WITHIN TWO MONTHS FROM THE DATE OF THE DECISION, must exercise one of the following two options with respect to the new ground of rejection to avoid termination of the appeal as to the rejected claims: (1) Reopen prosecution. Submit an appropriate amendment of the claims so rejected or new evidence relating to the claims so rejected, Appeal 2013-007917 Application 12/266,837 8 or both, and have the matter reconsidered by the Examiner, in which event the proceeding will be remanded to the Examiner. . . . (2) Request rehearing. Request that the proceeding be reheard under § 41.52 by the Board upon the same record. . . REVERSED; 41.77(B) Notice of References Cited Application/Control No. Applicant(s)/Patent Under Patent Appeal No. Examiner Art Unit Page 1 of 1 U.S. PATENT DOCUMENTS * Document Number Country Code-Number-Kind Code Date MM-YYYY Name Classification A US- B US- C US- D US- E US- F US- G US- H US- I US- J US- K US- L US- M US- FOREIGN PATENT DOCUMENTS * Document Number Country Code-Number-Kind Code Date MM-YYYY Country Name Classification N O P Q R S T NON-PATENT DOCUMENTS * Include as applicable: Author, Title Date, Publisher, Edition or Volume, Pertinent Pages) U V W X *A copy of this reference is not being furnished with this Office action. (See MPEP § 707.05(a).) Dates in MM-YYYY format are publication dates. Classifications may be US or foreign. U.S. Patent and Trademark Office PTO-892 (Rev. 01-2001) Notice of References Cited Part of Paper No. Binding and Interaction of Dinitroanilines with Apicomplexan and Kinetoplastid r-Tubulin Arpita Mitra†,‡ and David Sept*,‡,§ Department of Chemical Engineering, Center for Computational Biology, and Department of Biomedical Engineering, Washington UniVersity, St. Louis, Missouri 63130-4899 ReceiVed April 21, 2006 Despite years of use as commercial herbicides, it is still unclear how dinitroanilines interact with tubulin, how they cause microtubule disassembly, and why they are selectively active against plant and protozoan tubulin. In this work, through a series of computational studies, a common binding site of oryzalin, trifluralin, and GB-II-5 on apicomplexan and kinetoplastidR-tubulin is proposed. Furthermore, to investigate how dinitroanilines affect tubulin dynamics, molecular dynamics simulations ofLeishmaniaR-tubulin with and without a bound dinitroaniline are performed. The results obtained provide insight into the molecular mechanism by which these compounds interact with tubulin and function to prevent microtubule assembly. Finally, to aid in the design of effective parasitic microtubule inhibitors, several novel dinitroaniline analogues are evaluated. The location of the binding site and the relative binding affinities of the dinitroanilines all agree well with experimental data. Introduction Microtubules (MTsa) are cylindrical polymers formed by the polymerization ofRâ-tubulin heterodimers. MTs are essential in a diverse array of eukaryotic cell functions, such as mitosis, cell motility, and intracellular organelle transport. Many of these cellular processes are associated with the dynamic reorganization of MT arrays, which is often regulated by the interaction of various small molecules and accessory proteins with the tubulin dimer or microtubules. Dinitroanilines are widely used in herbicide formulations because they bind with high affinity to plant tubulin, inhibit tubulin assembly, and disrupt MTs of plants.1-3 Interestingly, these compounds also inhibit parasite growth and differentiation as well as depolymerize MTs of parasitic protozoa, such asTrypanosomaspp.,4,5 Leishmania spp.,6-8 Plasmodium falciparum,9 andToxoplasma gondii.10 In contrast, dinitroanilines are ineffective against mammalian and fungal tubulin.7,11 This selective activity of the dinitroanilines against parasitic tubulin gives them the potential of being developed as anti-parasitic agents. For this reason, molecular characterization and structure-activity relationship data is required to design dinitroaniline analogues for use as effective therapeutic agents against life-threatening infectious diseases, such as leishmaniasis, trypanosomiasis, toxoplasmosis, and malaria. The flagellate protozoan parasiteL ishmaniaspp. belongs to the phylum Kinetoplastida and causes visceral and cutaneous leishmaniasis in humans. The dinitroaniline1 (oryzalin; see Figure 1) inhibits the polymerization of partially purified Leishmaniatubulin resulting in abnormal MTs, whereas rat brain tubulin polymerizes normally.11 Compound2 (trifluralin), an analogue of1, binds selectively to partially purifiedLeishmania spp. tubulin as compared to rat brain tubulin.7 More recently,3 (GB-II-5), which differs from1 in having a phenyl ring at the N1 nitrogen of the sulfonamide, has been shown to be more potent and likewise have selective activity againstLeishmania tubulin polymerization.12 3 binds 11-fold stronger toLeishmania tubulin than1, and at concentrations below where3 completely inhibits the polymerization ofLeishmaniatubulin, porcine brain tubulin is inhibited by only 17%.13 The obligate intracellular parasites such asToxoplasma gondii andPlasmodium falciparumbelong to the phylum Apicomplexa. Infection byT. gondii can result in miscarriage, birth defects in newborns, or the loss of vision in immunocompetent hosts as well as toxoplasmic encephalitis in HIV-infected individuals. P. falciparumparasites cause the most lethal form of human malaria, which is responsible for over a million deaths a year. 2 binds selectively toPlasmodiumtubulin and induces the disassembly of subpellicular MTs inP. falciparumgametocytes and merozoites.14,15 Interestingly, 1 displays 10-fold higher activity than3 againstP. falciparum(Werbovetz, K. Personal communication), whereasT. gondiitachyzoites are more sensi- tive to 1 than3 (Morrissette, N. Personal communication). Recently, 23ToxoplasmaR-tubulin mutations associated with resistance to1 were identified and characterized.16 These results provided strong evidence that1 binds toR-tubulin; however, many of these mutations were located in the core ofR-tubulin, * To whom correspondence should be addressed. Tel: 314-935-8837. Fax: 314-935-7448. E-mail: dsept@biomed.wustl.edu. † Department of Chemical Engineering. ‡ Center for Computational Biology. § Department of Biomedical Engineering. a Abbreviations: MT, microtubule; MD, molecular dynamics;Rg, radius of gyration; rmsf, root-mean-square fluctuations. Figure 1. Structures of all dinitroaniline analogues used in docking and simulation studies. 5226 J. Med. Chem.2006,49, 5226-5231 10.1021/jm060472+ CCC: $33.50 © 2006 American Chemical Society Published on Web 07/28/2006 indicating that their effect was probably allosteric. Solely on the basis of these genetic studies, a binding location for1 n ToxoplasmaR-tubulin could not be identified. However, by combining this work with computational studies, we were able to localize the binding to a specific site onR-tubulin. In the current work, we have expanded upon our previous results16 and examined several dinitroanilines and their interactions with Leishmania, Toxoplasma,andPlasmodiumR-tubulin. We have also performed molecular dynamics simulations of theLeish- maniaR-tubulin-3 complex to determine the molecular function of these compounds. Finally, to aid in the design of effective parasitic MT inhibitors, analogues of3 with substitutions in the aryl ring at theN1 nitrogen, as well as modifications at theN4 nitrogen and at the nitro groups present on the sulfanilamide ring were evaluated. Materials and Methods Preparation of Ligands and Protein Structures.Ligands 1-10 (Figure 1) were constructed as all-atom entities, energy minimized using the Tripos force field, and assigned partial atomic charges based on the Gasteiger-Marsili method using Sybyl v6.8 (Tripos Inc., St. Louis, MO). The electron crystallographic structure of the bovine tubulin heterodimer (pdb entry: 1JFF) was completed as described previously.16 This structure was then used as a homology model template to create the structure ofT. gondiiR-tubulin (accession code: AAA30145). Because the twoR-tubulin sequences are 84% identical, theToxoplasmasequence was threaded on to the 1JFF structure, and the differing amino acid side chains were modeled in using WHAT-IF.17 Similarly, P. falciparumandL. major R-tubulin share 93% and 86% sequence identity, respec- tively, with T. gondii R-tubulin. Hence, theToxoplasma R-tubulin structure was used as a homology model template for P. falciparum(accession code: CAA34101) andL. major (accession code: CAJ02501)R-tubulin structures, using the same procedure described above. To determine the binding site of the dinitroanilines, ensembles of conformations forToxoplasma, Plasmodium,andLeishmania R-tubulin were required. Using the molecular dynamics (MD) program NAMD v2.5,18 simulations of the parasiticR-tubulin- GTP complexes were performed using the isothermal-adiabatic (NPT) ensemble with CHARMM27 force field and TIP3P model for water. Interactions were evaluated on the basis of a multiple- time stepping algorithm, where the bonded interactions (using SHAKE algorithm) were computed every 2 fs, the short-range nonbonded electrostatic and van der Waals interactions (10 Å cutoff with a smooth switching function beginning at 8.5 Å, and pairlist distance of 11.5 Å) every 2 fs, and the long-range electrostatic interactions every 4 fs. Particle-Mesh Ewald was used to compute the long-range electrostatics with grid points at least 1 Å in all directions. NPT simulations were performed at 1 atm using a Nose´-Hoover Langevin piston with a decay period of 200 fs and a damping time scale of 100 fs (for heating and equilibration phases) and 500 fs (for production phase), coupled with temperature control involving Langevin dynamics. The steps of the MD simulations included energy minimization down to a gradient of<0.1 kcal/mol, heating with CR restrained to 300 K at intervals of 75 K, equilibration with CR restrained for 100 ps, equilibration with no restraints for 600 ps, followed by 32 ns of the production run, in which the first∼2 ns was the time required for the potential energy to stabilize and is not included in the analysis. From the resulting 30 ns MD trajectories, representative conformations ofT xoplasma, Plas- modium, and LeishmaniaR-tubulin were extracted every 2.5 ns for use in the AutoDock 3.0 docking simulations.19 Docking of Ligands. Flexible docking of all ligands was carried out over the entireR-tubulin structure using the adaptive Lamarckian genetic algorithm in AutoDock 3.0.19 During the molecular docking runs, the macromolecule structures were kept rigid; therefore, using multiple structures of the macromolecule allows sampling of the side chain and backbone conformations and, thereby, captures some flexibility of the protein.20 Fifty docking runs were performed for eachR-tubulin conformation, using a grid spacing of 0.2 Å, a starting population size of 200, 25 million energy evaluations, a mutation rate of 0.07, a crossover rate of 0.8, and 600 iterations of Solis and Wets local search. The 600 bound ligand predictions resulting from the docking simulations were clustered on the basis of a root-mean- square deviation of 2.5 Å from the lowest docked energy conformation of the ligand. The site of the largest cluster with the lowest energy was selected as the active site of the ligand on R-tubulin, with the binding mode of the dinitroaniline being that of the lowest energy complex from this largest top cluster. Parametrization of Compound 3.The molecular geometry of 3 was optimized with ab initio Hartree-Fock calculations, and the Merz-Singh-Kollman CHELPG style atom-centered point charges were derived with the HF/6-31G* basis set using Gaussian 98 (Gaussian Inc., Pittsburgh, PA). The bond- stretching, angle-bending, and torsional force field parameters for the sulfanilamide core and the nitro groups of3 were taken from the MM3 force field,21 whereas the Lennard-Jones force field parameters were adapted from the CHARMM22 force field parameters for proteins.22 All bonded and Lennard-Jones force field parameters for the aryl ring at theN1 nitrogen of the sulfanilamide and the alkane tail parameters were directly taken from the CHARMM22 force field parameters for proteins and the CHARMM27 force field parameters for lipids.22,23 Having the force field parameters of the ligand,LeishmaniaR-tubulin in complex with3 was subsequently simulated for 30 ns under the identical conditions used for theL ishmaniaapo structure. Alignment of R-Tubulin Sequences.The amino acid resi- dues that are conserved between plant and protozoanR-tubulin but are different from those found in mammalian and fungal R-tubulin were determined by performing multiple sequence alignment using ClustalW.24 The R-tubulin sequences of the various species used in the alignment with accession codes in parentheses wereZea mays(CAD20822), Eleusine indica (CAA06619), Toxoplasma gondii(AAA30145), Plasmodium falciparum (CAA34101),Leishmaniamajor (CAJ02501),Try- panosoma brucei(CAJ16366),Sus scrofa(P02550),Homo sapiens(NP_006073),Saccharomyces cereVisiae(NP_013625), andEmericella nidulans(P24634). The full alignment is shown in the Supporting Information. Results Docking of Dinitroanilines to Parasitic R-Tubulin. From the docking simulations of ligands1-3 to multiple conforma- tions of T. gondii, P. falciparum, and L. major R-tubulin, a consensus binding site for the dinitroanilines on parasitic R-tubulin was identified. This active site is located beneath the H1-S2 loop and is formed by the residues (based on the Leishmaniasequence) of strand S1 (Arg2, Glu3, Ala4, Ile5, Cys6), helix H1 (Cys20, Trp21, Phe24), the H1-S2 loop (His28, Met36, Asp39, Lys40, Cys41, Ile42, Asp47, Pro63, Arg64), strand S2 (Cys65), strand S4 (Met136), helix H7 (Val235, Ser236, Thr239, Ala240, Ser 241), and the T7 loop (Arg243, Phe244, Asp245). Figure 2 shows representative structures of 1-3 bound to Toxoplasma, Plasmodium, and Leishmania R-tubulin in the binding conformation predicted by the docking Dinitroanilines Interact with ParasiticR-Tubulin Journal of Medicinal Chemistry, 2006, Vol. 49, No. 175227 simulations. In this bound conformation, the hydrophobic alkane tails of the dinitroanilines orient toward the core of the protein, resulting in favorable interactions with residues Ala4, Ile5, Cys6, Cys20, Trp21, Phe24, Arg64, and Val235 (Figure 2). The sulfonamide headgroup of the dinitroanilines makes contact with the residues at the entrance to the binding site. More details and analysis of these ligand-protein interactions follow in the Discussion. The docking results are summarized in Table 1. The top two clusters for each drug/tubulin combination were located at the binding site, although with slightly different conformations. The binding free energy predictions from AutoDock 3.0 can have errors as high as 2 kcal/mol; however, as observed in previous studies,20 the docking of ligands to multiple protein structures taken from the MD trajectory leads to much more accurate free energy estimates. Likewise, here, we obtain the correct relative ordering of the inhibitors corresponding to their anti-parasitic activity, with 3 being the most potent againstLeishmania, whereas1 and3 exhibit similar binding affinities for apicom- plexanR-tubulin. Effect on Tubulin Dynamics. Having identified the binding site of the dinitroanilines, MD simulations of theLeishmania R-tubulin alone and theLeishmaniaR-tubulin-3 complex were performed in order to compare their dynamics and determine the molecular effect of the drug. To distinguish how protein flexibility is altered by the presence of the bound dinitroaniline, the rmsf of the CR residues for the apo and holo structures were calculated (Figure 3A). To quantify any variation in dynamics, we found the difference between these rmsf plots, calculated the mean and standard deviation (σ) of this rmsf difference over the entire protein, and plotted the result in terms of the standard deviation from the mean (Figure 3B). The residues for which the CR rmsf difference was at least 2σ from the mean were considered to be significant. These calculations reveal that the binding of 3 beneath the H1-S2 loop of R-tubulin severely restricts the flexibility of this loop, particularly affecting residues Asp39-Gly43 and Ser54-Ala58. Interestingly, long-range allosteric effects are also observed, resulting in a more than 50- fold enhancement in the flexibility of the M-loop residues Ser277-Tyr282 in the drug-bound protein complex. The radius of gyration (Rg) of the R-tubulin binding site residues (those within 5 Å of thedrug) were also examined for Figure 2. Representative structures of1, 2, and 3 bound toToxoplasma gondii, Plasmodium falciparumand Leishmania majorR-tubulin as predicted by docking simulations. Key residues within 4 Å of theligand are labeled, and the H1-S2 loop is colored pink. This and other molecular figures were created using VMD.28 Table 1. Two Largest Clusters from 600 Docking Runs with Toxoplasma, Plasmodium, andLeishmaniaR-Tubulin compd first cluster second cluster predicted∆G (kcal/mol) Toxoplasma gondii 1 53 42 -10.48 2 50 36 -9.96 3 16 9 -10.24 Plasmodium falciparum 1 23 14 -10.49 2 20 12 -9.88 3 24 15 -11.59 Leishmania major 1 53 32 -10.67 2 86 33 -9.88 3 58 47 -13.43 5228 Journal of Medicinal Chemistry, 2006, Vol. 49, No. 17 Mitra and Sept both the apo and holo structures (Figure 4). In the apo form, the protein exhibits breathing motions, and the binding pocket of the protein changes between an open conformation (first half of the trajectory) and a closed conformation as a result of the high flexibility of the H1-S2 loop. This movement of the H1- S2 loop helps to maintain interprotofilament contacts and also enables easy access of the dinitroanilines into their binding site via the luminal surface of the MT. However, when3 is bound at the active site, the interactions between the drug and the H1- S2 loop stabilize it in its closed conformation, giving a lower overall Rg. The waters in the binding pocket form a solvent network that links the nitro group oxygen atoms of3 with Asp47 and Arg64 ofR-tubulin. Hydrogen bonding is also observed between the sulfonamide oxygen of3 and Cys41. In addition, the electrostatic and van der Waals interactions between the aromatic ring at theN1nitrogen of3 and theR-tubulin residues His28, Met36, and Asp39-Ile42 keep the H1-S2 loop closer to the body of the drug-bound protein (Figure 5). Design of Parasitic Microtubule Inhibitors. The analogues of 3 with hydroxymethyl (4; ∆G ) -11.52 kcal/mol), hydroxy- ethyl (5; ∆G ) -12.34 kcal/mol), or aminomethyl (6; ∆G ) -12.10 kcal/mol) moieties at theN4 position exhibited weaker binding affinities compared to that of3 (∆G ) -13.43 kcal/ mol), indicating that the presence of polar end groups and alkyl chains shorter than three carbon atoms at theN4 nitrogen were not favored in the binding interactions. This is most likely due to the loss of favorable hydrophobic contacts between the dialkyl chains and the adjacent hydrophobic residues. This result agrees with the experimental observation that an increase in the length of the dialkyl chain atN4 nitrogen improves anti-Leishmania activity.12 The aromatic nitro group of the sulfonamide has been implicated with host toxicity.25 Our docking studies indicate that the replacement of both nitro groups by either carboxyl (7; ∆G ) -13.97 kcal/mol) or methyl ketone (8; ∆G ) -14.54 kcal/mol) moieties leads to stronger binding toLeishmania R-tubulin compared to that of3. The bound conformation of the drug indicates that a methyl ketone group would enhance hydrophobic contacts withR-tubulin residues Trp21, Phe24, and Phe244. In contrast, the dicyano analogue (9; ∆G ) -13.47 kcal/mol) has a similar binding affinity to3 and has been found experimentally to be as equally active as3 against both Leishmaniatubulin assembly and parasite growth (Werbovetz, K. Personal communication). On the basis of free energy perturbation calculations, dinitroaniline10has a 1 kcal/mol more favorable binding free energy compared to that of3, indicating that a hydroxyl group is favored at the meta position of the phenyl ring at theN1 nitrogen of the sulfonamide (see Supporting Information for details). Discussion In this work, MD simulations and molecular docking studies have been used to identify a single high-affinity binding site for 1, 2, and3 on Toxoplasma gondii, Plasmodium falciparum and Leishmania majorR-tubulin. The binding site is located beneath the H1-S2 loop and includes residues on helix H7 and on the T7 loop. Residues 52-62 in the H1-S2 loop of bothR- andâ-tubulin play an important role in stabilizing the MT. As observed in MD simulations of a MT fragment, these residues participate in noncovalent interactions with the M-loop residues 275-287 of the monomers in the neighboring protofilament (Mitra, A.; Sept, D. Unpublished data; Figure 6). The confor- mational stability of helix H7 is also important because it contacts both the GTPase domain and the activation domain.26 Figure 3. (A) Root-mean-square fluctuations of CR residues of Leishmaniaapo and holo (bound with3) structures. (B) Difference between the apo and holo rmsf plots. Residues having CR rmsf difference of at least 2σ from the mean (dotted line) are labeled. Figure 4. Radius of gyration ofR-tubulin binding site residues within 5 Å of the drug for apo and holo (bound with3) structures. Figure 5. Movement of the H1-S2 loop (Ser38-Cys65) ofR-tubu- lin-3 complex (in blue) in the closed conformation relative to that in the open conformation observed for the apo structure (in pink). The binding site residues within 5 Å of thedrug are labeled. The data shown is at 8.5 ns of the MD trajectory. Dinitroanilines Interact with ParasiticR-Tubulin Journal of Medicinal Chemistry, 2006, Vol. 49, No. 175229 Helix H7 is also connected to loop T7, which, along with helix H8, is involved in interdimer longitudinal contacts. Our results demonstrate that dinitroanilines lower the flexibility of the H1- S2 loop, and through its direct interaction with the ligand, the H1-S2 loop is pulled closer to the body of theR-tubulin, displacing it from the M-loop of theR-tubulin subunit in the neighboring protofilament. This result may in fact be com- pounded by the change in the flexibility of the M-loop that we observed. The combination of these two effects could lead to the disruption of lateral protofilament contacts and eventually cause MT disassembly. Recent studies indicate that a fewToxoplasmaR-tubulin mutations associated with resistance to1 are located on helix 7 and its immediate proximity, on the H1-S2 loop and on the â-sheet formed by strands S1, S4, and S5.16 Six of these 23 mutations (His28Gln, Leu136Phe, Ile235Val, Thr239Ile, Arg243Cys, and Arg243Ser) are of residues that have direct interactions with the ligand in the identified binding site. Furthermore, the Thr239Ile point mutation, which confers the highest resistance to1 (5 µM), is located within 3 Å of the nitro group oxygen. The effect of the other mutants appears to be primarily allosteric in nature. This premise is supported by the fact that although the Ser165Ala mutation resulted in 2µM resistance to1 and the Ser165Ala/Thr239Ile double mutant exhibited>90 µM resistance,16 Ser165 does not form a part of the dinitroaniline binding pocket and is located on strand S5 more than 10 Å away from Thr239. Interestingly, although dinitroanilines disrupt plant and protozoan MTs, they are ineffective against mammalian and fungal tubulin. It would seem logical that some insight into this selectivity could be obtained by simple sequence alignment (Supporting Information); however, mapping the conserved plant and protozoan residues on theR-tubulin structure shows that these residues mainly cluster around the interior of the tubulin molecule and do not distinctly define a binding site. Most notably, the binding site residues identified in this study are not specific to plant and protozoan tubulin, but many are in fact conserved in fungi and mammals as well. Just as we observe long-range allosteric effects due to dinitroaniline binding, it seems evident that these nonlocal sequence differences must likewise affect the conformation, dynamics, and/or access to the H1-S2 binding site. This hypothesis is supported by the fact that similar dynamics and docking studies using bovine R-tubulin resulted in a 50-fold weaker binding affinity and no consensus binding site, in agreement with experimental find- ings.16 Recently, Blume and co-workers27 analyzed the change in the surface electrostatic potential between wild-typeEl usine indica R-tubulin and the corresponding resistant Thr239Ile R-tubulin point mutant and proposed a binding site including residues Arg239, Arg243, Asp251, Val252, and Asn253 (and partly Cys4, His8, Leu136, and Phe138). Although some residues, namely, Cys4, Leu136, Thr239, and Arg243, of their active site overlap with the those determined in this study and the binding sites are in close proximity, the two predictions are distinct. In their study, the dinitroanilines bind at the dimer interface, whereas in this work, the active site is directly behind the H1-S2 loop and primarily destabilizes the lateral contacts between protofilaments, along with disrupting longitudinal interdimer contacts, leading to microtubule disassembly. One shortfall of current docking methodologies is that the response of the macromolecule to the bound ligand is not taken into account. In this case, we have overcome this difficulty by performing MD simulations of theR-tubulin-3 complex, allowing the protein and the drug to respond to each other. This results in a snug fit of the aromatic ring at theN1 nitrogen of the compound with theR-tubulin residues Phe24, His28, Met36, Asp39-Ile42, and Ser241-Asp245 that form the entrance of the binding pocket. The aromatic ring at theN1 nitrogen and the His28 imidazole ring also interact through ring-stacking interactions. Hydrogen bonding is also observed between the N1nitrogen of the sulfonamide and Ala240, as well as between the sulfonamide oxygen and Cys41 (on the H1-S2 loop). The nitro group oxygen atoms have electrostatic interactions with Glu3 (on strand S1), Thr239 (on helix H7), and Arg243 (on T7 loop), whereas hydrogen bonding is observed between the nitro group oxygen and Ala4. Interestingly, the second nitro group oxygen has no direct interactions withR-tubulin, but is linked by the solvent network to the salt-bridge between Asp47 and Arg64. Finally, unfavorable steric interactions between the ligand and the protein may occur if the hydrogen atoms of the phenyl ring are substituted either by a polar group larger than a hydroxyl group or by alkyl side chains, consistent with experimental observations.12 It is hopeful that these findings provide some insight for future SAR based design of dinitro- aniline analogues. Acknowledgment. We thank Naomi Morrissette and Karl Werbovetz for experimental data, many useful discussions, and comments on the manuscript. We acknowledge financial support from the National Institutes of Health Grant R01-AI-062021. Supporting Information Available: CompleteR-tubulin se- quence alignment and details of the free energy perturbation calculations. This material is available free of charge via the Internet at http://pubs.acs.org. References (1) Hugdahl, J. D.; Morejohn, L. C. Rapid and reversible high-affinity binding of the dinitroaniline herbicide oryzalin to tubulin from Zea mays L.Plant Physiol.1993, 102, 725-740. (2) Morejohn, L. C.; Bureau, T. E.; Mole-Bajer, J.; Bajer, A. S.; Fosket, D. E. Oryzalin, a dinitroaniline herbicide, binds to plant tubulin and inhibits microtubule polymerization in vitro.Planta1987, 172, 252- 264. Figure 6. Location of the dinitroaniline binding site within the microtubule lattice. This is a view from the inside of the microtubule and suggests that dinitroanilines may disrupt the interaction of the H1- S2 and M loops, leading to microtubule instability. 5230 Journal of Medicinal Chemistry, 2006, Vol. 49, No. 17 Mitra and Sept (3) Murthy, J. V.; Kim, H. H.; Hanesworth, V. R.; Hugdahl, J. D.; Morejohn, L. C. Competitive-inhibition of high-affinity oryzalin binding to plant tubulin by the phosphoric amide herbicide amipro- phos-methyl.Plant Physiol.1994, 105, 309-320. (4) Bogitsh, B. J.; Middleton, O. L.; Ribeiro-Rodrigues, R. Effects of the antitubulin drug trifluralin on the proliferation and metacyclo- genesis ofTrypanosoma cruziepimastigotes.Parasitol. Res.1999, 85, 475-480. (5) Traub-Cseko, Y. M.; Ramalho-Ortigao, J. M.; Dantas, A. P.; de Castro, S. L.; Barbosa, H. S.; Downing, K. H. Dinitroaniline herbicides against protozoan parasites: the case ofTrypanosoma cruzi. Trends Parasitol.2001, 17, 136-141. (6) Bhattacharya, G.; Salem, M. M.; Werbovetz, K. A. Antileishmanial dinitroaniline sulfonamides with activity against parasite tubulin. Bioorg Med. Chem. Lett.2002, 12, 2395-2398. (7) Chan, M. M.; Fong, D. Inhibition of leishmanias but not host macrophages by the antitubulin herbicide trifluralin.Science1990, 249, 924-926. (8) Chan, M. M.; Tzeng, J.; Emge, T. J.; Ho, C. T.; Fong, D. Structure- function analysis of antimicrotubule dinitroanilines against promas- tigotes of the parasitic protozoan Leishmania mexicana.A timicrob. Agents Chemother.1993, 37, 1909-1913. (9) Nath, J.; Schneider, I. Antimalarial effects of the antitubulin herbicide trifluralin: studies in Plasmodium-falciparum.Clin. Res.1992, 40, A331-A331. (10) Stokkermans, T. J.; Schwartzman, J. D.; Keenan, K.; Morrissette, N. S.; Tilney, L. G.; Roos, D. S. Inhibition of Toxoplasma gondii replication by dinitroaniline herbicides.Exp. Parasitol.1996, 84, 355-370. (11) Chan, M. M.; Triemer, R. E.; Fong, D. Effect of the anti-microtubule drug oryzalin on growth and differentiation of the parasitic protozoan Leishmania mexicana.Differentiation1991, 46, 15-21. (12) Bhattacharya, G.; Herman, J.; Delfin, D.; Salem, M. M.; Barszcz, T.; Mollet, M.; Riccio, G.; Brun, R.; Werbovetz, K. A. Synthesis and antitubulin activity of N1- and N4-substituted 3,5-dinitro sulfanilamides against African trypanosomes and Leishmania.J. Med. Chem.2004, 47, 1823-1832. (13) Werbovetz, K. A.; Sackett, D. L.; Delfin, D.; Bhattacharya, G.; Salem, M.; Obrzut, T.; Rattendi, D.; Bacchi, C. Selective antimicrotubule activity of N1-phenyl-3,5-dinitro-N4,N4-di-n-propylsulfanilamide (GB-II-5) against kinetoplastid parasites.Mol. Pharmacol.2003, 64, 1325-1333. (14) Fowler, R. E.; Fookes, R. E.; Lavin, F.; Bannister, L. H.; Mitchell, G. H. Microtubules in Plasmodium falciparum merozoites and their importance for invasion of erythrocytes.Parasitology1998, 117, 425-433. (15) Kaidoh, T.; Nath, J.; Fujioka, H.; Okoye, V.; Aikawa, M. Effect and localization of trifluralin in Plasmodium falciparum gametocytes: an electron microscopic study.J. Eukaryot. Microbiol.1995, 42, 61- 64. (16) Morrissette, N. S.; Mitra, A.; Sept, D.; Sibley, L. D. Dinitroanilines bind alpha-tubulin to disrupt microtubules.Mol. Biol. Cell2004, 15, 1960-1968. (17) Vriend, G. WHAT IF: a molecular modeling and drug design program.J. Mol. Graph.1990, 8, 52-56. (18) Kale, L.; Skeel, R.; Bhandarkar, M.; Brunner, R.; Gursoy, A.; Krawetz, N.; Phillips, J.; Shinozaki, A.; Varadarajan, K.; Schulten, K. NAMD2: Greater scalability for parallel molecular dynamics.J. Comput. Phys.1999, 151, 283-312. (19) Morris, G. M.; Goodsell, D. S.; Halliday, R. S.; Huey, R.; Hart, W. E.; Belew, R. K.; Olson, A. J. Automated docking using a Lamarckian genetic algorithm and empirical binding free energy function.J. Comput. Chem.1998, 19, 1639-1662. (20) Mitra, A.; Sept, D. Localization of the antimitotic peptide and depsipeptide binding site on beta-tubulin.Biochemistry2004, 43, 13955-13962. (21) Allinger, N. L.; Yuh, Y. H.; Lii, J. H. Molecular mechanics: the Mm3 force-field for hydrocarbons 0.1.J Am. Chem. Soc.1989, 111, 8551-8566. (22) MacKerell, A. D.; Bashford, D.; Bellott, M.; Dunbrack, R. L.; Evanseck, J. D.; Field, M. J.; Fischer, S.; Gao, J.; Guo, H.; Ha, S.; Joseph-McCarthy, D.; Kuchnir, L.; Kuczera, K.; Lau, F. T. K.; Mattos, C.; Michnick, S.; Ngo, T.; Nguyen, D. T.; Prodhom, B.; Reiher, W. E.; Roux, B.; Schlenkrich, M.; Smith, J. C.; Stote, R.; Straub, J.; Watanabe, M.; Wiorkiewicz-Kuczera, J.; Yin, D.; Karplus, M. All-atom empirical potential for molecular modeling and dynamics studies of proteins.J. Phys. Chem. B1998, 102, 3586-3616. (23) Feller, S. E.; MacKerell, A. D. An improved empirical potential energy function for molecular simulations of phospholipids.J. Phys. Chem. B2000, 104, 7510-7515. (24) Pearson, W. R. Rapid and sensitive sequence comparison with Fastp and Fasta.Methods Enzymol.1990, 183, 63-98. (25) Berman, J. D. Human leishmaniasis: clinical, diagnostic, and chemotherapeutic developments in the last 10 years.Clin. Infect. Dis. 1997, 24, 684-703. (26) Amos, L. A. Microtubule structure and its stabilisation.Org. Biomol. Chem.2004, 2, 2153-2160. (27) Blume, Y. B.; Nyporko, A. Y.; Yemets, A. I.; Baird, W. V. Structural modeling of the interaction of plant alpha-tubulin with dinitroaniline and phosphoroamidate herbicides.Cell. Biol. Int. 2003, 27, 171- 174. (28) Humphrey, W.; Dalke, A.; Schulten, K. VMD: Visual molecular dynamics.J. Mol. Graphics1996, 14, 33-38. JM060472+ Dinitroanilines Interact with ParasiticR-Tubulin Journal of Medicinal Chemistry, 2006, Vol. 49, No. 175231 Scalable Molecular Dynamics with NAMD JAMES C. PHILLIPS,1 ROSEMARY BRAUN,1 WEI WANG,1 JAMES GUMBART,1 EMAD TAJKHORSHID,1 ELIZABETH VILLA,1 CHRISTOPHE CHIPOT,2 ROBERT D. SKEEL,3 LAXMIKANT KALÉ,3 KLAUS SCHULTEN1 1Beckman Institute, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801 2UMR CNRS/UHP 7565, Université Henri Poincaré, 54506 Vandœuvre-lès-Nancy, Cedex, France 3Department of Computer Science and Beckman Institute, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801 Received 14 December 2004; Accepted 26 May 2005 DOI 10.1002/jcc.20289 Published online in Wiley InterScience (www.interscience.wiley.com). Abstract: NAMD is a parallel molecular dynamics code designed for high-performance simulation of large biomo- lecular systems. NAMD scales to hundreds of processors on high-end parallel platforms, as well as tens of processors on low-cost commodity clusters, and also runs on individual desktop and laptop computers. NAMD works with AMBER and CHARMM potential functions, parameters, and file formats. This article, directed to novices as well as experts, first introduces concepts and methods used in the NAMD program, describing the classical molecular dynamics force field, equations of motion, and integration methods along with the efficient electrostatics evaluation algorithms employed and temperature and pressure controls used. Features for steering the simulation across barriers and for calculating both alchemical and conformational free energy differences are presented. The motivations for and a roadmap to the internal design of NAMD, implemented in C and based on Charm parallel objects, are outlined. The factors affecting the serial and parallel performance of a simulation are discussed. Finally, typical NAMD use is illustrated with representative applications to a small, a medium, and a large biomolecular system, highlighting particular features of NAMD, for example, the Tcl scripting language. The article also provides a list of the key features of NAMD and discusses the benefits of combining NAMD with the molecular graphics/sequence analysis software VMD and the grid computing/collaboratory software BioCoRE. NAMD is distributed free of charge with source code at www.ks.uiuc.edu. © 2005 Wiley Periodicals, Inc. J Comput Chem 26: 1781-1802, 2005 Key words: biomolecular simulation; molecular dynamics; parallel computing Introduction The life sciences enjoy today an ever-increasing wealth of infor- mation on the molecular foundation of living cells as new se- quences and atomic resolution structures are deposited in data- bases. Although originally the domain of experts, sequence data can now be mined with relative ease through advanced bioinfor- matics and analysis tools. Likewise, the molecular dynamics pro- gram NAMD,1 together with its sister molecular graphics program VMD,2 seeks to bring easy-to-use tools that mine structure infor- mation to all who may benefit, from the computational expert to the laboratory experimentalist. The increase in our knowledge of structures is not as dramatic as that of sequences, yet the number of newly deposited structures reached a record 5000 last year. For example, the structures of pharmacologically crucial membrane proteins, which were essen- tially unknown 10 years ago, are being resolved today at a rapid pace. Structures yield static information that can be viewed with molecular graphics software such as VMD. However, structures also hold the key to dynamic information that leads to understand- ing function and mechanism, intellectual guideposts for basic medical research. With NAMD we want to simplify access to dynamic information extrapolated from structures and provide a molecular modeling tool that can be used productively by a wide Correspondence to: K. Schulten; e-mail: kschulte@ks.uiuc.edu Contract/grant sponsor: National Institutes of Health; contract/grant number: NIH P41 RR05969 Contract/grant sponsor: Pittsburgh Supercomputer Center and the National Center for Supercomputing Applications (for supercomputer time); contract/grant number: National Resources Allocation Committee MCA93S028. © 2005 Wiley Periodicals, Inc. group of biomedical researchers, including particularly experimen- talists. On a molecular scale, the fundamental processing units in the living cell are often huge in size and function in an even larger complex environment. Striking progress has been achieved in characterizing the immense machines of the cell, such as the ribosome, at the atomic level. Advances in biomed- icine demand tools to model these machines to understand their function and their role in maintaining the health of cells. Ac- cordingly, the purpose of NAMD is to enable high-performance classical simulation of biomolecules in realistic environments of 100,000 atoms or more. The progress made in this regard is illustrated in Figure 1. A decade ago in its first release,3,4 NAMD permitted simulation of a protein-DNA complex en- compassing 36,000 atoms,5 one of the largest simulations car- ried out at the time. The most recent release permitted the simulation of a protein-DNA complex of 314,000 atoms.6 To probe the behavior of this 10-fold larger system, the simulated period actually increased 100-fold as well. The following article on NAMD is directed to novices and experts alike. It explains the concepts and algorithms underlying modern molecular dynamics (MD) simulations as realized in NAMD, for example, the efficient numerical integration of the Newtonian equations of motion, the use of statistical mechanics methods for simulations that control temperature and pressure, the efficient evaluation of electrostatic forces through the particle- mesh Ewald method, the use of steered and interactive MD to manipulate and probe biomolecular systems and to speed up reac- tion processes, and the calculation of alchemical and conforma- tional free energy differences. The article describes the design of NAMD and the motivation behind the design, that is, to permit continuous software develop- ment in view of ever-changing technologies, to utilize parallel computers of any size effectively via message driven computation, to run well on platforms from laptops and desktops-where NAMD is actually used most-to parallel computers with thou- sands of processors. The article also demonstrates how the user can readily extend NAMD through the Tcl scripting language and elaborates on the strengths of NAMD in steered and interactive MD. To demonstrate the broad utility of NAMD, the article intro- duces three typical applications as they are encountered in modern research, involving a small, an intermediate, and a large biomo- lecular system. We emphasize which features of NAMD were used and which were most helpful in completing the three modeling projects expeditiously. We also refer readers to tutorial material (available at http://www.ks.uiuc.edu/Training/Tutorials/) that has been proven helpful in NAMD training workshops and university courses. In fact, the first application presented below ubiquitin stems directly from the introductory NAMD tutorial. Other tuto- rials-for which a laptop provides sufficient computational power-introduce steered MD and simulations of membrane chan- nels, as well as the use of VMD in trajectory analysis. Finally, a conclusion section summarizes the features of NAMD that have proven to be most relevant to nonexpert as well as expert users and describes the great benefits that NAMD gains from its close link to the widely used molecular graphics program VMD, which permits model building as well as trajectory analysis. The integration of NAMD into the grid software BioCoRE is also mentioned. Molecular Dynamics Concepts and Algorithms In this section we outline concepts and algorithms of classical MD simulations. In these simulations the atoms of a biopolymer move according to the Newtonian equations of motion mr̈ r Utotalr1, r2, . . . , rN, 1, 2 . . . N, (1) where m is the mass of atom , r is its position, and Utotal is the total potential energy that depends on all atomic positions and, thereby, couples the motion of atoms. The potential energy, rep- resented through the MD “force field,” is the most crucial part of the simulation because it must faithfully represent the interaction between atoms, yet be cast in the form of a simple mathematical function that can be calculated quickly. Below we introduce first the functional form of the force field utilized in NAMD. We then comment on the special problem of calculating the Coulomb potential and forces efficiently. The nu- merical integration of (1) is then explained, followed by an outline of simulation strategies for controlling temperature and pressure. In the case of such simulations, frictional and fluctuating forces are added to (1) following the principles of nonequilibrium statistical mechanics. Finally, we describe how external, user-defined forces are added to simulations to manipulate and probe biomolecular systems. Force Field Functions For an all-atom MD simulation, one assumes that every atom experiences a force specified by a model force field accounting for Figure 1. Growth in practical simulation size illustrated by compar- ison of estrogen receptor DNA-binding domain simulation of ref. 5, left, 36,000 atoms simulated over 100 ps, published in 1997, with multiscale LacI-DNA complex simulation of ref. 6, right, 314,000 atoms simulated over 10 ns, published in 2005 and described in the exemplary applications section. [Color figure can be viewed in the online issue, which is available at www.interscience.wiley.com.] 1782 Phillips et al. • Vol. 26, No. 16 • Journal of Computational Chemistry the interaction of that atom with the rest of the system. Today, such force fields present a good compromise between accuracy and computational efficiency. NAMD employs a common potential energy function that has the following contributions: Utotal Ubond Uangle Udihedral UvdW UCoulomb. (2) The first three terms describe the stretching, bending, and torsional bonded interactions, Ubond bonds i ki bondri r0i 2, (3) Uangle angles i ki anglei 0i 2, (4) Udihedral dihedral i kidihe1 cosnii i, ni 0kidihe0i i2 n 0, (5) where bonds counts each covalent bond in the system, angles are the angles between each pair of covalent bonds sharing a single atom at the vertex, and dihedral describes atom pairs separated by exactly three covalent bonds with the central bond subject to the torsion angle (Fig. 2). An “improper” dihedral term governing the geometry of four planar, covalently bonded atoms is also included as depicted in Figure 2. The final two terms in eq. (2) describe interactions between nonbonded atom pairs: UvdW i ji 4 ij ijrij 12 ijrij 6, (6) UCoulomb i ji qiqj 4 0rij , (7) which correspond to the van der Waal’s forces (approximated by a Lennard-Jones 6-12 potential) and electrostatic interactions, respectively. In addition to the intrinsic potential described by the force field, NAMD also provides the user with the ability to apply external forces to the system. These forces may be used to guide the system into configurations of interest, as in steered and interactive MD (described below). For every particle in a given context of bonds, the parameters ki bond, r0i, etc., for the interactions given in eqs. (3)-(5) are laid out in force field parameter files. The determination of these parame- ters is a significant undertaking generally accomplished through a combination of empirical techniques and quantum mechanical calculations;7-9 the force field is then tested for fidelity in repro- ducing the structural, dynamic, and thermodynamic properties of small molecules that have been well-characterized experimentally, as well as for fidelity in reproducing bulk properties. NAMD is able to use the parameterizations from both CHARMM7 and AMBER10 force field specifications. To avoid surface effects at the boundary of the simulated system, periodic boundary conditions are often used in MD sim- ulations; the particles are enclosed in a cell that is replicated to infinity by periodic translations. A particle that leaves the cell on one side is replaced by a copy entering the cell on the opposite side, and each particle is subject to the potential from all other particles in the system including images in the surrounding cells, thus entirely eliminating surface effects (but not finite-size effects). Because every cell is an identical copy of all the others, all the image particles move together and need only be represented once inside the molecular dynamics code. However, because the van der Waals and electrostatic interac- tions [eqs. (6) and (7)] exist between every nonbonded pair of atoms in the system (including those in neighboring cells) com- puting the long-range interaction exactly is unfeasible. To perform this computation in NAMD, the van der Waals interaction is spatially truncated at a user-specified cutoff distance. For a simu- lation using periodic boundary conditions, the system periodicity is exploited to compute the full (nontruncated) electrostatic interac- tion with minimal additional cost using the particle-mesh Ewald (PME) method described in the next section. Full Electrostatic Computation Ewald summation11 is a description of the long-range electrostatic interactions for a spatially limited system with periodic boundary Figure 2. Internal coordinates for bonded interactions: r governs bond stretching; represents the bond angle term; gives the dihedral angle; the small out-of-plane angle is governed by the so-called “improper” dihedral angle . Scalable Molecular Dynamics with NAMD 1783 conditions. The infinite sum of charge-charge interactions for a charge-neutral system is conditionally convergent, meaning that the result of the summation depends on the order in which it is taken. Ewald summation specifies the order as follows:12 sum over each box first, then sum over spheres of boxes of increasingly larger radii. Ewald summation is considered more reliable than a cutoff scheme,13-15 although it is noted that the artificial period- icity can lead to bias in free energy,16,17 and can artificially stabilize a protein that should have unfolded quickly.18 Dropping the prefactor 1/4 , the Ewald sum involves the following terms: EEwald Edir Erec Eself Esurface, (8) Edir 1 2 i, j1 N qiqj n r erfcri rj nr ri rj nr i, jExcluded qiqj ri rj ij , (9) Erec 1 2 V m 0 exp 2m 2/2 m 2 i1 N qiexp(2 im ri)2, (10) Eself i1 N qi 2, (11) Esurface 2 2 s 1V i1 N qiri2. (12) Here, qi and ri are the charge and position of atom i, respectively, and n r is the lattice vector. For an arbitrary simulation box defined by three independent base vectors a1, a2, a3, one defines n r n1a1 n2a2 n3a3, where n1, n2, and n3 are integers. ¥ denotes a summation over n r that excludes the n r 0 term in the case i j; “excluded” denotes the set of atom pairs whose direct electrostatic interaction should be excluded. ij denotes the lattice vector for the (i, j) pair that minimizes ri rj ij. is a parameter adjusting the workload distribution for direct and recip- rocal sums. s is the dielectric constant of the surroundings of the simulation box, which is water for most biomolecular systems. V is the volume of the simulation box. m is the reciprocal vector defined as m m1b1 m2b2 m3b3, where m1, m2, m3 are integers, and the reciprocal base vectors b1, b2, b3 are defined so that a b , , 1, 2, 3. (13) The complementary error function erfc( x) in eq. (9) is erfcx 2 x et 2 dt. (14) The Ewald sum in eq. (8) has four terms: direct sum Edir, reciprocal sum Erec, self-energy Eself, and surface energy Esurface. The self-energy term is a trivial constant, while the surface term is usually neglected by assuming the “tin-foil” boundary condition s , which is partly due to the dielectric constant of water ( s 80) being much larger than 1. The Ewald sum has a freely chosen parameter , which can shift the computational load be- tween the direct sum and the reciprocal sum. For a given accuracy, the computationally optimal choice of the parameter leads to a cost proportional to N3/ 2 in the standard computation, where N is the number of charges in the system.19 The particle-mesh Ewald (PME)20 method is a fast numerical method to compute the Ewald sum. NAMD uses the smooth PME (SPME)21 method for full electrostatic computations. The cost of PME is proportional to N log N and the time reduction is signif- icant even for a small system of several hundred atoms. In PME, the parameter is chosen so that the major work load is put into the reciprocal sum while the direct sum is computed by a cost proportional to N only. The reciprocal sum is then computed via fast Fourier transform (FFT) after an approximation is made to delegate the computation to a grid scheme. For this purpose, PME uses an interpolation scheme to distribute the charges, sitting anywhere in real space, to the nodes of a uniform grid as illustrated in Figure 3. SPME uses B-spline functions as the basis functions for the interpolation. The continuity of B-spline functions and their derivatives makes the analytical expression of forces easy to obtain, and reduces the number of FFTs by half compared to the original PME method. In SPME, approximations are made to the potential only; the force is still the exact derivative of the approx- imated potential. The strict conservation of energy resulting from the computed force is crucial and is strongly assisted by maintain- ing the symplecticness of the integrator,22 as discussed further below. The PME method can be adopted to compute the electrostatic potential in real space and has been implemented in VMD (see Fig. 4). The feature has been used recently in a ground-breaking sim- ulation to compute transmembrane electrostatic potentials aver- aged over an MD trajectory.23 This feature makes it possible today to compute average electrostatic potentials from first principles and to replace previously used heuristic potentials like those de- rived from Poisson-Boltzmann theory. The PME method does not conserve energy and momentum simultaneously, but neither does the particle-particle/particle-mesh method24 or the multilevel summation method.25,26 Momentum conservation can be enforced by subtracting the net force from the reciprocal sum computation, albeit at the cost of a small long-time energy drift. Numerical Integration Biomolecular simulations often require millions of time steps. Furthermore, biological systems are chaotic; trajectories start- ing from slightly different initial conditions diverge exponen- tially fast and after a few picoseconds are completely uncorre- lated. However, highly accurate trajectories is not normally a goal for biomolecular simulations; more important is a proper sampling of phase space. Therefore, for constant energy (NVE ensemble) simulations, the key features of an integrator are not 1784 Phillips et al. • Vol. 26, No. 16 • Journal of Computational Chemistry only how accurate it is locally, but also how efficient it is, and how well it preserves the fundamental dynamical properties, such as energy, momentum, time-reversibility, and symplectic- ness. The time evolution of a strict Hamiltonian system is symplec- tic.22 A consequence of this is the conservation of phase space volume along the trajectory, that is, the enforcement of the Li- ouville theorem (ref. 27, p. 69). To a large extent, the trajectories computed by numerical integrators observing symplecticness rep- resent the solution of a closely related problem that is still Ham- iltonian.28 Because of this, the errors, unavoidably generated by an integrator at each time step, accumulate imperceptibly slowly, resulting in a very small long-time energy drift, if there is any at all (ref. 22, theorem IX.8.1). Artificial measures to conserve en- ergy, for example, scaling the velocity at each time step so that the total energy is constant, lead to biased phase space sampling of the constant energy surface;29 in contrast, there has been no evidence that symplectic integrators have this problem. A simple example demonstrates the merit of a symplectic integrator. For this purpose, the one-dimensional harmonic oscil- lator problem has been numerically integrated, the resulting tra- jectory being shown in Figure 5. We note that the comparison is “unfair” to the symplectic method with respect to both accuracy ( t2 local error for the symplectic method vs. t5 for the Runge- Kutta method, where t is the time step), and computational effort (single force evaluation per time step for the symplectic method vs. four force evaluations for the Runge-Kutta method). Nevertheless, the symplectic method shows superior long-time stability. NAMD uses the Verlet method (ref. 31, section 4.2.3) for NVE ensemble simulations. The “velocity-Verlet” method obtains the position and velocity at the next time step (rn1, vn1) from the current one (rn, vn), assuming the force Fn F(rn) is already computed, in the following way: “half-kick” vn1/ 2 vn M1Fn t/2, “drift” rn1 rn vn1/ 2 t, “compute force” Fn1 Frn1, “half-kick” vn1 vn1/ 2 M1Fn1 t/2. Here, M is the mass. The Verlet method is symplectic and time reversible, conserves linear and angular momentum, and requires only one force evaluation for each time step. For a fixed time period, the method exhibits a (global) error proportional to t2. More accurate (higher order) methods are desirable if they can increase the time step per force evaluation. Higher order Runge- Kutta type methods, symplectic or not, are not suitable for biomo- lecular simulations because they require several force evaluations for each time step and force evaluation is by far the most time- consuming task in molecular dynamics simulations. Gear type predictor-corrector methods,32 or linear multistep methods in gen- eral, are not symplectic (ref. 22, theorem XIV.3.1). No symplectic method has been found as yet that is both more accurate than the Verlet method and as practical for biomolecular simulations. NAMD employs a multiple-time-stepping13,33,34 method to improve integration efficiency. Because the biomolecular interac- tions collected in eq. (2) generally give rise to several different time scales characteristic for biomolecular dynamics, it is natural to compute the slower-varying forces less frequently than faster varying ones in molecular dynamics simulations. This idea is implemented in NAMD by three levels of integration loops. The inner loop uses only bonded forces to advance the system, the middle loop uses Lennard-Jones and short-range electrostatic forces, and the outer loop uses long-range electrostatic forces. We note that the method implemented in NAMD is symplectic and time reversible. The longest time step for the multiple time-stepping method is limited by resonance.35 When good energy conservation is needed for NVE ensemble simulations we recommend choosing 2 fs, 2 fs, and 4 fs as the inner, middle, and outer time steps if rigid bonds to hydrogen atoms are used; or 1 fs, 1 fs, and 3 fs if bonds to Figure 3. In PME, a charge (denoted by an empty circle with label “q” in the figure) is distributed over grid (here a mesh in two dimen- sions) points with weighting functions chosen according to the dis- tance of the respective grid points to the location of the charge. Positioning all charges on a grid enables the application of the FFT method and significantly reduces the computation time. In real appli- cations, the grid is three-dimensional. Figure 4. Smoothed electrostatic potential of decalanine in vacuum as calculated with the PME plugin of VMD. Atoms are colored by charge (blue is positive, red is negative). The helix dipole is clearly visible from the two potential isosurfaces 20kBT/e (blue, left lobe) and 20kBT/e (red, right lobe). [Color figure can be viewed in the online issue, which is available at www.interscience.wiley.com.] Scalable Molecular Dynamics with NAMD 1785 hydrogen are flexible.36 More aggressive time steps may be used for NVT or NPT ensemble simulations, for example, 2 fs, 2 fs, and 6 fs with rigid bonds and 1 fs, 2 fs, and 4 fs without. Using multiple time stepping can increase computational effi- ciency by a factor of 2. NVT and NPT Ensemble Simulations A fundamental requirement for an integrator is to generate the correct ensemble distribution for the specified temperature and pressure in an appropriate way. For this purpose the Newtonian equations of motion (1) should be modified “mildly” so that the computed short-time trajectory can still be interpreted in a con- ventional way. To generate the correct ensemble distribution, the system is coupled to a reservoir, with the coupling being either deterministic or stochastic. Deterministic couplings generally have some conserved quantities (similar to total energy), the monitoring of which can provide some confidence in the simulation. NAMD uses a stochastic coupling approach because it is easier to imple- ment and the friction terms tend to enhance the dynamical stability. The (stochastic) Langevin equation37 is used in NAMD to generate the Boltzmann distribution for canonical (NVT) ensemble simulations. The generic Langevin equation is Mv̇ Fr v 2kBTM Rt, (15) where M is the mass, v ṙ is the velocity, F is the force, r is the position, is the friction coefficient, kB is the Boltzmann constant, T is the temperature, and R(t) is a univariate Gaussian random process. Coupling to the reservoir is modeled by adding the fluc- tuating (the last term) and dissipative (v term) forces to the Newtonian equations of motion (1). To integrate the Langevin equation, NAMD uses the Brünger-Brooks-Karplus (BBK) meth- od,38 a natural extension of the Verlet method for the Langevin equation. The position recurrence relation of the BBK method is rn1 rn 1 t/ 2 1 t/ 2 rn rn1 1 1 t/ 2 t2M1F(rn) 2kBT M Zn , (16) where Zn is a set of Gaussian random variables of zero mean and variance 1. The BBK integrator requires only one random number for each degree of freedom. The steady-state distribution generated by the BBK method has an error proportional to t2,39 although the error in the time correlation function can have an error pro- portional to t.40 For stochastic equations of motion, position and velocity be- come random variables while the time evolution of the correspond- ing probability distribution function is described by the Fokker- Planck equation (ref. 37, section 2.4), a deterministic partial differential equation. The stochastic equations of motion are de- signed so that the time-independent solution to the Fokker-Planck equation is the targeted ensemble distribution. The relationship between the Langevin equation and the associated Fokker-Planck equation has been invoked, for example, in ref. 41. The theory behind deterministic coupling methods is similar, with the Li- ouville equation playing the pivotal role.42 For NPT ensemble simulations, one of the authors (J.P.) pro- posed a new set of equations of motion and implemented a nu- merical integrator in NAMD (ref. 43, pressure control section). It was inspired by the Langevin-piston method44 and Hoover’s meth- od45-47 for constant pressure simulations. A recent work proposed essentially the same set of equations [the “Langevin-Hoover” method given by eqs. (5a)-(5d) in ref. 48], and proved that they generate the correct ensemble distribution. The only difference between the two is the term 1 d/Nf in eq. (5b) and (5d) of ref. 48, where d is the dimension (generally 3), and Nf is the number of degrees of freedom. The term d/Nf is negligible for most purposes. Steered and Interactive Molecular Dynamics Biologically important events often involve transitions from one equilibrium state to another, such as the binding or dissociation of a ligand. However, processes involving the rare event of barrier crossing are difficult to reproduce on MD time scales, which today are still confined to the order of tens of nanoseconds. To address this issue, the application of external forces may be used to guide the system from one state to another, enhancing sampling along the pathway of interest. Recent applications of single-molecule exper- imental techniques (such as atomic force microscopy and optical tweezers) have enhanced our understanding of the mechanical properties of biopolymers. Steered molecular dynamics (SMD) is the in silico complement of such studies, in which external forces are applied to molecules in a simulation to probe their mechanical properties as well as to accelerate processes that are otherwise too slow to model. This method has been reviewed in refs. 49-51. Figure 5. A symplectic method demonstrates superior long-time sta- bility: the symplectic method used here is the symplectic Euler method ([22], Theorem 3.3), whose local error is proportional to t2; the nonsymplectic method used is the Runge-Kutta method ([30], sect. 8.3.3) whose local error is proportional to t5. The system described is a one-dimension harmonic oscillator. The equation of motion is mẍ m2x, with m 1, and initial conditions x 1, v 0. The exact trajectory is the unit circle. The chosen time step is t 0.5 and 10,000 steps were integrated. The trajectory of the Runge- Kutta method collapses toward the center while the symplectic Euler method maintains a stable orbit even though its trajectory is deformed into an ellipse by a larger local error. 1786 Phillips et al. • Vol. 26, No. 16 • Journal of Computational Chemistry With advances in available computer power, steering forces can also be applied interactively, instead of in batch mode; we call this technique Interactive Molecular Dynamics (IMD).52,53 External forces have been applied using NAMD in a variety of ways to a diverse set of systems, yielding new information about the me- chanics of proteins,54 for instance in refs. 6, 55-67 and other studies reviewed in ref. 51. We expect that most molecular dy- namics simulations in the future will be of the steered type. This expectation stems from an analogy to experimental biophysics: although many experiments provide unaltered images of biological systems, more experiments investigate systems through well-de- signed perturbations by physical or chemical means. SMD Steered MD may be carried out with either a constant force applied to an atom (or set of atoms) or by attaching a harmonic (spring- like) restraint to one or more atoms in the system and then varying either the stiffness of the restraint67 or the position of the re- straint68-70 to pull the atoms along. Other external forces or potentials can also be used, such as constant forces or torques applied to parts of the system to induce rotational motion of its domains. NAMD provides built-in facilities for applying a variety of external forces, including the automated application of moving constraints. In SMD, the direction of the applied force is chosen in advance, specified through a few simple lines in an NAMD con- figuration file. More flexible force schemes can be realized within NAMD through scripting. As a computational technique, SMD bears similarities to the method of umbrella sampling,71,72 which also seeks to improve the sampling of a particular degree of freedom in a biomolecular system; however, while umbrella sampling requires a series of equilibrium simulations, SMD simulations apply a constant or time-varying force that results in significant deviations from equi- librium. In consequence, the results of the SMD dynamics have to be analyzed from an explicitly nonequilibrium viewpoint.54 SMD also permits new types of simulations that are more naturally performed and understood as out-of-equilibrium processes. In constant-force SMD, the atoms to which the force is applied are subject to a fixed, constant force in addition to the force field potential. The affected atoms are specified through a flag in the molecular coordinates (PDB) file, and the force vector is specified in the NAMD configuration. Intermediates found through con- stant-force SMD simulations may be modeled using the theory of mean first passage times for a barrier-crossing event.73,74 Typical applied forces range from tens to a thousand picoNewtons (pN).75 Constant velocity SMD simulates the action of a moving AFM cantilever on a protein. An atom of the protein, or the center of mass of a group of atoms, is harmonically restrained to a point in space that is then shifted in a chosen direction at a predetermined constant velocity, forcing the restrained atoms to follow (Fig. 6). By default, the SMD harmonic restraint in NAMD only applies along the direction of motion of the restraint point, such that the atoms are unrestrained along orthogonal vectors; it is possible, however, to apply additional restraints. As with constant force SMD, the affected atoms are specified through a flag in the PDB file; the force constant of the restraint and the velocity of the restraint point are specified in the NAMD configuration file. For a group of atoms harmonically restrained with a force constant k moving with velocity v in the direction n , the additional potential Ur1, r2, . . . , t 1 2 kvt R t R 0 n 2 (17) is applied, where R (t) is the current center of mass of the SMD atoms and R 0 is their initial center of mass. n is a unit vector. In AFM experiments, the spring constants k of the cantilevers are typically of the order of 1 pN/Å, so that thermal fluctuations in the position of an attached ligand, (kBT/k) 1/ 2, are large on the atomic scale, for example, 6 Å. However, in SMD simulations stiffer springs (k 70 pN/Å) are employed, leading to more detailed information about interaction energies as well as finer spatial resolution. However, due to limitations in attainable computational speeds, simulations cover time scales that are typically 105 times shorter than those of AFM experiments, necessitating high pulling velocities on the order of 1 Å/ps. As a result, a large amount of irreversible work is performed during SMD simulations, which needs to be discounted to obtain equilibrium information. A proof that equilibrium properties of a system can be deduced from nonequilibrium simulations was given by Jarzynski.76,77 The second law of thermodynamics states that the average work W done through a nonequilibrium process that changes a parameter of a system from 0 at time zero to t at time t is greater than or equal to the equilibrium free energy difference between the two states specified through the final and initial values of : F Ft F0 W, (18) where the equality holds only if the process is quasi-static. Jar- zynski76 discovered an equality that holds regardless of the speed of the process: e F eW, (19) where (kBT) 1. The Jarzynski equality provides a way to extract equilibrium information, such as free energy differences, from averaging over nonequilibrium processes,76 a method that has been tested against computer simulations77 and experiments.78 A major difficulty that arises with the application of eq. (19) is that the average of exponential work appearing in Jarzynski’s equality is dominated by trajectories corresponding to small work values that arise only rarely. Hence, an accurate estimate of free energy requires suitable sampling of such rare trajectories, and thus the accuracy of the method is limited by statistical error. Cumulant expansions41,62,76,79 are an effective approximation for the exponential average; because the lower order terms of the expansion are less influenced by statistical error, the systematic error introduced by truncating the higher order terms may be considerably smaller than the statistical error that which would be introduced by including them. It can be shown41 that in the relevant regime of steering by means of stiff springs, the work on the system is Gaussian-distributed regardless of the speed of the process simulated and the cumulant expansion of Jarzynski’s equality can safely be terminated at second order.80 Scalable Molecular Dynamics with NAMD 1787 Application of the Jarzynski identity is comparable in effi- ciency to the umbrella sampling method.81 However, the analysis involved in the SMD method is simpler than that involved in umbrella sampling in which one needs to solve coupled nonlinear equations for the weighted histogram analysis method.31,54,82,83 In addition, the application of the Jarzynski identity has the advantage of uniform sampling of a reaction coordinate. Whereas in umbrella sampling a reaction coordinate is locally sampled nonuniformly proportional to the Boltzmann weight, in SMD a reaction coordi- nate follows a guiding potential that moves with a constant veloc- ity and, hence, is sampled almost uniformly (computing time is uniformly distributed over the given region of the reaction coor- dinate). This is particularly beneficial when the potential of mean force (essentially, the free energy profile along the reaction coor- dinate) contains narrow barrier regions as in ref. 62. In such cases, a successful application of umbrella sampling depends on an optimal choice of biasing potentials, whereas nonequilibrium ther- modynamic integration appears to be more robust.31 However, umbrella sampling is a general method that can be applied to a variety of reaction coordinates, including complex ones like those involved in large conformational changes in proteins.84 NAMD also provides the facility for the user to apply other types of external forces to a system. In a technique related to SMD, torques may be applied to induce the rotation of a protein domain. As with SMD, this is implemented in NAMD through a simple specification in the configuration file and does not require addi- tional programming on the part of the researcher. This technique has already been successfully used to study the rotation of the Fo domain of ATPase.85 The application of more sophisticated exter- nal forces are readily implemented through the NAMD “Tcl forces” interface, which allows the user to specify position- and time-dependent forces to be computed at each time step. This technique has recently been used to mimic the effect of membrane surface tension on a mechanosensitive channel59 and to model the interaction of the lac repressor protein (modeled in atomic-level detail) with DNA described by an elastic rod that exerts forces on the protein.6,86 IMD By using NAMD in conjunction with the molecular graphics software VMD, steering forces can be applied in an interactive manner, rather than only in batch mode.52 For this purpose, VMD is linked to a NAMD simulation running on the same machine or a remote cluster. This arrangement permits an investigator to view a running simulation and to apply forces based on real-time infor- mation about the progress of the simulation (such as visual infor- mation or force feedback through a haptic device). The researcher is then able to explore the mechanical properties of the system in a direct, hands-on manner, using her scientific intuition to guide the simulation via a mouse or haptic device. This method has already been used in biomolecular research, for instance, to study the selectivity and regulation of the membrane channel protein GlpF and the enzyme glycerol kinase.53 Setting up an IMD sim- ulation is a straight-forward process that can be done on any computer. The IMD haptic interface52 consists of three primary compo- nents: a haptic device to provide translational and orientational input as well as force feedback to the user’s hand; NAMD to calculate the effect of applied forces via molecular dynamics; and VMD to display results (Fig. 7). VMD communicates with the haptic device via a server87 that controls the haptic environment experienced by the user, as described in ref. 52. The scheme of splitting the haptic, visualization, and simulation components into three communicating, asynchronous processes has been employed successfully,52,88 and permits all three components to run at top speed, maximizing the responsiveness of the system. IMD requires Figure 6. Constant velocity pulling in a one-dimensional case. The dummy atom is colored red, and the SMD atom blue. As the dummy atom moves at constant velocity, the SMD atom experiences a force that depends linearly on the distance between both atoms. [Color figure can be viewed in the online issue, which is available at www. interscience.wiley.com.] Figure 7. In IMD, the user applies forces to atoms in the simulation via a force-feedback haptic device. 1788 Phillips et al. • Vol. 26, No. 16 • Journal of Computational Chemistry efficient network communication between the visualization front- end and the MD back-end. Although the network bandwidth re- quirements for performing IMD are quite low relative to the computational demands, latency is a major concern as it has a direct impact on the responsiveness of the system. IMD uses custom networking code in NAMD and VMD to transfer atomic coordinates and steering forces efficiently. To make molecular motion as described by MD perceptible to the IMD user through the haptic device, the quantities arising in the generic equation of motion governing the molecular response (represented below by Roman characters) and the haptic response (denoted by Greek characters), m d2x dt2 f, d2 d2 (20) must be related through suitable scaling factors. Molecular motion probed is typically extended over distances of x 1 Å, molecular time scales covered are typically t 1 ps, and external forces acting on molecular moieties should not exceed f 1 nN so as not to overwhelm inherent molecular forces. By contrast, the haptic device is characterized by length resolution of 1 cm and can generate a force of 1 N or more; t 1 ps of dynamics requires 1 s or more to compute. The interface between the haptic device and NAMD thus introduces the scaling factors Sx /x 10 8, St /t 10 12, Sf /f 10 9, (21) Multiplying the molecular equation of motion in eq. (20) by the factor SfSx/St 2 gives SfSx St 2 m d2x dt2 Sfm d2 d2 SfSx St 2 f Sx St 2 . (22) From this we can conclude SfSt 2 Sx m d2 d2 . (23) Comparison with the haptic equation of motion in eq. (20) suggests that the effective mass felt by the haptic device, and hence, by the user is SfSt 2 Sx m. (24) The molecular moieties to be moved through external forces have typical masses of (e.g., for glycerol moved through a membrane channel53) of m 180 amu or m 3 1025 kg. From eq. (21) we conclude then that the effective mass felt by the user through the haptic device is 3 kg; the user does not sense strong inertial effects, and can readily manipulate the biomolecular system. IMD can also be carried out without force feedback, using a standard mouse to steer the simulated system. To assist users of NAMD with IMD, AutoIMD89 has been developed. AutoIMD permits the researcher to use the graphical interface provided by VMD to run an MD simulation based on a selection of atoms. The simulation can then be visualized in real time in the VMD graphics window. Forces may be applied with either a mouse or a haptic device by the user (as described above), or statically as in traditional SMD. Rather than carrying out a simulation of the entire molecule, AutoIMD performs a rapid MD simulation by dividing the system into three parts: a “molten zone,” where the atoms are allowed to move freely; a surrounding “fixed zone,” in which the atoms are included in the simulation (and exert forces on the molten zone), but are held fixed; and an “excluded zone,” which is entirely disregarded in the AutoIMD simulation. In this way, AutoIMD may be used to inspect and perform energy minimizations on parts of the system that have been manipulated (e.g., through mutations or IMD), giving the researcher real-time feedback on the simulation. SMD and IMD simulations differ in fundamental ways, and may be fruitfully combined. In SMD, the specification of the external forces is developed based on the researcher’s understand- ing of the biological and structural properties of the system. The SMD simulation is carried out with the weakest force possible to induce the necessary change in an affordable length of simulation time, and the analysis of the simulation data relates the force applied to the progress of the system along the chosen reaction path. In contrast, IMD simulations are unplanned, allowing the researcher to toy with the system, exploring the degrees of free- dom. Because the simulations need to be rapid-completed in minutes rather than days or weeks-the applied forces are ex- tremely large, and the simulations are too rough to produce data suitable for an accurate analysis of molecular properties. The techniques can be combined: in the first stage, the researcher uses IMD to generate and test hypotheses for reaction mechanisms or to accelerate substrate transport, docking, and other mechanisms that are difficult to cast into numerical descriptions; in the second stage, the researcher carries out further MD or SMD simulations building on the hypothesized mechanisms or configurations from the IMD investigation. Free Energy Calculations In addition to propagating the motion of atoms in time, MD can also be used to generate an ensemble of configurations, from which thermodynamic quantities like free energy differences, F, can be computed. In a nutshell, there are three possible routes for the calculation of F: (1) estimate the relevant probability distribu- tion, [U(r1, . . . , rN)], from which a free energy difference may be inferred via 1/ ln [U(r1, . . . , rN)]/0, where 0 denotes a normalization term: (2) compute the free energy difference di- rectly; and (3) calculate the free energy derivative, d F()/d, along some order parameter (collective coordinate), , consistent with an average force,90 and integrate the latter to obtain F. The popular umbrella sampling method,71,72 whereby the prob- ability to find the system along a given reaction coordinate is sought, falls evidently into the first category. One blatant short- coming of this scheme, however, lies in the need to guess the external potential or bias that is necessary to overcome the barriers of the free energy landscape-an issue that may rapidly become intricate in the case of qualitatively new problems. In this section, Scalable Molecular Dynamics with NAMD 1789 we shall focus on the second and the third classes of approaches for determining free energy differences. The first approach, available in NAMD since version 2.4, is free energy perturbation (FEP),91 an exact method for the direct computation of relative free energies. FEP offers a convenient framework for performing “alchemical transformations,” or in silico site-directed mutagenesis of one chemical species into an- other. Description of intermolecular association or intramolecular de- formation in complex molecular assemblies requires an efficient computational tool, capable of rapidly providing precise free en- ergy profiles along some ordering parameter, , in particular when little is known about the underlying free energy behavior of the process. A fast and accurate scheme, pertaining to the third cate- gory of methods, is introduced in NAMD version 2.6 to determine such free energy profiles, F(). This scheme relies upon the evaluation of the average force acting along the chosen order parameter, , in such a way that no apparent free energy barrier impedes the progress of the system along the latter.92,93 The efficiency of the free energy algorithm represents only one facet of the overall performance of the free energy calculation, which to a large extent, relies on the ability of the core MD program to supply configurations and forces in rigorous thermo- dynamic ensembles and in a time-bound fashion. The methodology described hereafter has been implemented in NAMD and operates with nearly no extra cost compared to a standard MD simulation. Alchemical Transformations Contrary to the worthless piece of lead in the hands of the pro- verbial alchemist, the potential energy function of the computa- tional chemist is sufficiently malleable to be altered seamlessly, thereby allowing the thermodynamic properties of a system to be related to those of a slightly modified one, such as a chemically modified protein or ligand, through the difference in the corre- sponding intermolecular potentials. The free energy difference between a reference state, a, and a target state, b, can be expressed in terms of the ratio of their corresponding partition functions. Using the well-known relation- ship between partition function Z and free energy F, Z exp[F/ k8T], along with the property Z ZkinZpot where Zkin and Zpot are the partition functions for kinetic and potential energy, respec- tively, one can express F a3b Fb Fa as: Fa3b 1 ln expUbr1, . . . , rNdr1 . . . drN expUar1, . . . , rNdr1 . . . drN . (25) Here Ua and Ub are the potential energy functions for states a and b, respectively. One can write eq. (25) Fa3b (1/)ln{ exp[(Ub()Ua())]exp[Ua()]dx/exp[Ua()]dx} where r1, . . . , rN . Defining the average, Boltzmann-weighted relative to the potential Ua, that is, weighted over configurations representa- tive of the reference state a, fa f()exp[Ua()]dx/ exp[Ua()]dx, one can state: Fa3b 1 lnexpUbr1, . . . , rN Uar1, . . . , rNa. (26) This is the celebrated FEP equation.91 In principle, eq. (26) is exact in the limit of infinite sampling. In practice, however, on the basis of finite-length simulations, it only provides accurate esti- mates for small changes between a and b. This condition does not imply that the free energies characteristic of a and b be sufficiently close, but rather that the corresponding configurational ensembles overlap to a large degree to guarantee the desired accuracy. For example, although the hydration free energy of benzene is only 0.4 kcal/mol, insertion of a benzene molecule in bulk water constitutes too large a perturbation to fulfill the latter requirement in a single-step transformation. If such is not the case, the pathway connecting state a to state b is broken down into N intermediate, not necessarily physical states, k (a 1 0 and b N 1), so that the Helmholtz free energy difference reads: Fa3b 1 k1 N1 lnexpUr1, . . . , rN; k1 Ur1, . . . , rN; kk. (27) Here the potential energy is not only a function of the spatial coordinates, but also of the parameter that connects the reference and the target states. Perturbation of the chemical system by means of k may be achieved by scaling the relevant nonbonded force field parameters of appearing, vanishing, or changing atoms, in the spirit of turning lead into gold. In NAMD, the topologies characteristic of the initial state, a, and the final state, b, coexist, yet without interacting. This implies that, as a preamble to the free energy calculation, a hybrid topol- ogy has to be defined with an appropriate exclusion list to prevent interactions between those atoms unique to state a and those unique to state b. In lieu of altering the nonbonded parameters, the interaction of the perturbed molecular fragments with their envi- ronment is scaled as a function of k: Ur1, . . . , rN; k kUbr1, . . . , rN 1 kUar1, . . . , rN. (28) This scheme is referred to as the dual-topology paradigm.94 In a number of MD programs, FEP is implemented as an extra layer, implying that free energy differences are computed a pos- teriori by looping over a previously generated trajectory. In NAMD, the potential energies representative of the reference state, k, and the target state, k1, are evaluated concurrently “on the fly” at little additional cost and the ensemble average of eq. (27) is updated continuously. “Alchemical transformations” may be applied to a variety of chemically and biologically relevant systems, offering, in ad- dition to a free energy difference, atomic-level insight into the structural modifications entailed by the perturbation. In Figure 8, in silico site-directed mutagenesis experiments are proposed for the transmembrane domain of glycophorin A (GpA) in an attempt to pinpoint those residues responsible for -helix dimerization. Leu75 and Ile76 are perturbed into alanine follow- ing the depicted thermodynamic cycle. The first point mutation, L75A, yields a free energy change of 13.9 0.3 and 28.8 1790 Phillips et al. • Vol. 26, No. 16 • Journal of Computational Chemistry 0.5 kcal/mol in the free and in the bound state, respectively, which, put together, corresponds to a net free energy change of 1.0 0.6 kcal/mol (experimental estimate:95 1.1 0.1 kcal/mol). The second point mutation, I76A, led to a free energy change of 4.9 0.3 and 8.4 0.4 kcal/mol, in the single helix and in the dimer, respectively, that is, a net change of 1.4 0.5 kcal/mol (experimental estimate:95 1.7 0.1 kcal/mol). Aside from the remarkable agreement between the- ory and experiment, these free energy calculations confirm that replacement of bulky nonpolar side chains like leucine or isoleucine by alanine disrupts the -helical dimer through a loss of van der Waals interactions.96 Overcoming Free Energy Barriers Using an Adaptive Biasing Force The sizeable number of degrees of freedom described explicitly in statistical simulations of large molecular assemblies, in particular those of both chemical and biological interest, rationalizes the need for a compact description of thermodynamic properties. Free en- ergy profiles offer a suitable framework that fulfills this require- ment by providing the dependence of the free energy on the chosen degrees of freedom . Determination of such free energy profiles, under the sine qua non condition that some key degree of freedom , for example, a reaction coordinate, can be defined, however, remains a daunting task from the perspective of numerical simu- lations. In the context of Boltzmann sampling of the phase space, overcoming the high free energy barriers that separate thermody- namic states of interest is a rare event that is unlikely to occur on the time scales amenable to MD simulation. An important step forward on the road towards an optimal sampling of the phase space along a chosen collective coordinate, , has been made recently. In a nutshell, this new method relies on the continuous application of a dynamically adapted biasing force that compensates the current estimate of the free energy, thus virtually erasing the roughness of the free energy landscape as the system progresses along .92 To reach this goal, the average force Figure 8. Homodimerization of the transmembrane (TM) domain of glycophorin A (GpA): (a) Contact surface of the two -helices forming the TM domain of GpA. The strongest contacts are observed in the heptad of amino acids, Leu75, Ile76, Gly79, Val80, Gly83, Val84, and Thr87. Residue Leu75, which participates in the association of the two -helices through dispersive interactions, is featured as transparent van der Waals spheres. (b) Free energy profile delineating the reversible dissociation of the two -helices, obtained using an adaptive biasing force. The ordering parameter, , corresponds to the distance separating the center of mass of the TM segments. The entire pathway was broken down into 10 windows, in which up to 15 ns of MD sampling was performed, corresponding to a total simulation time of 125 ns. (c) Thermodynamic cycle utilized to perform the “alchemical transformation” of residues Leu75 and Ile76 into alanine, demonstrating the participation of these amino acids in the homodimerization of the two -helices. The left vertical leg of the cycle represents the transformation in a single -helix from the wild-type (WT) to the mutant form. The right vertical leg denotes a simultaneous point mutation in the two interacting -helices. Using the notation of the figure, the free energy difference arising from this perturbation is equal to G2 mutation 2 G1 mutation. Each leg of the thermodynamic cycle consists of 50 intermediate states and a total MD sampling of 6 ns. For clarity, the environment of the -helical dimer, formed by a dodecane slab in equilibrium between two lamellae of water, is not shown. [Color figure can be viewed in the online issue, which is available at www.interscience.wiley.com.] Scalable Molecular Dynamics with NAMD 1791 acting on , F, is evaluated from an unconstrained MD simu- lation:93 dA d U(r1, . . . , rN) 1 lnJ F, (29) where J denotes the determinant of the Jacobian for the trans- formation from Cartesian to generalized coordinates, which is a necessary modification of metric, given that {r1, . . . , rN} and are not independent variables. The specific form of J is an inherent function of the coordinate, , chosen to advance the system. In the course of the simulation in NAMD, the force, F, acting along the ordering parameter, , is accrued in small bins, thereby supplying an estimate of the derivative dA()/d. The so called adaptive biasing force (ABF), F ABF Fr, is determined in such a way that, when applied to the system, it yields a Hamiltonian in which no average force is exerted along . As a result, all values of are sampled with an equal probability, which in turn, dramatically improves the accuracy of the calculated free energies. The approach further entails that progression of the system along is fully reversible and limited solely by its self- diffusion properties. At this stage, it should be clearly understood that whereas the ABF method enhances sampling along , its ability to supply a perfectly uniform probability distribution of the system over the entire range of values may be impeded by possible orthogonal degrees of freedom. We have chosen to introduce the average force method in NAMD within the convenient framework of unconstrained MD,93 in which the coordinate, is unconstrained, but other degrees of freedom, such as bond lengths, can be constrained. Either constraint forces must be taken into account in F, as they are in NAMD version 2.6, or it will be crucial to ascertain that no Cartesian coordinate appears simulta- neously in a constrained degree of freedom and in the derivative U(r1, . . . , rN)/ of eq. (29). The implementation of the ABF scheme in NAMD provides reaction coordinates such as a distance between subgroups of atoms or length along a specified direction in cartesian space. Previous appli- cations include, the intramolecular folding of a short peptide, the partitioning of small solutes across an aqueous interface, and the intermolecular association of neutral and ionic species.92,93 The -helical dimerization of GpA represents an interesting application of the method, whereby the reversible dissociation- rather than the association, for obvious cost-effectiveness rea- sons-is carried out, using the distance separating the center of mass of the trans-membrane segments as the reaction coordinate. The free energy profile characterizing this process is shown in Figure 8, and features a deep minimum at 8.2 Å, which corre- sponds to the native, close packing of the -helices. As the trans-membrane segments move away from each other, helix- helix interactions are progressively disrupted, in particular in the crossing region, thus causing an abrupt in- crease of the free energy, accompanied by a tilt of the two -helices towards an upright orientation. As the distance be- tween the two trans-membrane segments further increases, the free energy profile levels off at 21 Å, a separation beyond which the dimer is fully dissociated. A valuable feature offered by NAMD lies in the possibility to evaluate a posteriori electrostatic and van der Waals forces from an ensemble of configurations. Projection of these forces onto the coordinate , and subsequent integration of the former provides a deconvolution of the complete free energy profile in terms of helix- helix and helix-solvent contributions. Analysis of these contributions reveals two regimes in the association process, driven at large separation by the solvent, which pushes the -helices together, and at short separation by van der Waals interactions that favor native contacts.96 NAMD Software Design Just as the intelligent car buyer looks under the hood to understand the performance and longevity of a particular vehicle, we now direct the attention of the reader and potential NAMD user to a few design and implementation details that contribute to the flexibility and performance of NAMD. Goals, Design, and Implementation NAMD was developed to enable ambitious MD simulations of biomolecular systems, employing the best technology available to provide the maximum performance possible to researchers. In the past decade simulation size and duration have increased dramati- cally. Ten years ago a simulation of 36,000 atoms over 100 ps as reported in ref. 5 was considered very advanced. Today, this status is reserved for simulations of systems with more than 300,000 atoms for up to 100 ns as reported in refs. 6, 23, and 55. The progress made is illustrated in Figure 1, comparing the sizes of systems reported in refs. 5 and 6. This 1000-fold increase in capability (10-fold in atom count and over 100-fold in simulation length) has been partially enabled by advances in processor per- formance, with clock rate increases leading the way. However, substantial progress has also resulted from exploiting the factor of 100 or more in performance available through the use of massively parallel computing, coordinating the efforts of numerous proces- sors to address a single computation. Looking forward, scientific ambition remains unchecked while increases in processor clock rates are constrained by limits on power consumption and heat dissipation. This stagnation in CPU speed has inspired system vendors such as Cray and SGI to incorporate field- programmable gate arrays (FPGAs) into their offerings, promising great performance increases, but only for suitable algorithms that are subjected to heroic porting efforts (as have been initiated for the force evaluations used by NAMD). Advances in semiconductor technology will surpass the limits encountered today, but in the meantime, indus- try has turned to offering greater concurrency to performance-hungry applications, scaling systems to more processors, and processors to more cores, rather than to higher clock rates. Indeed, the highest- performance component of a modern desktop is often the 3D graphics accelerator, which inexpensively provides an order of magnitude greater floating point capability than the main processor at a fraction of the clock rate by automatically distributing independent calcula- tions to tens of pipelines, and even across multiple boards. Therefore, high performance and parallel computing will become even more synonymous than today, with greater industry acceptance and support. 1792 Phillips et al. • Vol. 26, No. 16 • Journal of Computational Chemistry Scientific computing is also facing the perennial “software crisis” that has stalked business information systems for decades. The seminal book by Allen and Tildesley97 could dedicate a few pages of an appendix to writing efficient FORTRAN-77 code and consider the reader adequately informed. Developing modern high-performance software, however, requires knowledge of ev- erything from parallel decomposition and coordination libraries to the relative cost of accessing different levels of cache memory. The design of more complex algorithms and numerical methods has ensured that any useful and successful program is likely to outlive the machine for which it is originally written, making portability a necessity. Software is also likely to be used and extended by persons other than the original author, making code readability and modifiability vital. Software maintenance activities such as porting and modifica- tion account for the majority of the cost and effort associated with a program during its lifetime. These issues have been addressed in the development of object-oriented software design, in which the programmer breaks the program into “objects” comprising closely- related data (such as the x, y, and z components of a vector) and the operations that act on it (addition, dot product, cross product, etc.). The objects may be arranged into hierarchies of classes, and an object may contain or refer to other objects of the same or different classes. In this manner, large and complex programs can be broken down into smaller components with defined interfaces that can be implemented independently. NAMD is implemented in C, the most popular and widely supported programming lan- guage providing efficient support for these methods. Methodology for the development of parallel programs is far from mature, with automatically parallelizing compilers and lan- guages still quite limited and most programmers using the Mes- sage Passing Interface (MPI) libraries in combination with C, C, or Fortran. Although the acceptance of MPI as a crossplat- form standard for parallel software has been of great benefit, the burden on the programmer remains. The first task is to decompose the problem, which is often simplified by assuming that the pro- cessor count is a power of two or has factors corresponding to the dimensions of a large three-dimensional array. MPI programming then requires the explicit sending and receiving of arrays between processors, much like directing a large and complex game of catch. NAMD is based on the Charm parallel programming sys- tem and runtime library.98 In Charm, the computation is de- composed into objects that interact by sending messages to other objects on either the same or remote processors. These messages are asynchronous and one sided, that is, a particular method is invoked on an object whenever a message arrives for it rather than having the object waste resources while waiting for incoming data. This message-driven object programming style effectively hides communication latency and is naturally tolerant of the system noise that is found on workstation clusters. Charm also sup- ports processor virtualization,99 allowing each algorithm to be written for an ideal, maximum number of parallel objects that are then dynamically distributed among the actual number of proces- sors on which the program is run. Charm provides these benefits even when it is implemented on top of MPI, an option that allows NAMD to be easily ported to new platforms. The parallel decomposition strategy used by NAMD is to treat the simulation cell (the volume of space containing the atoms) as a three-dimensional patchwork quilt, with each patch of sufficient size that only the 26 nearest-neighboring patches are involved in bonded, van der Waals, and short-range electrostatic interactions. More precisely, the patches fill the simulation space in a regular grid and atoms in any pair of non-neighboring patches are sepa- rated by at least the cutoff distance at all times during the simu- lation. Each hydrogen atom is stored on the same patch as the atom to which it is bonded, and atoms are reassigned to patches at regular intervals. The number of patches varies from one to several hundred and is determined by the size of the simulation indepen- dently of the number of processors. Additional parallelism may be generated through options that double the number (and have the size) of patches in one or more dimensions. When NAMD is run, patches are distributed as evenly as possible, keeping nearby patches on the same processor when there are more patches than processors, or spreading them across the machine when there are not. Then, a (roughly 14 times) larger number of compute objects responsible for calculating atomic interactions either within a single patch or between neighboring patches is distributed across the processors, minimizing commu- nication by grouping compute objects responsible for the same patch together on the same processors. At the beginning of the simulation, the actual processor time consumed by each compute object is measured, and this data is used to redistribute compute objects to balance the workload between processors. This mea- surement-based load balancing100 contributes greatly to the paral- lel efficiency of NAMD. Using forces calculated by compute objects, each patch is responsible for integrating the equations of motion for the atoms it contains. This can be done independently of other patches, but occasionally requires global data affecting all atoms, such as a change in the size of the periodic cell due to a constant pressure algorithm. Although the integration algorithm is the clearly visible “outer loop” in a serial program, NAMD’s message-driven design could have resulted in much obfuscation (as was experienced even in the simpler NAMD 1.X4). This was averted by using Charm threads to write a top-level function that is called once for each patch at program start and does not complete until the end of the simulation.10 This design has allowed pressure and temperature control methods and even a conjugate gradient minimizer to be implemented in NAMD without writing any new code for parallel communication. Tcl Scripting Interface NAMD has been designed to be extensible by the end user in those areas that have required the most modification in the past and that are least likely to affect performance or harbor hard-to-detect programming errors. The critical force evaluation and integration routines that are the core of any molecular dynamics simulation have remained consistent in implementation as NAMD has evolved, occasionally introducing improved methods for pressure and temperature control. The greatest demand for modification has been in high-level protocols, such as equilibration and simulated annealing schedules. An additional demand has been for the im- plementation of experimental and often highly specialized steering and analysis methods, such as those used to study the rotational motion of ATP synthase.85 Scalable Molecular Dynamics with NAMD 1793 Flexibility requirements could, in theory, be met by having the end user modify NAMD’s C source code, but this is undesir- able for several reasons. Any users with special needs would have to maintain their own “hacked” version of the source code, which would need to be updated whenever a new version of NAMD was released. These unfortunate users would also need to maintain a full compiler environment and any external libraries for every platform on which they wanted to run NAMD. Finally, an under- standing of both C and NAMD’s internal structure would be required for any modification, and bugs introduced by the end user would be difficult for a novice programmer to fix. These problems can be eliminated by providing a scripting interface that is inter- preted at run time by NAMD binaries that can be compiled once for each platform and installed centrally for all users on a machine. Tcl (Tool Command Language)102 is a popular interpreted language intended to provide a ready-made scripting interface to high-performance code written in a “systems programming lan- guage” such as C or C. The Tcl programmer is supported by online documentation (at www.tcl.tk) and by books targeting all levels of experience. Tcl is used extensively in the popular mo- lecular graphics program VMD, and therefore even novice users are likely to have experience with the language. In NAMD, Tcl is used to parse the simulation configuration file, allowing variables and expressions to be used in initially defining options, and then to change certain options during a running simulation. Tcl scripting has been used to implement the replica exchange method103 using NAMD, without modifying or adding a single line of C. A master program, running in a Tcl shell outside of NAMD, is used to spawn a separate NAMD slave process for each replica needed. Each NAMD instance is started with a special configuration file that uses the standard networking capabilities of Tcl to connect to the master program via a TCP socket. At this point, the master program sends commands to each slave to load a molecule, simulate dynamics for a few hundred steps, report average energies, and change temperatures based on the relative temperatures of the other replicas. This work would have been much more difficult without a standard and fully featured scripting language such as Tcl. NAMD provides a variety of standard steering forces and protocols, but an additional Tcl force interface provides the ulti- mate in flexibility. The user specifies the atoms to which steering forces will be applied; the coordinates of these atoms are then passed to a Tcl procedure, also provided by the user, at every time step. The Tcl steering procedure may use these atomic coordinates to calculate steering forces, possibly modifying them based on elapsed time or progress of the atoms along a chosen path. This minimal but complete interface has been used to implement com- plex features such as the adaptive biasing force method93 de- scribed above. Serial and Parallel Performance Although the NAMD developers have strived to make the software as fast as possible,104 decisions made by the user can greatly influence the serial performance and parallel efficiency of a par- ticular simulation. For example, the computational effort required for a simulation is dominated by the nonbonded force evaluation, which scales as NRcut 3 , where N is the number of simulated atoms, Rcut is the cutoff distance, and N/V is the number of atoms per unit volume. From this formula one can see that, for example, a five-point water model compared to a three-point water model increases both the number and density of simulated points (real and dummy atoms) and would therefore run up to (5/3)2 2.8 times as long. The processor type and clock rate of the machine on which NAMD is run, of course, will affect performance. Unlike many scientific codes, NAMD is usually limited by processor speed rather than memory size, bandwidth, or latency; the patch structure described above leads to naturally cache-friendly data layout and force routines. However, the nonbonded pair lists used by NAMD, while distributed across nodes in a parallel run, can result in paging to disk for large simulations on small memory machines, for example, 100,000 atoms on a single 128 MB workstation; in this case disabling pair lists will greatly improve performance. The limits of NAMD’s parallel scalability are mainly deter- mined by atom count, with one processor per 1000 atoms being a conservative estimate for good efficiency on recent platforms. Increased cutoff distances will result in additional work to distrib- ute, but also in fewer patches, and hence, one is confronted with a hard-to-predict effect on scaling. Finally, dynamics features that require global communication or even synchronization, such as minimization, constant pressure, or steering, may adversely affect parallel efficiency. When procuring a parallel computer, great attention is normally paid to individual node performance, network bandwidth, and network latency. Although NAMD transfers relatively little data and is designed to be latency tolerant, scaling beyond a few tens of processors will benefit from the use of more expensive technolo- gies than commodity gigabit ethernet. A major drain on parallel performance will come from interference by other processes run- ning on the machine. A cluster should never be time shared, with two parallel jobs running on the same nodes at the same time, and care should be taken to ensure that orphaned processes from previous runs do not remain. On larger machines, even occasional interruptions for normal operating system functions have been shown to degrade the performance of tightly coordinated parallel applications running on hundreds of processors.105 The particle-mesh Ewald (PME) method for full electrostatics evaluation, neglected in the performance discussion thus far, de- serves special attention. The most expensive parts of the PME calculation are the gridding of each atomic charge onto (typically) 4 4 4 points of a regular mesh and the corresponding extraction of atomic forces from the grid; both scale linearly with atom count. The actual calculation of the (approximately) 100 100 100 element fast Fourier transform (FFT) is negligible, but requires many messages for its parallel implementation. While this presents a serious impediment to other programs, NAMD’s mes- sage-driven architecture is able to automatically interleave the latency-sensitive FFT algorithm with the dominant and latency- tolerant short-range nonbonded calculation.106 In conclusion, a simulation with PME will run slightly slower than a non-PME simulation using the same cutoff, but PME is the clear winner because it provides physically correct electrostatics (without arti- facts due to truncation) and allows a smaller short-range cutoff to be used. 1794 Phillips et al. • Vol. 26, No. 16 • Journal of Computational Chemistry When running on a parallel computer, it is important to mea- sure the parallel efficiency of NAMD for the specific combination of hardware, molecular system, and algorithms of interest. This is correctly done by measuring the asymptotic cost of the simulation, that is, the additional runtime of a 2000 step run vs. a 1000 step run, thereby discounting startup and shutdown times. By measur- ing performance at a variety of processor counts, a decision can be made between, for example, running two jobs on 64 processors each vs. one job 60% faster (and 20% less efficiently) on 128 processors. Dynamic load balancing makes NAMD’s startup sequence longer than that of other parallel programs, requiring special attention to benchmarking. Initial load balancing in an NAMD run begins by default after five atom migration “cycles” of simulation. Compute object execution times are measured for five cycles, followed by a major load balancing in which work is reassigned to processors from scratch. Next is five cycles of measurement followed by a load balance refinement in which only minor changes are attempted. This is repeated immedi- ately, and then periodically between 200-cycle stretches of simulation with measurement disabled (for maximum perfor- mance). After the second refinement, NAMD will print several explicitly labeled “benchmark” timings, which are a good esti- mate of performance for a long production simulation. The results of such benchmarking on a variety of platforms for a typical simulation are presented in Figure 9. Although every researcher’s situation is different, the compu- tational resource decisions made by the authors’ group provide a useful example. The main local resources are six independent, 24-node, 48-processor Linux clusters using commodity gigabit ethernet cards and switches. Each cluster is used for at most one simulation at a time, and therefore, only the six head nodes are connected to the building network, which provides access to cen- tral file and administrative servers. A queueing system manages a single queue, dispatching jobs to the next free cluster for up to 24 h, yielding full utilization as long as jobs are available to run. In addition, the Clustermatic Linux distribution allows a running simulation to be easily suspended by the queueing system, allow- ing short “express” jobs to run, which are used for simulation setup and testing, as well as for on-demand IMD. These cost-efficient local resources are supplemented by grants of computer time from NSF-funded centers, which provide access to larger, more scalable machines for large, intensive simulations and other special cases. In conclusion, a basic understanding of NAMD’s performance characteristics combined with specific benchmarks provides more science in less time. Conversely, the most common ways to waste computer time with NAMD are to include too much water in the simulation cell, to use an unnecessarily large cutoff with PME, to run on more processors than the simulation efficiency scales to, to blindly continue a simulation without checking the output for anomalies, and finally, to simulate a molecule without a clear understanding of what scientific questions are to be answered based on the results. Three Exemplary Applications In the following, we describe three exemplary applications of NAMD. The applications are chosen to illustrate the use of NAMD in studies of small (ubiquitin), intermediate (aquaporin channel), and large (lac repressor) systems. We emphasize in particular the features of NAMD that were most useful and how they allowed us to surmount methodological hurdles quickly. Ubiquitin One first application of the NAMD program involves a rather routine modeling project on a small protein system that grew into a more extensive one over time. The reader will learn from this first brief case study mainly how NAMD is used under common circumstances in which any other modeling program would have performed just as well, although we will emphasize the particular features that made NAMD a very suitable choice. The protein investigated is ubiquitin, which recently acquired considerable notice as the main subject of study by the 2004 Nobel laureates in Chemistry.107,108 Ubiquitin is a small globular protein of about 80 amino acids that is highly conserved throughout the eukaryotes. The proteins, usually in the form of multimers, function as tags for the destination of cellular proteins in cell trafficking, for example, for protein degradation.109 The modeling task in the present case was to explain atomic force microscopy experiments that mea- sured forces and extensions that arise when monomeric or tet- rameric ubiquitin is stretched with pN forces.110,111 The project was vastly simplified through the availability of tutorials that explained how VMD and NAMD are used to set up a simulation of the solvated protein, to stretch it, and to analyze the results (http://www.ks.uiuc.edu/Training/Tutorials). Indeed, VMD and NAMD are accessible to novice researchers due to the extensive introductory material prepared by the developers. The goal of these simulations was to stretch monomeric ubiq- uitin using the steered molecular dynamics method introduced above. After equilibration of the protein in a spherical water bath (a system of altogether 26,000 atoms) under NVT and NVE ensemble conditions with a 12 Å cutoff of nonbonded forces, the protein was stretched with the N-terminus constrained and the C-terminus pulled with a spring (spring constant of about 500 pN/Å), the end of which moved at a speed of 0.5 Å/ps [c.f., eq. (17) and Fig. 6]. Snapshots during the stretching process are shown in Figure 10. One can recognize that the protein can be stretched readily from the folded to the completely extended conformation. The simulations lasted about a nanosecond and, with a 2 fs time step, required 4 days on a desktop computer with an AMD Opteron processor. Next, the stretching of a tandem of four ubiquitins was also attempted. In this case, the simulated protein-water system com- prised 62,000 atoms; snapshots during the 3.5 ns stretching process are also shown in Figure 10. The simulations revealed the unfold- ing pathways of stretched ubiquitin, information that cannot be gleaned from an experiment that provides solely data of extension and applied forces vs. time. This application of steered molecular dynamics demonstrates well the value of molecular modeling in complementing experimental observation. The ready accessibility and ease of use of NAMD makes such modeling projects possible for experimentalists. In this project several key features of NAMD and VMD were employed. NAMD ran on a common desktop system and was even used on a laptop, for example, for setting up and testing the Scalable Molecular Dynamics with NAMD 1795 simulation. NAMD simulations can then easily be migrated to multiprocessor machines, although in the present case that was not absolutely necessary. The “Psfgen” plugin of VMD was very useful in setting up the system, namely in defining the chemical topology, that is, the LYS48-C-terminus bonds connecting the four tandem repeats, and generating so-called protein structure and parameter files. Also beneficial was the “Solvate” plugin of VMD, which greatly expedited the task of placing ubiquitin in a water bath. NAMD is ideally suited to carry out SMD simulations; as explained above, the needed functionality can be activated at the input file level and other features can be added through scripting. In the present case, the scripting capabilities of VMD allowed us to calculate the extension of the protein reached at any moment during the stretching process. This easy provision of all types of data from an MD run as well as the interplay with VMD are invaluable features of NAMD. Aquaporin The second example of using NAMD to simulate and investi- gate a biomolecular process concerns a family of membrane proteins, known as aquaporins (AQPs), which are responsible for facilitating water transfer across cellular membranes. Sim- ulation of the dynamics and function of these channels is a good example of systems that require inclusion of more than 100,000 atoms in the molecular system,112-115 as one needs to explicitly include a lipid bilayer and water to provide a natural environ- ment to the protein and to be able to investigate transmembrane diffusion of molecules. Conceptual and methodological issues of simulating membrane proteins have been reviewed in refs. 116 and 117. The size of the system and the computational demand are usually prohibitive factors in simulation studies of membrane proteins, but NAMD’s ability to exploit efficiently hundreds of processors makes it possible to tackle such prob- lems with full atomic detail in the most faithful way possible. A side view of an AQP model embedded in a lipid bilayer is shown in Figure 11. AQPs are passive membrane channels that increase (over that of pure membranes) the rate of water ex- change between the cell and its environment in a highly selec- tive manner. Some members of the family have dual functions in cellular metabolism, as they also allow other small mole- cules, such as glycerol, to pass. Charged species, including protons, however, are excluded. Simulation of water permeation through AQPs and the study of selectivity mechanisms employed by the channel require simula- tions of large molecular aggregates on the order of several tens of nanoseconds.58,114,115,118 NAMD proved to be very efficient in simulating such systems at ambient pressure and physiological temperature with full account of electrostatic forces. These condi- Figure 9. NAMD performance on modern platforms. The cost in CPU seconds of a single time step for a representative system of 92,000 atoms with a 12 Å short-range cutoff and full electrostatics evaluated via PME every four steps is plotted on the vertical axis, while the wait for a 1-ns simulation (one million time steps of 1 fs each) may be read relative to the broken diagonal lines. Perfect parallel scaling is a horizontal line. The older Pittsburgh Lemieux Alpha cluster has lower serial performance but scales very well. The HPCx IBM POWER 4 cluster at Edinburgh scales similarly, with comparable serial performance to the Xeon, Opteron, and Apple Xserve G5 clusters at UIUC. The two NCSA Itanium platforms have excellent serial performance but lose efficiency beyond 128 processors. 1796 Phillips et al. • Vol. 26, No. 16 • Journal of Computational Chemistry tions are critical for a reliable description of the molecular system at hand. NAMD provides a variety of pressure control schemes that are effective and useful in the initial setup and equilibration phases of simulations. For example, NAMD allows exclusion of atoms from pressure calculation, which enables one to fix the protein during the initial simulation of its environment under constant-pressure conditions. Multi-nanosecond equilibrium sim- ulations performed by NAMD successfully described diffusive permeation of water through the channel, in close agreement with the natural time scale (nanoseconds) of the event.114,118 A key finding of the simulations was a unique configuration of water enforced by the channel during the permeation. This configuration was found to be an effective, novel selection mechanism that prevents proton transfer in these channels.114,118 To investigate the permeation of larger substrates, that is, glycerol and longer linear sugar molecules, we took advantage of the SMD and IMD meth- odologies that are easily accessible through NAMD. SMD simu- lations of glycerol permeation through GlpF allowed us to recon- struct the potential of mean force of the process.62 IMD simulations that permit one to rapidly sample various configura- tions of a complex between a macromolecule and small molecules were applied to study how sugar molecules interact with the narrowest part of the channel, the selectivity filter, and how these interactions furnish the stereoselectivity of the channel.53 The Tcl scripting capability of NAMD and its atom selection algorithms allowed us to apply external forces of different mag- nitude and direction to dynamically changing groups of water molecules, and thereby generate hydrostatic pressure gradients across the membrane.58,115,119 Applying this method, and taking advantage of the speed of the simulations provided by NAMD and its high performance on various platforms (Fig. 9), we were able to calculate the water permeation rate through the channel using several simulations performed at various hydrostatic pressure gra- dients in either direction, and determine the osmotic permeability of AQPs, a property that can be directly compared to experimen- tally measured values for these channels.58,115,119 In these simu- lations external forces were applied to a large group of molecules (water molecules in the bulk) whose number and positions are constantly changing as the simulations proceed. The modularity of the NAMD code allowed us to easily add the force application part of the calculations into the code, thus avoiding any compromise in the performance and scalability of the program. A very important advantage of NAMD in this respect is that the user does not need to know parallel programming to implement new features and add modules to the program. Multiscale lac Repressor-DNA Complex Our third example for the use of NAMD concerns a protein-DNA complex involving the lac repressor (LacI) that is a paradigm for gene control. We choose this example because it leads to a simu- lation of very large size (314,000 atoms) that also requires great flexibility in the simulation protocol in combining all-atom mo- lecular dynamics with continuum mechanics. LacI regulates the function of the lac operon, a set of genes responsible for lactose digestion in Escherichia coli; when lactose is not present as a metabolite in the environment, LacI inhibits the expression of these genes by binding to two nearby sites at the beginning of the operon and folding the intervening DNA into a loop. The detailed mechanics of the LacI-DNA interaction remain unknown even though the crystallographic structure of LacI bound to its cognate DNA segments is available120,121 along with exten- Figure 10. Pulling ubiquitin and tetra-ubiquitin. (A)-(D) mono-ubiquitin is shown at various stages of a constant velocity SMD simulation with the N- and C-termini highlighted in green. The different structures are seen to unfold at different moments in the simulation. (A )-(C ) Tetra-ubiquitin is shown being pulled with constant force. Each color represents a different monomer and the transparent portion is the surface of the protein with the first and fourth segments not completely shown. Again, the different subunits can be seen to come apart at different times. [Color figure can be viewed in the online issue, which is available at www.interscience.wiley.com.] Scalable Molecular Dynamics with NAMD 1797 sive biochemical and genetic data.122,123 In particular, the structure in ref. 120 does not include the DNA loop connecting the two bound DNA segments nor does it provide information on how LacI wrestles the DNA, forcing it to maintain a loop form. A key obstacle to the simulation of the LacI-DNA dynamics is that LacI alone is a very large protein that results, together with a water bath, in a system of 300,000 atoms; including the DNA loop would increase the size to 700,000 atoms, thus incurring an even greater computational cost. To overcome this difficulty one can follow a multiscale strategy and describe the DNA loop mathe- matically employing the Kirchhoff theory of continuum elastic rods as developed in ref. 124, coupling the calculation to an all-atom MD representation of LacI. The method has been outlined in ref. 86 and applied in ref. 6. The LacI-DNA structure con- structed in ref. 125 and employed in ref. 6 is available in the Protein Data Bank (1Z04).126 In the simulation, illustrated in Figure 12, LacI was solvated in a box of TIP3P water and 100 mM NaCl and simulated under NPT ensemble conditions with periodic boundary conditions and PME electrostatics for an overall simu- lation time of 40 ns using a 1-fs time step. At 10-ps intervals, position and orientation of the LacI head groups were communi- cated to the mathematical description of the DNA loop, which determined the appropriate new loop geometry and communicated back the forces and torques with which the DNA resisted the protein’s action; forces and torques were included in the MD simulations much like forces are incorporated in SMD runs. Sim- ulation results are shown in Figure 12. One can recognize the large scale motion of the DNA loop over the course of the simulation. A key finding resulting from the simulation is an extreme flexibility of the two LacI head groups absorbing most of the strain arising from the DNA loop. The simulations reported in ref. 6 also provided an insightful new interpretation of prior FRET data.127 NAMD proved to be most valuable due to its ability to effi- ciently simulate very large systems. In the present case NAMD ran a 314,000 atom system on 256 Itanium processors of the NCSA TeraGrid system with an average production speed of 2.5 ns per day. The multiscale strategy linking MD with a continuum me- chanics DNA model was implemented through the Tcl scripting interface of NAMD. This feature permitted us not only to obtain information from the simulation, for example, head group geom- etries, on the fly, but also to control from the NAMD side the calculations for the DNA model, eventually retrieving the forces applied in the LacI simulation. For the sake of clarity we empha- size that the entire multiscale simulation, including the continuum Figure 11. Side view of a simulated aquaporin tetramer in the cell membrane. The E. coli water/glycerol channel GlpF is embedded in a patch of POPE lipid bilayer and fully hydrated by water on both sides. Lipid head groups are shown in CPK and the hydrophobic tail region is drawn using licorice represen- tation. The four AQP monomers, each forming an independent water pore, are shown in different colors. The single file of water formed inside the pores is shown in one of the monomers. The characteristic conformational inversion of water at the center of the channel that contributes to the barrier against proton transfer is discernible. 1798 Phillips et al. • Vol. 26, No. 16 • Journal of Computational Chemistry mechanics description of DNA, was controlled from the NAMD program and its Tcl scripting feature. Because NAMD readily permits the definition of all types of external forces, the imple- mentation of the multiscale strategy was straightforward. We note here also that the ability of NAMD to write and read any type of files during its execution proved to be invaluable for the ultimate analysis of the multiscale simulation data. Conclusion We conclude our overview of NAMD with a list of the program’s key features in Table 1. Surveys indicate consistently that users greatly appreciate that NAMD is distributed free of charge. NAMD is well known for its performance on large parallel com- puters, for which it has received a Gordon Bell award,106 but the program is actually used on many platforms, including laptops. This versatility is a great benefit for initiating and testing modeling projects. NAMD permits a novice to carry out standard simulations of most types readily, but NAMD also supports more advanced uses. The most essential feature in this regard is NAMD’s Tcl scripting capability, which has been used to implement, for exam- ple, replica exchange dynamics, adaptive biasing forces, and ad- vanced phase space sampling protocols. NAMD is complemented by the molecular graphics program VMD to offer a complete modeling environment. In addition to its other functions, VMD provides tools tailored to NAMD. This is a natural development because not only do the programs share a large common user base, but they are also being developed to- gether and use the same scripting language. VMD provides assis- tance at all stages of a simulation including preparing the initial system, monitoring the progress of the calculation, and analyzing the results. Initiating a modeling project is simplified and acceler- ated through convenient features that aid in the typical tasks of building a model, for example, solvating a protein or placing it in a lipid bilayer, and providing the needed protein structure files. VMD’s AutoIMD feature allows one to modify and/or interact with the system “on the fly” with both immediate visual and mechanical feedback, thus providing an intuitive method of ex- ploring the system in ways not available through scripting alone. Finally, VMD provides a unique analysis environment for the results of NAMD simulations. For example, one can use VMD for routine calculations such as protein stability or intra- and intermo- lecular distances, as well as advanced ones such as electrostatic potential maps. Furthermore, the analysis capabilities of VMD may be extended through Tcl scripting, allowing the user to design any project-specific tools needed. Clearly, although NAMD and VMD function effectively as stand-alone programs, their highest purpose is achieved when used together. NAMD users also benefit from the program BioCoRE,128 a Web-based collaborative research environment that is linked to both NAMD and VMD. For example, BioCoRE greatly simplifies submission of NAMD simulations to supercomputer centers and local machines alike. BioCoRE also provides a graphical user interface to prepare simulation input files. The authors’ Urbana group focuses much effort on training researchers in the use of NAMD through a series of tutorials available on the Web. The NAMD Web site is a continuously updated source of information for novice and advanced users (http://www.ks.uiuc.edu/Research/namd/). Figure 12. Snapshots taken over the course of the LacI-DNA complex multiscale simulation: (a) The evolution of the structure of the DNA loop; (b) The structure of LacI remains unchanged, with the exception of the rotation of the head groups, which allows the DNA loop to adopt a more relaxed configuration. Scalable Molecular Dynamics with NAMD 1799 Acknowledgments The NAMD source code has been written, revised, and extended by many talented individuals, including (in alphabetical order) Milind Bhandarkar, Robert Brunner, Christophe Chipot, Andrew Dalke, Sur- jit Dixit, Paul Grayson, Justin Gullingsrud, Attila Gursoy, David Hardy, Jérôme Hénin, Bill Humphrey, David Hurwitz, Neal Krawetz, Sameer Kumar, David Kunzman, Che Wai Lee, Mark Nelson, James Phillips, Aritomo Shinozaki, Gengbin Zheng, Fangqiang Zhu, and most certainly others for whose omission from this list we apologize. The authors would also like to acknowledge Fatemeh Khalili-Araghi and Marcos Sotomayor for preparing the ubiquitin tetramer and ubiquitin simulations. Free energy calculation development with C. Chipot was aided by a CNRS/UIUC collaborative grant. The authors thank Greg Farber of the NIH National Center for Research Re- sources for many years of assistance. References 1. Kalé, L.; Skeel, R.; Bhandarkar, M.; Brunner, R.; Gursoy, A.; Krawetz, N.; Phillips, J.; Shinozaki, A.; Varadarajan, K.; Schulten, K. J Comp Phys 1999, 151, 283. 2. Humphrey, W.; Dalke, A.; Schulten, K. J Mol Graphics 1996, 14, 33. 3. Nelson, M.; Humphrey, W.; Gursoy, A.; Dalke, A.; Kalé, L.; Skeel, R.; Schulten, K.; Kufrin, R. Comput Phys Commun 1995, 91, 111. 4. Nelson, M.; Humphrey, W.; Gursoy, A.; Dalke, A.; Kalé, L.; Skeel, R. D.; Schulten, K. Int J Supercomp Appl High Perform Comp 1996, 10, 251. 5. Kosztin, D.; Bishop, T. C.; Schulten, K. Biophys J 1997, 73, 557. 6. Villa, E.; Balaeff, A.; Schulten, K. Proc Natl Acad Sci USA 2005, 102, 6783. 7. MacKerell, A. D., Jr.; Bashford, D.; Bellott, M.; Dunbrack, R. L., Jr.; Evanseck, J.; Field, M. J.; Fischer, S.; Gao, J.; Guo, H.; Ha, S.; Joseph, D.; Kuchnir, L.; Kuczera, K.; Lau, F. T. K.; Mattos, C.; Michnick, S.; Ngo, T.; Nguyen, D. T.; Prodhom, B.; Reiher, I. W. E.; Roux, B.; Schlenkrich, M.; Smith, J.; Stote, R.; Straub, J.; Watanabe, M.; Wiorkiewicz-Kuczera, J.; Yin, D.; Karplus, M. J Phys Chem B 1998, 102, 3586. 8. Weiner, S. P.; Kollman, P. A.; Case, D. A.; Singh, U. C.; Ghio, C.; Alagona, G.; Profeta, J.; Weiner, P. J Am Chem Soc 1984, 106, 765. 9. Berendsen, H. J. C.; van der Spoel., D.; van Drunen, R. Comput Phys Commun 1995, 91, 43. 10. Weiner, P. K.; Kollman, P. A. J Comp Chem 1981, 2, 287. 11. Ewald, P. Ann Phys 1921, 64, 253. 12. de Leeuw, S. W.; Perram, J. W.; Smith, E. R. Proc R Soc Lond A 1980, 373, 27. 13. Grubmüller, H.; Heller, H.; Windemuth, A.; Schulten, K. Mol Simu- lat 1991, 6, 121. 14. Loncharich, R. J.; Brooks, B. R. Proteins Struct Funct Genet 1989, 6, 32. 15. Feller, S. E.; Pastor, R. W.; Rojnuckarin, A.; Bogusz, S.; Brooks, B. R. J Phys Chem 1996, 100, 17011. 16. Bergdorf, M.; Peter, C.; Hünenberger, P. H. J Chem Phys 2003, 119, 9129. 17. Kastenholz, M. A.; Hünenberger, P. H. J Phys Chem B 2004, 108, 774. 18. Weber, W.; Hünenberger, P. H.; McCammon, J. A. J Phys Chem B 2000, 104, 3668. 19. Perram, J. W.; Petersen, H. G.; de Leeuw, S. W. Mol Phys 1988, 65, 875. 20. Darden, T. A.; York, D. M.; Pedersen, L. G. J Chem Phys 1993, 98, 10089. 21. Essmann, U.; Perera, L.; Berkowitz, M. L.; Darden, T.; Lee, H.; Pederson, L. J Chem Phys 1995, 103, 8577. 22. Hairer, E.; Lubich, C.; Wanner, G. Geometric Numerical Integration, vol. 31 in Springer Series in Computational Mathematics; Springer- Verlag: Berlin, 2002. 23. Aksimentiev, A.; Schulten, K. Biophys J 2005, 88, 3745. 24. Hockney, R. W.; Eastwood, J. W. Computer Simulation Using Par- ticles; McGraw-Hall: New York, 1981. 25. Skeel, R. D.; Tezcan, I.; Hardy, D. J. J Comput Chem 2002, 23, 673. 26. Brandt, A.; Lubrecht, A. A. J Comput Phys 1990, 90, 348. 27. Arnol’d, V. I. Mathematical Methods of Classical Mechanics; Springer-Verlag: New York, 1989, 2nd ed. 28. Reich, S. SIAM J Numer Anal 1999, 36, 1549. 29. Harvey, S. C. Proteins Struct Funct Genet 1989, 5, 78. 30. Dahlquist, G.; Björck, Å. Numerical Methods; Prentice Hall: Engle- wood Cliffs, NJ, 1974. 31. Frenkel, D.; Smit, B. Understanding Molecular Simulation from Algorithms to Applications; Academic Press: San Diego, CA, 2002. 32. Gear, C. W. Numerical Initial Value Problems in Ordinary Differen- tial Equations; Prentice-Hall: Englewood Cliffs, NJ, 1971. 33. Tuckerman, M.; Berne, B. J.; Martyna, G. J. J Chem Phys 1992, 97, 1990. 34. Grubmüller, H. Master’s thesis, Physik-Dept. der Tech. Univ., Mu- nich, Germany, 1989. Table 1. Key Features of NAMD. Ease of use Free to download and use. Precompiled binaries provided for 12 popular platforms. Installed at major NSF supercomputer sites. Portable to virtually any desktop, cluster, or other parallel computer. C source code and CVS access for modification. Molecule building Reads X-PLOR, CHARMM, AMBER, and GROMACS input files. Psfgen tool generates structures and coordinate files for CHARMM force field. Efficient conjugate gradient minimization. Fixed atoms and harmonic restraints. Thermal equilibration via periodic rescaling, reinitialization, or Langevin dynamics. Basic simulation Constant temperature via rescaling, coupling, or Langevin dynamics. Constant pressure via Berendsen or Langevin-Hoover methods. Particle-mesh Ewald full electrostatics for periodic systems. Symplectic multiple time step integration. Rigid water molecules and bonds to hydrogen atoms. Advanced simulation Alchemical and conformational free energy calculations. Automated pair-interaction calculations. Automated pressure profile calculations. Locally enhanced sampling via multiple images. Tcl based scripting and steering forces. Interactive visual steering interface to VMD. 1800 Phillips et al. • Vol. 26, No. 16 • Journal of Computational Chemistry 35. Bishop, T. C.; Skeel, R. D.; Schulten, K. J Comp Chem 1997, 18, 1785. 36. Ma, Q.; Izaguirre, J. A.; Skeel, R. D. SIAM J Sci Comput 2003, 24, 1951. 37. Kubo, R.; Toda, M.; Hashitsume, N. Statistical Physics II: Nonequi- librium Statistical Mechanics; Springer: New York, 1991, 2nd ed. 38. Brünger, A.; Brooks, C. B.; Karplus, M. Chem Phys Lett 1984, 105, 495. 39. Mishra, B.; Schlick, T. J Chem Phys 1996, 105, 299. 40. Wang, W.; Skeel, R. D. Mol Phys 2003, 101, 2149. 41. Park, S.; Schulten, K. J Chem Phys 2004, 120, 5946. 42. Tuckerman, M.; Liu, Y.; Ciccotti, G.; Martyna, G. J. J Chem Phys 2001, 115, 1678. 43. Bhandarkar, M.; Brunner, R.; Chipot, C.; Dalke, A.; Dixit, S.; Gray- son, P.; Gullingsrud, J.; Gursoy, A.; Hardy, D.; Humphrey, W.; Hurwitz, D.; Krawetz, N.; Nelson, M.; Phillips, J.; Shinozaki, A.; Zheng, G.; Zhu, F. NAMD user’s guide version 2.5. http://www. ks.uiuc.edu/Research/namd/2.5/ug/ug.html. 44. Feller, S. E.; Zhang, Y.; Pastor, R. W.; Brooks, B. R. J Chem Phys 1995, 103, 4613. 45. Hoover, W. G. Phys Rev A 1985, 31, 1695. 46. Hoover, W. G. Phys Rev A 1986, 34, 2499. 47. Hoover, W. G. Computational Statistical Mechanics; Elsevier: Am- sterdam, 1991. 48. Quigley, D.; Probert, M. I. J. J Chem Phys 2004, 120, 11432. 49. Izrailev, S.; Stepaniants, S.; Isralewitz, B.; Kosztin, D.; Lu, H.; Molnar, F.; Wriggers, W.; Schulten, K. Computational Molecular Dynamics: Challenges, Methods, Ideas, vol. 4 in Lecture Notes in Computational Science and Engineering; Deuflhard, P.; Hermans, J.; Leimkuhler, B.; Mark, A. E.; Reich, S.; Skeel, R. D., Eds.; Springer- Verlag: Berlin, 1998, p. 39. 50. Isralewitz, B.; Baudry, J.; Gullingsrud, J.; Kosztin, D.; Schulten, K. J Mol Graph Model 2001, 19, 13; Also in Protein Flexibility and Folding, Kuhn, L. A.; Thorpe, M. F., Eds.; Elsevier: New York. 51. Isralewitz, B.; Gao, M.; Schulten, K. Curr Opin Struct Biol 2001, 11, 224. 52. Stone, J.; Gullingsrud, J.; Grayson, P.; Schulten, K. In 2001 ACM Symposium on Interactive 3D Graphics; Hughes, J. F.; Séquin, C. H., Eds.; ACM SIGGRAPH: New York, 2001, p. 191. 53. Grayson, P.; Tajkhorshid, E.; Schulten, K. Biophys J 2003, 85, 36. 54. Gullingsrud, J.; Braun, R.; Schulten, K. J Comp Phys 1999, 151, 190. 55. Sotomayor, M.; Corey, D. P.; Schulten, K. Structure 2005, 13, 669. 56. Craig, D.; Gao, M.; Schulten, K.; Vogel, V. Structure 2004, 12, 2049. 57. Aksimentiev, A.; Balabin, I. A.; Fillingame, R. H.; Schulten, K. Biophys J 2004, 86, 1332. 58. Zhu, F.; Tajkhorshid, E.; Schulten, K. Biophys J 2004, 86, 50. 59. Gullingsrud, J.; Schulten, K. Biophys J 2003, 85, 2087. 60. Bayas, M. V.; Schulten, K.; Leckband, D. Biophys J 2003, 84, 2223. 61. Gao, M.; Wilmanns, M.; Schulten, K. Biophys J 2002, 83, 3435. 62. Jensen, M. Ø.; Park, S.; Tajkhorshid, E.; Schulten, K. Proc Natl Acad Sci USA 2002, 99, 6731. 63. Kosztin, D.; Izrailev, S.; Schulten, K. Biophys J 1999, 76, 188. 64. Lu, H.; Schulten, K. Proteins Struct Funct Genet 1999, 35, 453. 65. Stepaniants, S.; Izrailev, S.; Schulten, K. J Mol Mod 1997, 3, 473. 66. Izrailev, S.; Crofts, A. R.; Berry, E. A.; Schulten, K. Biophys J 1999, 77, 1753. 67. Izrailev, S.; Stepaniants, S.; Balsera, M.; Oono, Y.; Schulten, K. Biophys J 1997, 72, 1568. 68. Isralewitz, B.; Izrailev, S.; Schulten, K. Biophys J 1997, 73, 2972. 69. Grubmüller, H.; Heymann, B.; Tavan, P. Science 1996, 271, 997. 70. Kosztin, D.; Izrailev, S.; Schulten, K. Biophys J 1999, 76, 188. 71. Torrie, G. M.; Valleau, J. P. J Comput Phys 1977, 23, 187. 72. Roux, B. Comput Phys Commun 1995, 91, 275. 73. Schulten, K.; Schulten, Z.; Szabo, A. J Chem Phys 1981, 74, 4426. 74. Lu, H.; Krammer, A.; Isralewitz, B.; Vogel, V.; Schulten, K. In Elastic Filaments of the Cell; Pollack, J.; Granzier, H., Eds.; Kluwer Academic/Plenum Publishers: New York, 2000, p. 143, chap 1. 75. Lu, H.; Schulten, K. Biophys J 2000, 79, 51. 76. Jarzynski, C. Phys Rev Lett 1997, 78, 2690. 77. Jarzynski, C. Phys Rev E 1997, 56, 5018. 78. Liphardt, J.; Dumont, S.; Smith, S.; Tinoco, I.; Bustamante, C. Science 2002, 296, 1832. 79. Hummer, G.; Rasaiah, J. C.; Noworyta, J. P. Nature 2001, 414, 188. 80. Marcinkiewicz, J. Math Z 1939, 44, 612. 81. Park, S.; Khalili-Araghi, F.; Tajkhorshid, E.; Schulten, K. J Chem Phys 2003, 119, 3559. 82. Kumar, S.; Bouzida, D.; Swendsen, R. H.; Kollman, P. A.; Rosen- berg, J. M. J Comp Chem 1992, 13, 1011. 83. Boczko, E. M.; Brooks, C. L., III. J Phys Chem 1993, 97, 4509. 84. Shea, J. E.; Brooks, C. L., III. Annu Rev Phys Chem 2001, 52, 499. 85. Aksimentiev, A.; Balabin, I. A.; Fillingame, R. H.; Schulten, K. Biophys J 2004, 86, 1332. 86. Villa, E.; Balaeff, A.; Mahadevan, L.; Schulten, K. Multiscale Model Simul 2004, 2, 527. 87. Taylor, R. M., II. VRPN: Virtual Reality Peripheral Network, 1998. http://www.cs.unc.edu/Research/vrpn. 88. Dachille, F.; Qin, H.; Kaufman, A.; El-Sana, J. In 1999 Symposium on Interactive 3D Graphics, 1999, p. 103. 89. Cohen, J.; Grayson, P. AutoIMD User’s Guide, 2003. http://www. ks.uiuc.edu/Research/vmd/plugins/autoimd. 90. den Otter, W. K.; Briels, W. J. J Chem Phys 1998, 109, 4139. 91. Zwanzig, R. W. J Chem Phys 1954, 22, 1420. 92. Darve, E.; Pohorille, A. J Chem Phys 2001, 115, 9169. 93. Hénin, J.; Chipot, C. J Chem Phys 2004, 121, 2904. 94. Gao, J.; Kuczera, K.; Tidor, B.; Karplus, M. Science 1989, 244, 1069. 95. Fleming, K. G.; Ackerman, A. L.; Engelman, D. M. J Mol Biol 1997, 272, 266. 96. Hénin, J.; Pohorille, A.; Chipot, C. J Am Chem Soc 2005, 127, 8478. 97. Allen, M. P.; Tildesley, D. J. Computer Simulation of Liquids; Oxford University Press: New York, 1987. 98. Kalé, L. V.; Krishnan, S. Parallel Programming using C; Wilson, G. V.; Lu, P., Eds.; MIT Press: Cambridge, MA, 1996, p. 175. 99. Kalé, L. V. In LACSI 2002; Albuquerque, 2002. 100. Brunner, R. K.; Kalé, L. V. In Parallel and Distributed Computing for Symbolic and Irregular Applications; World Scientific Publishing: Singapore, 2000, p. 167. 101. Phillips, J. C.; Brunner, R.; Shinozaki, A.; Bhandarkar, M.; Krawetz, N.; Kalé, L.; Skeel, R. D.; Schulten, K. In Computational Molecular Dynamics: Challenges, Methods, Ideas; vol. 4 in Lecture Notes in Computational Science and Engineering; Deuflhard, P.; Hermans, J.; Leimkuhler, B.; Mark, A.; Reich, S.; Skeel, R. D., Eds.; Springer- Verlag: Berlin, 1998, p. 472. 102. Ousterhout, J. Tcl and the Tk Toolkit; Addison-Wesley: Reading, MA; 1994. 103. Sugita, Y.; Okamoto, Y. Chem Phys Lett 1999, 314, 141. 104. Kale, L. V.; Zheng, G.; Lee, C. W.; Kumar, S. In Future Generation Computer Systems Special Issue on: Large-Scale System Perfor- mance Modeling and Analysis; in press. 105. Petrini, F.; Kerbyson, D. J.; Pakin, S. In Proceedings of the IEEE/ACM SC2003 Conference, Technical Paper 301; IEEE Press: Piscataway, NJ, 2003. 106. Phillips, J.; Zheng, G.; Kumar, S.; Kale, L. In Proceedings of the IEEE/ACM SC2002 Conference, Technical Paper 277; IEEE Press: Piscataway, NJ, 2002. 107. Hershko, A.; Heller, H.; Elias, S.; Ciechanover, A. J Biol Chem 1983, 258, 8206. 108. Haas, A. L.; Rose, I. A. J Biol Chem 1982, 257, 10329. Scalable Molecular Dynamics with NAMD 1801 109. Hershko, A.; Ciechanover, A. Annu Rev Biochem 1998, 67, 425. 110. Carrion-Vazquez, M.; Li, H.; Lu, H.; Marszalek, P. E.; Oberhauser, A. F.; Fernandez, J. M. Nat Struct Biol 2003, 10, 738. 111. Fernandez, J. M.; Li, H. Science 2004, 303, 1674. 112. Zhu, F.; Tajkhorshid, E.; Schulten, K. FEBS Lett 2001, 504, 212. 113. Jensen, M. Ø; Tajkhorshid, E.; Schulten, K. Structure 2001, 9, 1083. 114. Tajkhorshid, E.; Nollert, P.; Jensen, M. Ø; Miercke, L. J. W.; O’Connell, J.; Stroud, R. M.; Schulten, K. Science 2002, 296, 525. 115. Zhu, F.; Tajkhorshid, E.; Schulten, K. Biophys J 2002, 83, 154. 116. Roux, B.; Schulten, K. Structure 2004, 12, 1343. 117. Tajkhorshid, E.; Cohen, J.; Aksimentiev, A.; Sotomayor, M.; Schulten, K. In Bacterial Ion Channels and Their Eukaryotic Homologues; Mar- tinac, B.; Kubalski, A., Eds.; ASM Press: Washington, DC, 2005, p. 153. 118. Jensen, M. Ø; Tajkhorshid, E.; Schulten, K. Biophys J 2003, 85, 2884. 119. Zhu, F.; Tajkhorshid, E.; Schulten, K. Phys Rev Lett 2004, 93, 224501. 120. Lewis, M.; Chang, G.; Horton, N. C.; Kercher, M. A.; Pace, H. C.; Schumacher, M. A.; Brennan, R. G.; Lu, P. Science 1996, 271, 1247. 121. Friedman, A. M.; Fischmann, T. O.; Steitz, T. A. Science 1995, 268, 1721. 122. Ptashne, M. A Genetic Switch; Cell Press & Blackwell Scientific Publications: Cambridge, MA, 1992, 2nd ed. 123. Müller-Hill, B. The lac Operon; Walter de Gruyter: New York, 1996. 124. Balaeff, A.; Koudella, C. R.; Mahadevan, L.; Schulten, K. Philos Trans R Soc Lond A 2004, 362, 1355. 125. Balaeff, A.; Mahadevan, L.; Schulten, K. Structure 2004, 12, 123. 126. The RCSB Protein Data Bank. http://www.rcsb.org/pdb. 127. Edelman, L. M.; Cheong, R.; Kahn, J. D. Biophys J 2003, 84, 1131. 128. Bhandarkar, M.; Budescu, G.; Humphrey, W.; Izaguirre, J. A.; Izrai- lev, S.; Kalé, L. V.; Kosztin, D.; Molnar, F.; Phillips, J. C.; Schulten, K. In Proceedings of the SCS International Conference on Web- Based Modeling and Simulation; Bruzzone, A. G.; Uchrmacher, A.; Page, E. H., Eds.; SCS International, San Francisco: San Francisco, CA, 1999, p. 242. 1802 Phillips et al. • Vol. 26, No. 16 • Journal of Computational Chemistry Copy with citationCopy as parenthetical citation