# How SOLVATE works

` SOLVATE` creates the solute/solvent/ion simulation system in a number of steps. For optimal use of ` SOLVATE`, knowledge of these steps is advantageous. As an example, the accompanying pictures show the generation of a solvent shell around a protein complex (immunoglobulin/lysozyme), 6Å thick, including ions. Starting from the coordinate- and structure-file of the complex (`glob.pdb` and ` glob.psf`), the shell was created with the command

` solvate -t 6.0 -n 8 -ion glob globsol`

yielding the file ` globsol.pdb`.

#### STEP 1: Read in solute

` SOLVATE` reads the atom coordinates and atom names of the solute (Figure) from the pdb-file specified on the command line. From the atom names ` SOLVATE` derives approximate van der Waals parameters (radii and interaction strengths ). If ions are to be placed (as in our case), ` SOLVATE` must know about the electrostatics of the solute, which it derives from the atomic partial charges read from the psf-file. If no pdb-file is given, or if the pdb-file contains no atoms, one »dummy-atom« (with zero radius) is created at the cartesian origin as the »solute«. By that means a »pure« spherical water droplet can be created.

#### STEP 2: Create minimal convex volume

* (a) Bounding sphere*

On the basis of the atomic positions of the solute the smallest convex volume containing the solute is computed and represented by a regular set of »grid points« with Å spacing. To that end, in a first step center and radius of the solute's »bounding sphere«, i.e., the smallest sphere containing the solute, are computed. This bounding sphere is then filled with grid points (see Figure; the solute seems not to completely fill the bounding sphere; however it does, since the sphere and the solute are three-dimensional objects).

* (b) Slicing*

From that spherical volume the minimal convex volume is subsequently constructed by »slicing away« parts of the spherical volume in many different directions by a »knife« (solid lines) that just touches the solute. To vary the flatness of the surface of the minimal convex volume, a »bended knife« (dashed lines) can be used by specifying a maximum boundary curvature radius. The grid points which survive the slicing procedure span the desired minimal convex volume. An ideal solute-adapted boundary would be given by a surface enclosing the minimal convex volume at a given constant distance **d**.

To a good approximation, such an ideal surface can be defined as an iso-surface of a density function ,

by requiring with suitably chosen . The figure shows a cut through this density function, , for our example solute.

This boundary certainly fulfills the geometric requirements, but its computational treatment is highly inefficient, since the number of exponentials to be computed in Eq. 1 (the number of grid points spanning the convex volume) is typically of the order .

#### STEP 3: Compute an approximate density function

Note, however, that the density function defined above is quite smooth, since the distance between grid points (Å) is much smaller than the width of the (univariate) gaussians used (typically Å). Therefore, can be approximated to sufficient accuracy by a sum of much fewer () * multivariate* gaussians,

where are the heights and are the centers of the gaussians. The matrices specify the shapes of the gaussians; their overall extension in space can be varied by a scale-factor **s**. Experience shows that usually very few (less than 10) gaussians are sufficient, so that the computational cost for the necessary distance computations in MD simulations becomes negligible. To find optimal heights, centers and shape matrices, ` SOLVATE` uses a recently proposed maximum likelihood density estimation method [8]. After having computed these parameters, they are written to the file

` gaussians.lis`.

The Figure shows a set of dots, the density of which obeys the density function **f**.

#### STEP 4: Adjust boundary distance from solute

` SOLVATE` uses a fixed isosurface level (where is the average height of the gaussians). Obviously, the distance of the solvent surface from the solute, as defined by , is not known at this point. To ensure that the smallest distance equals a given distance **d**,` SOLVATE` iteratively varies the scale-factor **s** (i.e., the widths of the gaussians) until the minimum distance between solute and solvent surface approaches the desired value.

All parameters necessary to define (,, and ) are now written to the file ` boundary.lis` for later use by an MD-program, e.g., by EGO.

#### STEP 5: Create solvent volume

In a similar way as in STEP 2, the boundary surface is filled with a number of grid points (see figure). For every grid point the minimum distance to the solute, , is determined and stored; the grid points are then sorted by increasing minimum distance, which will be useful to efficiently place the solute (water) molecules. Those grid points which are located very close to the boundary can be used to visualize that boundary and are therefore written to the file` surface_stat.lis` if desired.

#### STEP 6: Perform distance approximation statistics

As will be described in Sec. 6, the distance of a given point of the solvent volume to the boundary can be efficiently estimated from (shown in the figure, colour-coded) to a sufficient accuracy (Å). To check the accuracy of the distance computation, for each grid point the efficiently estimated distance is compared with the accurate distance, and, if the ` -s` option is set, the resulting error statistics is appended to the file ` surface_stat.lis`. In our example, this statistics reads

[MINIMUM INVALID DENSITY]

0.935194

[DISTANCE ERROR STATISTICS (ABS. ERR / DENSITY / DISTANCE)]

0.01 0.401408 3.245818

0.02 0.451980 4.120950

0.05 0.538697 5.512379

0.10 0.631438 6.651468

0.20 0.753683 8.271388

0.50 0.985918 10.718131

1.00 100000000000000000000.000000 62.000000

which means that all distances smaller than 3.245818 Å (for these, **f<**0.401408), are computed within an error of 0.01 Å; all distances smaller than 4.120950 Å (**f<**0.451980) within an error of 0.02 Å and so on. No error of 1.0 Å or larger occurred. Distance computations are valid for all locations within the boundary where **f** is below the minimum invalid density (0.935194).

#### STEP 7: Place water molecules

Using the sorted grid points , and starting at points closest to the solute, the solvent volume is filled with water molecules, one molecule after the other. In this process, for each grid point ` SOLVATE` checks whether the distances of to all solute atoms as well as to all water molecules already placed is larger or equal to the appropriate van der Waals distance. If not, the respective grid point is discarded; if yes, a water molecule is placed at location and, by steepest descent, subsequently moved to a nearby energetically favorable position (only van der Waals energies are considered here). By this procedure, water molecules close to the solute are placed first (drawn in blue in the figure), followed by water molecules further apart (the ones placed last are drawn in red).

#### STEP 8: Group water molecules

Water molecules closest to the solute are likely ones placed in »caves« * inside* the solute (such caves exist, e.g., inside proteins). To distinguish such »buried« water molecules (drawn as balls in the figure) from bulk water (small angles), all water molecules are grouped according to their connectivity. Typically a few dozen »groups« consisting of just one isolated molecule, of a pair or a triplet of molecules (depending on the size of the »cave«) will be identified as well as the bulk group containing all the other water molecules. The groups are consecutively numbered, starting with #1 for the bulk group.

Note that ` solvate` places buried water molecules only according to steric criteria, ** not** according to energetic criteria. If buried water molecules found by` solvate` are to be included within a subsequent MD-simulations, it has to be checked whether their free energy is low enough so that they are likely to really be there. A good estimate is provided by the program` Dowser` (1 2) or similar software.

#### STEP 9: Place ions(optional)

Sodium (light blue) and chloride ions (green) are placed in the solvent volume at isotonic (physiological) concentration (mol/l) obeying the Debye-Hückel distribution, which depends on the locations of charged atoms of the solute (red, blue): on average, each charged atom at the surface of the solute is surrounded by a »cloud« of socalled counter-ions. The size of this cloud is given by the Debye-Hückel length ,

where** e** is the elementary charge, is the dielectric constant, is the Boltzmann constant, and **T=300K** the temperature.

The density (**i=**Na,Cl) of an ion cloud caused by a solute atom with partial charge is a function of the distance **r** from the charged atom, approaches the isotonic concentration for large** r**, and is computed in linear approximation,

with and . Due to the linear approximation, may become negative, in which case it is set to zero. The total ionic density is the determined from the linear superposition of all the ion clouds around the charged solute atoms. In a first step, all ions (the number of which is determined from the charge density integral over the solvent volume) are placed at random according to that Debye-Hückel total ionic density. Since the Debye-Hückel approximation is a mean field description, at this point ion-ion correlations are not yet described. To find ionic positions which obey also these higher order correlations, and to avoid artifacts due to the linear approximation, the ions are subsequently subjected to a large number (2,000,000) of Monte-Carlo moves, where the Coulomb field of the solute, and now also the inter-ionic Coulomb field is considered.

The current version of ` SOLVATE` does not yet allow to use a salt concentration different from the isotonic concentration, neither is it possible to include ions other tan sodium and chloride or to set a temperature different from K.

**STEP 10: Place Hydrogens**

Since up to now, the water molecules have been treated as simple van der Waals spheres (one per molecule) representing only the oxygen positions, two hydrogen atoms per water molecule have to be added. These are oriented at random; a realistic short range order of these dipoles can be created within a short MD run (10 picoseconds is long enough).

#### STEP 11: Write pdb-file

Finally, a pdb-file is written, containing the solute, the positions of the water molecules, and, if present, the positions of the ions. In the pdb-file each group of water molecules is assigned a unique segment-identifier, starting with ` W100` for the innermost group, ` W101` for the second and so on. Bulk water is stored last. If present, ions are appended at the end of the pdb-file; sodium ions first (atom name ` NA`, »molecule« ` INA`, segment-identifier` NA`), then the chloride ions (atom name ` CL`, »molecule« ` ICL`, segment-identifier ` CL`)

Optionally, ` SOLVATE` generates an X-PLOR-script to create a (psf-) structure file for the final solute/solvent-system. Appropriate topology-files and parameter-files to describe the water molecules from the X-PLOR distribution (e.g., ` toph19.sol` and ` param19.sol`) are required to generate a psf-file.