The DNA conformational energy landscape: Sequence-dependent conformational equilibria of duplex DNA

The free energy surfaces of duplex dinucleotide steps were mapped in a principal conformational subspace derived from crystal structure data on DNA duplex oligomers. The three dimensional subspace, spanned by collective degrees of freedom representing linear combinations of the Cartesian coordinates of the backbone and sugar atoms of both strands accounted for 77% of the total variance of the observed structural distribution. The features of the subspace free energy surface correspond well to the distribution of observed structures exhibiting a clear separation of Aand B-family classes. The sequence dependence of the relative A / B-form conformational equilibria was derived from the corresponding subspace free energy surfaces at physiological conditions. A B-philicity scale representing the mole fraction of the BI-form vs the A-family for the 10 unique dinucleotide steps revealed three classes of sequences: highly B-philic (GC/GC & CG/CG), B-philic (AC/GT, AA/TT, AT/AT, CA/TG, AG/CT & GG/CC) and A-philic (GA/TC & TA/TA). The high propensity of the TA/TA step to adopt the A-form conformation is in accord with single crystal X-ray diffraction data and has biological signifi cance in view of the frequent presence of the TATA sequence motif in transcriptional promoter regions.


Introduction
There have been several attempts to explore the conformation and energetics of DNA, using a variety of both backbone torsion angles and helicoidal parameters as effective degrees of freedom [1][2][3][4][5][6]. Several approximations were made to overcome the problem of dealing with a large number of degrees of freedom. For example, in a study of the energetics of base stacking [7], the backbone was replaced with methyl groups at the C1´ atoms. Using this model, the values of basestep parameters were predicted accurately provided that the values of slide, shift and propeller were assigned to observed values. In a subsequent study [8], a C1´-C1´ virtual bond model was combined with two helical degrees of freedom, slide and shift, to compute the potential energy surface of an isolated dinucleotide step. In this model, the position of the low energy regions agreed well with the geometry of observed crystal structures, but for only some of the base steps. For other base steps, the lack of agreement of energy landscape with the observations was ascribed to the neglect of the effects of conformational coupling with neighbouring steps (context dependence).
Therefore, in many studies of DNA conformation, the backbone is considered as a passive element that delineates the boundaries of the dinucleotide step [9] such that it acts as no more than a constraint on the range of the conformational space accessible to the bases [10]. However, the growing evidence of the biological signifi cance of DNA conformational states other than the B-form [11,12] with regard to DNA packaging [13], transcription [14][15][16], spore UV resistance [17] and protein recognition [11,12,[18][19][20], signals the importance of backbone conformational fl exibility. We reaffi rmed the importance of backbone conformation to DNA structure by revealing that steric interactions within the backbone of the single strand are key determinants of the conformational preferences of duplex DNA [21].

Research Article
The DNA conformational energy landscape: Sequencedependent conformational equilibria of duplex DNA Moreover, ions are known to play an important role in the conformation of oligomeric DNA structures by shielding the phosphate charges and affecting water activity around DNA [22][23][24]. In this respect, the conformational equilibria of DNA are highly sensitive to the salt concentration, such that A-form DNA is found in solutions with 1 M salt concentrations and above, whereas at lower concentrations, B-DNA is usually prevalent [25,26]. This conformational change is not only dependent on the environment, but it is also strongly dependent on the sequence of the base pairs [27]. Given the importance of the different conformational states of DNA to its biological function, it is of interest to examine their characteristics at physiological conditions.
In this study, we extend the methodology established in our previous works [21,28]  The derived B-philicity scale is then compared to known experimental and theoretical trends.

Conformational descriptors
The Cartesian coordinates corresponding to the atoms comprising the backbone and sugar torsions of both strands were used to describe the conformational space of duplex dinucleotide steps. This atom subset is common for the ten unique dinucleotide steps [9]. For the subset of 42 atoms, a total of 126 Cartesian (X,Y,Z) coordinates is used to describe each dinucleotide step ( Figure 1).
We note that the use of a Cartesian coordinate representation in this work, rather than a torsion angle representation (which was used in our previous work [21]), accounts for both the inter-strand and intra-strand conformational variation and naturally incorporates both the backbone conformation and the inter-base pair orientation. Accounting for these degrees of freedom in a torsion angle space is diffi cult as it requires a large number of torsion angles along with a defi nition of the inter-strand distance. Using a mixture of backbone torsions and helical parameters is less convenient as it incorporates different units (Ǻ and degrees) into the basis set which necessitates standardization of variables [29,30] and complicates the methodology. Use of a Cartesian space representation circumvents the problems of periodicity in the torsion angle representation [31].

Data acquisition
Structural data were obtained from the Nucleic Acid Database [32] (http://ndbserver.rutgers.edu). Crystal structures of duplex A-DNA and B-DNA were selected which met the following criteria: high resolution (d min < 2.5 Å); containing no base-mismatches or chemical modifi cations and those that are not in complex with drugs or proteins. Z-DNA was not included due to the paucity of structural data meeting the selection criteria. As in other studies [33], we regard different crystalline environments for the same sequence as distinct observations. From the sample of structures meeting these criteria (Supplementary Material, Table 1), the conformational descriptors (described above) were assembled in an n × p data matrix (M) in a row wise fashion, where n is the number of dinucleotide steps in the sample (n=675) and p is the number of coordinates per step (p=126).

Data preparation
Analysing sets of different structures described by Cartesian coordinates requires establishing a common spatial reference frame, which was achieved by superposition onto a single template structure. The template structure used was the unweighted geometric mean structure: A distance matrix (D) for each row of the matrix M was generated such that D ij corresponds to the Euclidian distance between atom i and atom j within the same dinucleotide step. An average distance matrix (DD) was then calculated such that DD ij corresponds to the average distance between atom i and atom j calculated over all corresponding entries in the D matrices. The matrix DD was subjected to principal coordinate analysis [30] to provide a three dimensional embedding corresponding to the (Cartesian coordinates of the) template structure. It is noted that generating the template structure in this way has the advantage that it is reference frame independent [34]. All of the Cluster analysis: The scores along the fi rst three principal components were split into two classes following the NDB classifi cation of A-form and the B-form structures from which the data were derived. Each of these classes was subjected separately to hierarchical clustering using the Ward linkage method [37]. This resulted in 3 clusters for the A-form category and 5 clusters for the B-form category. The whole dataset was then subjected to non-hierarchical k-means clustering [38], seeded using the centroids of the 8 clusters obtained from the previous step. A Euclidean metric was used throughout [39].

Mapping the Potential Energy Surface (PES) (in vacuum):
The vacuum potential energy was calculated at each grid point in a manner very similar to that discussed in our previous work [21] apart from minor changes. For the sake of completeness and clarity we describe it again. The potential energy surface of the ten unique (duplex) dinucleotide steps within the PCS was mapped via systematic energy evaluation on a grid defi ned by discrete points along the fi rst three PCs. The subspace coordinates of a grid point correspond to a set of Cartesian coordinates, . Therefore, the coordinates x i, correspond to a structure reconstructed in the 3D subspace whose projections along higher principal components are set to zero. Thiulated in vacuum (salt concentration = 0.0 M, relative dielectric permitivity = 1.0) at 298 K. The relative dielectric permitivity of the solute was set to 1 consistent with the use of a non-polarizable force fi eld and a fi xed solute conformation [45]. The solute-solvent boundary was constructed from the solvent accessible surface using a solvent probe radius of 1.4 Å.
To improve the accuracy of the FDPB calculations the 'focusing' technique [46], was used. The FDBP calculations were designed as to immerse all the subspace structures in a fi nal grid of 0.4 Å spacing which extends 6 Å from the edge of the molecule in each direction (x, y and z). This was conducted as follows: Each subspace grid structure was reoriented to its principal axes; an initial grid which extends 12 Å from the edge of the molecule in each direction was then constructed with initial grid spacing of 0.8 Å. The atomic partial charges were distributed over the nearest eight grid points using trilinear interpolation and an initial solution of PB equation was obtained using this charge distribution. The grid spacing was then decreased by 0.066 Å and the distance from the edge of the solute in each direction (X axis, Y axis and Z axis) was decreased by 1 Å where the boundary potential was set using the known potential from the previous step. This process was repeated 6 times until a fi nal grid spacing of 0.4 Å was reached.

The Free Energy Surface (FES)
Calculation of the total solvation free energy can be visualized as a two step process; fi rst formation of an uncharged (non-polar) solvent cavity and second charging the solute in solution [47]. This divides the total free energy of solvation into two contributions; non-polar and electrostatic.
For charged systems like DNA the second of these contributions predominates. Also, the entropic contributions to the total free

Conformational preferences of dinucleotide steps (a Bphilicity scale)
The

The Free Energy Surface within the PCS
The FESs of all ten possible sequences for dinucleotide steps were computed in the PCS, however here we present that for a CA/TG step (Figure 2 left panel) as an example. Three low energy regions on the FES are readily apparent from visual examination. A quantitative partitioning of the FES [21] confi rmed the presence of three distinct energy valleys that were then analysed individually in terms of their structural and energetic characteristics.

Structural characterization of the low energy valleys of the FES
Profi les of the backbone structural parameters for both strands of the DS in each of the three energy valleys on the FES are shown in Figure 3. We adopted a binary-state classifi cation scheme for the DS embracing the fact that the conformational states of the individual strands need not be the same [49], e.g. BI.BII denotes a DS structure in which one of the strands is in the BI form while the other is in the BII form. (NB: we consider state X.Y as equivalent to Y.X). In accord with the character of the underlying collective coordinates (see above), we found that PC1 serves to discriminate the valleys corresponding to the A and B-family conformers. The geometric characteristics of both strands (with regard to sugar puckers and e/z torsion ranges) of the structures within Valley B and C (see

Relative energetics of the valleys on the FES
The location and relative energetics of the lowest local energy minima within the energy valleys for the ten unique dinucleotide steps [9] are given in Table 1 [50,51].  Table 2). It is notable that substates of both A and B families are not found in the same dinucleotide step (see Table 2 (59). However, these studies do not include

A-philic: TA/TA step
The TA/TA step is the only dinucleotide step which showed a pronounced A-philic character in our computed B-philicity scale (∆A BI-A = +0.73 kcal/mol). This is in good agreement with the results of single crystal X-ray diffraction studies on DNA double helices containing the TATA sequence (e.g. d(GGTATACC)) which were observed to adopt the A-form conformation [65,66]. Further, the preference for the TA/TA step to adopt the A-form may be of biological relevance given that the TATA sequence is a motif frequently associated with transcription promoter regions [67,68].

Conclusions
The conformational space of DNA dinucleotide steps was   [50,51].
Based on the subspace FES representation, we computed a B-philicty scale, which represents the propensity for a given dinucleotide step to be in the B-form vs. the A-family conformation at near physiological conditions. Variation in the B-philicity of the ten dinucleotide steps was observed.
The ten dinucleotide steps were, therefore, grouped into three This physical approach to deriving conformational equilibria in DNA may be contrasted with statistical approaches e.g. [55].
A detailed comparison of the results of the two approaches is ongoing. On our physical model further work is currently underway in a number of directions, including the importance of the system representation, both in terms of segment length and better approximations of the free energy surface.