The DNA conformational energy landscape: sequence-dependent conformational equilibria of duplex DNA

Karim M ElSawy; Leo SD Caves; Karim M ElSawy; Leo SD Caves

ISSN: 2994-4171

Open Journal of Cell and Protein Science

Research Article Open Access Peer-Reviewed

The DNA conformational energy landscape: sequence-dependent conformational equilibria of duplex DNA

Karim M ElSawy^1,2* and Leo SD Caves³

Author and article information

¹York Centre for Complex Systems Analysis (YCCSA), University of York, York, UK
²Department of Chemistry, College of Science, Qassim University, Saudi Arabia
³Independent Researcher and Associate of the York Cross-Disciplinary Centre for Systems, Analysis, University of York, United Kingdom

*Corresponding author: Karim M. ElSawy, York Centre for Complex Systems Analysis (YCCSA), University of York, UK, E-mail: km.elsawy@qu.edu.sa ; karim.elsawy@uniofyorkspace.net

DOI: 10.17352/ojcps.000002

ORCID: https://orcid.org/0000-0002-9356-6657

Received:10 February, 2020 | Accepted: 26 May, 2020 | Published: 27 May, 2020

Keywords: Nucleic acid conformation; Principal conformational subspace; Free energy surface; B-philicity scale

Cite this as

ElSawy KM, Caves LSD (2020) The DNA conformational energy landscape: sequence-dependent conformational equilibria of duplex DNA. Open J Cell Protein Sci 3(1): 001-010. DOI: 10.17352/ojcps.000002

Abstract

The free energy surfaces of duplex dinucleotide steps were mapped in a principal conformational subspace derived from crystal structure data on DNA duplex oligomers. The three dimensional subspace, spanned by collective degrees of freedom representing linear combinations of the Cartesian coordinates of the backbone and sugar atoms of both strands accounted for 77% of the total variance of the observed structural distribution. The features of the subspace free energy surface correspond well to the distribution of observed structures exhibiting a clear separation of A- and B-family classes. The sequence dependence of the relative A / B-form conformational equilibria was derived from the corresponding subspace free energy surfaces at physiological conditions. A B-philicity scale representing the mole fraction of the BI-form vs the A-family for the 10 unique dinucleotide steps revealed three classes of sequences: highly B-philic (GC/GC & CG/CG), B-philic (AC/GT, AA/TT, AT/AT, CA/TG, AG/CT & GG/CC) and A-philic (GA/TC & TA/TA). The high propensity of the TA/TA step to adopt the A-form conformation is in accord with single crystal X-ray diffraction data and has biological significance in view of the frequent presence of the TATA sequence motif in transcriptional promoter regions.

Main article text

Introduction

There have been several attempts to explore the conformation and energetics of DNA, using a variety of both backbone torsion angles and helicoidal parameters as effective degrees of freedom [1-6]. Several approximations were made to overcome the problem of dealing with a large number of degrees of freedom. For example, in a study of the energetics of base stacking [7], the backbone was replaced with methyl groups at the C1´ atoms. Using this model, the values of base-step parameters were predicted accurately provided that the values of slide, shift and propeller were assigned to observed values. In a subsequent study [8], a C1´- C1´ virtual bond model was combined with two helical degrees of freedom, slide and shift, to compute the potential energy surface of an isolated dinucleotide step. In this model, the position of the low energy regions agreed well with the geometry of observed crystal structures, but for only some of the base steps. For other base steps, the lack of agreement of energy landscape with the observations was ascribed to the neglect of the effects of conformational coupling with neighbouring steps (context dependence).

Therefore, in many studies of DNA conformation, the backbone is considered as a passive element that delineates the boundaries of the dinucleotide step [9] such that it acts as no more than a constraint on the range of the conformational space accessible to the bases [10]. However, the growing evidence of the biological significance of DNA conformational states other than the B-form [11,12] with regard to DNA packaging [13], transcription [14-16], spore UV resistance [17] and protein recognition [11,12,18-20], signals the importance of backbone conformational flexibility. We reaffirmed the importance of backbone conformation to DNA structure by revealing that steric interactions within the backbone of the single strand are key determinants of the conformational preferences of duplex DNA [21].

Moreover, ions are known to play an important role in the conformation of oligomeric DNA structures by shielding the phosphate charges and affecting water activity around DNA [22-24]. In this respect, the conformational equilibria of DNA are highly sensitive to the salt concentration, such that A‑form DNA is found in solutions with 1 M salt concentrations and above, whereas at lower concentrations, B-DNA is usually prevalent [25,26]. This conformational change is not only dependent on the environment, but it is also strongly dependent on the sequence of the base pairs [27]. Given the importance of the different conformational states of DNA to its biological function, it is of interest to examine their characteristics at physiological conditions.

In this study, we extend the methodology established in our previous works [21,28] to map and characterise the free energy surface (FES) of duplex Dinucleotide Steps (DS) within a Principal Conformational Subspace (PCS) derived from dinucleotide steps observed in duplex crystal structures. A key feature of this work (Focussing on the duplex FES) is the use of collective degrees of freedom based on linear combinations of the Cartesian coordinates of the backbone and sugar atoms in order to account for the inter-strand interactions which were absent from the single strand PES [21]. In doing so, a more sophisticated model of the electrostatic effects is employed at near physiological conditions i.e. in water with a salt concentration of 0.15 M and a temperature of 298 K.

The structural, energetic and thermodynamic characteristics of the duplex FES are discussed and compared in terms of the location and relative stability of different local energy minima. Validation of the fidelity of the FES is presented by considering the relationship of its features with experimentally observed crystal structure distributions within the PCS. The utility of the FES is illustrated via calculation of a B-philicity scale for the ten dinucleotide steps based on the corresponding free energy surfaces within the PCS at physiological conditions. The derived B-philicity scale is then compared to known experimental and theoretical trends.

Data and methods

Conformational descriptors

The Cartesian coordinates corresponding to the atoms comprising the backbone and sugar torsions of both strands were used to describe the conformational space of duplex dinucleotide steps. This atom subset is common for the ten unique dinucleotide steps [9]. For the subset of 42 atoms, a total of 126 Cartesian (X,Y,Z) coordinates is used to describe each dinucleotide step (Figure 1).

We note that the use of a Cartesian coordinate representation in this work, rather than a torsion angle representation (which was used in our previous work [21]), accounts for both the inter-strand and intra-strand conformational variation and naturally incorporates both the backbone conformation and the inter-base pair orientation. Accounting for these degrees of freedom in a torsion angle space is difficult as it requires a large number of torsion angles along with a definition of the inter-strand distance. Using a mixture of backbone torsions and helical parameters is less convenient as it incorporates different units (Ǻ and degrees) into the basis set which necessitates standardization of variables [29,30]and complicates the methodology. Use of a Cartesian space representation circumvents the problems of periodicity in the torsion angle representation [31].

Data acquisition

Structural data were obtained from the Nucleic Acid Database [32] (http://ndbserver.rutgers.edu). Crystal structures of duplex A-DNA and B‑DNA were selected which met the following criteria: high resolution (dmin< 2.5 Å); containing no base-mismatches or chemical modifications and those that are not in complex with drugs or proteins. Z‑DNA was not included due to the paucity of structural data meeting the selection criteria. As in other studies [33], we regard different crystalline environments for the same sequence as distinct observations. From the sample of structures meeting these criteria (Supplementary Material, Table 1), the conformational descriptors (described above) were assembled in an n × p data matrix (M) in a row wise fashion, where n is the number of dinucleotide steps in the sample (n=675) and p is the number of coordinates per step (p=126).

Data preparation

Analysing sets of different structures described by Cartesian coordinates requires establishing a common spatial reference frame, which was achieved by superposition onto a single template structure. The template structure used was the unweighted geometric mean structure: A distance matrix (D) for each row of the matrix M was generated such that Dij corresponds to the Euclidian distance between atom i and atom j within the same dinucleotide step. An average distance matrix (DD) was then calculated such that DDij corresponds to the average distance between atom i and atom j calculated over all corresponding entries in the D matrices. The matrix DD was subjected to principal coordinate analysis [30] to provide a three dimensional embedding corresponding to the (Cartesian coordinates of the) template structure. It is noted that generating the template structure in this way has the advantage that it is reference frame independent [34]. All of the dinucleotide structures (rows in the data matrix M) were then superposed on the template structure using the least squares algorithm of Kabsch [35,36]. Coordinate superposition was performed on all 42 atoms using equal weights. All subsequent analysis is performed on the superposed data matrix (R).

Multivariate analysis

Principal component analysis: Principal component analysis (PCA) was performed in order to exploit the intrinsic covariance structure of the data matrix to effect a dimensionality reduction [30]. The eigenvectors (principal components or PCs) can be considered as collective degrees of freedom that form an orthogonal basis set for describing the dinucleotide step (DS) conformational distribution via concerted Cartesian coordinate displacements from the mean structure. The corresponding eigenvalues represent the variance of the conformational distribution along each of the collective degrees of freedom. The extent to which a given observation lies along the kth principal component is given by the projection . These projections (or scores) may be used for visualising the distribution of observatiosns in the principal conformational subspace of the dinucleotide steps.

Cluster analysis: The scores along the first three principal components were split into two classes following the NDB classification of A‑form and the B-form structures from which the data were derived. Each of these classes was subjected separately to hierarchical clustering using the Ward linkage method [37]. This resulted in 3 clusters for the A-form category and 5 clusters for the B-form category. The whole dataset was then subjected to non-hierarchical k-means clustering [38], seeded using the centroids of the 8 clusters obtained from the previous step. A Euclidean metric was used throughout [39].

Empirical energy calculations

Mapping the Potential Energy Surface (PES) (in vacuum): The vacuum potential energy was calculated at each grid point in a manner very similar to that discussed in our previous work [21] apart from minor changes. For the sake of completeness and clarity we describe it again. The potential energy surface of the ten unique (duplex) dinucleotide steps within the PCS was mapped via systematic energy evaluation on a grid defined by discrete points along the first three PCs. The subspace coordinates of a grid point correspond to a set of Cartesian coordinates, $x_{i} = \bar{x} + \sum_{k = 1}^{k = 3} z_{i, k} a_{k}$ . Therefore, the coordinates xi, correspond to a structure reconstructed in the 3D subspace whose projections along higher principal components are set to zero. The potential energy was calculated using the AMBER force field [40], using the Cornell parameterisation [41], implemented in the CHARMM program [42]. No non-bonded interaction truncation was performed. A grid resolution of 0.5 Å in the (PC1, PC2, PC3) subspace was used over the range ‑12 to 12 Å (corresponding to a range of conventional RMSD of -1.85 to 1.85 Å) with respect to an origin corresponding to the mean structure. These search limits were chosen so as to encompass the range of scores spanned by the superposed data matrix (R). The position of the atoms corresponding to the PCA basis were restrained to maintain the desired PCS projection using a harmonic penalty function with a force constant of 50 kcal mol‑1 Å‑2 and the system was energy minimised using the steepest descent (200 steps) followed by 2000 steps using ABNR method of CHARMM [42]. Characterisation of the features of the FES and calculation of the thermodynamic equilibria between different forms was conducted via estimation of the local partition function of each energy valley as described previously [21].

Calculation of the electrostatic component of the solvation free energy (∆Aelec)

The electrostatic component of the solvation free energy was calculated by solving the linearized Poisson-Boltzmann (PB) equation for the structures corresponding to each of the subspace grid points after minimization using a finite difference (FDPB) algorithm [43] implemented in the PBEQ module [44] of CHARMM. The electrostatic component of the solvation free energy ∆Aelec was obtained from the difference between the electrostatic potential computed in bulk solvent (salt concentration = 0.15 M, relative dielectric permitivity = 80) and the electrostatic potential calculated in vacuum (salt concentration = 0.0 M, relative dielectric permitivity = 1.0) at 298 K. The relative dielectric permittivity of the solute was set to 1 consistent with the use of a non-polarizable force field and a fixed solute conformation [45]. The solute-solvent boundary was constructed from the solvent accessible surface using a solvent probe radius of 1.4 Å. To improve the accuracy of the FDPB calculations the ‘focusing’ technique [46] was used. The FDBP calculations were designed as to immerse all the subspace structures in a final grid of 0.4 Å spacing which extends 6 Å from the edge of the molecule in each direction (x, y and z). This was conducted as follows: Each subspace grid structure was reoriented to its principal axes; an initial grid which extends 12 Å from the edge of the molecule in each direction was then constructed with initial grid spacing of 0.8 Å. The atomic partial charges were distributed over the nearest eight grid points using trilinear interpolation and an initial solution of PB equation was obtained using this charge distribution. The grid spacing was then decreased by 0.066 Å and the distance from the edge of the solute in each direction (X axis, Y axis and Z axis) was decreased by 1 Å where the boundary potential was set using the known potential from the previous step. This process was repeated 6 times until a final grid spacing of 0.4 Å was reached.

The Free Energy Surface (FES)

Calculation of the total solvation free energy can be visualized as a two step process; first formation of an uncharged (non-polar) solvent cavity and second charging the solute in solution [47]. This divides the total free energy of solvation into two contributions; non-polar and electrostatic. For charged systems like DNA the second of these contributions predominates. Also, the entropic contributions to the total free energy resulting from the fluctuations of the free degrees of freedom of the dinucleotide steps at each grid point is expected to be minimal since we restrain most of the system’s degrees of freedom. Therefore, only the electrostatic component was considered in this work. The electrostatic solvation free energy component (∆Aelec) of each grid point was added to the vacuum potential energy surface in order to obtain a free energy surface (FES) which incorporates the salt and solvation effects at near physiological conditions.

Conformational preferences of dinucleotide steps (a B-philicity scale)

The B-philicity of each dinucleotide step was represented by the equilibrium concentration of the BI-form vs. the A-family at physiological conditions in terms of the mole fraction of the BI-form. The calculation of the mole fraction was based on estimation of the local partition functions within the energy valleys of the A and BI-forms (QA and QBI respectively) as described previously [21]. We note that our consideration of local energy valleys (rather than just energy minima) naturally incorporates the contribution of conformational entropy to their relative free energy differences. The BI-form mole fraction () can be expressed in terms of the relative free energy difference of the BI and A forms () as follows: $φ_{B I =} (\frac{[B I]}{[A] + [B I]}) = (\frac{1}{1 + (Q_{A} / Q_{B I})}) = (\frac{1}{1 + \exp (Δ A_{B I - A} / R T)})$ $ϕ_{B I =} (\frac{[B I]}{[A] + [B I]})$

Results and discussion

The principal conformational subspace (PCS)

Characterisation of the PCS: The eigenvalues resulting from diagonalization of the covariance matrix of the mean-centred superposed data matrix represent the amount of variance captured along each of the corresponding eigenvectors (or principal components) [30]. The relative contribution of the (sorted) eigenvalues to the total variance levels off after the third or fourth principal component and the distributions of scores along the fourth, fifth and six principal components are unimodal and therefore do not contribute significantly to additional clustering of the data (results not shown). The first principal component accounts for 54.4 % of the variance, the second accounts for 14. 7 % while the third accounts for almost 8 %, details about the geometric character of the principal components are provide in Supplementary Material. Therefore, the first three principal components, which collectively describe almost 77 % of the total variance, were used as the basis set, spanning the PCS of the sample of dinucleotide steps, in the subsequent mapping of the free energy surface. For the interpretation of the principal components (eigenvectors) in terms of concerted atomic displacements [48], see Supplementary Material.

The Free Energy Surface within the PCS

The FESs of all ten possible sequences for dinucleotide steps were computed in the PCS, however here we present that for a CA/TG step (Figure 2 left panel) as an example. Three low energy regions on the FES are readily apparent from visual examination. A quantitative partitioning of the FES [21] confirmed the presence of three distinct energy valleys that were then analysed individually in terms of their structural and energetic characteristics.

Structural characterization of the low energy valleys of the FES

Profiles of the backbone structural parameters for both strands of the DS in each of the three energy valleys on the FES are shown in Figure 3. We adopted a binary‑state classification scheme for the DS embracing the fact that the conformational states of the individual strands need not be the same [49], e.g. BI.BII denotes a DS structure in which one of the strands is in the BI form while the other is in the BII form. (NB: we consider state X.Y as equivalent to Y.X). In accord with the character of the underlying collective coordinates (see above), we found that PC1 serves to discriminate the valleys corresponding to the A and B-family conformers. The geometric characteristics of both strands (with regard to sugar puckers and e/z torsion ranges) of the structures within Valley B and C (see Figure 2a) are similar and are found to correspond to the ranges of the BI-form and BII-form respectively (Figure 3) leading to a classification of the BI.BI and BII.BII states. The geometric characteristics of Valley A (e.g. sugar puckers in the C3´-endo range – see Figure 3) reveal that it corresponds to structures within the A-family. However, both a and g torsions show broad distributions for both strands which tail towards regions corresponding to the crankshaft A-form (CrA). Thus structures within Valley A show a fairly continuous spectrum of conformational states ranging from the A.A to the CrA.CrA states.

The presence of the A-family conformational states within a single energy valley (Valley A) suggests a loss of fine topographical details within this region of the FES. The compact nature of the A-form family structures relative to their counterparts in the B-family mitigates against their representation in the 3D PCS representation. These fine details can be recovered by extending the representation to higher dimensions, or alternatively by performing separately PCA on A and B-form structures. An example of the latter approach, showing an A-form PCS, which shows better separation of the A-form substates and distinct minima on the corresponding PES is provided in Supplementary Material.

Relative energetics of the valleys on the FES

The location and relative energetics of the lowest local energy minima within the energy valleys for the ten unique dinucleotide steps [9] are given in Table 1. The relative energies of valley A and B minima (ΔEBI-A) depend sensitively on the sequence of the dinucleotide step. Except for the GG/CC, GA/TC and the TA/TA steps, valley B (BI.BI states) contains the global energy minimum. The variation of the relative free energy difference between the two valleys (ΔABI-A) does not always follow the trend of the ΔEBI-A values due to the varying entropic contribution (see TΔSBI-A values) which is almost 66% of the ΔEBI-A value in case of the GA/TC step. The positive values of the TΔSBI-A are consistent with the known flexibility of the BI-form relative to the A-form and reflects the importance of entropic factors in stabilizing the BI-form relative to the A-form. This is consistent with our previous results [21] (based on single strand dinucleotide PES) that the bias towards the BI-form relative to the A-form is not only due to enthalpic factors but also due to entropic factors.

Table 1: Mole fraction and relative energetics of the BI form relative to the A-form for the ten dinucleotide steps computed at 0.15 M salt concentration and 298 K*. Calculations are based on partitioning the subspace free energy surface into energy valleys as outlined in the Methods section.
Dinucleotide step	Valley B (BI-form; BI.BI states)					Valley A (A-form; A-family states)					ΔABI-A kcal mol-1	ΔEBI-A kcal mol-1	TΔSBI-A kcal mol-1	Mole fraction of BI-form ( $φ_{B I}$ )
	Location of minima			Energy of minima kcal mol-1	Valley volume/ Å3	Location of minima			Energy of minima kcal mol-1	Valley volume/ Å3
	PC1	PC2	PC3	Energy of minima kcal mol-1	Valley volume/ Å3	PC1	PC2	PC3	Energy of minima kcal mol-1	Valley volume/ Å3
GC/GC	-3	-3	-2	0.00	322.44	4	-1	0	1.66	233.15	-1.97	-1.66	0.30	0.97
CG/CG	-3	-3	-2	0.00	349.66	4	-1	1	1.57	240.76	-1.74	-1.57	0.16	0.95
AC/GT	-3	-3	-2	0.00	274.72	4	0	0	0.92	233.27	-1.23	-0.92	0.31	0.89
AA/TT	-3	-3	-2	0.00	288.99	4	0	0	0.97	236.44	-1.19	-0.97	0.22	0.88
AT/AT	-2	-3	-3	0.00	308.42	4	-1	0	0.71	224.67	-0.94	-0.71	0.23	0.83
CA/TG	-2	-3	-3	0.00	223.40	3	0	1	0.77	243.91	-0.86	-0.77	0.09	0.81
AG/CT	-3	-3	-2	0.00	253.14	4	0	0	0.17	229.47	-0.44	-0.17	0.27	0.68
GG/CC	-2	-4	-2	0.05	221.96	4	0	0	0.00	228.99	-0.29	0.05	0.35	0.62
GA/TC	-3	-3	-2	0.60	252.85	4	0	0	0.00	231.52	0.19	0.60	0.41	0.42
TA/TA	-2	-3	-3	0.80	248.29	4	0	0	0.00	235.81	0.73	0.80	0.07	0.23
* Mole fraction of BII form (valley C, if present) is almost zero and therefore the mole fractions of A-from (valley A) and BI-form sums up to 1.00 in all cases.

Valley C (BII.BII states), if present, corresponds to the highest energy minimum and the shallowest valley with the smallest volume (data not shown). Accordingly, in this context, given the lack of enthalpic or entropic stabilisation, the BII.BII states may be termed metastable. A similar view of the BII conformation emerged in our previous work on the PES of single strand dinucleotide monophosphate models (ElSawy, 2005 #513). In this context we noted the suggestions by other workers that the observation of the BII conformation in crystal structures may be a result of packing effects [50,51].

Validation of the FES

An important step in validating the computed FES is to compare its fidelity with respect to certain observed structural properties of DNA.

Distribution of observed structures on the subspace FES

Each observation in the data matrix (containing sample DS from crystal structures of the A-DNA and B-DNA families) was pre-classified into known DNA conformational forms A, CrA (Crank-A), BI, and BII using relevant backbone torsion angle ranges [52-54]. Projection of the observations into the PCS results in pronounced clustering into the low energy regions on the FES (Figure 2b). Only six binary substates were identified in our data namely A.A, A.CrA, CrA.CrA, BI.BI, BI.BII and BII.BII, representing a mixing of substates within the A or B families (Table 2). It is notable that substates of both A and B families are not found in the same dinucleotide step (see Table 2) i.e. the different families appear to be mutually exclusive within the context of an individual DS.

Projection of the data onto the FES (Figure 2b) reveals that the A-family substates are well separated from those of the B-family. However, the three binary states of the A-family appear to exist within the same energy valley (Valley A) i.e. we do not observe energy barriers between these states on the FES in the 3D PCS (see Figure 3); these regions do appear in separate energy valleys in a principal conformational subspace which comprises only the A-form family (see Supplementary Material).

On the other hand, the separation of the B-family substates is more pronounced than within the A-family. In the PC1-PC3 plane, complete separation of BII.BII structures into Valley C (BII.BII states) is observed (Figure 2b). The BI.BI and BI.BII structures are slightly overlapped, however the BI.BI structures project into the bottom of Valley B (BI.BI states) while the density of the BI.BII structures increases in the direction of Valley C. Furthermore, it is noted that Valleys B and C are parts of a larger energetic basin which encompasses the B-family substates and separates them from those of the A family. We note that the position of the BI.BII substate in relation to the BI.BI, and BII.BII substates is highly suggestive that interconversion between the BI.BI and the BII.BII states proceeds on a strand-wise basis.

Utility of the FES

The utility of the FES is illustrated by the derivation of a realistic B-philicity scale which reflects the conformational behaviour of the dinucleotide steps at near physiological conditions. The B-philicity scale is based on a statistical thermodynamical estimation of the local partition functions of the local energy valleys within the subspace free energy surface (see Methods). This physical approach is distinct from existing statistical approaches which infer A and B- forms propensities from databases of structures determined under a variety of experimental conditions e.g. [55].

A relative B-philicity scale for dinucleotide steps

The B-philicity (tendency to be in the BI-form vs the A-family) of isolated (i.e. non-oligomeric) DNA dinucleotide steps (as indicated by the ∆ABI-A values and the mole fraction of BI-form structures in Table 1) reveals that, as expected, for most of the dinucleotide steps, the B-form is predominant at physiological conditions [56]. However, under these conditions (conventionally associated with the B-form) we find that the GA/TC and TA/TA steps favour the A-form.

In terms of the total free energy difference between the BI-form and A-form (∆ABI-A), the ten dinucleotide steps can be divided into three categories; highly B-philic (GC/GC & CG/CG), B-philic (AC/GT, AA/TT, AT/AT, CA/TG, AG/CT & GG/CC) and A-philic (GA/TC & TA/TA). In the following we compare this trend with available theoretical and experimental data. In doing this, we need to carefully consider the nature of the systems and environments in the experiments, against the nature of our theoretical model which represents isolated dinucleotide steps (i.e. without oligomeric neighbour/context effects) in low salt (0.15 M) aqueous solution at 298 K.

Highly B-philic: GC/GC & CG/CG steps

GC/GC & CG/CG steps show the largest free energy differences ∆ABI-A indicating that these steps are highly B-philic. The high preference for the B-form by the CG step is in accord with the crystallographic observation that the single dinucleotide duplex structure of d(CG)2 exists in the B-form in aqueous solutions ranging from 0.1 to 1.0 M NaCl in a temperature range from 273 to 298K [57]: the B-form conformation was found even in high salt solutions (5.0-6.0 M NaCl). Thus our computed high B-philicity of the CG/CG step (∆ABI-A= -1.74 kcal/mol, see Table 1) is in accord with a study where the length of the duplex structure and the experimental conditions are a good general match with our model.

Solution circular dischroism studies, suggest only moderate B-philicity of the GC/GC & CG/CG steps [27, 58]. However, it must be noted that these observed trends were derived by inducing the B to A transition in DNA oligomers via decreasing the relative humidity of the medium by adding ethanol to water-ethanol mixtures (and at a low temperature of 253 K [58]). Therefore, these observed B-philicity trends do not necessarily represent the behaviour of individual dinucleotide steps at physiological conditions and, as such, our calculations are not directly comparable with these experimental results. Many theoretical studies comment that the GC/GC steps are A-philic for example based on MP2/631G* ab intio calculation it was concluded that GC rich sequences favour the A-form over the B-form (59). However, these studies do not include the solvation, salt concentration and configurational entropic effects (via consideration of the valley volumes) incorporated in this work.

B-philic: AC/GT, AA/TT, AT/AT steps

The AC/GT, AA/TT and AT/AT steps show very similar B-philicity characteristics (with a range of ∆ABA of -1.23 to –0.94 kcal/mol). The B-philicity of these steps is in accord with Hunter’s suggestion [59] that AX/XT steps (X=A, C, G or T) prefer the B-form due to steric clashes between the thymine methyl group and the 5’-neighbouring base which block the A-DNA conformations. However, the order of these steps in terms of their relative B-philicity (Table 1) is not in accord with the observed high B-philicity of A‑tract DNA oligomers derived from crystallographic data [24]. However, high B-philicity of the A-tract oligomers is attributed to stabilization of the B-form by the formation of cross-strand hydrogen bonds [60,61]. For shorter DNA segments we expect this effect to be less significant.

A step with bistable characteristics: the GG/CC step

Our calculations suggest that the GG/CC step has only a small preference for B-form over the A-form (∆ABI-A= -0.29 kcal/mol) which is suggestive of a bistable character. Raman spectroscopy studies have shown that that poly(dG).poly(dC) shows structural variability between A and B-form in solution [62]. However, other spectroscopic studies using NMR and circular dichroism have showed that poly(dG).poly(dC) exists only in the A-form in solution [63,64]. This conflict in experimental results is difficult to resolve. However, it is noted that the NMR experiments [64], have been conducted over a range of temperatures, 30-60 ºC. Such rise in temperature could overcome the little free energy barrier (∆ABI-A= -0.29 kcal/mol), switching the GG/CC step into the A-form.

A-philic: TA/TA step

The TA/TA step is the only dinucleotide step which showed a pronounced A-philic character in our computed B-philicity scale (∆ABI-A= +0.73 kcal/mol). This is in good agreement with the results of single crystal X-ray diffraction studies on DNA double helices containing the TATA sequence (e.g. d(GGTATACC)) which were observed to adopt the A-form conformation [65, 66]. Further, the preference for the TA/TA step to adopt the A-form may be of biological relevance given that the TATA sequence is a motif frequently associated with transcription promoter regions [67,68].

Conclusion

he conformational space of DNA dinucleotide steps was described by three collective degrees of freedom derived from a principal component analysis of the Cartesian coordinates of the atoms defining the backbone and sugar torsions (using data from crystal structures in the NDB). The principal conformational subspace spanned by the first three principal components was found to capture ~77% of the total variance while the rest of the projections along higher principal components are essentially unimodal.

The free energy surfaces of all 10 possible dinucletide steps were mapped in the 3D principal component subspace. The topography of an illustrative FES, CA/TG, shows three energy valleys; two of them corresponding to the BI.BI and BII.BII states while the third corresponds to all of the A-family states. The relative energetics of the A-family, BI.BI depend highly on the sequence of the dinucleotide step. Except for the GG/CC, GA/TC and the TA/TA steps, the BI.BI state corresponds to the global energy minimum. The BII.BII state corresponds to a relatively high-energy, yet shallow valley on the FES, suggesting a metastable character. This reflects the suggestion that the observation of the BII conformation in X-ray crystal structures may be a result of crystal packing effects [50,51].

Based on the subspace FES representation, we computed a B-philicty scale, which represents the propensity for a given dinucleotide step to be in the B-form vs. the A-family conformation at near physiological conditions. Variation in the B-philicity of the ten dinucleotide steps was observed. The ten dinucleotide steps were, therefore, grouped into three categories: highly B-philic (GC/GC & CG/CG), B-philic (AC/GT, AA/TT, AT/AT, CA/TG, AG/CT & GG/CC) and A-philic (GA/TC & TA/TA). The computed B-philicity scale agrees well with experimental data on duplex DNA structures in comparable conditions. The high A-philicity of the TA/TA step has important biological significance in view of its structural relevance to the transcriptional TATA promoter regions [67,68].

The free energy surface provides a coherent physical framework for studying the conformational preferences of DNA at a given temperature and environmental conditions. This physical approach to deriving conformational equilibria in DNA may be contrasted with statistical approaches e.g. [55]. A detailed comparison of the results of the two approaches is ongoing. On our physical model further work is currently underway in a number of directions, including the importance of the system representation, both in terms of segment length and better approximations of the free energy surface.

Aknowledgement

We thank Drs Seishi Shimizu & Chandra Verma for their support and insightful discussions.

References

Copyright

© 2020 ElSawy KM, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

View Similar Articles

Share your thoughts and experiences

Order for reprints

Article Alerts

Subscribe to our articles alerts and stay tuned.

This work is licensed under a Creative Commons Attribution 4.0 International License.

Quick Enquiry

Open Journal of Cell and Protein Science

The DNA conformational energy landscape: sequence-dependent conformational equilibria of duplex DNA

Karim M ElSawy^1,2* and Leo SD Caves³

Author and article information

Abstract

Indexing and Abstracting

Main article text

Introduction

Data and methods

Conformational descriptors

Data acquisition

Data preparation

Multivariate analysis

Empirical energy calculations

Calculation of the electrostatic component of the solvation free energy (∆Aelec)

The Free Energy Surface (FES)

Conformational preferences of dinucleotide steps (a B-philicity scale)

Results and discussion

The principal conformational subspace (PCS)

The Free Energy Surface within the PCS

Structural characterization of the low energy valleys of the FES

Relative energetics of the valleys on the FES

Validation of the FES

Distribution of observed structures on the subspace FES

Utility of the FES

A relative B-philicity scale for dinucleotide steps

Highly B-philic: GC/GC & CG/CG steps

B-philic: AC/GT, AA/TT, AT/AT steps

A step with bistable characteristics: the GG/CC step

A-philic: TA/TA step

Conclusion

Aknowledgement

References

Copyright

View Similar Articles

Share your thoughts and experiences

Article Alerts

Table of Contents

Submit your next article Peertechz Publications, also join of our fulfilled creators. Submit a Manuscript

© Peertechz Publications Inc., 10880 Wilshire Blvd., Suite 1101, Los Angeles, California, 90024, USA

Table 2: Occupation numbers of binary states within the data set.
Class	A	CrA	BI	BII
A	295
CrA	33	23
BI	0	0	231
BII	0	0	71	22

The DNA conformational energy landscape: sequence-dependent conformational equilibria of duplex DNA

Karim M ElSawy1,2* and Leo SD Caves3

Introduction

Data and methods

Conformational descriptors

Data acquisition

Data preparation

Multivariate analysis

Empirical energy calculations

Calculation of the electrostatic component of the solvation free energy (∆Aelec)

The Free Energy Surface (FES)

Conformational preferences of dinucleotide steps (a B-philicity scale)

Results and discussion

The principal conformational subspace (PCS)

The Free Energy Surface within the PCS

Structural characterization of the low energy valleys of the FES

Relative energetics of the valleys on the FES

Validation of the FES

Distribution of observed structures on the subspace FES

Utility of the FES

A relative B-philicity scale for dinucleotide steps

Highly B-philic: GC/GC & CG/CG steps

B-philic: AC/GT, AA/TT, AT/AT steps

A step with bistable characteristics: the GG/CC step

A-philic: TA/TA step

Conclusion

Article Alerts

Table of Contents

Submit your next article Peertechz Publications, also join of our fulfilled creators. Submit a Manuscript

© Peertechz Publications Inc., 10880 Wilshire Blvd., Suite 1101, Los Angeles, California, 90024, USA

Karim M ElSawy^1,2* and Leo SD Caves³