Cite this asBéké DE, Koné M, Diarrasouba F (2021) Quantitative Structure-Activity Relationship (QSAR) study of a series of 2-thioarylalkyl benzimidazole derivatives by The Density Functional Theory (DFT). Open J Bioinform Biostat 5(1): 001-007. DOI: 10.17352/ojbb.000009
In this work, we used the quantum density theory (DFT), B3LYP / 6-311G (d, p) to establish a QSAR (Quantitative Structure Activity Relationships) model on a series of molecules derived from 2-thioarylalkyl-1H -Benzimidazole. This model is built with molecular descriptors and anthelmintic activities against the haemonchus contortus. The statistical indicators of this model are: the coefficient of determination R2, a standard deviation S, the Fisher coefficient F and the cross-validation coefficient Q2cv. The statistical parameters of the model are efficient.
The quantum descriptors responsible for the anthelmintic activity of 2-thioarylalkyl-1H-Benzimidazole derivatives are the dipole moment (μ), the energy of the highest occupied orbital (EHOMO), the smallest negative charge of the molecule (q-).
The acceptance criterion of Eriksson et al. used for the test series is verified. For the external validation, the values of the ratio of theoretical activity and experimental activity tends to unity.
Livestock are an important source of income in developing countries and contribute to food security. In addition in Africa, it often intervenes in the gross domestic product to the tune of 10% to 20% . Livestock farming in most of the African tropics is exposed to a number of factors which slow down its development, including animal diseases .
Among these diseases, gastrointestinal strongyliasis in cattle breeding is one of the main pathologies which causes enormous economic losses for the farmer .
The fight against infectious diseases remains a public health problem, which is explained by the high mortality and morbidity rate caused by these diseases .
Indeed there are three main families of anthelmintic available on the market. Unfortunately, the frequent use of its molecules has led to the appearance of resistance to its drugs. In this context, it is imperative to design and prepare new drugs with a reinforced anthelmintic aim.
Consequently, the pharmaceutical industry is moving towards new research methods, which consist in predicting the properties and activities of molecules before they are even synthesized. In recent years, the use of technologies allowing to synthesize a very large number of molecules simultaneously and to test their actions on therapeutic targets has given very attractive results. This is the main objective of QSAR (Quantitative Structure Activity Relationships) studies. These studies are based on the search for similarities between molecules in large databases of existing molecules whose activities are known. The discovery of such a relationship linking both activities and molecular descriptors makes it possible to predict the activities of new compounds, and therefore to guide the syntheses of new molecules.
This QSAR study concerns a series of sixteen molecules derived from 2-thiarylalkyl-1H-Benzimidazole with twelve molecules (75% of database) used for the training set and four molecules
(25% of the database) for the test set. These compounds have been synthesized and tested for their nematocidal activities by Akpa, et al.  Table 1.
All of the sixteen molecules used in our study have larvicidal concentrations ranging from 0.005 to 424 μg / ml. This concentration range does not allow a quantitative relationship to be defined between anthelmintic activity and theoretical descriptors.
Biological activities are generally expressed as the opposite of the base 10 logarithm of the activity so as to obtain higher mathematical values when the molecule is biologically effective. The anthelmintic activity is then expressed by the anthelmintic potential pCL100 defined by the relationship:
Where M is the molecular mass (g/mol) and CL100 the larvicidal concentration, it is the concentration necessary to eliminate 100% of the larvae of haemonchus contortus.
The relationship between the values of the biological activity of the molecules studied and the molecular structures was highlighted by calculations of theoretical chemistry using the software Gaussian 09. The density functional theory DFT  was used for our calculations with its functional B3LYP with the base 6-311G (d,p) in order to determine the molecular descriptors.
Indeed, DFT is known to generate a variety of molecular properties in a QSAR study [7, 8]. This method makes it possible to reduce the calculation time, increases predictability, and involves a lower cost in the design of drugs [9,10]. The model is obtained using the multilinear regression (RML) method using the XLSTAT  and EXCEL  software.
For the development of the QSAR model, several theoretical descriptors derived from the conceptual DFT were determined. These descriptors are: the dipole moment (μ), the Energy of the Highest Occupied Molecular Orbital (EHOMO) and the smallest negative charge (q-) of the molecule. These descriptors all determined following the optimization of the geometry of the molecules followed by the frequency calculation.
The calculation of the partial correlation coefficient between the descriptor pairs (aij) must be less than 0.70 which shows that the descriptors are independent of each other .
The quality of a QSAR model is determined based on the analysis of certain statistical criteria including the coefficient of determination R2, the standard deviation S, the Fisher coefficient F and the cross-validation coefficient Q2cv.
The statistical parameters R2, F and S relate to the adjustment between the experimental values and the calculated values. The cross-validation coefficient measures the accuracy of the model’s prediction on the data from the training set .
The coefficient of determination R2 measures the share of experimental variance explained by the model in relation to the total variance. Its value is between 0 and 1. The closer its
value is to 1, the more observed and predicted values are not correlated [15,16].
Experimental value of anthelmintic activity
Theoretical value of anthelmintic activity
The average of the experimental values of the anthelmintic activity
The variance  is determined by the following relation:
Where k is the number of independent variables (descriptors) of the equation of the model, n is the number of molecules in the test set and n-k-1 is the degree of freedom.
The standard deviation S [18,19] is another statistical parameter, it provides information on how the distribution of data is distributed around the average.
The Fisher coefficient F [19,20]. allows to test the global significance of the linear regression
The cross-validation coefficient  measures the accuracy of the prediction on the data from the training set. It is calculated using the following relation:
The performance of the model according to the Erickson et al. criterion [22,23]. is characterized by the value of Q2cv >0.5 for a satisfactory model and for an excellent model Q2cv must be close to 0.9. The training set of the model will be acceptable if the criterion R2- Q2cv<0.3 is respected.
However, the predictive power of the model can be obtained by the ratio for the test set. The model is acceptable when the ratio of the values of theoretical activity to experimental activity tends towards unity for the validation set. The model is acceptable when the ratio of the values of theoretical activity to experimental activity tends towards unity.
In this QSAR work, three (03) pertinent molecular descriptors were calculated. These descriptors are: the dipole moment (μ), the Energy of the Highest Occupied Molecular Orbital (EHOMO) and the smallest negative charge (q-) of the molecule. The Table 2 reports the different values of these molecular descriptors.
The partial correlation coefficients aij between the descriptor pairs shows that they are less than 0.70, which demonstrates the independence of the descriptors used to develop the model.
The positive or negative sign of the coefficients of the model descriptors reflects the effect of proportionality between the evolution of anthelmintic activity and the descriptors in the model equation. The best QSAR model obtained for anthelmintic activity against haemonchus contortus is as follows:
with a statistical indicators:
The negative sign of the dipole moment coefficient indicates that the anthelmintic activity is improved for low values of the dipole moment (μ). The energy of the highest occupied molecular orbital (EHOMO) and the smallest negative charge (q-) of the molecule are all negative. Under these conditions, low values of these two descriptors lead to an increase in anthelmintic activity. The coefficient of determination R2= 0.917 shows that 91.7% of the experimental variance of anthelmintic activity is explained by the descriptors of the established QSAR model. Regarding the Fischer coefficient, we note a quantity which is worth F= 29.354. This reflects our QSAR model is globally significant. As for the cross validation coefficient, its value is Q2cv = 0.916 and is greater than 0.90. Likewise, the difference R2-Q2cv less than 0.3. Which means our QSAR developed model is excellent in predicting anthelmintic activity.
The external validation of the model is obtained by the ratio is presented in Tables 3,4.
All the values of the ratio tend towards the unit. This indicates the good correlation between the experimental and theoretical values of the anthelmintic potential of 2-thioalkylaryl-1H-Benzimidazole derivatives. This model is therefore acceptable for predicting anthelmintic activity against haemonchus contortus in the series of 2-thioalkylaryl-1H-Benzimidazole derivatives.
The line of regressions between the theoretical and experimental anthelmintic activities between the training set and the test set is illustrated in Figure 1.
The relative contribution of the descriptors in predicting the anthelmintic activity of the compounds is presented in Figure 2.
The energy of the highest occupied molecular orbital has the largest contribution followed by the dipole moment and the smallest negative charge in the molecule.
In this work, Quantitative Structure-Activity Relationship (QSAR) methodology and theoretical chemistry methods were used to establish a predictive model of the anthelmintic activity of a series of 2-Thioalkyl Aryl Benzimidazole derivative coded TAB and TBZ against Haemonchus contortus. We determined molecular descriptors using theory level B3LYP/ 6-311G (d,p). The developed model depends on three (03) parameters (descriptors) namely the dipole moment (μ), the energy of the highest occupied molecular orbital (EHOMO) and the smallest negative charge (q-) of the molecule. This model displays very satisfactory statistical indicators. Indeed, R2= 0.917; Q2cv=0.916; S=0.606; F= 29.354. The Q2cv value greater than 0.90 indicates that the established QSAR model has excellent predictive power. For the Fischer coefficient, its high value shows that our established model is significant in predicting the anthelmintic activity of the series of studied molecules. The model contains at least one descriptor relevant to predicting the biological activity of this family of molecules. After studying the contribution of descriptors, it emerges that the energy of the highest occupied molecular orbital (EHOMO) is the descriptor that makes the strongest contribution in the prediction of anthelmintic activity of this series of molecules. It is therefore the priority descriptor in the prediction of anthelmintic activity.
This study will play a very important role in explaining anthelmintic activity and will also provide guidance for the design of new molecules with improved anthelmintic activity. From now on, for the design of new molecules with improved anthelmintic activity, we can simply play on the three descriptors of the QSAR developed model.
Subscribe to our articles alerts and stay tuned.