The functional relation between a dependent variable and more than one independent variable is examined by multiple regression analysis. The purpose of the multiple regression analysis is the creation of the best model that can predict the dependent variable by using the independent variables. For this purpose, the most common method to create the best model is ordinary least square (OLS) estimates method. In this method, the sum of error squares to be minimal is calculated to predict the parameters of the model.
There are some valid assumptions for the implementation of the multiple regression analysis. These are; the absence of multicollinearity problem among independent variables, the variance of error term must be constant for all independent variables and the covariance between error term and independent variables must be equal to zero.
One of the major problems in multiple regression analysis is multicollinearity problem. If there is a full or high degree linear relationship among independent variables, this situation is called as multicollinearity. Besides, multicollinearity has some important effects on OLS estimates of the regression coefficients. In the presence of multicollinearity, the OLS of regression coefficients have large variance. And also, the regression coefficients can be estimated incorrectly and the standard errors of regression coefficients can be found as exaggerated in the presence of multicollinearity. If the regression coefficients can be estimated incorrect, it can be obtained incorrect results statistically.
Therefore, ridge regression method is used to obtain stable coefficient estimates for the estimation of the regression coefficients. That means, ridge regression has been suggested to overcome the multicollinearity problem.
In the literature, it is commonly accepted that if the variance inflation factors (VIF) values are greater than 10 there is a multicollinearity problem. This is a rule of thumb and this is not exact information. Similarly, condition number can be used to determine multicollinearity problem by using rule of thumbs. As a result of, determining of multicollinearity problem can be realized by using some criteria.
The two methods most commonly used to determine the effects of multicollinearity problem are VIF and condition number methods. The diagonal elements of
are called as VIF and are given by the Equation 1.
In this Equation,
is the determination coefficient obtained from the multiple regression of
on the remaining
regressor variables in the model.
It can be said that there is a multicollinearity problem among the relevant independent variables if these VIF values increase (VIF values ≥ 10). And also, if VIF values are increased, the degree of the multicollinearity increases with the increase of VIF values.
Condition number method is another method to determine the multicollinearity problem which is based on the eigenvalues of matrix. The formula of the condition number (CN) was given in Equation 2.
In this Equation,
shows the eigenvalues of
the relationship between condition number and multicollinearity is given in Table 1.
In summary, the determining of multicollinearity problem can be done by following two rules of thumbs. The first one is that if VIF values are greater than 10 multicollinearity is high. The second one is checking condition number as given in Table 1.
In addition, another problem in ridge regression is finding optimal biasing parameter (k) value. This k value is a very small constant determined by the researcher . Several methods were proposed for finding it in the literature. These methods have been proposed in the studies of [2-22].
And also, there are many methods in the literature for ridge regression [23-29]. And also,  proposed some new methods that take care of the skewed eigenvalues of the matrix of explanatory variables.  Proposed an iterative approach to minimize the mean squared error in ridge regression.  Proposed new ridge parameters for ridge regression.  Proposed an optimal estimation for the ridge regression parameter. [34,35] Proposed some new estimators for estimating the ridge parameter.
This k value was found to be a single value in almost all these studies in the literature. But in this study, we found different k values corresponding to each diagonal elements of variance-covariance matrix of instead of a single value of k by using a new algorithm based on particle swarm optimization.
The rest part of the paper can be outlined as below: The second section of the paper is about ridge regression. The methodology of the paper is given in Section 3. The implementation of our proposed method is given in Section 4. Two different simulation studies are performed under the title of simulation study and finally, discussions are presented in Section 6.
Ridge regression is a remedy used in the presence of multicollinearity problem and it was firstly proposed by . Ridge regression method has two important advantages according to OLS method. One of them is to solve the multicollinearity problem and the other one is to decrease the mean square error (MSE). The solution technique of ridge regression is similar with OLS. Besides, the difference between ridge regression and OLS is the k value. This k value is also called as biased parameter or shrinkage parameter and it takes values between 0 and 1. This k value is added to the diagonal elements of the correlation matrix and thus biased regression coefficients are obtained.
The OLS estimates of regression coefficients and ridge estimates of regression coefficients are shown in the Equations 3 and 4 respectively.
As noted above, ridge regression is a biased regression method. The proof of this situation is shown in Equation 5.
It is clearly seen that ridge estimates of regression coefficients
are biased estimates. One of the most important points to be considered in the ridge regression is the k value. There are many methods proposed in the literature to find the optimal k value. Ridge trace is one of these methods. Ridge trace is a plot of the elements of the ridge estimator versus k usually in the interval (0, 1) .
The other methods in the literature used to find the optimal k value were given in the Equations 6-14, respectively.
In this paper, for the purpose of comparing the results we just consider the methods of which a brief introduction is given as below.
 Suggested another method for finding k value which is given in Equation 15
In this Equation
are the OLS estimates. This method is called as fixed point ridge regression method (FPRRM).
 Introduced an iterative method for finding the optimal k value. In this method k is calculated in Equation 16;
In this Equation,
are the corresponding residual mean square and the estimate vector of regression coefficients at (t-1)th iteration, respectively. This method is called as iterative ridge regression method (IRRM).
And also, the generalized ridge regression estimator of Hoerl and Kennard [1, 40] is given in  by following Equations 17-20.
Let and Q be the matrices of eigenvalues and eigenvectors of.
In the orthogonal version of the classical linear regression model:
Is the generalized ridge estimator of.
Hoerl and Kennard [1, 40], have shown that the values of
which minimize the MSE of regression coefficient are given by
And the estimation of
values can be obtained by using Equation 19.
In , other estimation formulas for optimum shrinkage parameters are given below.
Finding the optimal k value is an important problem in ridge regression. The k values recommended in the literature were given in the previous section. And also, there are some heuristic methods such as genetic algorithms to find the optimal k value in the literature proposed by [18, 21]. And also,  have found the k value by using particle swarm optimization (PSO). In all these methods suggested in the literature, this k value was found as a single value. But in this study, we found different k values corresponding to each explanatory variable instead of a single value of k by using an algorithm based on particle swarm optimization. And also, this paper is the improvement form of the study of .
The objective function of the paper was created by considering both mean absolute percentage error (MAPE) criterion and VIF values at the same time. The aim of the objective function is to find the optimal k values by finding the VIF values less than 10 and SSE (sum of square errors) minimum, at the same time. And also, we add a parameter
to the second part of the objective function. This parameter can be called as penalty parameter. If the VIF value corresponds to any explanatory variable is bigger than 10 the value of the objective function is increased. This is an effect of the penalty parameter. This is an undesirable result.
The optimization problem in the proposed method can be given in Equation 21.
with subject to:
can be defined in Equations 22 and 23 respectively.
(p shows the number of explanatory variables.)
The optimization problem defined as in (21) was solved by using PSO in the proposed method. PSO is a popular artificial intelligence technique and it was firstly proposed by . The algorithm of the proposed method is given below.
Step 1. The parameters such as pn
, etc., are determined. These parameters are as follows:
pn: particle number of swarm
: Cognitive coefficient
: Social coefficient interval
maxt: Maximum iteration number
w: Inertia weight
Step 2. Generate random initial positions and velocities.
The initial positions and velocities are generated by uniform distribution with (0,1) parameters. Each particle has velocities up to the number of explanatory variables and each particle has positions up to the number of explanatory variables which represents
Represents the position of particle m at iteration t and
represents the velocity of the particle m at iteration t.
Step 3. The fitness function was defined as in (21) and the fitness values of the particles are calculated.
Step 4. Pbest and Gbest particles given in (24) and (25), respectively, are determined according to fitness values.
Pbest is constructed by the best results obtained in the related positions at iteration t. Gbest is the best result in the swarm at iteration t.
Step 5. New velocities and positions of the particles are calculated by using the Equations given in (26) and (27).
are random numbers generated from U (0,1).
Step 6. Step 3 to Step 6 is repeated until t maxt.
Step 7. The optimal values are obtained as Gbest.
The proposed algorithm was applied to two different and well known data sets in order to investigate of the proposed method. These two data sets named “Import Data” and “Longley Data” were used to evaluate the performance of the proposed method. Import data was analyzed by . The variables of “Import Data” are; imports (IMPORT-Y), domestic production (DOPROD-X1), stock formation (STOCK-X2) and domestic consumption (CONSUM-X3), all measured in billions of French francs for the years 1949 through 1959. Both Import data and Longley data were solved by using fixed point method (), iterative method (), ’s method and the algorithm proposed in this paper. In the proposed algorithm, PSO parameters were chosen as
. In the iterative ridge method the stopping criteria were chosen as
. The results of each method were presented in Tables 2 and 3, respectively.
As we can see from Table 2, our proposed method has minimum SSE and MAPE values. And also there is no multicollinearity problem when “Import Data” solved by our proposed method. But, there is a multicollinearity problem when “Import Data” solved by FPRRM and IRRM methods because of the VIF values of these methods are bigger than 10. Although, other methods can give smaller SSE and MAPE values they do not still solve the multicollinearity problem. Because it is clearly seen that some VIF values of these methods are greater than 10.
As we can see from Table 3, our proposed method has minimum MAPE value when compared with other methods. But SSE value of our proposed method is not the smallest one. The SSE value of OLS is smaller than our proposed methods. But, it is clearly seen that the OLS method has multicollinearity problem when “Longley Data” solved by this method. But our proposed method has no multicollinearity problem.
As a result, finding k values for each explanatory variable gives better results than finding a single k value. And also, our proposed has no multicollinearity problem.
Two different simulation studies are performed in this section of the paper in order to show the performance of the proposed method in different levels of multicollinearity and standard deviation of error term and the superiority of the proposed method when compared with other methods.
The First Simulation Study: In this simulation study, the proposed method was compared with ridge regression methods given in [2,22,39] by a simulation study. The number of observations (n) was taken as 100, 500 and 1000; the standard deviation of error term was taken as 0.01 and 1 and comparisons were made for the total 6 cases. For each case, 1000 data set including multicollinearity problem was created.
The first three independent variables were generated from standard normal distribution as given in Equation 28.
The last two independent variables were generated by using Equation 29. Thus, it is provided to arise multicollinearity problem for the data set by providing a high correlation between independent variables
The observations of dependent variable were obtained using Equation 30. So, all the coefficients in the regression model are taken as 1.
For each data generated in each case,
, SSE, MAPE and CN values are calculated by using proposed method, the studies [2, 22, 39]. The formula of SSE is given in Equation 31.
The most important indicator for the comparison of methods is that VIF and CN would be small. The methods  and  do not guarantee the solution of multicollinearity problem as seen in the numerical examples. The method  and proposed method guarantee that all VIF values are smaller than 10. Therefore, it is suitable to compare the proposed method with  method in terms of SSE and MAPE criteria.
The results of median and inter quartile range (IQR) valueswere given between Tables 4-9.
When all tables are examined, it is clearly seen that and CN values of proposed method is lower than the other methods in all cases.
However, it is seen that the proposed method produces lower MAPE values compared to others despite producing higher SSE values. This is because the objective function of the proposed method may be depending to the MAPE.
The Second Simulation Study: A second simulation study was performed in the paper according to different levels of multicollinearity problem and standard deviation of error term. The regressors were generated by using Equations 32-36 given by .
independent standard normal are pseudorandom numbers and
is theoretical correlation between any two explanatory variables.
Simulation study was conducted for a total of 8 cases for sample size is 100,
, standard deviation of the standard deviation of error term () and different degrees of multiple connections (
It is clearly seen that in the tables of the simulation Study 2,
and CN values of the proposed method do not change significantly when standard deviation of error term values are changed.
And CN values of the proposed method are increased dramatically when multicollinearity is increased. And also there is no a hardly ever change to be seen in the MAPE values of the proposed method with the reasonable standard deviation of error term values
or there is a decrease to be seen in the MAPE values of the proposed method when multicollinearity is increased.
In this simulation study, different levels of standard deviation of error term are also employed. As a result of this simulation study it is clearly seen that when standard deviation of error term value is greater than 1 and >1 the model has very big deviation from linear regression model because MAPE values are obtained about 60 and this value is not suitable. And also, it is clearly seen that in the tables of the simulation study 2, the prediction performance of the proposed is affected quite negatively when standard deviation of error term is increased.
There are some valid assumptions to create a model in multiple regression analysis. One of them is that it should not be multicollinearity problem among independent variables. Ridge regression method is often used in the literature when there is a multicollinearity problem among independent variables.
But, ridge regression has also some problems. One of the most important problems in ridge regression is to decide what the shrinkage parameter (k) value will be. There are many studies in the literature to find the optimal k value. In these studies, this k value was found to be a single value. But in this study, we found different k values corresponding to each explanatory variable instead of a single value of k by using a new algorithm based on particle swarm optimization. And also, the proposed method was supported by two simulation studies. Besides, it is an important novelty for ridge regression literature.
In the future studies, different artificial intelligence optimization techniques can be used to find these k values for each explanatory variable.