ISSN: 2641-3086

Research Article
Open Access Peer-Reviewed

**Cite this as**

The functional relation between a dependent variable and more than one independent variable is examined by multiple regression analysis. The purpose of the multiple regression analysis is the creation of the best model that can predict the dependent variable by using the independent variables. For this purpose, the most common method to create the best model is ordinary least square (OLS) estimates method. In this method, the sum of error squares to be minimal is calculated to predict the parameters of the model.

There are some valid assumptions for the implementation of the multiple regression analysis. These are; the absence of multicollinearity problem among independent variables, the variance of error term must be constant for all independent variables and the covariance between error term and independent variables must be equal to zero.

One of the major problems in multiple regression analysis is multicollinearity problem. If there is a full or high degree linear relationship among independent variables, this situation is called as multicollinearity. Besides, multicollinearity has some important effects on OLS estimates of the regression coefficients. In the presence of multicollinearity, the OLS of regression coefficients have large variance. And also, the regression coefficients can be estimated incorrectly and the standard errors of regression coefficients can be found as exaggerated in the presence of multicollinearity. If the regression coefficients can be estimated incorrect, it can be obtained incorrect results statistically.

Therefore, ridge regression method is used to obtain stable coefficient estimates for the estimation of the regression coefficients. That means, ridge regression has been suggested to overcome the multicollinearity problem.

In the literature, it is commonly accepted that if the variance inflation factors (VIF) values are greater than 10 there is a multicollinearity problem. This is a rule of thumb and this is not exact information. Similarly, condition number can be used to determine multicollinearity problem by using rule of thumbs. As a result of, determining of multicollinearity problem can be realized by using some criteria.

The two methods most commonly used to determine the effects of multicollinearity problem are VIF and condition number methods. The diagonal elements of $Var\hat{\left(\beta \right)}$ are called as VIF and are given by the Equation 1.

$VI{F}_{j}=\frac{1}{\left(1-{R}_{j}^{2}\right)}\text{}j=1,\dots ,p\text{(1)}$

In this Equation, ${R}_{j}^{2}$ is the determination coefficient obtained from the multiple regression of ${X}_{j}$ on the remaining $\left(p-1\right)$ regressor variables in the model.

It can be said that there is a multicollinearity problem among the relevant independent variables if these VIF values increase (VIF values ≥ 10). And also, if VIF values are increased, the degree of the multicollinearity increases with the increase of VIF values.

Condition number method is another method to determine the multicollinearity problem which is based on the eigenvalues of matrix. The formula of the condition number (CN) was given in Equation 2.

$\varphi =\frac{{\lambda}_{\text{max}}}{{\lambda}_{\text{min}}}\text{(2)}$

In this Equation, $\lambda $ shows the eigenvalues of ${X}^{\text{'}}X.$ the relationship between condition number and multicollinearity is given in Table 1.

In summary, the determining of multicollinearity problem can be done by following two rules of thumbs. The first one is that if VIF values are greater than 10 multicollinearity is high. The second one is checking condition number as given in Table 1.

In addition, another problem in ridge regression is finding optimal biasing parameter (k) value. This k value is a very small constant determined by the researcher [1]. Several methods were proposed for finding it in the literature. These methods have been proposed in the studies of [2-22].

And also, there are many methods in the literature for ridge regression [23-29]. And also, [30] proposed some new methods that take care of the skewed eigenvalues of the matrix of explanatory variables. [31] Proposed an iterative approach to minimize the mean squared error in ridge regression. [32] Proposed new ridge parameters for ridge regression. [33] Proposed an optimal estimation for the ridge regression parameter. [34,35] Proposed some new estimators for estimating the ridge parameter.

This k value was found to be a single value in almost all these studies in the literature. But in this study, we found different k values corresponding to each diagonal elements of variance-covariance matrix of instead of a single value of k by using a new algorithm based on particle swarm optimization.

The rest part of the paper can be outlined as below: The second section of the paper is about ridge regression. The methodology of the paper is given in Section 3. The implementation of our proposed method is given in Section 4. Two different simulation studies are performed under the title of simulation study and finally, discussions are presented in Section 6.

Ridge regression is a remedy used in the presence of multicollinearity problem and it was firstly proposed by [1]. Ridge regression method has two important advantages according to OLS method. One of them is to solve the multicollinearity problem and the other one is to decrease the mean square error (MSE). The solution technique of ridge regression is similar with OLS. Besides, the difference between ridge regression and OLS is the k value. This k value is also called as biased parameter or shrinkage parameter and it takes values between 0 and 1. This k value is added to the diagonal elements of the correlation matrix and thus biased regression coefficients are obtained.

The OLS estimates of regression coefficients and ridge estimates of regression coefficients are shown in the Equations 3 and 4 respectively.

$\begin{array}{l}\widehat{\beta}={\left(X\text{'}X\right)}^{-1}X\text{'}Y\\ {\widehat{\beta}}_{R}={\left(X\text{'}X+kI\right)}^{-1}X\text{'}Y\text{(3)}\end{array}$

${\widehat{\beta}}_{R}={\left(\text{X}\text{'}\text{X}+kI\right)}^{-1}\text{X}\text{'}Y\text{(4)}$

As noted above, ridge regression is a biased regression method. The proof of this situation is shown in Equation 5.

$\begin{array}{l}{\widehat{\beta}}_{R}={\left(X\text{'}X+kI\right)}^{-1}X\text{'}Y\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}={\left(X\text{'}X+kI\right)}^{-1}(X\text{'}X)\widehat{\beta}=Z\widehat{\beta}\text{(5)}\end{array}$

$E\left({\widehat{\beta}}_{R}\right)=E\left(Z\widehat{\beta}\right)=Z\beta $

It is clearly seen that ridge estimates of regression coefficients $\left({\widehat{\beta}}_{R}\right)$ are biased estimates. One of the most important points to be considered in the ridge regression is the k value. There are many methods proposed in the literature to find the optimal k value. Ridge trace is one of these methods. Ridge trace is a plot of the elements of the ridge estimator versus k usually in the interval (0, 1) [1].

The other methods in the literature used to find the optimal k value were given in the Equations 6-14, respectively.

$k=\frac{\rho {\hat{\sigma}}^{2}}{\hat{\beta}\text{'}\hat{\beta}}\text{(6)}$

$k=\frac{\rho {\hat{\sigma}}^{2}}{{{\displaystyle \sum}}_{i=1}^{p}{\lambda}_{i}{\hat{\beta}}_{i}{}^{2}}\text{(7)}$

$k=\frac{\rho {\hat{\sigma}}^{2}}{{{\displaystyle \sum}}_{i=1}^{p}\left\{{\hat{\beta}}_{i}{}^{2}\left[1+\left(1+{\lambda}_{i}{\left({\hat{\beta}}_{i}{}^{2}/{\hat{\sigma}}^{2}\right)}^{1/2}\right)\right]\right\}}\text{(8)}$

$k=\frac{\left({\lambda}_{max}{\hat{\sigma}}^{2}\right)}{\left(\left(n-p-1\right){\hat{\sigma}}^{2}+{\lambda}_{max}{\hat{\beta}}^{2}{}_{max}\right)}\text{(9)}$

$k=max\left(0,\frac{p{\hat{\sigma}}^{2}}{\hat{\beta}\text{'}\hat{\beta}}-\frac{1}{n{\left(VI{F}_{j}\right)}_{max}}\right)\text{(10)}$

$k=\frac{{\hat{\sigma}}^{2}{{\displaystyle \sum}}_{i=1}^{p}\left({\lambda}_{i}{\hat{\beta}}_{i}{}^{2}\right)}{{\left[s{{\displaystyle \sum}}_{i=1}^{p}\left({\lambda}_{i}{\hat{\beta}}_{i}{}^{2}\right)\right]}^{2}}\text{(11)}$

$k=\frac{\left\{{\hat{\sigma}}^{2}{\lambda}_{max}{{\displaystyle \sum}}_{i=1}^{p}\left({\lambda}_{i}{\hat{\beta}}_{i}{}^{2}\right)+{\left[{{\displaystyle \sum}}_{i=1}^{p}\left({\lambda}_{i}{\hat{\beta}}_{i}{}^{2}\right)\right]}^{2}\right\}}{{\lambda}_{max}{{\displaystyle \sum}}_{i=1}^{p}\left({\lambda}_{i}{\hat{\beta}}_{i}{}^{2}\right)}\text{(12)}$

$k=max\left(\frac{{\hat{\sigma}}^{2}}{{\hat{\beta}}_{i}{}^{2}}+\frac{1}{{\lambda}_{i}}\right),i=1,2,\cdots ,p\text{(13)}$

$k=\frac{p{\hat{\sigma}}^{2}}{{{\displaystyle \sum}}_{i=1}^{p}\left\{{\hat{\beta}}_{i}{}^{2}/\left[{\left[\left({\hat{\beta}}_{i}{}^{4}{\lambda}_{i}{}^{2}/4{\hat{\sigma}}^{2}\right)+\left(6{\hat{\beta}}_{i}{}^{4}{\lambda}_{i}/{\hat{\sigma}}^{2}\right)\right]}^{1/2}-\left({\hat{\beta}}_{i}{}^{2}{\lambda}_{i}/2{\hat{\sigma}}^{2}\right)\right]\right\}}\text{(14)}$

In this paper, for the purpose of comparing the results we just consider the methods of which a brief introduction is given as below.

[2] Suggested another method for finding k value which is given in Equation 15

$k=\frac{p{\widehat{\sigma}}^{2}}{{\widehat{\beta}}^{\text{'}}\widehat{\beta}}\text{(15)}$

In this Equation ${\widehat{\sigma}}^{2}$ and $\widehat{\beta}$ are the OLS estimates. This method is called as fixed point ridge regression method (FPRRM).

[39] Introduced an iterative method for finding the optimal k value. In this method k is calculated in Equation 16;

$k=\frac{p{\widehat{\sigma}}^{2}\left(t-1\right)}{\widehat{\beta}{\left(t-1\right)}^{\text{'}}\widehat{\beta}\left(t-1\right)}\text{(16)}$

In this Equation, ${\widehat{\sigma}}^{2}\left(t-1\right)$ and $\widehat{\beta}\text{\hspace{0.17em}}\left(t-1\right)$ are the corresponding residual mean square and the estimate vector of regression coefficients at (t-1)th iteration, respectively. This method is called as iterative ridge regression method (IRRM).

And also, the generalized ridge regression estimator of Hoerl and Kennard [1, 40] is given in [41] by following Equations 17-20.

Let and Q be the matrices of eigenvalues and eigenvectors of. $\left(X\text{'}X\right)$ In the orthogonal version of the classical linear regression model: $Z=XQ$ ,$\alpha =Q\text{'}\beta $ ,$\widehat{\alpha}={\lambda}^{-1}Z\text{'}y$ ,$K=diag\left({k}_{1},\text{\hspace{0.17em}}{k}_{2},\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\cdots ,\text{\hspace{0.17em}}\text{\hspace{0.17em}}{k}_{p}\right),\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}{k}_{i}0$ then

$\tilde{\beta}=Q{\left(\Lambda +K\right)}^{-1}\Lambda \widehat{\alpha}\text{(17)}$

$\tilde{\beta}$ Is the generalized ridge estimator of. $\beta $ Hoerl and Kennard [1, 40], have shown that the values of ${k}_{i}$ which minimize the MSE of regression coefficient are given by

${k}_{i}=\frac{{\sigma}^{2}}{{\alpha}_{i}^{2}}\text{(18)}$

And the estimation of ${k}_{i}$ values can be obtained by using Equation 19.

${\widehat{k}}_{i}=\frac{{\widehat{\sigma}}^{2}}{{\widehat{\alpha}}_{i}^{2}}\text{(19)}$

In [41], other estimation formulas for optimum shrinkage parameters are given below.

${\widehat{k}}_{i}=\frac{{\lambda}_{i}{\widehat{\sigma}}^{2}}{\left(n-k\right){\widehat{\sigma}}^{2}+{\lambda}_{i}{\widehat{\alpha}}_{i}^{2}}\text{(20)}$

Finding the optimal k value is an important problem in ridge regression. The k values recommended in the literature were given in the previous section. And also, there are some heuristic methods such as genetic algorithms to find the optimal k value in the literature proposed by [18, 21]. And also, [22] have found the k value by using particle swarm optimization (PSO). In all these methods suggested in the literature, this k value was found as a single value. But in this study, we found different k values corresponding to each explanatory variable instead of a single value of k by using an algorithm based on particle swarm optimization. And also, this paper is the improvement form of the study of [22].

The objective function of the paper was created by considering both mean absolute percentage error (MAPE) criterion and VIF values at the same time. The aim of the objective function is to find the optimal k values by finding the VIF values less than 10 and SSE (sum of square errors) minimum, at the same time. And also, we add a parameter $\left(\varnothing (k)\right)$ to the second part of the objective function. This parameter can be called as penalty parameter. If the VIF value corresponds to any explanatory variable is bigger than 10 the value of the objective function is increased. This is an effect of the penalty parameter. This is an undesirable result.

The optimization problem in the proposed method can be given in Equation 21.

Objective function:

$\underset{{k}_{1},{k}_{2},\cdots ,{k}_{p}}{\mathrm{min}}MAPE\left({k}_{1},{k}_{2},\cdots ,{k}_{p}\right)+\varnothing ({k}_{1},{k}_{2},\cdots ,{k}_{p})\text{(21)}$

with subject to: $0\le {k}_{1},{k}_{2},\cdots ,{k}_{p}\le 1\left(j=1,2,\cdots ,p\right)$

where MAPE $\left({k}_{1},{k}_{2},\cdots ,{k}_{p}\right)$ and $\varnothing \left({k}_{1},{k}_{2},\cdots ,{k}_{p}\right)$ can be defined in Equations 22 and 23 respectively.

$MAPE\left({k}_{1},{k}_{2},\cdots ,{k}_{p}\right)=\frac{1}{n}{\displaystyle \sum}_{i=1}^{n}\left|\frac{{y}_{i}-{\hat{y}}_{i}}{{y}_{i}}\right|\text{(22)}$$\varnothing \left({k}_{1},{k}_{2},\cdots ,{k}_{p}\right)=\{\begin{array}{c}0\forall VI{F}_{j}10,j=1,2,\dots ,p\\ {\displaystyle \sum}_{j=1}^{p}VI{F}_{j}otherwise\end{array}\text{(23)}$

(p shows the number of explanatory variables.)

The optimization problem defined as in (21) was solved by using PSO in the proposed method. PSO is a popular artificial intelligence technique and it was firstly proposed by [42]. The algorithm of the proposed method is given below.

Step 1. The parameters such as pn ${c}_{1}$ ,${c}_{2}$ , etc., are determined. These parameters are as follows:

*pn:* particle number of swarm

${c}_{1}$ : Cognitive coefficient

${c}_{2}$ : Social coefficient interval

*maxt:* Maximum iteration number

*w:* Inertia weight

**Step 2.** Generate random initial positions and velocities.

The initial positions and velocities are generated by uniform distribution with (0,1) parameters. Each particle has velocities up to the number of explanatory variables and each particle has positions up to the number of explanatory variables which represents $\left({k}_{1},{k}_{2},\cdots ,{k}_{p}\right)$
values. ${x}_{m}^{t}$
Represents the position of particle m at iteration t and ${v}_{m}^{t}$
represents the velocity of the particle *m* at iteration t.

**Step 3.** The fitness function was defined as in (21) and the fitness values of the particles are calculated.

**Step 4.** Pbest and Gbest particles given in (24) and (25), respectively, are determined according to fitness values.

$Pbes{t}_{m}^{t}=(pm),\text{}m=\text{}1,2,\text{}\dots ,\text{}pn\text{(24)}$

$Gbes{t}^{t}=\text{}(pg)\text{(25)}$

*Pbest* is constructed by the best results obtained in the related positions at iteration t. Gbest is the best result in the swarm at iteration t.

Step 5. New velocities and positions of the particles are calculated by using the Equations given in (26) and (27).

${v}_{m}^{t+1}=\left[\begin{array}{l}w\times {v}_{m}^{t}+{c}_{1}\times ran{d}_{1}\times \\ \left(Pbes{t}_{m}^{t}-{x}_{m}^{t}\right)+{c}_{2}\times ran{d}_{2}\times \left(Gbes{t}^{t}-{x}_{m}^{t}\right)\end{array}\right]\text{(26)}$

${x}_{m}^{t+1}={x}_{m}^{t}+{v}_{m}^{t+1}\text{(27)}$

Where $ran{d}_{1}$ and $ran{d}_{2}$ are random numbers generated from U (0,1).

**Step 6.** Step 3 to Step 6 is repeated until *t maxt.*

**Step 7.** The optimal values are obtained as *Gbest. *

The proposed algorithm was applied to two different and well known data sets in order to investigate of the proposed method. These two data sets named “Import Data” and “Longley Data” were used to evaluate the performance of the proposed method. Import data was analyzed by [43]. The variables of “Import Data” are; imports (IMPORT-Y), domestic production (DOPROD-X1), stock formation (STOCK-X2) and domestic consumption (CONSUM-X3), all measured in billions of French francs for the years 1949 through 1959. Both Import data and Longley data were solved by using fixed point method ([2]), iterative method ([39]), [22]’s method and the algorithm proposed in this paper. In the proposed algorithm, PSO parameters were chosen as $pn=30,w=0.9,{c}_{1}={c}_{2}=2$ and $maxt=100$ . In the iterative ridge method the stopping criteria were chosen as $={10}^{-6}$ . The results of each method were presented in Tables 2 and 3, respectively.

As we can see from Table 2, our proposed method has minimum SSE and MAPE values. And also there is no multicollinearity problem when “Import Data” solved by our proposed method. But, there is a multicollinearity problem when “Import Data” solved by FPRRM and IRRM methods because of the VIF values of these methods are bigger than 10. Although, other methods can give smaller SSE and MAPE values they do not still solve the multicollinearity problem. Because it is clearly seen that some *VIF* values of these methods are greater than 10.

As we can see from Table 3, our proposed method has minimum MAPE value when compared with other methods. But SSE value of our proposed method is not the smallest one. The SSE value of OLS is smaller than our proposed methods. But, it is clearly seen that the OLS method has multicollinearity problem when “Longley Data” solved by this method. But our proposed method has no multicollinearity problem.

As a result, finding k values for each explanatory variable gives better results than finding a single k value. And also, our proposed has no multicollinearity problem.

Two different simulation studies are performed in this section of the paper in order to show the performance of the proposed method in different levels of multicollinearity and standard deviation of error term and the superiority of the proposed method when compared with other methods.

**The First Simulation Study:** In this simulation study, the proposed method was compared with ridge regression methods given in [2,22,39] by a simulation study. The number of observations (n) was taken as 100, 500 and 1000; the standard deviation of error term was taken as 0.01 and 1 and comparisons were made for the total 6 cases. For each case, 1000 data set including multicollinearity problem was created.

The first three independent variables were generated from standard normal distribution as given in Equation 28.

${X}_{i}~N\left(0,1\right)i=1,2,3\text{(28)}$

The last two independent variables were generated by using Equation 29. Thus, it is provided to arise multicollinearity problem for the data set by providing a high correlation between independent variables ${X}_{1}$ and,${X}_{4}$ ${X}_{1}$ and ${X}_{5}$ .

${X}_{i}=U\left(10,20\right)+U\left(5,20\right){X}_{1}+N\left(0,7\right)i=4,5\text{(29)}$

The observations of dependent variable were obtained using Equation 30. So, all the coefficients in the regression model are taken as 1.

$Y={{\displaystyle \sum}}^{\text{}}{X}_{i}+N(0,\sigma )$

For each data generated in each case, ${{\displaystyle \sum}}^{\text{}}VI{F}^{2}$ , SSE, MAPE and CN values are calculated by using proposed method, the studies [2, 22, 39]. The formula of SSE is given in Equation 31.

$SSE={\displaystyle \sum}_{i=1}^{n}{({y}_{i}-{\hat{y}}_{i})}^{2}\text{(31)}$

The most important indicator for the comparison of methods is that VIF and CN would be small. The methods [2] and [39] do not guarantee the solution of multicollinearity problem as seen in the numerical examples. The method [22] and proposed method guarantee that all VIF values are smaller than 10. Therefore, it is suitable to compare the proposed method with [22] method in terms of SSE and MAPE criteria.

The results of median and inter quartile range (IQR) valueswere given between Tables 4-9.

When all tables are examined, it is clearly seen that and CN values of proposed method is lower than the other methods in all cases.

However, it is seen that the proposed method produces lower MAPE values compared to others despite producing higher SSE values. This is because the objective function of the proposed method may be depending to the MAPE.

**The Second Simulation Study:** A second simulation study was performed in the paper according to different levels of multicollinearity problem and standard deviation of error term. The regressors were generated by using Equations 32-36 given by [44].

${w}_{ij}~N\left(0,1\right);i=1,2,\dots ,n;j=1,2,\dots ,6\text{(32)}$

$\begin{array}{l}{x}_{ij}={(}^{1}{w}_{ij}+\rho {w}_{i,6};\\ i=1,2,\dots ,n;j=1,2,3\text{(33)}\end{array}$

${x}_{ij}={w}_{ij},i=1,2,\dots ,n;j=4,5\text{(34)}$

${e}_{i}~N\left(0,\sigma \right);i=1,2,\dots ,n\text{(35)}$

${y}_{i}={\displaystyle \sum}_{j=1}^{5}{\beta}_{j}{x}_{i,j}+{e}_{i};i=1,2,\dots ,n\text{(36)}$

Where ${w}_{i,j}$ independent standard normal are pseudorandom numbers and ${\rho}^{2}$ is theoretical correlation between any two explanatory variables.

Simulation study was conducted for a total of 8 cases for sample size is 100, $(n=100)$ , standard deviation of the standard deviation of error term () and different degrees of multiple connections ($\sigma =0.01,0.1,1,5$ ) $\rho =0.99,0.999$ (Tables 10-17).

It is clearly seen that in the tables of the simulation Study 2, ${{\displaystyle \sum}}^{\text{}}VI{F}^{2}\pi $ and CN values of the proposed method do not change significantly when standard deviation of error term values are changed. ${{\displaystyle \sum}}^{\text{}}VI{F}^{2}$ And CN values of the proposed method are increased dramatically when multicollinearity is increased. And also there is no a hardly ever change to be seen in the MAPE values of the proposed method with the reasonable standard deviation of error term values $\left(\sigma =0.01,0.1\right)$ or there is a decrease to be seen in the MAPE values of the proposed method when multicollinearity is increased.

In this simulation study, different levels of standard deviation of error term are also employed. As a result of this simulation study it is clearly seen that when standard deviation of error term value is greater than 1 and >1 the model has very big deviation from linear regression model because MAPE values are obtained about 60 and this value is not suitable. And also, it is clearly seen that in the tables of the simulation study 2, the prediction performance of the proposed is affected quite negatively when standard deviation of error term is increased.

There are some valid assumptions to create a model in multiple regression analysis. One of them is that it should not be multicollinearity problem among independent variables. Ridge regression method is often used in the literature when there is a multicollinearity problem among independent variables.

But, ridge regression has also some problems. One of the most important problems in ridge regression is to decide what the shrinkage parameter (k) value will be. There are many studies in the literature to find the optimal k value. In these studies, this k value was found to be a single value. But in this study, we found different k values corresponding to each explanatory variable instead of a single value of k by using a new algorithm based on particle swarm optimization. And also, the proposed method was supported by two simulation studies. Besides, it is an important novelty for ridge regression literature.

In the future studies, different artificial intelligence optimization techniques can be used to find these k values for each explanatory variable.

- Hoerl AE, Kennard RW (1970) Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12: 55–67. Link: https://goo.gl/5ZV56T
- Hoerl AE, Kennard RW, Baldwin KF (1975) Ridge regression: some simulations. Communications in Statistics 4: 105–123. Link: https://goo.gl/QGgP3L
- McDonald GC, Galarneau DI (1975) A Monte Carlo evaluation of some ridge-type estimators. Journal of the American Statistical Association 70: 407–412. Link: https://goo.gl/7ZN2co
- Lawless JF, Wang P (1976) A simulation study of ridge and other regression estimators. Communications in Statistics – Theory and Methods 14: 1589–1604. Link: https://goo.gl/WfUz0p
- Hocking RR, Speed FM, Lynn MJ (1976) A class of biased estimators in linear regression. Technometrics 18: 425–437. Link: https://goo.gl/NEjsRY
- Gunst RF, Mason RL (1977) Biased estimation in regression: an evaluation using mean squared error. Journal of the American Statistical Association 72: 616–628. Link: https://goo.gl/HfxIin
- Wichern D, Curchill G (1978) A comparison of ridge estimators. Technometrics 20: 301-311. Link: https://goo.gl/U6OiUQ
- Lawless JF (1978) Ridge and related estimation procedure Theory and Methods. Communications in Statistics 7: 139–164. Link: https://goo.gl/KceYME
- Nordberg L (1982) A procedure for determination of a good ridge parameter in linear regression, Communications in Statistics 11: 285–309. Link: https://goo.gl/pNqtc2
- Saleh AK, Kibria BM (1993) Performances of some new preliminary test ridge regression estimators and their properties. Communications in Statistics – Theory and Methods 22: 2747–2764. Link: https://goo.gl/4XqQNd
- Haq MS, Kibria BMG (1996) a shrinkage estimator for the restricted linear regression model: ridge regression approach. Journal of Applied Statistical Science 3: 301–316. Link: https://goo.gl/sjCZrw
- Kibria BM (2003) Performance of some new ridge regression estimators. Communications in Statistics – Simulation and Computation 32: 419–435. Link: https://goo.gl/3OJp6a
- Pasha GR, Shah MA (2004) Application of ridge regression to multicollinear data. Journal of Research Science 15: 97– 106. Link: https://goo.gl/5eP2I5
- Khalaf G, Shukur G (2005) Choosing ridge parameter for regression problem. Communications in Statistics – Theory and Methods 34: 1177–1182. Link: https://goo.gl/Nu1Xs4
- Norliza A, Maizah HA, Robin A (2006) A comparative study on some methods for handling multicollinearity problems. Mathematika 22: 109–119. Link: https://goo.gl/Tlyqej
- Alkhamisi MA, Shukur G (2007) A Monte Carlo study of recent ridge parameters. Communications in Statistics – Simulation and Computation 36: 535–547. Link: https://goo.gl/Mv2FMY
- Mardikyan S, Cetin E (2008) Efficient choice of biasing constant for ridge regression. Int. J. Contemp. Math. Sciences, 3: 527–536. Link: https://goo.gl/oOgsiH
- Prago-Alejo RJ,Torre-Trevino LM, Pina-Monarrez MR (2008) Optimal determination of k constant of ridge regression using a simple genetic algorithm. Electronics robotics and Automotive Mechanics Conference. Link: https://goo.gl/uPVi0B
- Dorugade AV, Kashid DN (2010) Alternative method for choosing ridge parameter for regression. Applied Mathematical Sciences 4: 447–456. Link: https://goo.gl/E7MYJ5
- Al-Hassan Y (2010) Performance of new ridge regression estimators. Journal of the Association of Arab Universities for Basic and Applied Science 9: 23–26. Link: https://goo.gl/zTjoEe
- Ahn JJ, Byun HW, Oh KJ, Kim TY (2012) Using ridge regression with genetic algorithm to enhance real estate appraisal forecasting. Expert Systems with Applications 39: 8369–8379. Link: https://goo.gl/TM0Udi
- Uslu VR, Egrioglu E, Bas E (2014) Finding optimal value for the shrinkage parameter in ridge regression via particle swarm optimization. American Journal of Intelligent Systems 4: 142-147. Link: https://goo.gl/U06GuG
- Chitsaz S, Ahmed SE (2012) Shrinkage estimation for the regression parameter matrix in multivariate regression model. Journal of Statistical Computation and Simulation 82: 309-323. Link: https://goo.gl/lDIZzU
- Firinguetti L (1997) Ridge regression in the context of a system of seemingly unrelated regression equations. Journal of Statistical Computation and Simulation 56: 145-162. Link: https://goo.gl/wWROue
- Halawa AM, El Bassiouni MY (2000) Tests of regression coefficients under ridge regression models. Journal of Statistical Computation and Simulation 65: 341-356. Link: https://goo.gl/LUQbtW
- Dorugade AV, Kashid DN (2010) Variable selection in linear regression based on ridge estimator. Journal of Statistical Computation and Simulation 80: 1211-1224. Link: https://goo.gl/A0WJm7
- Golam Kibria BM (2004) Performance of the shrinkage preliminary test ridge regression estimators based on the conflicting of W, LR and LM tests, Journal of Statistical Computation and Simulation 74: 793-810. Link: https://goo.gl/TMvLYe
- Roozbeh M, Arashi M, Niroumand HA (2011) Ridge regression methodology in partial linear models with correlated errors. Journal of Statistical Computation and Simulation 81: 517-528. Link: https://goo.gl/Y2r4Nz
- Simpsona JR, Montgomery DC (1996) A biased-robust regression technique for the combined outlier-multicollinearity problem. Journal of Statistical Computation and Simulation 56: 1-22. Link: https://goo.gl/qgK7Fz
- Uzuke CA, Mbegbu JI, Nwosu CR (2015) Performance of kibria, khalaf and shurkur's methods when the eigenvalues are skewed. Communications in Statistics - Simulation and Computation Link: https://goo.gl/VdvgIo
- Wong KY, Chiu SN (2015) an iterative approach to minimize the mean squared error in ridge regression. Computational Statistics 30: 625-639. Link: https://goo.gl/rdHK1p
- Dorugade AV (2014) new ridge parameters for ridge regression. Journal of the Association of Arab Universities for Basic and Applied Sciences 15: 94-99. Link: https://goo.gl/wpNPWf
- Khalaf G (2013) An optimal estimation for the ridge regression parameter. Journal of Fundamental and Applied Statistics 5: 11-19.Link: https://goo.gl/bo4bde
- Muniz G, Golam Kibria BM, Månsson K, Ghazi S (2012) On developing ridge regression parameters: a graphical investigation. Sort-Statistics and Operations Research Transactions 36: 115-138. Link: https://goo.gl/o13EFW
- Muniz G, Golam Kibria BM (2009) On some ridge regression estimators: an empirical comparisons. Communications in Statistics - Simulation and Computation 38: 621-630. Link: https://goo.gl/wqCKbh
- Nomura M (1988) On the almost unbiased ridge regression estimation. Communications in Statistics – Simulation and Computation 17: 729–743. Link: https://goo.gl/5kX0MM
- Montogomery DC, Peck EA, Vining GG (2006) Introduction to Linear Regression Analysis. John Wiley and Sons. Link: https://goo.gl/M3tgXY
- Batah FS, Ramnathan T, Gore SD (2008) The efficiency of modified jackknife and ridge type regression estimators: a comparison. Surveys in Mathematics and its Applications 3: 111–122. Link: https://goo.gl/bw8Xcf
- Hoerl AE, Kennard RW (1976) Ridge regression: iterative estimation of the biasing parameter. Commun. Statist. Theor. Meth.5: 77-88. Link: https://goo.gl/VxoSpF
- Hoerl AE, Kennard RW (1970) Ridge Regression: Applications to Nonorthogorial Problems. Technometrics 12: 69-82.Link: https://goo.gl/HKnemY
- Firinguetti L (1999) A generalized ridge regression estimator and its finite sample properties. Communications in Statistics-Theory and Methods 28: 1217-1229. Link: https://goo.gl/VTzhbj
- Kennedy J, Eberhart R (1995) Particle swarm optimization. In Proceedings of IEEE International Conference on Neural Networks, Piscataway, NJ, USA, IEEE Press. 1942–1948
- Chatterjee S, Hadi (2006) A Regression Analysis by Example. John Wiley and Sons. Link: https://goo.gl/Dx6iqn
- Gibbons DG (1981) A simulation study of some ridge estimators. Journal of the American Statistical Association 76:131–139. Link: https://goo.gl/XMzqJs

© 2017 Bas E, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Subscribe to our articles alerts and stay tuned.

This work is licensed under a Creative Commons Attribution 4.0 International License.