stepwise with log of non positive values
Posted: Fri Oct 11, 2013 8:02 am
I am using stepwise regression on a data set with missing values.
Eviews 8 Sept 20 2013 build.
In the list of search regressors, I accidentally entered the log of two binary variables. In Eviews 7, the error message "Log of a non-positive value". In Eviews 8, I get results but I would not classify the results as correct.
Below log(ivy) and log(priv) are logs of binary variables.
Forward selection with log(binary) in the search regressors:
STEPLS(FTOL=0.99,BTOL=0.99) LOG(SCORE) C @ LOG(ACCEPT) LOG(ACT) LOG(ALUM) LOG(ENDOW) LOG(ENDPERSTUD) LOG(ENDPERUND) LOG(GRAD) LOG(IVY) LOG(PRIV) LOG(RET) LOG(S_F) LOG(SAT) LOG(TOPTEN) LOG(TOTENROLL) LOG(UNENROLL)
Dependent Variable: LOG(SCORE)
Method: Stepwise Regression
Date: 10/11/13 Time: 09:11
Sample: 1 202
Included observations: 199
Number of always included regressors: 1
Number of search regressors: 15
Selection method: Stepwise forwards
Stopping criterion: p-value forwards/backwards = 0.9/0.9
Note: final equation sample is larger than stepwise sample (rejected
regressors contain missing values)
LOG(SCORE) = C(1) + C(2)*LOG(ENDPERUND) + C(3)*LOG(TOTENROLL) + C(4)*LOG(ALUM) + C(5)*LOG(S_F) + C(6)*LOG(GRAD) + C(7)*LOG(ACCEPT)
Variable Prob.*
C 0.3815
LOG(ENDPERUND) 0
LOG(TOTENROLL) 0
LOG(ALUM) 0
LOG(S_F) 0.0175
LOG(GRAD) 0
LOG(ACCEPT) 0
R-squared 0.916617
Identical results with .99,.99 rather than .9,.9
Combinatorial with same search regressors
STEPLS(METHOD=COMB,FTOL=0.9,BTOL=0.9,NVARS=7) LOG(SCORE) C @ LOG(ACCEPT) LOG(ACT) LOG(ALUM) LOG(ENDOW) LOG(ENDPERSTUD) LOG(ENDPERUND) LOG(GRAD) LOG(IVY) LOG(PRIV) LOG(RET) LOG(S_F) LOG(SAT) LOG(TOPTEN) LOG(TOTENROLL) LOG(UNENROLL)
Dependent Variable: LOG(SCORE)
Method: Stepwise Regression
Date: 10/11/13 Time: 09:18
Sample: 1 202
Included observations: 199
Number of always included regressors: 1
Number of search regressors: 15
Selection method: Combinatorial
Number of search regressors: 7
Note: final equation sample is larger than stepwise sample (rejected
regressors contain missing values)
LOG(SCORE) = C(1) + C(2)*LOG(ENDPERUND) + C(3)*LOG(TOTENROLL) + C(4)*LOG(ALUM) + C(5)*LOG(S_F) + C(6)*LOG(GRAD) + C(7)*LOG(ACCEPT) + C(8)*LOG(RET)
Variable Prob.*
C 0
LOG(ENDPERUND) 0.0003
LOG(TOTENROLL) 0.0001
LOG(ALUM) 0
LOG(S_F) 0.0007
LOG(GRAD) 0
LOG(ACCEPT) 0.003
LOG(RET) 0
R-squared 0.926661
LOG(RET) is in the equation at a pvalue < .9 and yet is not in the forward stepwise. The reported sample and included number of observations are the same and I checked that the 3 missing values of each regression occur at the same observations.
At least as interesting,
STEPLS(METHOD=COMB,FTOL=0.9,BTOL=0.9,NVARS=8) LOG(SCORE) C @ LOG(ACCEPT) LOG(ACT) LOG(ALUM) LOG(ENDOW) LOG(ENDPERSTUD) LOG(ENDPERUND) LOG(GRAD) LOG(IVY) LOG(PRIV) LOG(RET) LOG(S_F) LOG(SAT) LOG(TOPTEN) LOG(TOTENROLL) LOG(UNENROLL)
produces
Dependent Variable: LOG(SCORE)
Method: Stepwise Regression
Date: 10/11/13 Time: 08:59
Sample: 1 202
Included observations: 202
Number of always included regressors: 1
Number of search regressors: 15
Selection method: Combinatorial
Number of search regressors: 8
Note: final equation sample is larger than stepwise sample (rejected
regressors contain missing values)
Variable Coefficient Std. Error t-Statistic Prob.*
C 3.887912 0.023380 166.2932 0.0000
R-squared 0.000000 Mean dependent var 3.887912
Number of combinations compared: 6435
Somehow included obs is 202 nothing excluded and no variables.
LS LOG(SCORE) C LOG(ENDPERUND) LOG(TOTENROLL) LOG(ALUM) LOG(S_F) LOG(GRAD) LOG(ACCEPT) LOG(RET) LOG(UNENROLL)
Produces
Dependent Variable: LOG(SCORE)
Method: Least Squares
Date: 10/11/13 Time: 09:30
Sample: 1 202
Included observations: 199
LOG(SCORE) = C(1) + C(2)*LOG(ENDPERUND) + C(3)*LOG(TOTENROLL) + C(4)*LOG(ALUM) + C(5)*LOG(S_F) + C(6)*LOG(GRAD) + C(7)*LOG(ACCEPT) + C(8)*LOG(RET) + C(9)*LOG(UNENROLL)
C 0
LOG(ENDPERUND) 0.0001
LOG(TOTENROLL) 0.6985
LOG(ALUM) 0
LOG(S_F) 0.0002
LOG(GRAD) 0
LOG(ACCEPT) 0.002
LOG(RET) 0
LOG(UNENROLL) 0.1138
R-squared 0.927622
Which is a 8 variable plus intercept, 199 observations, and higher R^2 than the 7 variable plus intercept above. Why didn't combinatorial at least see that model? Why did it produce NO ANSWER?
In Eviews 7 the LOG(PRIV) and LOG(IVY) produce 'log non-positive number'. In Eviews 8, we get, well, I don't know.
Data set available at http://econ413.wustl.edu/pearlmank-v8.wf1
Regressions of interest at T* what WHAT*
See table RESULTS22 if you like.
Bob
Eviews 8 Sept 20 2013 build.
In the list of search regressors, I accidentally entered the log of two binary variables. In Eviews 7, the error message "Log of a non-positive value". In Eviews 8, I get results but I would not classify the results as correct.
Below log(ivy) and log(priv) are logs of binary variables.
Forward selection with log(binary) in the search regressors:
STEPLS(FTOL=0.99,BTOL=0.99) LOG(SCORE) C @ LOG(ACCEPT) LOG(ACT) LOG(ALUM) LOG(ENDOW) LOG(ENDPERSTUD) LOG(ENDPERUND) LOG(GRAD) LOG(IVY) LOG(PRIV) LOG(RET) LOG(S_F) LOG(SAT) LOG(TOPTEN) LOG(TOTENROLL) LOG(UNENROLL)
Dependent Variable: LOG(SCORE)
Method: Stepwise Regression
Date: 10/11/13 Time: 09:11
Sample: 1 202
Included observations: 199
Number of always included regressors: 1
Number of search regressors: 15
Selection method: Stepwise forwards
Stopping criterion: p-value forwards/backwards = 0.9/0.9
Note: final equation sample is larger than stepwise sample (rejected
regressors contain missing values)
LOG(SCORE) = C(1) + C(2)*LOG(ENDPERUND) + C(3)*LOG(TOTENROLL) + C(4)*LOG(ALUM) + C(5)*LOG(S_F) + C(6)*LOG(GRAD) + C(7)*LOG(ACCEPT)
Variable Prob.*
C 0.3815
LOG(ENDPERUND) 0
LOG(TOTENROLL) 0
LOG(ALUM) 0
LOG(S_F) 0.0175
LOG(GRAD) 0
LOG(ACCEPT) 0
R-squared 0.916617
Identical results with .99,.99 rather than .9,.9
Combinatorial with same search regressors
STEPLS(METHOD=COMB,FTOL=0.9,BTOL=0.9,NVARS=7) LOG(SCORE) C @ LOG(ACCEPT) LOG(ACT) LOG(ALUM) LOG(ENDOW) LOG(ENDPERSTUD) LOG(ENDPERUND) LOG(GRAD) LOG(IVY) LOG(PRIV) LOG(RET) LOG(S_F) LOG(SAT) LOG(TOPTEN) LOG(TOTENROLL) LOG(UNENROLL)
Dependent Variable: LOG(SCORE)
Method: Stepwise Regression
Date: 10/11/13 Time: 09:18
Sample: 1 202
Included observations: 199
Number of always included regressors: 1
Number of search regressors: 15
Selection method: Combinatorial
Number of search regressors: 7
Note: final equation sample is larger than stepwise sample (rejected
regressors contain missing values)
LOG(SCORE) = C(1) + C(2)*LOG(ENDPERUND) + C(3)*LOG(TOTENROLL) + C(4)*LOG(ALUM) + C(5)*LOG(S_F) + C(6)*LOG(GRAD) + C(7)*LOG(ACCEPT) + C(8)*LOG(RET)
Variable Prob.*
C 0
LOG(ENDPERUND) 0.0003
LOG(TOTENROLL) 0.0001
LOG(ALUM) 0
LOG(S_F) 0.0007
LOG(GRAD) 0
LOG(ACCEPT) 0.003
LOG(RET) 0
R-squared 0.926661
LOG(RET) is in the equation at a pvalue < .9 and yet is not in the forward stepwise. The reported sample and included number of observations are the same and I checked that the 3 missing values of each regression occur at the same observations.
At least as interesting,
STEPLS(METHOD=COMB,FTOL=0.9,BTOL=0.9,NVARS=8) LOG(SCORE) C @ LOG(ACCEPT) LOG(ACT) LOG(ALUM) LOG(ENDOW) LOG(ENDPERSTUD) LOG(ENDPERUND) LOG(GRAD) LOG(IVY) LOG(PRIV) LOG(RET) LOG(S_F) LOG(SAT) LOG(TOPTEN) LOG(TOTENROLL) LOG(UNENROLL)
produces
Dependent Variable: LOG(SCORE)
Method: Stepwise Regression
Date: 10/11/13 Time: 08:59
Sample: 1 202
Included observations: 202
Number of always included regressors: 1
Number of search regressors: 15
Selection method: Combinatorial
Number of search regressors: 8
Note: final equation sample is larger than stepwise sample (rejected
regressors contain missing values)
Variable Coefficient Std. Error t-Statistic Prob.*
C 3.887912 0.023380 166.2932 0.0000
R-squared 0.000000 Mean dependent var 3.887912
Number of combinations compared: 6435
Somehow included obs is 202 nothing excluded and no variables.
LS LOG(SCORE) C LOG(ENDPERUND) LOG(TOTENROLL) LOG(ALUM) LOG(S_F) LOG(GRAD) LOG(ACCEPT) LOG(RET) LOG(UNENROLL)
Produces
Dependent Variable: LOG(SCORE)
Method: Least Squares
Date: 10/11/13 Time: 09:30
Sample: 1 202
Included observations: 199
LOG(SCORE) = C(1) + C(2)*LOG(ENDPERUND) + C(3)*LOG(TOTENROLL) + C(4)*LOG(ALUM) + C(5)*LOG(S_F) + C(6)*LOG(GRAD) + C(7)*LOG(ACCEPT) + C(8)*LOG(RET) + C(9)*LOG(UNENROLL)
C 0
LOG(ENDPERUND) 0.0001
LOG(TOTENROLL) 0.6985
LOG(ALUM) 0
LOG(S_F) 0.0002
LOG(GRAD) 0
LOG(ACCEPT) 0.002
LOG(RET) 0
LOG(UNENROLL) 0.1138
R-squared 0.927622
Which is a 8 variable plus intercept, 199 observations, and higher R^2 than the 7 variable plus intercept above. Why didn't combinatorial at least see that model? Why did it produce NO ANSWER?
In Eviews 7 the LOG(PRIV) and LOG(IVY) produce 'log non-positive number'. In Eviews 8, we get, well, I don't know.
Data set available at http://econ413.wustl.edu/pearlmank-v8.wf1
Regressions of interest at T* what WHAT*
See table RESULTS22 if you like.
Bob

