Page 1 of 1

Stepwise regression and HAC error estimates

Posted: Mon Jan 10, 2011 7:37 am
by fboehlandt
Hi everyone,
I noticed that EViews 6 offers stepwise regression. Is it possible to use a stepwise regression algorithm that uses HAC-consitent error estimates (Newey_west)? I would like to avoid collinearity between the regressors whilst accounting for autocorrelation and heteroskedasticity in the error estimates. The dialogue for stepwise regression does not include specifications for the error estimates. Can anybody help? Thanx

Re: Stepwise regression and HAC error estimates

Posted: Mon Jan 10, 2011 9:35 am
by EViews Gareth
HAC is not available in the built in Stepwise routines (since HAC means you can't use many of the "tricks" we use in the internal Stepwise code). However you could always program one yourself. The program shown here:
http://forums.eviews.com/viewtopic.php? ... wise#p1379
is a good starting point.

Re: Stepwise regression and HAC error estimates

Posted: Tue Jan 11, 2011 5:38 am
by fboehlandt
Okay, thanks. I thought as much. I have a VBA code that does forward stepwise regression as well as another snippet for HAC error estimates according to White and Newey-West. I shall try to implement the code in EViews when I have the time and post it again once it is in working order.

Re: Stepwise regression and HAC error estimates

Posted: Mon Jan 24, 2011 4:27 am
by fboehlandt
PLEASE IGNORE THIS POST. REFER TO POST BELOW...

Hi Gareth
I have build on your suggested code snippet and translated an old script from R. As I don't know the command references well yet there may be some errors or superfluous loops. The idea is to incorporate variables from a pool of potential regressors if they contribute more in terms of explanatory power than they 'cost' in terms of degrees of freedom reduction. Similarly, regressors are removed if the benefits from increasing the degrees of freedom outway the loss in explanatory power. The F statistic (F-to-enter and F-to-leave) is used to determine which variables enter/leave. Lastly, the model removes regressors if (one) other regressor(s) explain(s) a significant proportion of the variation in the regressor being tested (Multicollinearity). In a nutshell, at every iteration:

1. variable enters based on F-to-enter
2. variable removed based on collinearity
3. variable leaves based on F-to-leave

Assume all potential regressors are grouped in 'xs' and the regressand is named 'y'. I will post amendments as I go along. Your input is greatly appreciated....

Code: Select all

!Fcrit1=3.84 !Fcrit2=2.71 !tolerance=0.99 !idx = 1 !k = xs.@count !n = xs.@minobs group xsa group xsd group xsi for !i=1to xs.@count %n=xs.@seriesname(!i) xsd.add {%n} next !cnt = 0 While xsa.@count < !k and !cnt < 1000 'max iterations 'ENTER !cnt = !cnt + 1 !maxF = !Fcrit1 !minF = !Fcrit2 vector (!cnt) ssr for !i=1 to xsd.@count %n = xsd.@seriesname(!i) xsa.add {%n} equation e1.ls y c xsa !currentssr= e1.@ssr !ncoef=e1.@ncoef !currentmsr=!currentssr/(!n-!ncoef) if !cnt = 1 Then !currentF = e1.@f else !currentF=(ssr(!cnt-1)-!currentssr)/!currentmsr endif if !currentF > !maxF then !maxF = !currentF !msr=!currentmsr !idx = !i ssr(!cnt) = !currentssr endif d e1 xsa.drop {%n} next If !maxF> !Fcrit1 then !enter = 1 %n = xsd.@seriesname(!idx) 'variable enter xsa.add {%n} xsd.drop {%n} else !enter = 0 endif stepreg(!cnt, 1) = !maxF 'COLLINEARITY (regress all x on all other xs) !maxR = !tolerance for !i=1 to xsa.@count %n=xsa.@seriesname(!i) xsa.drop {%n} equation e1.ls {%n} c xsa !currentR = e1.@r2 If !currentR>!maxR then !maxR = !currentR !idxss = !i endif d e1 xsa.add {%n} next 'remove collinear regressors if !maxR > !tolerance then %n=xsa.@seriesname(!idxs) xsa.drop {%n} equation e1.ls y c xsa ssr(!cnt)=e1.@ssr xsi.add {%n} endif 'LEAVE for !i=1 to xsa.@count - 1 %n=xsa.@seriesname(!i) xsa.drop {%n} equation e1.ls y c xsa !currentssr=e1.@ssr !ncoef=e1.@ncoef !currentmsr=!currentssr/(!n-!ncoef) !currentF=(!currentssr-ssr(!cnt))/!msr if !currentF < !minF then !minF = !currentFr !idx = !i endif d e1 xsa.add {%n} next If !minF< !Fcrit2 then %n = xsa.@seriesname(!idx) 'variable leave xsa.drop {%n} equation e1.ls y c xsa ssr(!cnt)=e1.@ssr xsd.add {%n} for !i=1 to xsi.@count %n = xsi.@seriesname(!i) xsd.add {%n} next else if !enter = 0 Then exitloop endif endif wend
[/color][/color]p.s. it should be fairly easy to amend the code to include HAC error estimates at every step

Re: Stepwise regression and HAC error estimates

Posted: Tue Jan 25, 2011 4:38 am
by fboehlandt
okay,
streamlined code, removed superfluous loops and corrected errors. This is what I have come up with:

Code: Select all

!Fcrit1=3.84 !Fcrit2=2.71 !tolerance=0.01 table(xs.@count, 10) stepreg !idx = 1 !k = xs.@count !n = xs.@minobs group xsa group xsd for !i=1to xs.@count %n=xs.@seriesname(!i) xsd.add {%n} next !cnt = 0 While !cnt < !k !cnt = !cnt + 1 if !cnt > 1 then equation e1.ls y c xsa !ssrr = e1.@ssr endif !maxF = !Fcrit1 !minF = !Fcrit2 for !i=1 to xsd.@count %n = xsd.@seriesname(!i) xsa.add {%n} equation e1.ls y c xsa !currentssr= e1.@ssr !ncoef=e1.@ncoef !currentmsr=!currentssr/(!n-!ncoef) if !cnt = 1 Then !currentF = e1.@f else !currentF=(!ssrr-!currentssr)/!currentmsr endif d e1 xsa.drop {%n} equation e1.ls {%n} c xsa 'tolerance !currentr2=1-e1.@r2 if !currentF > !maxF and !currentr2 > !tolerance then !enter = 1 !maxF = !currentF !msr=!currentmsr !idx = !i !ssr = !currentssr endif d e1 next If !maxF> !Fcrit1 then %n = xsd.@seriesname(!idx) 'variable enter xsa.add {%n} xsd.drop {%n} else exitloop endif If !cnt > 1 then !cnt2 = 0 While !cnt2 < xsa.@count !cnt2 = !cnt2 + 1 %n=xsa.@seriesname(1) xsa.drop {%n} equation e1.ls y c xsa !currentssr= e1.@ssr !currentF=(!currentssr-!ssr)/!msr If !currentF < !minF then !minF = !currentF !idx = !i endif xsa.add {%n} wend If !minF < !Fcrit2 then %n = xsa.@seriesname(!idx) 'variable leave xsa.drop {%n} endif endif wend
As before, all regressors should be grouped and named 'xs' whereas the regressand is 'y'. This is the forward stepwise regression algorithm from Neter (1996) Applied Linear Models: pp. 348-352. You find the chosen regressors in group 'xsa'. I shall post a HAC-version shortly. Comments welcome!

Re: Stepwise regression and HAC error estimates

Posted: Tue Jan 25, 2011 8:40 am
by fboehlandt
This time the F-to-enter and F-to-leave calculations are based on coefficient estimates and the standard errors thereof. Consequently, the F-estimations benefit from the Newey-West HAC adjustments at every step:
guys, please check the small changes added 1/27/2011 (marked with '! in the code). I will keep posting adjustments until it works flawlessly. Ps be patient

Code: Select all

!Fcrit1=3.84 !Fcrit2=2.71 !tolerance=0.01 !idx = 1 !k = xs.@count !n = xs.@minobs group xsa group xsd for !i=1to xs.@count %n=xs.@seriesname(!i) xsd.add {%n} next !cnt = 0 !enter = 1 '! line added While !cnt < !k !cnt = !cnt + 1 !maxF = !Fcrit1 !minF = !Fcrit2 !rowcounter = 0 '! vector t matrix F for !i=1 to xsd.@count %n = xsd.@seriesname(!i) xsa.add {%n} equation e1.ls(n) y c xsa vector (xsa.@count) t matrix (xsa.@count, xsd.@count) F For !j = 1 to xsa.@count t(!j)= e1.@tstats(1+!j) F(!j, !i) = t(!j)^2 next d e1 xsa.drop {%n} equation e1.ls(n) {%n} c xsa 'tolerance !r2=1-e1.@r2 if F(!enter, !i) > !maxF and !r2 > !tolerance then '! F(!cnt, !i) to !F(!enter, !i) '! removed: !enter = 1 !maxF = F(!enter, !i) '! F(!cnt, !i) to !F(!enter, !i) !idx = !i endif d e1 next If !maxF> !Fcrit1 then %n = xsd.@seriesname(!idx) 'variable enter !enter = !enter + 1 '! line added xsa.add {%n} xsd.drop {%n} else exitloop endif If !cnt > 1 then For !i=1 to xsa.@count if F(!i, !idx) < !minF then !minF=F(!i, !idx) !jdx = !i endif next If !minF < !Fcrit2 then %n = xsa.@seriesname(!jdx) 'variable leave !enter = !enter - 1 '! line added xsa.drop {%n} endif endif wend
Note that the results for the routine above and the results of the previous routine will almost certainly be different in the presence of heteroskedasticity and autocorrelation of the error terms. However, in the event that HAC is not a major concern, the selected variables are likely to be the same. In many intances one could run the stepwise regression routine without HAC estimates first and then estimate HAC errors for the final group of regressors. The above approach is consistent throughout. I recommend using the model in this post and simply remove (n) from the equations e1 if HAC estimates are not desired.

Re: Stepwise regression and HAC error estimates

Posted: Tue Jan 25, 2011 8:52 am
by EViews Gareth
Nice job.

Re: Stepwise regression and HAC error estimates

Posted: Fri Sep 23, 2011 9:19 am
by Mila
Hi, just seeking a clarification. Would the following syntax, for instance, be sufficient to make sure that the step regression process would actually be performed using the HAC SE?

STEPLS(method=UNI,BACK,BTOL=0.1,COV=HAC,COVBW=NEWEYWEST)

Re: Stepwise regression and HAC error estimates

Posted: Fri Sep 23, 2011 9:26 am
by EViews Gareth
The whole point of this thread was that the built in stepwise routines don't support HAC covariances, so you have to program it yourself (which Fboehlandt did masterfully). Thus your syntax will not work.

Re: Stepwise regression and HAC error estimates

Posted: Sat Sep 24, 2011 1:43 am
by Mila
Thanks for the clarification. Wishful thinking on my part!

Re: Stepwise regression and HAC error estimates

Posted: Mon Sep 26, 2011 12:25 pm
by fboehlandt
Hope the comments I have included help. Please don't hesitate to contact me should you have any further questions
p.s. make sure I didnt accidentally delete any lines of coding from above. I dont have Eviews on this computer so had to view the code in the text editor

Code: Select all

!Fcrit1=3.84 'this line sets the F-to-enter variable !Fcrit2=2.71 'this line sets the F-to-leave variable !tolerance=0.01 'this is the tolerance allowed for (1 - R^2). This indicates that whilst variables may be highly correlated, they may not be perfectly correlated in OLS) !idx = 1 !k = xs.@count 'the list of regressors available !n = xs.@minobs 'the minimum number of observations (i.e. in timeseries the number of observations for the shortest series) group xsa 'a group containing all regressors entered. Start out with 0 series group xsd 'a group containing all regressors not entered (yet). Starts out with 0 series 'This loops enters all regressors grouped under xs into group xsd for !i=1to xs.@count %n=xs.@seriesname(!i) xsd.add {%n} next !cnt = 0 !enter = 1 'this counts the number of regressors entered While !cnt < !k 'loops as long as there are regressors left to enter !cnt = !cnt + 1 !maxF = !Fcrit1 !minF = !Fcrit2 !rowcounter = 0 '! vector t matrix F 'this loop enters one regressor at a time. The regressor resulting in the maximum Fstat is the first variable to enter (provided the Fstat is in excess of Fcrit). for !i=1 to xsd.@count %n = xsd.@seriesname(!i) xsa.add {%n} equation e1.ls(n) y c xsa 'this is a simple OLS estimate for xi regressed against y vector (xsa.@count) t ' all t-values stored in vector for reference matrix (xsa.@count, xsd.@count) F ' all F-values stored in matrix for reference For !j = 1 to xsa.@count t(!j)= e1.@tstats(1+!j) F(!j, !i) = t(!j)^2 next d e1 xsa.drop {%n} equation e1.ls(n) {%n} c xsa 'tolerance !r2=1-e1.@r2 'to avoid perfect collinearity, this additional restriction is imposed if F(!enter, !i) > !maxF and !r2 > !tolerance then 'note that F-to-enter is tested against Fcritical of 3.84. For large samples, this should be a good enough approximation but you may want to change this manually !maxF = F(!enter, !i) !idx = !i endif d e1 next If !maxF> !Fcrit1 then %n = xsd.@seriesname(!idx) 'variable enter !enter = !enter + 1 xsa.add {%n} xsd.drop {%n} else exitloop 'it is possible that none of the regressors add any significant explanatory power, in which case the code stops and exits without entering a variable. endif If !cnt > 1 then 'This loop stepwise drops one variable already entered and removes variables if no/little explanatory power is lost. For !i=1 to xsa.@count if F(!i, !idx) < !minF then !minF=F(!i, !idx) !jdx = !i endif next If !minF < !Fcrit2 then %n = xsa.@seriesname(!jdx) 'variable leave !enter = !enter - 1 xsa.drop {%n} endif endif wend 'Comment: due to the outside loop and variables being moved back and forth between xsd and xsa groups, some variables may exit at one stage, reenter, and exit again. Although rarely the case with a limited number of regressors tested, there is a chance that the loop continues for quite a long time. In that case, one may want to implement a manual counter limiting the number of iterations to for instance k = 5000