Stepwise regression and HAC error estimates

fboehlandt · Postby **fboehlandt** » Mon Jan 10, 2011 7:37 am

Hi everyone,
I noticed that EViews 6 offers stepwise regression. Is it possible to use a stepwise regression algorithm that uses HAC-consitent error estimates (Newey_west)? I would like to avoid collinearity between the regressors whilst accounting for autocorrelation and heteroskedasticity in the error estimates. The dialogue for stepwise regression does not include specifications for the error estimates. Can anybody help? Thanx

EViews Gareth · Postby **EViews Gareth** » Mon Jan 10, 2011 9:35 am

HAC is not available in the built in Stepwise routines (since HAC means you can't use many of the "tricks" we use in the internal Stepwise code). However you could always program one yourself. The program shown here:
viewtopic.php?f=15&t=383&p=1379&hilit=stepwise#p1379
is a good starting point.

fboehlandt · Postby **fboehlandt** » Tue Jan 11, 2011 5:38 am

Okay, thanks. I thought as much. I have a VBA code that does forward stepwise regression as well as another snippet for HAC error estimates according to White and Newey-West. I shall try to implement the code in EViews when I have the time and post it again once it is in working order.

fboehlandt · Postby **fboehlandt** » Mon Jan 24, 2011 4:27 am

PLEASE IGNORE THIS POST. REFER TO POST BELOW...

Hi Gareth
I have build on your suggested code snippet and translated an old script from R. As I don't know the command references well yet there may be some errors or superfluous loops. The idea is to incorporate variables from a pool of potential regressors if they contribute more in terms of explanatory power than they 'cost' in terms of degrees of freedom reduction. Similarly, regressors are removed if the benefits from increasing the degrees of freedom outway the loss in explanatory power. The F statistic (F-to-enter and F-to-leave) is used to determine which variables enter/leave. Lastly, the model removes regressors if (one) other regressor(s) explain(s) a significant proportion of the variation in the regressor being tested (Multicollinearity). In a nutshell, at every iteration:

1. variable enters based on F-to-enter
2. variable removed based on collinearity
3. variable leaves based on F-to-leave

Assume all potential regressors are grouped in 'xs' and the regressand is named 'y'. I will post amendments as I go along. Your input is greatly appreciated....

[color=#FF0000]

Code: Select all

!Fcrit1=3.84
!Fcrit2=2.71
!tolerance=0.99

!idx = 1
!k = xs.@count
!n = xs.@minobs
group xsa
group xsd
group xsi
for !i=1to xs.@count
   %n=xs.@seriesname(!i)
   xsd.add {%n}
next
!cnt = 0
While xsa.@count < !k and !cnt < 1000 'max iterations
   'ENTER
   !cnt = !cnt + 1
   !maxF = !Fcrit1
   !minF = !Fcrit2
   vector (!cnt) ssr
   for !i=1 to xsd.@count
      %n = xsd.@seriesname(!i)
      xsa.add {%n}
      equation e1.ls y c xsa
      !currentssr= e1.@ssr
      !ncoef=e1.@ncoef
      !currentmsr=!currentssr/(!n-!ncoef)      
      if !cnt = 1 Then
         !currentF = e1.@f
      else
         !currentF=(ssr(!cnt-1)-!currentssr)/!currentmsr
      endif
      if !currentF > !maxF then
         !maxF = !currentF
         !msr=!currentmsr
         !idx = !i 
         ssr(!cnt) = !currentssr
      endif
      d e1
      xsa.drop {%n}
   next
   If !maxF> !Fcrit1 then
      !enter = 1
      %n = xsd.@seriesname(!idx)
      'variable enter
      xsa.add {%n}
      xsd.drop {%n}
   else
      !enter = 0
   endif
   stepreg(!cnt, 1) = !maxF
   'COLLINEARITY (regress all x on all other xs)
   !maxR = !tolerance
   for !i=1 to xsa.@count
      %n=xsa.@seriesname(!i)
      xsa.drop {%n}
      equation e1.ls {%n} c xsa
      !currentR = e1.@r2
      If !currentR>!maxR then
         !maxR = !currentR
         !idxss = !i
      endif
      d e1      
      xsa.add {%n}
   next
   'remove collinear regressors
   if !maxR > !tolerance then
      %n=xsa.@seriesname(!idxs)
      xsa.drop {%n} 
      equation e1.ls y c xsa
      ssr(!cnt)=e1.@ssr
      xsi.add {%n}
   endif
   'LEAVE
   for !i=1 to xsa.@count - 1
      %n=xsa.@seriesname(!i)
      xsa.drop {%n}
      equation e1.ls y c xsa
      !currentssr=e1.@ssr
      !ncoef=e1.@ncoef
      !currentmsr=!currentssr/(!n-!ncoef)
      !currentF=(!currentssr-ssr(!cnt))/!msr
      if !currentF < !minF then
         !minF = !currentFr
         !idx = !i 
      endif
      d e1
      xsa.add {%n}
   next                           
   If !minF< !Fcrit2 then
      %n = xsa.@seriesname(!idx)
      'variable leave
      xsa.drop {%n}
      equation e1.ls y c xsa
      ssr(!cnt)=e1.@ssr      
      xsd.add {%n}
      for !i=1 to xsi.@count
         %n = xsi.@seriesname(!i)
         xsd.add {%n}
      next
   else
      if  !enter = 0 Then            
         exitloop
      endif
   endif   
wend

[/color]p.s. it should be fairly easy to amend the code to include HAC error estimates at every step

fboehlandt · Postby **fboehlandt** » Tue Jan 25, 2011 4:38 am

okay,
streamlined code, removed superfluous loops and corrected errors. This is what I have come up with:

Code: Select all

!Fcrit1=3.84
!Fcrit2=2.71
!tolerance=0.01

table(xs.@count, 10) stepreg
!idx = 1
!k = xs.@count
!n = xs.@minobs
group xsa
group xsd
for !i=1to xs.@count
   %n=xs.@seriesname(!i)
   xsd.add {%n}
next
!cnt = 0
While !cnt < !k
   !cnt = !cnt + 1
   if !cnt > 1 then
      equation e1.ls y c xsa
      !ssrr = e1.@ssr
   endif
   !maxF = !Fcrit1
   !minF = !Fcrit2
   for !i=1 to xsd.@count
      %n = xsd.@seriesname(!i)
      xsa.add {%n}
      equation e1.ls y c xsa
      !currentssr= e1.@ssr
      !ncoef=e1.@ncoef
      !currentmsr=!currentssr/(!n-!ncoef)      
      if !cnt = 1 Then
         !currentF = e1.@f
      else
         !currentF=(!ssrr-!currentssr)/!currentmsr
      endif
      d e1
      xsa.drop {%n}
      equation e1.ls {%n} c xsa
      'tolerance
      !currentr2=1-e1.@r2
      if !currentF > !maxF and !currentr2 > !tolerance then
         !enter = 1
         !maxF = !currentF
         !msr=!currentmsr
         !idx = !i 
         !ssr = !currentssr
      endif
      d e1
   next
   If !maxF> !Fcrit1 then
      %n = xsd.@seriesname(!idx)
      'variable enter
      xsa.add {%n}
      xsd.drop {%n}
   else
      exitloop
   endif
   If !cnt > 1 then
      !cnt2 = 0
      While !cnt2 < xsa.@count
         !cnt2 = !cnt2 + 1
         %n=xsa.@seriesname(1)
         xsa.drop {%n}
         equation e1.ls y c xsa
         !currentssr= e1.@ssr
         !currentF=(!currentssr-!ssr)/!msr
         If !currentF < !minF then
            !minF = !currentF
            !idx = !i
         endif
         xsa.add {%n}      
      wend
      If !minF < !Fcrit2 then
         %n = xsa.@seriesname(!idx)
         'variable leave
         xsa.drop {%n}
      endif      
   endif
wend

As before, all regressors should be grouped and named 'xs' whereas the regressand is 'y'. This is the forward stepwise regression algorithm from Neter (1996) Applied Linear Models: pp. 348-352. You find the chosen regressors in group 'xsa'. I shall post a HAC-version shortly. Comments welcome!

fboehlandt · Postby **fboehlandt** » Tue Jan 25, 2011 8:40 am

This time the F-to-enter and F-to-leave calculations are based on coefficient estimates and the standard errors thereof. Consequently, the F-estimations benefit from the Newey-West HAC adjustments at every step:
guys, please check the small changes added 1/27/2011 (marked with '! in the code). I will keep posting adjustments until it works flawlessly. Ps be patient

Code: Select all

!Fcrit1=3.84
!Fcrit2=2.71
!tolerance=0.01
!idx = 1
!k = xs.@count
!n = xs.@minobs
group xsa
group xsd
for !i=1to xs.@count
   %n=xs.@seriesname(!i)
   xsd.add {%n}
next
!cnt = 0
!enter = 1 '! line added
While !cnt < !k
   !cnt = !cnt + 1
   !maxF = !Fcrit1
   !minF = !Fcrit2
   !rowcounter = 0 '!
   vector t
   matrix F
   for !i=1 to xsd.@count
      %n = xsd.@seriesname(!i)
      xsa.add {%n}
      equation e1.ls(n) y c xsa
      vector (xsa.@count) t
      matrix (xsa.@count, xsd.@count) F
      For !j = 1 to xsa.@count
         t(!j)= e1.@tstats(1+!j)
         F(!j, !i) = t(!j)^2
      next      
      d e1
      xsa.drop {%n}
      equation e1.ls(n) {%n} c xsa
      'tolerance
      !r2=1-e1.@r2
      if F(!enter, !i) > !maxF and !r2 > !tolerance then '! F(!cnt, !i) to !F(!enter, !i)
         '! removed: !enter = 1 
         !maxF = F(!enter, !i) '! F(!cnt, !i) to !F(!enter, !i)
         !idx = !i 
      endif
      d e1
   next
   If !maxF> !Fcrit1 then
      %n = xsd.@seriesname(!idx)
      'variable enter
                                !enter = !enter + 1 '! line added

      xsa.add {%n}
      xsd.drop {%n}
   else
      exitloop
   endif
   If !cnt > 1 then
      For !i=1 to xsa.@count
         if F(!i, !idx) < !minF then
            !minF=F(!i, !idx)
            !jdx = !i
         endif
      next
      If !minF < !Fcrit2 then
         %n = xsa.@seriesname(!jdx)
         'variable leave
                                                !enter = !enter - 1 '! line added
         xsa.drop {%n}
      endif
   endif         
wend

Note that the results for the routine above and the results of the previous routine will almost certainly be different in the presence of heteroskedasticity and autocorrelation of the error terms. However, in the event that HAC is not a major concern, the selected variables are likely to be the same. In many intances one could run the stepwise regression routine without HAC estimates first and then estimate HAC errors for the final group of regressors. The above approach is consistent throughout. I recommend using the model in this post and simply remove (n) from the equations e1 if HAC estimates are not desired.

EViews Gareth · Postby **EViews Gareth** » Tue Jan 25, 2011 8:52 am

Nice job.

Mila · Postby **Mila** » Fri Sep 23, 2011 9:19 am

Hi, just seeking a clarification. Would the following syntax, for instance, be sufficient to make sure that the step regression process would actually be performed using the HAC SE?

STEPLS(method=UNI,BACK,BTOL=0.1,COV=HAC,COVBW=NEWEYWEST)

EViews Gareth · Postby **EViews Gareth** » Fri Sep 23, 2011 9:26 am

The whole point of this thread was that the built in stepwise routines don't support HAC covariances, so you have to program it yourself (which Fboehlandt did masterfully). Thus your syntax will not work.

Mila · Postby **Mila** » Sat Sep 24, 2011 1:43 am

Thanks for the clarification. Wishful thinking on my part!

fboehlandt · Postby **fboehlandt** » Mon Sep 26, 2011 12:25 pm

Hope the comments I have included help. Please don't hesitate to contact me should you have any further questions
p.s. make sure I didnt accidentally delete any lines of coding from above. I dont have Eviews on this computer so had to view the code in the text editor

Code: Select all

!Fcrit1=3.84 'this line sets the F-to-enter variable
!Fcrit2=2.71 'this line sets the F-to-leave variable
!tolerance=0.01 'this is the tolerance allowed for (1 - R^2). This indicates that whilst variables may be highly correlated, they may not be perfectly correlated in OLS)
!idx = 1
!k = xs.@count 'the list of regressors available
!n = xs.@minobs 'the minimum number of observations (i.e. in timeseries the number of observations for the shortest series)
group xsa 'a group containing all regressors entered. Start out with 0 series
group xsd 'a group containing all regressors not entered (yet). Starts out with 0 series

'This loops enters all regressors grouped under xs into group xsd
for !i=1to xs.@count
   %n=xs.@seriesname(!i)
   xsd.add {%n}
next
!cnt = 0
!enter = 1 'this counts the number of regressors entered
While !cnt < !k 'loops as long as there are regressors left to enter
   !cnt = !cnt + 1
   !maxF = !Fcrit1
   !minF = !Fcrit2
   !rowcounter = 0 '!
   vector t
   matrix F
   'this loop enters one regressor at a time. The regressor resulting in the maximum Fstat is the first variable to enter (provided the Fstat is in excess of Fcrit). 
   for !i=1 to xsd.@count
      %n = xsd.@seriesname(!i)
      xsa.add {%n}
      equation e1.ls(n) y c xsa 'this is a simple OLS estimate for xi regressed against y
      vector (xsa.@count) t ' all t-values stored in vector for reference
      matrix (xsa.@count, xsd.@count) F ' all F-values stored in matrix for reference
      For !j = 1 to xsa.@count
         t(!j)= e1.@tstats(1+!j)
         F(!j, !i) = t(!j)^2
      next      
      d e1
      xsa.drop {%n}
      equation e1.ls(n) {%n} c xsa
      'tolerance
      !r2=1-e1.@r2 'to avoid perfect collinearity, this additional restriction is imposed
      if F(!enter, !i) > !maxF and !r2 > !tolerance then 'note that F-to-enter is tested against Fcritical of 3.84. For large samples, this should be a good enough approximation but you may want to change this manually
         !maxF = F(!enter, !i)
         !idx = !i 
      endif
      d e1
   next
   If !maxF> !Fcrit1 then
      %n = xsd.@seriesname(!idx)
      'variable enter
      !enter = !enter + 1
      xsa.add {%n}
      xsd.drop {%n}
   else
      exitloop 'it is possible that none of the regressors add any significant explanatory power, in which case the code stops and exits without entering a variable.
   endif
   If !cnt > 1 then
      'This loop stepwise drops one variable already entered and removes variables if no/little explanatory power is lost.
      For !i=1 to xsa.@count
         if F(!i, !idx) < !minF then
            !minF=F(!i, !idx)
            !jdx = !i
         endif
      next
      If !minF < !Fcrit2 then
         %n = xsa.@seriesname(!jdx)
         'variable leave
         !enter = !enter - 1
         xsa.drop {%n}
      endif
   endif         
wend
'Comment: due to the outside loop and variables being moved back and forth between xsd and xsa groups, some variables may exit at one stage, reenter, and exit again. Although rarely the case with a limited number of regressors tested, there is a chance that the loop continues for quite a long time. In that case, one may want to implement a manual counter limiting the number of iterations to for instance k = 5000

EViews.com

Stepwise regression and HAC error estimates

Stepwise regression and HAC error estimates

Re: Stepwise regression and HAC error estimates

Re: Stepwise regression and HAC error estimates

Re: Stepwise regression and HAC error estimates

Re: Stepwise regression and HAC error estimates

Re: Stepwise regression and HAC error estimates

Re: Stepwise regression and HAC error estimates

Re: Stepwise regression and HAC error estimates

Re: Stepwise regression and HAC error estimates

Re: Stepwise regression and HAC error estimates

Re: Stepwise regression and HAC error estimates

Who is online