EViews 10

For making suggestions and/or requests for new features you'd like added to EViews.

Moderators: EViews Gareth, EViews Moderator

CharlieEVIEWS
Posts: 202
Joined: Tue Jul 17, 2012 9:47 am

EViews 10

Postby CharlieEVIEWS » Tue May 24, 2016 4:06 pm

Suggestions thread for EViews 10 -- to hopefully store ideas over time as people think of them.

My motivation for this is a desire for:

1.) Machine Learning: this is the age of machine (and deep) learning, and, given how EViews is always a first choice go-to language for time series econometric forecasting, why not utilize the strong reputation it enjoys to implement some of the burgeoning developments in ML prediction also?

2.) 3D plots (and other visualisation tools useful in the 'Big Data' era).

Grand optimism maybe, and feel free to disregard both!

Charlie

diggetybo
Posts: 152
Joined: Mon Jun 23, 2014 12:04 am

Re: EViews 10

Postby diggetybo » Sat Jul 16, 2016 9:33 am

I would love to have a more discrete way of doing a random split. Whether its as simple as a .8 / .2 set aside split or k-fold CV, currently eviews needs to have the series in object forms to keep the randomly split data. If you are performing a high number of splits in the data, the workfile can become crowded with series objects and labeling the series with intuitive names becomes more difficult. Not to mention k-fold is only available via a plug-in sub-routine and is not very user-friendly (although I appreciate the author's effort).

My humble idea is to have random splits somehow coupled with the equation object, so that no extra series objects are needed for a random split, but the equation will be able to read the random split argument. So there might have to be a slightly new feature for the equation's sample UI box. This probably isn't the only way to handle it, but maybe it can jostle our imaginations awhile.

With regards to ML, a more refined random split approach will make it much smoother for us all to follow the elementary machine learning block diagram: getting data, splitting data, getting a validation set, a test set then of course the quality metric, minimizing true error, ect.

Keep up the great work, eviews has really served me well. I'm just trying to give some feedback.

Thank you for reading

EViews Gareth
Fe ddaethom, fe welon, fe amcangyfrifon
Posts: 11369
Joined: Tue Sep 16, 2008 5:38 pm

Re: EViews 10

Postby EViews Gareth » Sat Jul 16, 2016 10:02 am

TBH, this is completely outside my field of knowledge. Could you describe what a random-split is? Thank you.
Follow us on Twitter @IHSEViews

diggetybo
Posts: 152
Joined: Mon Jun 23, 2014 12:04 am

Re: EViews 10

Postby diggetybo » Sat Jul 16, 2016 5:39 pm

It probably goes by other names, I just know it by this particular convention. Random split is a way to randomly partition the data into groups. Customarily, a .8 & .2 split or .9 &.1 split are used. Although, it would make sense for the partition fraction to be user configurable, so we can split the data however we need. For example, after the random split randomly partitions the data, the 80% portion of the original data set would be used for estimation as normal, then the remaining 20% is kept aside or sometimes split further. There are several reasons why random splitting is used, a few being to avoid overfitting, making sure the model generalizes or forecasts well across different data sets. Then the 20% portion of the data helps to reduce overfitting, you can compare the sum of squares and other metrics to assess model performance and implement some iterative algorithms to find the right coefficients based on how the model performs on the different partitions of the data.

For large data sets, a static .8/.2 split could work. Whereas for smaller data sets k-fold cross validation splitting is useful, because all observations 'take turns' being part of different partitions of the data.

I realize that this is feasible already in eviews 9, but it's not very intuitive and doesn't scale up well. In the advent of the big data craze, it would be great to have smoother/more scalable random split functionality. I hope its a bit more clear now, but let me know if you'd like to discuss it further or bring up any area where you want more clarity / specifics. I'm excited for eviews 10!

EViews Gareth
Fe ddaethom, fe welon, fe amcangyfrifon
Posts: 11369
Joined: Tue Sep 16, 2008 5:38 pm

Re: EViews 10

Postby EViews Gareth » Sat Jul 16, 2016 7:37 pm

Why do you need to make copies of the series then? Aren't you just creating a random sample? Or is there sampling with replacement?
Follow us on Twitter @IHSEViews

diggetybo
Posts: 152
Joined: Mon Jun 23, 2014 12:04 am

Re: EViews 10

Postby diggetybo » Sun Jul 17, 2016 9:12 am

Oh right, I didn't make that very clear. I was referring to the "if keep=1" sample inequality based approach of having the same set of randomly split data. For that we'd need an actual series of 1's and 0's on the workfile, unless I'm mistaken. Or, currently, another alternative that doesn't need an actual series object would be the sample inequality "if rnd<.8". However, this will result in a new random split every time the equation object is estimated. I think eviews 10 could have the best of both worlds, a persistent sample of a random split that doesn't need a object in the workfile. And like I said earlier, maybe in the future this could be done with a user configurable option in the estimate equation interface/sample box.

So in keeping with the ML scenario, we'd need to have a persistent sample because the different partitions might take on different roles (validation set, test set, ect) and it wouldn't make sense to keep shuffling them once they have been randomly split (as it would be currently with .rnd<.8). So that leaves the case for why the "if keep =1" approach might not scale well. Consider large data sets where multiple models might be being estimated each using multiple splits (i.e. .8/.2 and then splitting the .2 into .1 and .1 for test and validation). The 'keep series' objects would start to pile up. While we could "use if not keep" to reference the .2 partition, it would be increasingly difficult to distinguish between the series objects from the different splits/models and it would also increase the overall risk of user error. Again, I'm not sure how feasible that is from the development side of things, but it would help things be more scalable if random splits could be moved into the background so to speak.

To reiterate briefly, having a more user friendly / scalable approach to random splits would benefit the community because partitioning the data the way we want it will need to take place before any estimation. One look at eview's drop down menu in the estimate equation interface tells me all the bases are covered. We have everything from TSLS to GARCH to ARFIMA. These complicated models are easy for the user to implement. However partitioning data, although equally important, seems substantially more challenging to implement. For instance, take a look at how challenging the eviews code is that is needed to run a k-fold cross validation (it's a subroutine, is a 3rd party plug in, and needs like 20 arguments). So basically, that's my feed back on random splits.

It'd be great to have several built-in partitioning methods, a user configurable random split in addition to some popular alternatives like k-fold. I've seen some packages that have a 'seed' argument as part of the attribute. So that if you are collaborating with other researchers you can reproduce other results while still having the random splits. (like everyone types in seed=1)

Its probably pretty boring to most people, but I'm enthusiastic about it, and although I tried to keep it short, this post still ended up being pretty long. Nonetheless, please feel free to continue the dialogue if you are interested.

trubador
Did you use forum search?
Posts: 1518
Joined: Thu Nov 20, 2008 12:04 pm

Re: EViews 10

Postby trubador » Sun Jul 17, 2016 9:31 am

I think OP refers to me as the "author" who put the effort in writing the k-fold CV subroutine: viewtopic.php?f=23&t=12261

I already shared with him the code behind this routine and totally agree that such approach (i.e. cross validation) should be made available as a built-in feature. But I disagree that the procedure is not very user-friendly. I believe the difficulty in reading and understanding the code arises due to some workarounds and tricks that I had to use in order to ease the use of add-in for its main purpose.

Please keep in mind that cross validation is a model selection approach and is therefore more complicated than a mere data partitioning. That's why the add-in requires so many input arguments.

As for the Gareth's question; please see the following code to understand the logic behind CV approach:

Code: Select all

'Generate some data
wfcreate u 100
series x1 = nrnd
series x2 = nrnd
series y =5 + 2*x1 + nrnd

!k=5 'number of folds
matrix(!k,2) rmse 'matrix to hold evaluation criterion
series trainset 'series to select randomly split data

'Carry out cross validation (a sort of)
for !i=1 to !k
 trainset = @runif(0,1)>=(1/!k) 'randomly select the observations to be used in the training set
 smpl if trainset=1 'train set
 equation eq1!i.ls y c x1
 equation eq2!i.ls y c x1 x2
 smpl if trainset=0 'test set
 eq1!i.fit yf1
 eq2!i.fit yf2
 rmse(!i,1) = @rmse(y,yf1)
 rmse(!i,2) = @rmse(y,yf2)
 smpl @all
next

show rmse 'we choose the first model over the second

One of the main difficulties here is to make sure that all folds have the same number (but each with a unique set) of observations, and to think of a way to handle the cases where the number of observations are not exactly divided by the chosen number of folds. Not to mention the situations where the sample range is different than that of workfile range. This is where some workarounds/tricks and additional coding are needed, which unnecessarily complicates the procedure and populate the workfile.

Adjusting the sample prior to each estimation might also slow things down: viewtopic.php?f=8&t=13288
Last edited by trubador on Sun Jul 17, 2016 3:28 pm, edited 1 time in total.

EViews Gareth
Fe ddaethom, fe welon, fe amcangyfrifon
Posts: 11369
Joined: Tue Sep 16, 2008 5:38 pm

Re: EViews 10

Postby EViews Gareth » Sun Jul 17, 2016 9:42 am

So a possible solution would be to have a seed option on the random number generators (built into the function rather than as a secondary command). That way you can have a sample of randoms that doesn't change.
Follow us on Twitter @IHSEViews

trubador
Did you use forum search?
Posts: 1518
Joined: Thu Nov 20, 2008 12:04 pm

Re: EViews 10

Postby trubador » Sun Jul 17, 2016 3:44 pm

I am not sure if the seed option would be enough to produce random subsets for k-fold CV as each set should hold unique values of the total sample (e.g. draw without replacement). It should be clear that the code above does not ensure this property.

diggetybo
Posts: 152
Joined: Mon Jun 23, 2014 12:04 am

Re: EViews 10

Postby diggetybo » Sun Jul 17, 2016 6:37 pm

My post did come off as lumping them into the same category, sorry for that. Like trubador said k-fold has its own intricacies, so different approaches might be needed for each procedure. However, I believe a seed option for a simple random split would be great.

jfgeli
Posts: 55
Joined: Fri Jan 30, 2009 6:29 pm

Re: EViews 10

Postby jfgeli » Fri Aug 05, 2016 2:16 am

Another item to the wishlist...get the line number when showing errors.
That would save A LOT of time to those debugging code, specially when it is very long and you are not very acquainted with it. I know that there are some workarounds, like including log and statusbar messages, but I am talking about pieces of code I did not initially write

Alternatively, you can improve the current message box when showing an error in those cases in which local strings or scalars are used, I give an example.
Let´s pretend I run this code:

%aCtry = "US DE ES"
for %Ctry {%aCtry}
series data2_{%Ctry} = (data1_{%Ctry})*2
next

but the series data1_ES do not exist...then Eviews will give me an error saying that the series data1_ES do not exist. If I would have a very long code and I don´t know it very well, it might take me quite a lot of time to find the line of code where the error is taking place. However, if Eviews tells me also the error as it is written in the code (i.e. series data2_{%Ctry} = (data1_{%Ctry})*2) I could just try to find that line with the search function.
In the case of subroutines or functions that are stored in other .prg files...you can also report that, no?
I believe these functionalities are a compromise between providing the line number (which I understand it is really difficult to do according to previous posts) and the current status quo.

marend
Posts: 8
Joined: Fri Jul 19, 2013 12:20 pm

Re: EViews 10

Postby marend » Tue Apr 25, 2017 5:47 am

Seasonal-and-calendar adjusted procedure:

In the seasonal adjustment methods I would make available a diolog box to add the calendar series. This should be incorporated very easily and is extremely useful and necesary to analyze the data. Actually I created an add-in program to seasonally-and-calendar adjust time series for Chile.

EViews Gareth
Fe ddaethom, fe welon, fe amcangyfrifon
Posts: 11369
Joined: Tue Sep 16, 2008 5:38 pm

Re: EViews 10

Postby EViews Gareth » Tue Apr 25, 2017 8:38 pm

marend wrote:Seasonal-and-calendar adjusted procedure:

In the seasonal adjustment methods I would make available a diolog box to add the calendar series. This should be incorporated very easily and is extremely useful and necesary to analyze the data. Actually I created an add-in program to seasonally-and-calendar adjust time series for Chile.



Could you provide more details on this?
Follow us on Twitter @IHSEViews

diggetybo
Posts: 152
Joined: Mon Jun 23, 2014 12:04 am

Re: EViews 10

Postby diggetybo » Mon May 29, 2017 6:42 am

I would say updating built-in imputation methods for missing values would be helpful for Eviews 10 users. Missing values can cause an assortment of complications such as singular value decomposition is unable to be carried out, ect.

If we had a command like knnimpute(data,k), it could replace the NaNs in data with a weighted mean of the k nearest-neighbors within the eviews series.

A full implementation can be found here:

http://www.mathworks.com/help/bioinfo/r ... hworks.com

Even if it was not as comprehensive as described above, a simple knn imputation would save a lot of time, as its very common to have nan values in a dataset; for many of us its a daily occurrence.

Thank you for asking for feedback.

EViews Gareth
Fe ddaethom, fe welon, fe amcangyfrifon
Posts: 11369
Joined: Tue Sep 16, 2008 5:38 pm

Re: EViews 10

Postby EViews Gareth » Mon May 29, 2017 7:08 am

I presume you mean for matrices?
Follow us on Twitter @IHSEViews


Return to “Suggestions and Requests”

Who is online

Users browsing this forum: No registered users and 2 guests