Page 1 of 1

Random Split Data

Posted: Mon Mar 21, 2016 8:34 am
by diggetybo
Hey everyone,

I'm aware that certain sample selection methods can be used to do in-sample predictions and measure test error, but I would like to know if eviews has a built-in way to do a random split on the data in the workfile. A browse through the command reference suggests it is not supported. But let me reiterate, I'm imagining a configurable command, say between 0 and 1. In my imaginary function:

Code: Select all

sample sample_data =@randomsplit(.8)
80% of the workfile size would be selected at random and saved as a sample called "sample_data" and I guess the remaining 20% should be stored somewhere too.

I don't think there is a way to do this with the if clause in the sample GUI, but that just might be my user error.

In any event, please share some of the typical ways this is achieved in eviews.

Re: Random Split Data

Posted: Mon Mar 21, 2016 8:39 am
by EViews Gareth

Code: Select all

smpl if rnd<0.8

Re: Random Split Data

Posted: Mon Mar 21, 2016 8:40 am
by startz
smpl if rnd<.8

or

series keep = rnd<.8
smpl if keep
wfsave
smpl if not keep
wfsave

Re: Random Split Data

Posted: Mon Mar 21, 2016 8:42 am
by diggetybo
Ok great, thanks again.

If I wanted to run some tests in the remaining 20% of the data in isolation would that be possible in the way you suggested?

Would subtracting the 80% sample from @all give me the 20%? or do samples not work like that?

Re: Random Split Data

Posted: Mon Mar 21, 2016 8:44 am
by startz
No problem. Just use
smpl if not keep

Re: Random Split Data

Posted: Mon Mar 21, 2016 8:47 am
by diggetybo
Oh I understand now, thanks!

Re: Random Split Data

Posted: Mon Mar 21, 2016 5:42 pm
by diggetybo
Hey, I'm back after having some time to take the sample selections you guys suggested for a test drive. I have some final questions when you have the chance,

1. The rnd < .8 if clause sample seemed very elegant in terms of the coding, so I liked it. However even after I created a sample object in my workfile this method would use a different random .8 sample every time I clicked estimate. I tried creating a separate sample named "fixed_data" from my existing .8 random sample range called "random_data":

Code: Select all

sample fixed_data = random_data
However, it said: Error, illegal date "=". So, it seems equating/assigning samples to each other is not allowed? If that's the case, is there some other way I can fix the sample after it has be randomly drawn the first time?

2. The series "keep" approach does work and doesn't recalculate each time, which is great if I need to go back and add/remove things from the estimation. The only drawback is taking up object space if you have a large data set and need many different randomly drawn samples. Also, I'm not sure why we have to call wfsave after.

So I'm using the keep way for now, but ideally I'd like to find someway to get the first method to work. Let me know if there is something more I can do.

I appreciate all the help!

Re: Random Split Data

Posted: Mon Mar 21, 2016 5:48 pm
by startz
The series keep is just a bunch of ones and zeros. It takes the same amount of space as any other series, which is to say that the space in memory is negligible. You could make a sample with

Code: Select all

sample s if keep
but that doesn't save any space. In fact, it adds an object.

Re: Random Split Data

Posted: Mon Mar 21, 2016 6:29 pm
by diggetybo
Hey startz,

Yea, you're right, the memory is negligible. I think it's mostly curiosity or stubbornness that has me still thinking on the .rnd way, even though the series 'keep' has already proved to work for me. I would suspect though, if you needed many distinct, randomly drawn samples from the data, at some point clutter would be an issue. You'd have to name them very intuitively, if the keep series name gets longer as a result, it will be hard to refer to it without auto-complete, or if you go with keep1, keep2, ect you'd need a legend/key. Anyway it might complicate things, if taken to the extreme.

Another thought that occurred is writing a program to run tests then delete the keep series as necessary if you are ocd about objects in the worfkile.

As it stands though, a static random sample using only the GUI if clause is not feasible?

Re: Random Split Data

Posted: Mon Mar 21, 2016 6:36 pm
by startz
Well, you could do what you suggested above: do a random sample and then save the observations in that sample. Then just use the saved workfile without worrying about the sample.

Re: Random Split Data

Posted: Mon Mar 21, 2016 7:34 pm
by EViews Gareth
You want a random sample that is persistent (I.e. The same observations are used every time). The only way to have it persistent is to keep it around (makes sense right?!).

You don't need to keep the sample objects around though. Just keep the series of 1/0s.

Sample objects are nothing more than little text strings - literally just the text "if keep=1".