Large data task execution time

fi99ggb · Postby **fi99ggb** » Wed Jul 10, 2019 6:41 am

Hello

I have a c.10,000 series I want to model (standard regression) and forecast and I have written a program to automate the task. Which approach is faster:

1. Split the data into two (or more) files and execute the program for the two files simultaneously?
or
2. Keep all data in one file and run the program.

I believe (1) is faster. But not sure why.

Any insights would be really appreciated.

Thanks
George

EViews Gareth · Postby **EViews Gareth** » Wed Jul 10, 2019 7:46 am

EViews does not have multi-threading built in to its programming language. Although a number of procedures are internally multi-threaded, you cannot, within the programming language, instruct EViews to perform more than one task at a time.

The internal multi-threading generally only kicks in with large numbers of observations (it is more efficient to use one thread on most regressions, for example, than to split the work up among many threads since the effort of distributing the work is itself costly). Thus if you have a smallish data set, only one thread will be used during the running of your program.

That's a very long winded way of saying that artificially introducing simultaneous processing by launching multiple instances of EViews and splitting the 10,000 regressions among those instances will almost certainly be more time efficient than using one instance.

Of course this won't be true if you have a somewhat old PC that can support only one thread.

And of course it does require some manual labour to split the job up between multiple instances.

And finally, I just ran a 10,000 multi variable regressions in my copy of EViews and it took 3 seconds. It may be that all this is moot!

fi99ggb · Postby **fi99ggb** » Wed Jul 10, 2019 9:35 am

Thanks Gareth. It makes sense.

I have 10,000 variables but the regressions I run are a few million as I use a recursive scheme to generate forecasts and evaluate them out of sample. I also use multiple models for each variable. It takes me about one a half our to run 3.5 million regressions and generate a few variables and statistics which I still think is fast.

Are there any tips written somewhere on how to speed up large tasks like use quite model, @recode instead of smpl if (?), or anything else?

Thanks again.

George

EViews Gareth · Postby **EViews Gareth** » Wed Jul 10, 2019 9:44 am

Probably the best/only tip I have is to be careful of object proliferation. There is a tendency, as a human, to say write this:

Code: Select all

for !i=1 to 1000
series y!i=nrnd
series x!i=nrnd
equation eq!i.ls y!i c x!i
'do something with eq!i

This will cause 1000 equation objects to be created, which is nice as a human because you can open up each one and look at them etc... But those 1000 equation objects cause the workfile to balloon which makes working with the workfile much more time intensive. Better to reuse the same equation:

Code: Select all

for !i=1 to 1000
series y!i=nrnd
series x!i=nrnd
equation eq.ls y!i c x!i
'do something with eq

EViews.com

Large data task execution time

Large data task execution time

Re: Large data task execution time

Re: Large data task execution time

Re: Large data task execution time

Who is online