Huge Bond Data File - Filtering out Bonds I don't need

JimForest · Postby **JimForest** » Sat Dec 10, 2016 7:02 pm

Hi,

I have an enormous file of bond data. It is about 250 MB after deleting columns of data I don't need. Originally it was 650 MB.

While I am impressed that EViews can handle the data import. I have a ways to go in getting things in the format I need.

I have four separate tables, that I can compile into one, that store CUSIP identifiers for the bonds I want to look at for my research. There are around 800 bonds. This would further reduce the file size significantly.

How do I create a worksheet that stores only the data for the bonds in my list. I tried using smpl filtering but there is a limit to the number of IF conditions. Is there an easier way of bringing this file down to size???

Any help would be greatly appreciated.

EViews Gareth · Postby **EViews Gareth** » Sat Dec 10, 2016 7:29 pm

Question is too generic to be of much help. But for the smpl if statements, you might find the "@inlist" function useful.

JimForest · Postby **JimForest** » Sat Dec 10, 2016 7:55 pm

That looks helpful. Thanks Gareth!! :D

JimForest · Postby **JimForest** » Sat Dec 10, 2016 9:02 pm

@inlist is a cool feature, but it won't handle a list of 861 objects.

Can the list be a vector?

startz · Postby **startz** » Sun Dec 11, 2016 8:20 am

You might do something like
sample s1 if cusip=1 or cusip=2
sample s2 if cusip= 4
smpl if s1 or s2

EViews Gareth · Postby **EViews Gareth** » Sun Dec 11, 2016 12:52 pm

Or

Smpl if @inlist(cusip, "1 2 3") or @inlist(cusip, "4 5 6")

JimForest · Postby **JimForest** » Sun Dec 11, 2016 8:22 pm

Still not resolving my problem as there are 861 different cusips. How can I store them in a vector and select the entire contents of the vector in the smpl command? As with my other question in this forum, I can't import non-numerical data to a matrix.

I created a workfile that is small in size and of the same structure. In it I have begun to draft a program, so you can see what I am trying to do. In this manner, I could set up a loop to perform the copying I need, one CUSIP at a time. The program is saved in a text object within the workfile.

Can you please take a look and offer some advice? Not sure why I can't store the needed data in a matrix.

EViews Glenn · Postby **EViews Glenn** » Mon Dec 12, 2016 12:10 pm

Not sure how to interpret your example, but if I am understanding the setup correctly, I think the easiest way is to perform the subsampling is to create a selection series and then to use a program to loop through the values you want to keep, setting the keep indicator to 1 whenever it encounters a value

Code: Select all

smpl @all
series keep = 0

for %cuid 2 4 6 8
keep = @recode(cusip={%cuid}, 1, keep)
next

(the for loop above needs to be in a program file). When you are do, you'll have KEEP with values of 1 for all of the matching CUSIPs and 0,s elsewhere.

Then

Code: Select all


smpl if keep=1

will filter the sample to only use those observations with CUSIP that matched the values in the loop.

JimForest · Postby **JimForest** » Mon Dec 12, 2016 1:08 pm

Thank you very much. I see the logic of the code above. Thanks Glenn.

as long as the line "for %cuid 2 4 6 8" would handle 861 cusips, but I don't think it will based on my experimentation.

So I believe filtering the data is not as important.

What I want to do now, as in the program, is to do the algorithm that filters the huge data set down to one single cusip for the desired date range based on data from a table (as in text file in workfile).

My program needs to take my large unstructured workfile, filter it down to the specified date ranges for a single cusip number - one at a time in a loop - and copy the data to another workfile. Then, in the loop, go back to sample @all and then resample down to the next cusip date range combination and copy that data to the new workfile, and keep doing until all the data is sent to the new workfile.

The program would be pulling the cusip and dates from the cells of a table (as dates/charachters in a matrix are problematic).

If I could store dates in a matrix, the first CUSIP I want is in the first row and first column. I was calling the matrix 'dates' - so in the program

CUSIP Ann_Date Issue_Date
8DIGITS1 1983-02-13 1983-02-15
8DIGITS2 1983-02-18 1983-02-20
8DIGITS3 1983-03-01 1983-03-03
8DIGITS4 1983-03-10 1983-03-12

Code: Select all

%startdate="1983-01-01" 'set prior to first auction (for future WI data) purposes
%enddate="1983-06-01"
wfcreate d5 %startdate %enddate

'Matrix called dates holds arrays of information needed
'First column of matrix holds CUSIP identifiers
'Third column of matrix holds Issue_Dates
'copying data from test_dates to new workfile in specified date range

'for loop (need help here) 
for !i=1 to @nrows(dates)  'see what I am trying to do? 4 rows in the matrix
smpl dates(i,3) dates(i+1,3) IF tcusip = dates(i,1) 'so sample from 1983-02-15 to 1983-02-20 IF tcusip="8DIGITS"

'below I need help referencing the new workfile - do my code might not make sense (please advise on how to specify workfiles)

series tcusip (in new workfile) = tcusip (test_dates)
series tdbid (in new workfile) = tdbid (test_dates)
series tdask (in new workfile) = task (test_dates) 'etc...
'here I can add more series once it gets working - but this is the gist

smpl @all 'set back to full sample before returning to repeat in loop for next CUSIP

'*** at this point it should loop back to do the same for the following bond 8DIGITS2
'when it gets to the last cusip, I would then have it fill data to @last

So because I have to use a table, rather than a matrix. The data for the first CUSIP is shifted down one row - no big deal. But how do I change the code to reference a table cell?

Also, as you can see, I am using two workfiles in one program. Source file is undated and irregular (but has date columns), target files is a dated daily 5-day workfile.

Source file looks like this (notice, data for first CUSIP extends beyond issue date of second CUSIP - this is why I need specific ranges for each CUSIP):
8DIGITS1 1983-02-13 99.5 99.4375
8DIGITS1 1983-02-14 99.59375 99.53125
8DIGITS1 1983-02-15 99.71875 99.65625
8DIGITS1 1983-02-16 99.5625 99.5
8DIGITS1 1983-02-17 99.8125 99.75
8DIGITS1 1983-02-18 99.6875 99.625
8DIGITS1 1983-02-19 99.34375 99.28125
8DIGITS1 1983-02-20 99.28125 99.21875
8DIGITS1 1983-02-21 99.28125 99.21875
8DIGITS1 1983-02-22 99.28125 99.21875
8DIGITS1 1983-02-23 99.3125 99.25
8DIGITS2 1983-02-18 99.15625 99.09375
8DIGITS2 1983-02-19 99.0625 99
8DIGITS2 1983-02-20 99.21875 99.15625
8DIGITS2 1983-02-21 98.875 98.8125
8DIGITS2 1983-02-22 98.09375 98.03125
8DIGITS2 1983-02-23 98.3125 98.25
8DIGITS2 1983-02-24 97.90625 97.84375
8DIGITS2 1983-02-25 97.90625 97.84375
8DIGITS2 1983-02-26 97.96875 97.90625
8DIGITS2 1983-02-27 98.15625 98.09375
8DIGITS2 1983-02-28 97.8125 97.75
8DIGITS2 1983-03-01 97.84375 97.78125

I think I am pretty close to making a usable program. I just need advice on how to adapt it.

Please help. I very much appreciate this...

Cheers,

Jim

JimForest · Postby **JimForest** » Mon Dec 12, 2016 1:58 pm

See, commands like this don't seem to work:

smpl dates_table(2,3) dates(3,3) IF tcusip=dates_table(2,1)

I thought it might be because the cell is stored as a string, so I tried this:

smpl @strdate(dates_table(2,3)) @strdate(dates_table(3,3)) IF tcusip=dates_table(2,1)

And this string to date function?

smpl @stod(dates_table(2,3)) @stod(dates_table(3,3)) IF tcusip=dates_table(2,1)

JimForest · Postby **JimForest** » Mon Dec 12, 2016 2:04 pm

I tried this too:

smpl @dateval(dates_table(2,3)) @dateval(dates_table(3,3)) IF tcusip=dates_table(2,1)

Everything gives me an 'illegal date' -

How do I get a program to tell EViews to take a table cell that contains a date and use it in a sample statement?

EViews Gareth · Postby **EViews Gareth** » Mon Dec 12, 2016 2:13 pm

Code: Select all


%start = dates_table(2,3)
%end = dates_table(3,3)
smpl {%start} {%end}

I'll be honest, between this thread, your other thread and the emails you've sent, I have absolutely no idea what you're trying to achieve. But there is something somewhere that tells me you're making it far more complicated than it needs to be.

JimForest · Postby **JimForest** » Mon Dec 12, 2016 2:41 pm

Thank you. Ok, I see that with the braces I can specify the contents in a smpl statement.

The cusip and sample in the source file (undated irregular format) is going to change at every iteration in the loop - there are many bonds and many dates for each bond. Each bond starts on a different date.

My first question was "far too generic" as you said - the rest should have been more clear, sorry if I am confusing you. Trying to give you as much detail as possible. I have been a client for many years but I don't do a lot of data manipulation programming in EViews and I am not a professional programmer.

I took your code above and pasted into a program file and clicked run. This is the result.

Error in Sample: Illegal date 2/15/1983 in "SMPL 1983-02-15
1983-02-20".

Not trying to frustrate you, believe me. Just baffled by some of these issues. I posted my workfile, and will attach here again.

Again: what I am trying to do is create a daily workfile of bond data for "on-the-run" bonds - the most recently issued bond. Source data contains all bonds, which are issued frequently but irregularly. I want to track just the most recently issued bond at a give maturity. So I need to look up an issue date and the CUSIP identifier for that bond and copy the needed data from the giant source workfile to a new daily workfile for ONLY the dates that that bond was the MOST RECENTLY issued. Then, starting on the date a new bond is issued, I want to repeat the process and continue doing so until I have a nice set of time series with data for ONLY the "on-the-run" bonds. Basically, write bond data for bond x from date first issued until bond y is issued, then write data for bond y until bond z is issued, etc...

But I need to learn how to deal with this issue of getting EViews to read these dates. I'm sure it is a simple thing that I am making more difficult - but that is why I am asking.

EViews Gareth · Postby **EViews Gareth** » Mon Dec 12, 2016 2:49 pm

Still no idea what you're doing.

But that file isn't dated, so you can't specify a sample with dates.

JimForest · Postby **JimForest** » Mon Dec 12, 2016 3:02 pm

Okay, that at least makes sense. Even though there are dates in the file, because it is unstructured it can't be sampled using smpl.

So what can be done? Can it be sent to a database object and then queried one CUSIP at a time based on dates in a table?

I have no clue how to do this...

EViews.com

Huge Bond Data File - Filtering out Bonds I don't need

Huge Bond Data File - Filtering out Bonds I don't need

Re: Huge Bond Data File - Filtering out Bonds I don't need

Re: Huge Bond Data File - Filtering out Bonds I don't need

Re: Huge Bond Data File - Filtering out Bonds I don't need

Re: Huge Bond Data File - Filtering out Bonds I don't need

Re: Huge Bond Data File - Filtering out Bonds I don't need

Re: Huge Bond Data File - Filtering out Bonds I don't need

Re: Huge Bond Data File - Filtering out Bonds I don't need

Re: Huge Bond Data File - Filtering out Bonds I don't need

Re: Huge Bond Data File - Filtering out Bonds I don't need

Re: Huge Bond Data File - Filtering out Bonds I don't need

Re: Huge Bond Data File - Filtering out Bonds I don't need

Re: Huge Bond Data File - Filtering out Bonds I don't need

Re: Huge Bond Data File - Filtering out Bonds I don't need

Re: Huge Bond Data File - Filtering out Bonds I don't need

Who is online