Importing/joining data without headers

For questions regarding the import, export and manipulation of data in EViews, including graphing and basic statistics.

Moderators: EViews Gareth, EViews Steve, EViews Jason, EViews Moderator, EViews Pamela

indigo.bunting
Posts: 3
Joined: Fri Mar 02, 2018 3:01 pm

Importing/joining data without headers

Postby indigo.bunting » Fri Mar 02, 2018 4:10 pm

I have two data sets I'm working with. Each has 70 files (quarterly data from 2000-present), and each file has several hundred thousand to a million observations. Each data set is vertical bar delimited and does not have headers. The headers are given in a PDF, so I'll have to manually enter those in for each series name (unless there's an easy copy/paste method you know of).

The website I'm getting my information from is the Fannie Mae Acquisition and Performance data, in case you've run across this before. They give me SAS and R code to upload and manipulate the data, but if it's possible to upload this data directly to E-views without learning two languages I'm not familiar with (I primarily work in SQL and Excel), then I'd like to cut out the middleman.

The unique identifier in both of these sets is the Loan Identifier (1st column in both data sets). I found out how to enter the 25 headers for the Acquisition data (set 1) and the 31 headers for the Performance data (set 2):

Capture.PNG
Capture.PNG (19.85 KiB) Viewed 511 times


1) Is there any way I could somehow concatenate the data so that all 70 files for sets 1 and 2 have the same headers without manually entering them? That is, is there any way to upload one file and add new data to each field, creating more rows but having the same number of columns?

2) Is there any way I could combine Set 1 and Set 2 by doing a join by the Loan Identifier? Or if there's some equivalent to a VLOOKUP in Eviews, I could use something like that as well.

EViews Steve
EViews Developer
Posts: 517
Joined: Tue Sep 16, 2008 3:00 pm
Location: Irvine, CA

Re: Importing/joining data without headers

Postby EViews Steve » Sat Mar 03, 2018 7:39 am

I downloaded the sample acquisition file from the Fannie Mae site and tried to import it the way you wanted. I made the following assumptions:

1. You are importing your data into an undated workfile.
2. Each of your files have the same number of columns.

First, here's a program I wrote to import the same sample file three times, but shifted so I get the data repeated, one after the other. Obviously, you'll have to alter it to use your specific file names:

Code: Select all

import(type=txt,mode=u,resize) C:\files\acquisition-sample-file.txt ftype=ascii rectype=crlf skip=0 fieldtype=delimited custom="|" colhead=0 names=(LOAN_IDENTIFIER,ORIGINATION_CHANNEL,SELLER_NAME,,,,,,,,,,,,,,,,,,,,,,) eoltype=pad badfield=NA @freq U 1 @smpl @all
import(type=txt,mode=u,resize) C:\files\acquisition-sample-file.txt ftype=ascii rectype=crlf skip=0 fieldtype=delimited custom="|" colhead=0 names=(LOAN_IDENTIFIER,ORIGINATION_CHANNEL,SELLER_NAME,,,,,,,,,,,,,,,,,,,,,,) eoltype=pad badfield=NA @freq U 9 @smpl @all
import(type=txt,mode=u,resize) C:\files\acquisition-sample-file.txt ftype=ascii rectype=crlf skip=0 fieldtype=delimited custom="|" colhead=0 names=(LOAN_IDENTIFIER,ORIGINATION_CHANNEL,SELLER_NAME,,,,,,,,,,,,,,,,,,,,,,) eoltype=pad badfield=NA @freq U 17 @smpl @all

Some things to point out here:
1. The "names=" argument lets you specify the names for each column. Since this doesn't change, you can specify the same value here for each import command.
2. The "@freq U 1" frequency argument changes for each import so that subsequent imports don't go back and overwrite the previous data. This requires that you know how many rows are in each file. This is coupled with the "mode=u" option so that the workfile retains the previously imported data with each new import (you don't lose the previous imported data). Also, the "resize" option tells EViews to resize our workfile for each import so that there are new obs added for all available data.

You could probably ask EViews after each import how many obs there are by looking at a series' @last value. Just add 1 to this in a program variable, then use that value in the next import command.

Finally, you might skip all of these extra import steps by simply concatenating all of your 70 source files together into a single text file beforehand. You can do this by opening a DOS prompt, going to the directory that has your text files, then running the command:

Code: Select all

copy *.txt all.txt

This will concatenate all txt files and save the result as "all.txt". Of course this will only work if each of your text files has a carriage return at the end of the last row so that concatenation doesn't end up with occasional rows that are actually two rows.

Once you have the all.txt, just import it using our import wizard and specify your column names (or use a single import command like above).



About your second question about joining sets 1 and 2, it would be easiest to do if both sets had the same loan_ids in the same order. This way, you can do parallel imports but the 2nd import for the 2nd set will specify different column names so different series objects get created.

Or, if you can guarantee that the LOAN_IDENTIFIER will be unique within each dataset (it only appears once), you could alter your import command to use that series as the ID series in your workfile (so you get an undated with id workfile instead of just undated).

Code: Select all

import(type=txt) C:\files\acquisition-sample-file.txt ftype=ascii rectype=crlf skip=0 fieldtype=delimited custom="|" colhead=0 names=(LOAN_IDENTIFIER,ORIGINATION_CHANNEL,SELLER_NAME,,,,,,,,,,,,,,,,,,,,,,) eoltype=pad badfield=NA @id LOAN_IDENTIFIER @smpl @all

This command is different from above because it replaces the "@freq U 1" argument with "@id LOAN_IDENTIFIER".

If you use this method, you'll have to first manually concatenate all of your dataset text files so you can import them all in a single shot (I don't think you can do multiple appending imports into an undated with series workfile). Then do the same with your second dataset (concatenate first, then import) using the same command, but specify different column names.

indigo.bunting
Posts: 3
Joined: Fri Mar 02, 2018 3:01 pm

Re: Importing/joining data without headers

Postby indigo.bunting » Mon Mar 05, 2018 9:44 am

Steve,

Thank you for your response!

I like your suggestion of using Command Prompt to concatenate my files. In regards to the carriage return at the end of the last line, do I simply open each of the text files and put a \r (highlighted in yellow below) and save before running the cmd command?

Capture.PNG
Capture.PNG (297.85 KiB) Viewed 462 times

EViews Steve
EViews Developer
Posts: 517
Joined: Tue Sep 16, 2008 3:00 pm
Location: Irvine, CA

Re: Importing/joining data without headers

Postby EViews Steve » Mon Mar 05, 2018 9:45 am

Yes but only if it’s not already there.


Sent from my iPhone using Tapatalk

indigo.bunting
Posts: 3
Joined: Fri Mar 02, 2018 3:01 pm

Re: Importing/joining data without headers

Postby indigo.bunting » Mon Mar 05, 2018 4:08 pm

Ok, great. If my files are larger than 2GB, notepad won't open them. Do you know of a text editor that will allow for files larger than 2GB?

startz
Non-normality and collinearity are NOT problems!
Posts: 3337
Joined: Wed Sep 17, 2008 2:25 pm

Re: Importing/joining data without headers

Postby startz » Mon Mar 05, 2018 4:17 pm

Wordpad may work.


Return to “Data Manipulation”

Who is online

Users browsing this forum: No registered users and 5 guests