I have two data sets I'm working with. Each has 70 files (quarterly data from 2000-present), and each file has several hundred thousand to a million observations. Each data set is vertical bar delimited and does not have headers. The headers are given in a PDF, so I'll have to manually enter those in for each series name (unless there's an easy copy/paste method you know of).
The website I'm getting my information from is the Fannie Mae Acquisition and Performance data, in case you've run across this before. They give me SAS and R code to upload and manipulate the data, but if it's possible to upload this data directly to E-views without learning two languages I'm not familiar with (I primarily work in SQL and Excel), then I'd like to cut out the middleman.
The unique identifier in both of these sets is the Loan Identifier (1st column in both data sets). I found out how to enter the 25 headers for the Acquisition data (set 1) and the 31 headers for the Performance data (set 2):
1) Is there any way I could somehow concatenate the data so that all 70 files for sets 1 and 2 have the same headers without manually entering them? That is, is there any way to upload one file and add new data to each field, creating more rows but having the same number of columns?
2) Is there any way I could combine Set 1 and Set 2 by doing a join by the Loan Identifier? Or if there's some equivalent to a VLOOKUP in Eviews, I could use something like that as well.
Importing/joining data without headers
Moderators: EViews Gareth, EViews Jason, EViews Steve, EViews Moderator
-
- EViews Developer
- Posts: 795
- Joined: Tue Sep 16, 2008 3:00 pm
- Location: Irvine, CA
Re: Importing/joining data without headers
I downloaded the sample acquisition file from the Fannie Mae site and tried to import it the way you wanted. I made the following assumptions:
1. You are importing your data into an undated workfile.
2. Each of your files have the same number of columns.
First, here's a program I wrote to import the same sample file three times, but shifted so I get the data repeated, one after the other. Obviously, you'll have to alter it to use your specific file names:
Some things to point out here:
1. The "names=" argument lets you specify the names for each column. Since this doesn't change, you can specify the same value here for each import command.
2. The "@freq U 1" frequency argument changes for each import so that subsequent imports don't go back and overwrite the previous data. This requires that you know how many rows are in each file. This is coupled with the "mode=u" option so that the workfile retains the previously imported data with each new import (you don't lose the previous imported data). Also, the "resize" option tells EViews to resize our workfile for each import so that there are new obs added for all available data.
You could probably ask EViews after each import how many obs there are by looking at a series' @last value. Just add 1 to this in a program variable, then use that value in the next import command.
Finally, you might skip all of these extra import steps by simply concatenating all of your 70 source files together into a single text file beforehand. You can do this by opening a DOS prompt, going to the directory that has your text files, then running the command:
This will concatenate all txt files and save the result as "all.txt". Of course this will only work if each of your text files has a carriage return at the end of the last row so that concatenation doesn't end up with occasional rows that are actually two rows.
Once you have the all.txt, just import it using our import wizard and specify your column names (or use a single import command like above).
About your second question about joining sets 1 and 2, it would be easiest to do if both sets had the same loan_ids in the same order. This way, you can do parallel imports but the 2nd import for the 2nd set will specify different column names so different series objects get created.
Or, if you can guarantee that the LOAN_IDENTIFIER will be unique within each dataset (it only appears once), you could alter your import command to use that series as the ID series in your workfile (so you get an undated with id workfile instead of just undated).
This command is different from above because it replaces the "@freq U 1" argument with "@id LOAN_IDENTIFIER".
If you use this method, you'll have to first manually concatenate all of your dataset text files so you can import them all in a single shot (I don't think you can do multiple appending imports into an undated with series workfile). Then do the same with your second dataset (concatenate first, then import) using the same command, but specify different column names.
1. You are importing your data into an undated workfile.
2. Each of your files have the same number of columns.
First, here's a program I wrote to import the same sample file three times, but shifted so I get the data repeated, one after the other. Obviously, you'll have to alter it to use your specific file names:
Code: Select all
import(type=txt,mode=u,resize) C:\files\acquisition-sample-file.txt ftype=ascii rectype=crlf skip=0 fieldtype=delimited custom="|" colhead=0 names=(LOAN_IDENTIFIER,ORIGINATION_CHANNEL,SELLER_NAME,,,,,,,,,,,,,,,,,,,,,,) eoltype=pad badfield=NA @freq U 1 @smpl @all
import(type=txt,mode=u,resize) C:\files\acquisition-sample-file.txt ftype=ascii rectype=crlf skip=0 fieldtype=delimited custom="|" colhead=0 names=(LOAN_IDENTIFIER,ORIGINATION_CHANNEL,SELLER_NAME,,,,,,,,,,,,,,,,,,,,,,) eoltype=pad badfield=NA @freq U 9 @smpl @all
import(type=txt,mode=u,resize) C:\files\acquisition-sample-file.txt ftype=ascii rectype=crlf skip=0 fieldtype=delimited custom="|" colhead=0 names=(LOAN_IDENTIFIER,ORIGINATION_CHANNEL,SELLER_NAME,,,,,,,,,,,,,,,,,,,,,,) eoltype=pad badfield=NA @freq U 17 @smpl @all
Some things to point out here:
1. The "names=" argument lets you specify the names for each column. Since this doesn't change, you can specify the same value here for each import command.
2. The "@freq U 1" frequency argument changes for each import so that subsequent imports don't go back and overwrite the previous data. This requires that you know how many rows are in each file. This is coupled with the "mode=u" option so that the workfile retains the previously imported data with each new import (you don't lose the previous imported data). Also, the "resize" option tells EViews to resize our workfile for each import so that there are new obs added for all available data.
You could probably ask EViews after each import how many obs there are by looking at a series' @last value. Just add 1 to this in a program variable, then use that value in the next import command.
Finally, you might skip all of these extra import steps by simply concatenating all of your 70 source files together into a single text file beforehand. You can do this by opening a DOS prompt, going to the directory that has your text files, then running the command:
Code: Select all
copy *.txt all.txt
This will concatenate all txt files and save the result as "all.txt". Of course this will only work if each of your text files has a carriage return at the end of the last row so that concatenation doesn't end up with occasional rows that are actually two rows.
Once you have the all.txt, just import it using our import wizard and specify your column names (or use a single import command like above).
About your second question about joining sets 1 and 2, it would be easiest to do if both sets had the same loan_ids in the same order. This way, you can do parallel imports but the 2nd import for the 2nd set will specify different column names so different series objects get created.
Or, if you can guarantee that the LOAN_IDENTIFIER will be unique within each dataset (it only appears once), you could alter your import command to use that series as the ID series in your workfile (so you get an undated with id workfile instead of just undated).
Code: Select all
import(type=txt) C:\files\acquisition-sample-file.txt ftype=ascii rectype=crlf skip=0 fieldtype=delimited custom="|" colhead=0 names=(LOAN_IDENTIFIER,ORIGINATION_CHANNEL,SELLER_NAME,,,,,,,,,,,,,,,,,,,,,,) eoltype=pad badfield=NA @id LOAN_IDENTIFIER @smpl @all
This command is different from above because it replaces the "@freq U 1" argument with "@id LOAN_IDENTIFIER".
If you use this method, you'll have to first manually concatenate all of your dataset text files so you can import them all in a single shot (I don't think you can do multiple appending imports into an undated with series workfile). Then do the same with your second dataset (concatenate first, then import) using the same command, but specify different column names.
-
- Posts: 3
- Joined: Fri Mar 02, 2018 3:01 pm
Re: Importing/joining data without headers
Steve,
Thank you for your response!
I like your suggestion of using Command Prompt to concatenate my files. In regards to the carriage return at the end of the last line, do I simply open each of the text files and put a \r (highlighted in yellow below) and save before running the cmd command?
Thank you for your response!
I like your suggestion of using Command Prompt to concatenate my files. In regards to the carriage return at the end of the last line, do I simply open each of the text files and put a \r (highlighted in yellow below) and save before running the cmd command?
-
- EViews Developer
- Posts: 795
- Joined: Tue Sep 16, 2008 3:00 pm
- Location: Irvine, CA
Re: Importing/joining data without headers
Yes but only if it’s not already there.
Sent from my iPhone using Tapatalk
Sent from my iPhone using Tapatalk
-
- Posts: 3
- Joined: Fri Mar 02, 2018 3:01 pm
Re: Importing/joining data without headers
Ok, great. If my files are larger than 2GB, notepad won't open them. Do you know of a text editor that will allow for files larger than 2GB?
-
- Non-normality and collinearity are NOT problems!
- Posts: 3775
- Joined: Wed Sep 17, 2008 2:25 pm
Re: Importing/joining data without headers
Wordpad may work.
Who is online
Users browsing this forum: No registered users and 15 guests