Reading fixed width data

For questions regarding the import, export and manipulation of data in EViews, including graphing and basic statistics.

Moderators: EViews Gareth, EViews Steve, EViews Moderator, EViews Jason

jthodge
Posts: 77
Joined: Tue Oct 21, 2008 5:52 am

Reading fixed width data

Postby jthodge » Thu Aug 11, 2011 7:13 am

I'm trying to figure out how to read in some data that is in fixed-width format.

Here is short extract from the dataset (note that data for ST, CD, and REG should be right-aligned on columns 2, 4, and 8 respectively - but I can't seem to get it to show that way here):

Code: Select all

STCD REG DIVISION NAME POPULATION POP WT 1 1 6 NORTHERN VALLEY 657744 0.1479 2 5 8 SOUTHWEST 179964 0.0351 3 8 7 SOUTH CENTRAL 172053 0.0644 4 1 9 NORTH COAST 1261635 0.0372 5 3 8 NORTH DRAINAGE BASIN 29783 0.0069 6 2 1 CENTRAL 2029652 0.5960 7 1 5 NORTHERN 465853 0.5945 8 6 5 LOWER EAST COAST 4903143 0.3068 9 4 5 WEST CENTRAL 1248322 0.1525 1010 8 EASTERN HIGHLANDS 131750 0.1018 11 1 3 NORTHWEST 765271 0.0616 12 6 3 EAST CENTRAL 342095 0.0562
I tried using the rformat statement in the wfopen command (with options type=raw and fieldtype=fixed) but received syntax error message of "unexpected character '(' in rformat specification." According to the documentation, aren't you supposed to enclose your rformat specification within parentheses?

Anyway, how would you suggest that I phrase my wfopen command to read this raw data into a workfile?

EViews Glenn
EViews Developer
Posts: 2682
Joined: Wed Oct 15, 2008 9:17 am

Re: Reading fixed width data

Postby EViews Glenn » Thu Aug 11, 2011 9:52 am

The following should do it.

Code: Select all

wfopen(type=raw, rectype=crlf) "file.txt" skip=1, fformat=(i2,i2,i3,2x,a39,i7,4x,f6), names=("stcd", "reg", "division", "name", "population", "popweight")
where you replace the "file.txt" with the name of your file.

Note that this was made a bit more difficult since you didn't provide the full fixed format (I used EViews to identify the positions and had to compute the offsets--more on this in a minute), and because the column headings sometimes have more than one word to describe a variable and since they sometimes cross the fixed format column boundaries ("Division" and "Name" in particular).

FYI, what I did was to drop your example into a text file, then drop the text file onto EViews. I then selected "Fixed width fields" in the Column specification, then used the interactive interface to put the variable separators (column breaks) where I wanted them. I then performed the somewhat ugly task of translating the column positions back into the equivalent column format form. If you were doing this interactively, you wouldn't have to do this last step (generally you could just click on Finish--in your case you'd have to do a bit more work to specify variable names since they don't follow the column format).

jthodge
Posts: 77
Joined: Tue Oct 21, 2008 5:52 am

Re: Reading fixed width data

Postby jthodge » Thu Aug 11, 2011 10:50 am

Thanks for the reply. The sample data table I included in my original posting probably made things a bit more complicated than they should have been. It would have been better if I had attached the data as a text file (which I do here - note that I've added a "ruler" to the top line of this file to help delineate the columns).

First problem with the table was that I intended "ST" and "CD" to be separate series. Second, the fourth column was intended to be just one series, "Division_Name". I can see how interpreting that column as two series would make reading the data much more complicated -- in fact I don't think the data would technically be "fixed width" format anymore.

Anyway, based on the attached data.txt file, I was able to import the data using the following command:

Code: Select all

wfopen(type=raw,rectype=crlf) "data.txt" skip=2, fformat=(i2,i2,1x,i3,2x,a38,i9,4x,f6), names=("st","cd","reg","division_name","population","pop_wt")
Your suggestion pointed out two things of which I was unaware: 1) I didn't realize the names had to be listed in quotation marks, and 2) I didn't know that the term "2x" in the fformat statement means to skip two columns.

Thanks again for your help.
Attachments
data.txt
(967 Bytes) Downloaded 363 times


Return to “Data Manipulation”

Who is online

Users browsing this forum: No registered users and 2 guests