sorting panel data variables based on averages

BradleyMorris · Postby **BradleyMorris** » Mon Jan 03, 2011 8:46 am

Hello,

I'm trying to find a way in which I can sort my panel data into groups according to criteria "having an average value >1"

for instance: I have panel data on 150 firms over a period of 10 years. I would like to sort them into three groups based on average profitability over these ten years. The problem which I run into is that if I sort them purely on profitability, the whole sample will be messed up, because the firms don't stick together. I would like them to stick together, so I can subsequently take the groups I made and compare their return with each other.

What happens when i simply sort is (descending): Profitability

Firm A: 0.10
Firm B:0.09
Firm E: 0.09
Firm A: 0.09
Firm C:0.07
Firm B: 0.06

This way I cannot see which firm is on average on top and so then I cannot use the outcome to form the groups such as "most profitable companies" versus "least profitable companies"

I could do the sorting in excel, no problem, but then I am still left of how to create groups from panel data based on firm names or codes.

If someone could help me with either question, that would be great!

BradleyMorris · Postby **BradleyMorris** » Tue Jan 04, 2011 6:55 am

A third issue: sorting panel data based on a specific point in time.

I'm trying to sort the firms according to the income on 01/01/2000. In the sample selection window is there a possibility to do this?

For instance something like:

income >100 @2000m01 - obviously this doesn't work, maybe somebody knows what does?

Thanks in advance!

EViews Glenn · Postby **EViews Glenn** » Tue Jan 04, 2011 11:19 am

To take the first question: it really depends on what you want to do with the information. What do you want to do with the designations "most profitable" and "least profitable". Do you simply want a indicator for each type associated with every observation in the panel. Or do you want the "contracted" form where you simply have the firms with an indicator for type?

BradleyMorris · Postby **BradleyMorris** » Tue Jan 04, 2011 5:02 pm

I want to use these designations in order to command eviews to select all of the "profitable firms" and test whether their returns are more influenced by a certain factor than unprofitable firms' return.

So ideally I could tell eviews to select firm A,E&F (categorized as profitable) run the regression and subsequently select B,C,D&G (unprofitable) and do the same, after which I can do some comparing..

Once again, the problem I'm having now is if i tell eviews: take all firms making more than 10% profit, it in some years takes A,B and C and then doesn't because obviously, these things change, hence my initial question on categories based on averages.

EViews Glenn · Postby **EViews Glenn** » Tue Jan 04, 2011 6:07 pm

Thanks Bradley, that helps...

The first principle here is that you don't want to be sorting data if at all possible (especially in a panel context). If I'm understanding correctly what you want to do, there are a number of ways to proceed. In most cases, the following simple steps should do the trick.

First, create a new series containing the average income for each of the firms. The simplest way to do this is to use the @meansby function to generate a new series that matches the firm average to each of the observations in the firm.

Code: Select all

series avginc = @meansby(inc, @crossid)

This gets you the basic data on which you want to do your categorizing.

Next, we want to get the ranks for these observations. We could do this on the entire sample, but it turns out to be convenient to do this just for the first observations in each firm. (The following assumes that the first observation for each firm is available--if not, we can do something like this but it would be a bit harder). Set the sample to the first observation in each cross section and obtain ranks for the mean values (across firms). Then use these ranks for populate all of the remaining observations using the lag operator:

Code: Select all

smpl if @cellid=1
series rankinc = @ranks(avginc, a, f)
smpl if @cellid>1
rankinc = rankinc(-1)

Now you are home-free, the new RANKINC series is an indicator for the ranking of the mean income (from low-to-high with tied-ranks assigned the rank of the first of the tie-group).

It may be desirable to scale RANKINC by the number of groups to get the scaled ranks...

Code: Select all

smpl @all
rankinc = rankinc / @max(@groupid(@crossid))

You can use this RANKINC series to set samples that will allow you to restrict your analysis to subsets of firms. For example, to set the sample to the lowest 1/3 of firms, you can use

Code: Select all

smpl if rankinc<=1/3

To set to the middle third

Code: Select all

smpl if rankinc>1/3 and rankinc<=2/3

and so forth.

You could also use your variable to create dummy variables

Code: Select all

series lowfirm = rankinc<=1/3
series medfirm = rankinc>1/3 and rankinc<=2/3
series highfirm = 1 - lowfirm - medfirm

and use that in, say, regression analysis. For more (or more sophisticated) categories, you could use the classify series proc to recode your series into a different category series. I would do that on AVGINC using the first observations in each cross-section as above, then use the lag operator to populate the remainder.

As to your third issue...just replace the @meansby line above with one that computes the "mean" for the single period:

Code: Select all

series avginc = @meansby(inc, @crossid, "2000m01 2000m01")

BradleyMorris · Postby **BradleyMorris** » Thu Jan 06, 2011 3:54 am

Thanks a lot Glenn!

Unfortunately already in the first step i'm running into problems with "syntax errors"

I almost literally copied your formula, and after that I tried what seemed to me to be similar: series meanlstat = @meansby(lstat, @crossid, "@all") as taken from the eviews handbook.

I have some missing values, could this be a cause?

Thanks, Bradley

BradleyMorris · Postby **BradleyMorris** » Thu Jan 06, 2011 3:56 am

Using version 7.1 should that matter..

BradleyMorris · Postby **BradleyMorris** » Thu Jan 06, 2011 4:39 am

Okay, figured this one out myself, yes finally haha, anyways I should have left out "series" in series averageinc.! -I Bet that's valuable information right there.

BradleyMorris · Postby **BradleyMorris** » Thu Jan 06, 2011 6:00 am

Alright, basically I got everything to work, resulting in the end in three series which I created, lowincfirms, medincfirms and highincfirms.

When I then open these series I get the long list of firms, with either a 0 or a 1 in front of the first firm listing, according to its categorization.

So far, so good, and thanks Glenn for helping me this far, however I do have a next issue I would like to solve.

I would like to look at the lowincfirms' returns over the years and do some tests on that. Is there a way to tell eviews to select all the firms (and their return stats!) which have the dummy variable 1 in for instance lowincfirms?

What I tried doing first is setting parameters in the general sample. I have 20 000 observations, and then when I selected rankinc<= 1/3 the result was that my sample shrank from 20 000 to 45 (1/3 of the amount of firms), however, what I would like is that is results into appr. 6666 observations (=20 000/3).

Thanks in advance!

EViews Glenn · Postby **EViews Glenn** » Mon Jan 10, 2011 10:42 am

I'm a little confused. If you do a frequency (or even easier, an average) on your LOWFIRM, MEDFIRM, and HIGHFIRM dummies, what do you get?

BradleyMorris · Postby **BradleyMorris** » Fri Feb 11, 2011 5:25 am

Hey Glenn,

sorry for the long break, I had my an exam, for which I took a study break, now that I passed that I'm back and I'll be actively posting again until I finish this damned project. To get back to the issue:

If i take the average of the low/med/high firms without adjusting the sample (to if lowfirm = 1 etc), I get a mean of 0.32 in my lowfirms max of 1 min of 0 and 152 observations. The others groups (med/highfirms) show similar figures.

When I then adjust the sample to lowfirm=1 All the discriptives show 0.0000 of med/high firms (50/51 observations) and the lowfirm shows all descriptives at 1.000 with 51 observations. This was what I expected, and it makes sense.

My point that I am trying to get at is the following: after the categorization I would indeed like to run regressions with the 3 groups. So what I would like to do is select all the lowfirms and run the regressions with my independent/dependent variables. I'm measuring the influence of Sentiment on stock return. What I would like to see is whether small firms are differently influence by sentiment than large firms, and therefore I made the categorization.

I hope it is clear to you what I want to do, if not just let me know what you're missing, I'll try to clarify.

Thanks again for your earlier help,

Bradley

EViews Glenn · Postby **EViews Glenn** » Fri Feb 11, 2011 10:10 am

I skimmed through the earlier discussion so I may not have gotten all of the details right, but here goes.

My understanding is that you have about 20,000 observations in a panel data structure.
We've created a set of dummy variables for all 20,000 observations where each firm is categorized as low-medium-high.
Roughly 1/3 of the firms are in each category.

Now your last descriptive statistic should have had 20,000 observations in the sample...It appears that you only have 152. That's what we have to figure out. If you set the sample to @all and look at the dummies, are there missings? Are there the same missings in RANKINC?

BradleyMorris · Postby **BradleyMorris** » Tue Feb 15, 2011 3:23 am

The problem I am getting is that the ranking shows very many NA values if I do it the way you mentioned earlier. To be exact, of the 132 observations only 1 shows a value, namely the cellid = 1. So then if i select the large companies, the rest of the observations just fall away. What I've tried to do now is do the ranking step without the cellid =1 step, so that all values will be filled in. The problem with this is that they don't show 1-152, but much higher numbers: from appr. 855-20000, and per company the same value is shown 132 times. What I am therefore guessing is that Eviews is showing me the ranking relative to the total amount of observations. Do you know how to solve this and turn it into the classical ranking of 1-152 or relative 0.0 - 1.0? I'll be trying too!

BradleyMorris · Postby **BradleyMorris** » Tue Feb 15, 2011 3:47 am

Quick practical solution (although not perfect!) = just leaving it the way I had said and in the sample sort for the lowest 6666 observations, i.e. lowest 1/3, then for the middle 6666 and so on. However if you know how to rearrange it into normal rankings from 1-152, I'd appreciate it if you tell me how.

EViews Glenn · Postby **EViews Glenn** » Tue Feb 15, 2011 10:29 am

Did you make certain to do the part of the commands I listed above...

Code: Select all

smpl if @cellid>1
rankinc = rankinc(-1)

This repeats the ranks for the firm for non-first-cell observations.

EViews.com

sorting panel data variables based on averages

sorting panel data variables based on averages

Re: sorting panel data variables based on averages

Re: sorting panel data variables based on averages

Re: sorting panel data variables based on averages

Re: sorting panel data variables based on averages

Re: sorting panel data variables based on averages

Re: sorting panel data variables based on averages

Re: sorting panel data variables based on averages

Re: sorting panel data variables based on averages

Re: sorting panel data variables based on averages

Re: sorting panel data variables based on averages

Re: sorting panel data variables based on averages

Re: sorting panel data variables based on averages

Re: sorting panel data variables based on averages

Re: sorting panel data variables based on averages

Who is online