Assigning values based on probability of occurrence

jason_ll · Postby **jason_ll** » Tue Apr 23, 2013 1:54 pm

Hello all,
I'm looking for a more efficient way to do something, and I hope someone can help out.

What I'm trying to do is have EViews assign a series a certain value, based on a list of probabilities.

For exmaple, let's assume that a new product can come in three colors:

1. Blue: 10%
2. White: 50%
3. Green: 40%

and the % figure indicates the probability that a randomly chosen chair will fall into any of these categories. Is there an easy way for this to be done in EViews? For such a simple example, I can easily do the following:

Code: Select all


Series b = rnd
Series a = @recode(b<0.1,1,@recode(b<0.5,2,3))

which will give me the correct distribution.

But imagine I had several hundred rows like this. What would be an efficient way to do this?

Thanks a lot,

EViews Glenn · Postby **EViews Glenn** » Tue Apr 23, 2013 5:15 pm

From your brief description, I'm not sure why there would be several hundred rows required, but to simplify I suspect you could use the classify proc of a series: Proc/Generate by classification... from the menu, and series_name.classify using the command form.

jason_ll · Postby **jason_ll** » Wed Apr 24, 2013 6:24 am

From your brief description, I'm not sure why there would be several hundred rows required, but to simplify I suspect you could use the classify proc of a series: Proc/Generate by classification... from the menu, and series_name.classify using the command form.

Thank you, but it's not a matter of displaying the data in a table. I am performing microsimulations, so I don't just want to show the data, I want my EViews program to generate it randomly. And I think I should have pointed out that I'm using panel data.

I am trying to populate a series in a large panel dataset with some values, with each possible value having a uniquie probability.

The reason I could have hundreds of rows is because I could have 200 possible colors for each chair, for example. (another example, imagine I'm populating the series with the height of people and I know the probability of each height occurring, like 6% of population will be exactly 6 feet tall, 15% will be exactly 5 feet tall, etc....)

Now say I was using panel data with 10,000 cross-sections (i.e. 10,000 different chairs) and I knew the exact probability of a chair having a certain color. I would like a program that would populate my series "chair_color" using the probabilities I have.

Again, the way I'm doing it now is like this:

Code: Select all


Series b = rnd
Series a = @recode(b<0.1,1,@recode(b<0.5,2,3))

But if there were 200 possibilities (instead of just 3), it would take forever to write.

Please let me know if anything is not clear and thanks again!

EViews Glenn · Postby **EViews Glenn** » Wed Apr 24, 2013 9:57 am

Clearer, but not entirely clear.

Let's take the non-panel case to start. Suppose that you have 6 categories that you want to assign observations into randomly with some set of probabilities. You can create a vector containing 6 elements, each element representing the probabilities, and use this vector to "cut up" a randomly generated uniform. For example, you could do...

Code: Select all

vector probs = @fill(.05, .25, .5, .75, .8)
series r = rnd
r.classify(method=limits) probs

will create a new series R_CT with a value mapped set of integer values obtained by assigning to category using the defined grid. Turn of the map to see the integers. (Note: it looks like there's no way to not provide the map, which I'll look into correcting.)

So 200 will take time to create the vector, but beyond that it's a single command.

I'm not sure from your description how the panel is fitting into this. You say you have 10,000 chairs, but how is this information used.

jason_ll · Postby **jason_ll** » Wed Apr 24, 2013 10:28 am

Ok, I think this is getting a little closer to what I'm looking for, but not quite. I'll try to describe in greater detail what I'm attempting to do.

I have a panel of 10,000 cross-sections and 100 years.
Each cross-section represents a person.

Each series represents one of the person's attributes.
One of the attributes is height.

So I have a series in this panel called "height".

This series contains the height of each person, in centimeters.

So height is discrete, but can go all the way from 140 cm to 220 cm.

Now here's the thing. There is some turnover in the series: meaning every year, each person in my sample has a chance of being dropped out and are replaced by a new entrant. So if person 1 (crossid = 1) had a height of 185 cm and then left the sample, he needs to be replaced and the replacement has a different height. Think of it as a company, where new employees are occasionally entering.

When a new employee enters, I have to assign a height to him or her. If I have 1,000 new employees, they all have to be assigned a height.

So I want my series "height" to generat a height for each new entrant. I want this height to be randomly generated, and based on the probability distribution of heights in the country.

I can easily create a vector (or a matrix) as you suggested, let's call it "probs", which contains the probability of being assigned a given height. It looks like this (first colum is the theight, the second is the probability of having this height):

140 1%
141 1%
.
.
.
150 2%
151 2%
.
.
180 5%
181 3%
.
.
.
220 0.01%

But then where do you go from there? I'm trying to look for a way for these heights to be assigned (generated) in an EViews program.

Once again, thanks a lot.

EViews Glenn · Postby **EViews Glenn** » Wed Apr 24, 2013 3:29 pm

Two questions. If I am understanding correctly, after getting the new individuals, the workfile now has 11,000 observations, the first 10,000 of which have heights that are given? Where are you getting the probs--from the first 10,000?

jason_ll · Postby **jason_ll** » Thu Apr 25, 2013 7:27 am

Two questions. If I am understanding correctly, after getting the new individuals, the workfile now has 11,000 observations, the first 10,000 of which have heights that are given? Where are you getting the probs--from the first 10,000?

Well, from real data. I have a starting sample with attributes that I'm directly importing into EViews.

And yes, the sample is supposed to increase with time (10k to 11k for example). But there are also times where an existing cross-section will require a change in attributes (think of it as an employee leaving a company and needing to be replaced by another with different characteristics).

EViews.com

Assigning values based on probability of occurrence

Assigning values based on probability of occurrence

Re: Assigning values based on probability of occurrence

Re: Assigning values based on probability of occurrence

Re: Assigning values based on probability of occurrence

Re: Assigning values based on probability of occurrence

Re: Assigning values based on probability of occurrence

Re: Assigning values based on probability of occurrence

Who is online