Sorting Efficiency Question

CharlieEVIEWS · Postby **CharlieEVIEWS** » Sun Oct 19, 2014 7:57 pm

Dear all,

I am seeking help on a large sorting problem. I have a vector, call it VECTOR1, which is of a large size: e.g. (100,000x1). Every element is a different real number. My problem is I want to be able to easily return the elements which contain, say, the highest (or lowest) X values (X<100,000). That is - I do not want the values themselves, but the number of the element (the location) they occur at in VECTOR1 is what is of interest. For example, if element 28,282 of VECTOR1 contains the lowest value in the whole of the VECTOR, I want element 1 of the new vector (in the example below - VECTOR3) to contain the real number 28,282.

The way I am currently going about this (where I want the lowest elements of VECTOR1) is something like:

Code: Select all


vector VECTOR2 = @sort(VECTOR1) 'Sort the original vector of values
vector(X) VECTOR3 'define our output vector
for !k = 1 to X
for !p = 1 to 100,000
if VECTOR1(!p) = VECTOR2(!k) then
VECTOR3(!k) = !p
!p = 100,000 'end the loop early as the condition has been satisfied
endif
next
next

However, as you may imagine (with large !p and !x loops), this is extremely computationally burdensome. Is there anyone who knows of a simpler way to achieve VECTOR3 - the top or bottom elements of VECTOR1, rather than what the actual values in VECTOR1 are? Any help would be really very useful, as this issue is causing an otherwise extremely promising project to drag its feet considerably, taking many hours/days to estimate, as it's actually nested within an even larger set of loops.

Best wishes, and again - eternally grateful for your help,

Charlie

EViews Gareth · Postby **EViews Gareth** » Mon Oct 20, 2014 8:23 am

Make a vector running from 1 to X (i.e., has 1 in the first row, 2 in the second row, etc...). Then use the capplyranks function to sort that vector by the ranks of vector1.
[/code]
create u 10
vector(100) vector1
rnd(vector1)
!x = 100

vector VECTOR2 = @sort(VECTOR1) 'Sort the original vector of values
vector(!X) VECTOR3 'define our output vector
for !k = 1 to !X
for !p = 1 to 100
if VECTOR1(!p) = VECTOR2(!k) then
VECTOR3(!k) = !p
!p = 100 'end the loop early as the condition has been satisfied
endif
next
next

vector(!x) test

for !i=1 to !x
test(!i) = !i
next
test=@capplyranks(test,@ranks(vector1))
show test
show vector3
[/code]

CharlieEVIEWS · Postby **CharlieEVIEWS** » Mon Oct 20, 2014 12:44 pm

Upon applying it to the problem of interest (using the command vector3 = @capplyranks(test,vector1)), I get the the 'Invalid permutation index vector' mentioned in another thread.

Attached is a workfile with test, vector1 and vector3, as outlined in gareths code outlined above. I can get it to work with a smaller test where X=15, but when increasing this up to 62,220, I get the error as mentioned.

Is this to do with repeated entries in vector1, and if so, is there a work-around to this?

Best wishes, and thanks again for continued help.

Charlie

EViews Gareth · Postby **EViews Gareth** » Mon Oct 20, 2014 1:14 pm

Yep, you need to set tie handling to handle the ties with something other than taking average (since taking average will result in non-intergers)

Code: Select all


vector vector3 = @capplyranks(test, @ranks(vector1,"a,","r"))

EViews.com

Sorting Efficiency Question

Sorting Efficiency Question

Re: Sorting Efficiency Question

Re: Sorting Efficiency Question

Re: Sorting Efficiency Question

Who is online