memory use in recursive tree algorithm

Question 1

I've got a code where I need to create a map with key values as double (value of the f-test between two clusters. I need to calculate the residual sum of squares for this) and the mapped value of cluspair which is pair of the class Cluster that I created. Map aims to store the F-test values between the all clusters so that I would not need to do the calculation again and again in every step. BTW cluster is a tree structure where every cluster contains two subclusters and the stored values are 70-dimensional vectors.

Problem is, in order to calculate the RSS, I need to implement a recursive code where I need to find the distance of every element of the cluster with the mean of the cluster and this seems to be consuming an enormous amount of memory. When I create the same map with the key values being the simple distance between the means of two clusters, the program uses minimal memory so I think the increase in the memory use is caused by the call of the recursive function RSS. What should I do to manage the memory use in the code below? In its current implementation the system runs out of memory and windows closes the application saying that the system ran out of virtual memory.

The main code:

 map<double,cluspair> createRSSMap( list<Cluster*> cluslist )
 {
 list<Cluster*>::iterator it1;
 list<Cluster*>::iterator it2;
 map<double,cluspair> rtrnmap;
 for(it1=cluslist.begin(); it1!= --cluslist.end() ;it1++)
 {
 it2=it1;
 ++it2;
 cout << ".";
 list<Cluster*>::iterator itc;
 double cFvalue=10000000000000000000;
 double rIt1 = (*it1)->rss();
 for(int kk=0 ; it2!=cluslist.end(); it2++)
 {
 Cluster tclustr ((*it1) , (*it2));
 double r1 = tclustr.rss();
 double r2= rIt1 + (*it2)->rss();
 int df2 = tclustr.getNumOfVecs() - 2;
 double fvalue = (r1 - r2) / (r2 / df2);
 if(fvalue<cFvalue)
 {
 cFvalue=fvalue;
 itc=it2;
 }
 }
 cluspair clp;
 clp.c1 = *it1;
 clp.c2 = *itc;
 bool doesexists = (rtrnmap.find(cFvalue) != rtrnmap.end());
 while(rtrnmap)
 {
 cFvalue+= 0.000000001;
 rtrnmap= (rtrnmap.find(cFvalue) != rtrnmap.end());
 }
 rtrnmap[cFvalue] = clp;
 }
 return rtrnmap;
 }

and the imlementation of the function RSS:

double Cluster::rss()
{
 return rss(cnode->mean);
}
double Cluster::rss(vector<double> &cmean)
{
 if(cnode->numOfVecs==1)
 {
 return vectorDist(cmean,cnode->mean);
 }
 else
 {
 return ( ec1->rss(cmean) + ec2->rss(cmean) ); 
 }
}

Much thanks in advance. I really don't know what to do at this point.

below is the code with that I use to create a map with keys being simple euclidian distance between two cluster means. As I've said above, it is quite similar and uses minimal memory. It only differs in the calculation of the fvalue. Instead of the recursive calculation, there is the calculation of simple distance of means of two clusters. Hope it helps to identify the problem

 map<double,cluspair> createDistMap( list<Cluster*> cluslist )
 {
 list<Cluster*>::iterator it1;
 list<Cluster*>::iterator it2;
 map<double,cluspair> rtrnmap;
 for(it1=cluslist.begin(); it1!= --cluslist.end() ;it1++)
 {
 it2=it1;
 ++it2;
 cout << ".";
 list<Cluster*>::iterator itc;
 double cDist=1000000000000000;
 for(int kk=0 ; it2!=cluslist.end(); it2++)
 {
 double nDist = vectorDist( (*it1)->getMean(),(*it2)->getMean());
 if (nDist<cDist)
 {
 cDist = nDist;
 itc=it2;
 }
 } 
 cluspair clp;
 clp.c1 = *it1;
 clp.c2 = *itc;
 bool doesexists = (rtrnmap.find(cDist) != rtrnmap.end());
 while(doesexists)
 {
 cDist+= 0.000000001;
 doesexists = (rtrnmap.find(cDist) != rtrnmap.end());
 }
 rtrnmap[cDist] = clp;
 }
 return rtrnmap;
 }

Cluster constructer

Cluster::Cluster (Cluster *C1, Cluster *C2)
{
 ec1=C1;
 ec2=C2;
 node* cn = new node;
 cn->numOfVecs = C1->cnode->numOfVecs + C2->cnode->numOfVecs;
 double nov = cn->numOfVecs;
 double div = (1 / nov);
 cn->mean = scalarMultVect(div,vectAdd(scalarMultVect(C1->cnode->numOfVecs,C1->cnode->mean),scalarMultVect(C2->cnode->numOfVecs,C2->cnode->mean)));
 mvect tmv;
 tmv.stock="";
 cn->v1 = tmv;
 cnode = cn;
}

Question 2

You haven't given us enough code to reproduce the problem, but it looks as if you could save a lot of calculation (and maybe memory) by storing these RSS values and not recomputing them so many times.

Question 3

the loop is for finding the fvalues between cluster pairs. when the loop is at the last element, there is no cluster to find the fvalue for after that point.

Question 4

Does the Cluster::Cluster(const Cluster* pc1, const Cluster* pc2) constructor do a deep copy of the entire tree nodes of the two input clusters? Please explain its memory usage, and/or post the constructor's code.

Question 5

@Beta vectorDist is the euclidian distance between two mean vectors. ec1 and ec2 are the two subclusters. and the cnode->mean gives the mean of the current cluster. if you to know anything else I would gladly give more code but just didn't want to fill the page with unnecessary codes. As for the storing RSS values, this is a tree is building process and therefore RSS values wouldn't stay same and it would change with the means of the new parent clusters.

Question 6

@Beta and @aristos: the RSS values of a single cluster (the r2) can be precomputed. But I think the RSS values of a combination of clusters will have to be computed recursively.

Question 7

You've asked exactly the same question before: Enormous Increase In the Use Of Memory

rtrnmap= (rtrnmap.find(cFvalue) != rtrnmap.end()); does not make sense.
You were told to pass data through references
You were told to add logging information and see how many iterations are performed.

A few comments:

It is a bad idea to have a map with a double as key as you may find yourself unable to retrieve an element due to a tiny difference in the double.
Add only a few elements in the collection and manually go through all the functions in the debugger. You'll get to "see" what gets executed and can immediately see if the actual execution flow matches your expectations

And please don't double post your questions (even if you use different users).

EDIT:

We all assumed a proper destructor. Make sure you deallocate any memory you explicitly allocate with new or new[] with delete or delete[] as appropriate.

Question 8

Sorry for that, just thought that the old post would go unseen. doesexists = (rtrnmap.find(cDist) != rtrnmap.end()) would return true if map contains an element with the same key that is wanted to be added. In that case, a tiny amount of 0.00000001 will be added and thereby no info would be lost. Counter would give a value with maximum of 2 to the power 14000. I'm aware that this is a huge number but that is the nature of the data I have to deal with. Tree is quite large.

Question 9

@aristos : You have a limited amount of stack space for the recursion (and your function is not tail call optimisable).

Question 10

@aristos : are you sure you know what 2 to the poser 14000 means?

Question 11

And what do you mean "no info will be lost"? You do realize that "rtrnmap[cDist] = clp;" will add a new element in the rtrnmap every time it is called, right? (Because of the loop above which finds a cDist not in the map)

Question 12

@andrei Well I have about 14k vectors to build a cluster tree from. 2 to the power 14000 minus one would be equal to the number of recursions. Is there no way to make an optimisation for that kind of problem? By "no info will be lost" I meant that there would be no overwriting. If the given key is already in the map, while loop would find a double value close to cDist which is not a key of the map.

Andrei Andrei 5,04325 silver badges30 bronze badges · Accepted Answer · 2011-06-19 14:08:01Z

1

You've asked exactly the same question before: Enormous Increase In the Use Of Memory

rtrnmap= (rtrnmap.find(cFvalue) != rtrnmap.end()); does not make sense.
You were told to pass data through references
You were told to add logging information and see how many iterations are performed.

A few comments:

It is a bad idea to have a map with a double as key as you may find yourself unable to retrieve an element due to a tiny difference in the double.
Add only a few elements in the collection and manually go through all the functions in the debugger. You'll get to "see" what gets executed and can immediately see if the actual execution flow matches your expectations

And please don't double post your questions (even if you use different users).

EDIT:

We all assumed a proper destructor. Make sure you deallocate any memory you explicitly allocate with new or new[] with delete or delete[] as appropriate.

Share

Improve this answer

edited May 23, 2017 at 11:55

Community's user avatar

Community Bot

11 silver badge

answered Jun 19, 2011 at 14:08

Andrei's user avatar

Andrei Andrei

5,04325 silver badges30 bronze badges

11 Comments

aristos

aristos Over a year ago

Sorry for that, just thought that the old post would go unseen. doesexists = (rtrnmap.find(cDist) != rtrnmap.end()) would return true if map contains an element with the same key that is wanted to be added. In that case, a tiny amount of 0.00000001 will be added and thereby no info would be lost. Counter would give a value with maximum of 2 to the power 14000. I'm aware that this is a huge number but that is the nature of the data I have to deal with. Tree is quite large.

2011年06月19日T14:17:54.323Z+00:00

Andrei

Andrei Over a year ago

@aristos : You have a limited amount of stack space for the recursion (and your function is not tail call optimisable).

2011年06月19日T14:26:31.32Z+00:00

Andrei

Andrei Over a year ago

@aristos : are you sure you know what 2 to the poser 14000 means?

2011年06月19日T14:29:25.47Z+00:00

Andrei

Andrei Over a year ago

And what do you mean "no info will be lost"? You do realize that "rtrnmap[cDist] = clp;" will add a new element in the rtrnmap every time it is called, right? (Because of the loop above which finds a cDist not in the map)

2011年06月19日T14:37:40.203Z+00:00

aristos

aristos Over a year ago

@andrei Well I have about 14k vectors to build a cluster tree from. 2 to the power 14000 minus one would be equal to the number of recursions. Is there no way to make an optimisation for that kind of problem? By "no info will be lost" I meant that there would be no overwriting. If the given key is already in the map, while loop would find a double value close to cDist which is not a key of the map.

2011年06月19日T15:07:12.133Z+00:00

|

CollectivesTM on Stack Overflow

memory use in recursive tree algorithm

1 Answer 1

11 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

CollectivesTM on Stack Overflow

1 Answer 1

11 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related