I've got a code where I need to create a map with key values as double (value of the f-test between two clusters. I need to calculate the residual sum of squares for this) and the mapped value of cluspair which is pair of the class Cluster that I created. Map aims to store the F-test values between the all clusters so that I would not need to do the calculation again and again in every step. BTW cluster is a tree structure where every cluster contains two subclusters and the stored values are 70-dimensional vectors.
Problem is, in order to calculate the RSS, I need to implement a recursive code where I need to find the distance of every element of the cluster with the mean of the cluster and this seems to be consuming an enormous amount of memory. When I create the same map with the key values being the simple distance between the means of two clusters, the program uses minimal memory so I think the increase in the memory use is caused by the call of the recursive function RSS. What should I do to manage the memory use in the code below? In its current implementation the system runs out of memory and windows closes the application saying that the system ran out of virtual memory.
The main code:
map<double,cluspair> createRSSMap( list<Cluster*> cluslist )
{
list<Cluster*>::iterator it1;
list<Cluster*>::iterator it2;
map<double,cluspair> rtrnmap;
for(it1=cluslist.begin(); it1!= --cluslist.end() ;it1++)
{
it2=it1;
++it2;
cout << ".";
list<Cluster*>::iterator itc;
double cFvalue=10000000000000000000;
double rIt1 = (*it1)->rss();
for(int kk=0 ; it2!=cluslist.end(); it2++)
{
Cluster tclustr ((*it1) , (*it2));
double r1 = tclustr.rss();
double r2= rIt1 + (*it2)->rss();
int df2 = tclustr.getNumOfVecs() - 2;
double fvalue = (r1 - r2) / (r2 / df2);
if(fvalue<cFvalue)
{
cFvalue=fvalue;
itc=it2;
}
}
cluspair clp;
clp.c1 = *it1;
clp.c2 = *itc;
bool doesexists = (rtrnmap.find(cFvalue) != rtrnmap.end());
while(rtrnmap)
{
cFvalue+= 0.000000001;
rtrnmap= (rtrnmap.find(cFvalue) != rtrnmap.end());
}
rtrnmap[cFvalue] = clp;
}
return rtrnmap;
}
and the imlementation of the function RSS:
double Cluster::rss()
{
return rss(cnode->mean);
}
double Cluster::rss(vector<double> &cmean)
{
if(cnode->numOfVecs==1)
{
return vectorDist(cmean,cnode->mean);
}
else
{
return ( ec1->rss(cmean) + ec2->rss(cmean) );
}
}
Much thanks in advance. I really don't know what to do at this point.
below is the code with that I use to create a map with keys being simple euclidian distance between two cluster means. As I've said above, it is quite similar and uses minimal memory. It only differs in the calculation of the fvalue. Instead of the recursive calculation, there is the calculation of simple distance of means of two clusters. Hope it helps to identify the problem
map<double,cluspair> createDistMap( list<Cluster*> cluslist )
{
list<Cluster*>::iterator it1;
list<Cluster*>::iterator it2;
map<double,cluspair> rtrnmap;
for(it1=cluslist.begin(); it1!= --cluslist.end() ;it1++)
{
it2=it1;
++it2;
cout << ".";
list<Cluster*>::iterator itc;
double cDist=1000000000000000;
for(int kk=0 ; it2!=cluslist.end(); it2++)
{
double nDist = vectorDist( (*it1)->getMean(),(*it2)->getMean());
if (nDist<cDist)
{
cDist = nDist;
itc=it2;
}
}
cluspair clp;
clp.c1 = *it1;
clp.c2 = *itc;
bool doesexists = (rtrnmap.find(cDist) != rtrnmap.end());
while(doesexists)
{
cDist+= 0.000000001;
doesexists = (rtrnmap.find(cDist) != rtrnmap.end());
}
rtrnmap[cDist] = clp;
}
return rtrnmap;
}
Cluster constructer
Cluster::Cluster (Cluster *C1, Cluster *C2)
{
ec1=C1;
ec2=C2;
node* cn = new node;
cn->numOfVecs = C1->cnode->numOfVecs + C2->cnode->numOfVecs;
double nov = cn->numOfVecs;
double div = (1 / nov);
cn->mean = scalarMultVect(div,vectAdd(scalarMultVect(C1->cnode->numOfVecs,C1->cnode->mean),scalarMultVect(C2->cnode->numOfVecs,C2->cnode->mean)));
mvect tmv;
tmv.stock="";
cn->v1 = tmv;
cnode = cn;
}
1 Answer 1
You've asked exactly the same question before: Enormous Increase In the Use Of Memory
- rtrnmap= (rtrnmap.find(cFvalue) != rtrnmap.end()); does not make sense.
- You were told to pass data through references
- You were told to add logging information and see how many iterations are performed.
A few comments:
- It is a bad idea to have a
map
with adouble
as key as you may find yourself unable to retrieve an element due to a tiny difference in thedouble
. - Add only a few elements in the collection and manually go through all the functions in the debugger. You'll get to "see" what gets executed and can immediately see if the actual execution flow matches your expectations
And please don't double post your questions (even if you use different users).
EDIT:
We all assumed a proper destructor. Make sure you deallocate any memory you explicitly allocate with new
or new[]
with delete
or delete[]
as appropriate.
Cluster::Cluster(const Cluster* pc1, const Cluster* pc2)
constructor do a deep copy of the entire tree nodes of the two input clusters? Please explain its memory usage, and/or post the constructor's code.r2
) can be precomputed. But I think the RSS values of a combination of clusters will have to be computed recursively.