Calculating effective shop inventories from CSV files

Question 1

Program purpose

The purpose for the program is to create a file containing the effective inventory for each stocked item stored in a collection of shops.

Input and algorithm details

For clarity, in the descriptions below, variables in the C++ program use this style: nShopID, while fields in CSV files are shown like this: shop_id.

Shops and groups

The problem domain is a collection of shops (identified by an int shop ID number) each of which is assigned to a group (also identified by an int). The association between nShopID and nGroupID is contained in a text file shopgroups.csv, which is a tab-delimited file with a header with two fields: shop_id and shopgroup_id.

Simple inventory

The simple inventory is in a file named shopstock_all.csv. That file is also a tab-delimited file with header containing three fields: item_idx, shop_id, and amount. The item_idx field is a string containing the item identifier, and the shop_id represents the shop that has the item and amount is an integer representing the quantity of the item at that shop.

Effective inventory

The simple inventory is simply the number of items at each shop, as described above, but the desired effective inventory takes into account the fact that the inventory of nearby shops could, if needed, be quickly moved to one particular shop. Thus, the effective inventory of a particular shop represents the simple inventory of that shop plus that of nearby stores. Nearby stores and the direction of possible movement of inventory is represented in a third tab-delimited file called sgm.csv which contains a header and two fields: shop_group_from and shop_group_to. For example, a line in the file might say 8 15 which means that any inventory in group 8 can be counted in the effective inventory of all shops in group 15. These associations are many-to-many (that is group 8 might feed multiple groups and group 15 might be fed from multiple other groups) and not necessarily reciprocal. That is, group 8 inventory may be moved to group 15 shops but not necessarily the other direction unless explicitly stated in the file.

Additionally, all shops within the same group also share their effective inventory without needing a special line in the sgm.csv file.

Finally, some warehouse shops (those with nShopID > 10000) have extra storage available and goods are only transferred from and never to such shops. To differentiate, the program actually calculates two effective inventory quantities for each store; nTotalAmount which is the effective inventory from all stores including these warehouse shops and nTotalAmount10k which is the effective inventory excluding these warehouse shops.

Output file

The output file is a tab-delimited file with no header with each record (line) containing four fields: itemID, shopID, totalAmount, totalAmount10k each of which has been described above. The order of the records within the file does not matter.

The code

The code and sample files are all on on GitLab

Questions

Since I rarely write C++ code, I've got a mix of C++ and C code. Could it be fixed without hurting performance?

What should be fixed to improve readability, naming of variables and code styling?

Could makefile be less "bloated"? (This is the 2nd one I've ever written.)

P.S. I'm using vim, so there are

// {{{ manual-style folding comments
// ...
// }}}

and I would prefer to leave those as is.

shop_stock_total.h

// {{{ Structures and typedefs
// {{{ ShopGroupMove
typedef list<int> GroupsFromList;
struct ShopGroupMoveStruct {
 int nGroupTo;
 GroupsFromList *lstGroupsFrom;
};
typedef unordered_map<int, struct ShopGroupMoveStruct> ShopGroupMovesMap;
// }}}
// {{{ ShopGroup
typedef list<int> ShopsList;
struct ShopGroupStruct {
 int nGroupID;
 ShopsList *lstShops; // all shops with the same group
};
typedef unordered_map<int, struct ShopGroupStruct> ShopGroupsMap;
// }}}
// {{{ Shop
struct ShopStruct {
 int nShopID;
 int nGroupID;
 ShopsList *lstFromAll;
 ShopsList *lstFrom10k;
};
typedef unordered_map<int, struct ShopStruct> ShopsMap;
// }}}
// {{{ Availability
struct AvailabilityStruct {
 int nShopID;
 int nAmount;
 int nTotalAmount;
 int nTotalAmount10k;
};
typedef unordered_map<int, struct AvailabilityStruct> AvailabilitiesMap;
// }}}
// {{{ ShopStock 
struct ShopStockStruct {
 char *sItemID;
 AvailabilitiesMap *mapAv;
};
typedef unordered_map<std::string, struct ShopStockStruct> ShopStocksMap;
// }}}
// }}}
// {{{ Functions
// {{{ APP specific
void ProcessShopGroupMoves(ShopGroupMovesMap *mapSGM);
void DebugPrintSGM(ShopGroupMovesMap *mapSGM);
void ProcessShopGroups(ShopGroupsMap *mapSG, ShopsMap *mapS);
void DebugPrintSG(ShopGroupsMap *mapSG, ShopsMap *mapS);
void Fill_FromShopAll(ShopGroupMovesMap *mapSGM, ShopGroupsMap *mapSG, ShopsMap *mapS); 
void DebugPrintS(ShopsMap *mapS);
void ProcessShopStocks(ShopStocksMap *mapSS, ShopsMap *mapS, FILE *out);
void PostProcessItem(ShopStockStruct *pstSS, ShopsMap *mapS, FILE *out); 
void DebugPrintSS(ShopStockStruct *pstSS); 
void WriteResult(ShopStockStruct *pstSS, FILE *out);
// }}}
// {{{ generic
std::vector<std::string> &split(const std::string &s, char delim, std::vector<std::string> &elems); 
std::vector<std::string> split(const std::string &s, char delim); 
// }}}
// }}}

shop_stock_total.cpp

// {{{ standard and other includes
#include <stdio.h>
#include <stdlib.h>
#include <iostream>
#include <locale>
#include <fstream>
#include <string>
#include <string.h>
#include <sstream>
#include <assert.h>
#include <ctype.h>
#include <algorithm>
#include <unordered_map>
#include <vector>
#include <list>
// }}}
// {{{ filenames 
const char *SHOP_GROUP_MOVES_FILE = "sgm.csv";
const char *SHOP_GROUPS_FILE = "shopgroups.csv";
const char *SHOP_STOCKS_FILE = "shopstock_all.csv";
const char *SHOP_STOCK_TOTALS_FILE = "shop_stock_total.csv";
// }}}
using namespace std;
#include "shop-stock-total.h"
// {{{ main
int main() {
 // ShopGroupMoves
 ShopGroupMovesMap *mapSGM;
 mapSGM = new ShopGroupMovesMap();
 ProcessShopGroupMoves(mapSGM);
 DebugPrintSGM(mapSGM);
 // ShopGroups and Shops
 ShopGroupsMap *mapSG;
 mapSG = new ShopGroupsMap();
 ShopsMap *mapS;
 mapS = new ShopsMap();
 ProcessShopGroups(mapSG, mapS);
 DebugPrintSG(mapSG, mapS);
 Fill_FromShopAll(mapSGM, mapSG, mapS);
 DebugPrintS(mapS);
 // ShopStock
 ShopStocksMap *mapSS;
 mapSS = new ShopStocksMap();
 FILE* fileSST = fopen (SHOP_STOCK_TOTALS_FILE, "w+");
 ProcessShopStocks(mapSS, mapS, fileSST);
 fclose(fileSST);
 return 0;
}
// }}}
// {{{ ProcessShopGroupMoves
void ProcessShopGroupMoves(ShopGroupMovesMap *mapSGM) {
 std::ifstream file(SHOP_GROUP_MOVES_FILE);
 std::string l;
 ShopGroupMoveStruct *pstSGM;
 int nKey = -1;
 int nSGMs = 0;
 int nGroupFrom;
 int nGroupTo;
 std::vector<std::string> vsItems;
 while (std::getline(file, l)) {
 // skip first line
 if (nKey < 0) {
 nKey = 0;
 continue;
 }
 vsItems = split(l, (char) '\t');
 if (vsItems.size() < 2) 
 continue;
 nGroupFrom = atoi(vsItems[0].c_str());
 nGroupTo = atoi(vsItems[1].c_str());
 auto sgm = mapSGM->find(nGroupTo);
 if (sgm == mapSGM->end()) {
 // not found, so we need to create a new one
 pstSGM = new ShopGroupMoveStruct();
 pstSGM->nGroupTo = nGroupTo;
 pstSGM->lstGroupsFrom = new GroupsFromList();
 // and store it, we'll add from value a bit later
 (* mapSGM)[nGroupTo] = *pstSGM;
 }
 else {
 // found group already, get a pointer to it for adding nGroupFrom
 // value later
 pstSGM = &(sgm->second);
 }
 // in any case (found or not), we've to add GroupFrom value
 pstSGM->lstGroupsFrom->push_back(nGroupFrom);
 nSGMs++;
 }
 printf("Total %d ShopGroupMoves loaded.\n", nSGMs);
}
// }}}
// {{{ DebugPrintSGM
void DebugPrintSGM(ShopGroupMovesMap *mapSGM) {
 for (auto sgm=mapSGM->begin(); sgm != mapSGM->end(); sgm++) {
 printf("%d:\t", sgm->second.nGroupTo);
 for (auto f=sgm->second.lstGroupsFrom->begin();
 f!=sgm->second.lstGroupsFrom->end(); 
 f++) {
 printf("%d ", *f);
 }
 printf("\n");
 }
}
// }}}
// {{{ ProcessShopGroups
void ProcessShopGroups(ShopGroupsMap *mapSG, ShopsMap *mapS) {
 std::ifstream file(SHOP_GROUPS_FILE);
 std::string l;
 int nSGs = 0;
 int nKey = -1;
 int nShopID;
 int nGroupID;
 ShopStruct *pstShop;
 ShopGroupStruct *pstSG;
 std::vector<std::string> vsItems;
 while (std::getline(file, l)) {
 // skip first line
 if (nKey < 0) {
 nKey = 0;
 continue;
 }
 vsItems = split(l, (char) '\t');
 if (vsItems.size() < 2) 
 continue;
 nShopID = atoi(vsItems[0].c_str());
 nGroupID = atoi(vsItems[1].c_str());
 // skip zero ID shop
 if (nShopID == 0)
 continue;
 // create new Shop and save it
 pstShop = new ShopStruct();
 pstShop->nShopID = nShopID;
 pstShop->nGroupID = nGroupID;
 pstShop->lstFromAll = new ShopsList(); 
 pstShop->lstFrom10k = new ShopsList();
 (* mapS)[nShopID] = *pstShop;
 // do we need to save ShopGroup
 if (nGroupID == 0)
 continue;
 auto sg = mapSG->find(nGroupID);
 if (sg == mapSG->end()) {
 // not found, so we need to create a new one
 pstSG = new ShopGroupStruct();
 pstSG->nGroupID = nGroupID;
 pstSG->lstShops = new ShopsList();
 // and store it, we'll add from value a bit later
 (* mapSG)[nGroupID] = *pstSG;
 }
 else {
 // found group already, get a pointer to it for adding shop 
 // value later
 pstSG = &(sg->second);
 }
 // in any case (found or not), we've to add GroupFrom value
 pstSG->lstShops->push_back(nShopID);
 nSGs++;
 }
 printf("Total %d ShopGroups loaded.\n", nSGs);
}
// }}}
// {{{ Fill_FromShopAll
void Fill_FromShopAll(ShopGroupMovesMap *mapSGM, ShopGroupsMap *mapSG, ShopsMap *mapS) {
 for (auto s = mapS->begin(); s != mapS->end(); s++) {
 // fill shops from the same group
 auto sg = mapSG->find(s->second.nGroupID);
 if (sg != mapSG->end()) {
 for (auto s_ = sg->second.lstShops->begin();
 s_ != sg->second.lstShops->end();
 s_++) {
 s->second.lstFromAll->push_back(*s_);
 }
 }
 // fill shops from movement groups
 auto sgm = mapSGM->find(s->second.nGroupID);
 if (sgm != mapSGM->end()) {
 for (auto sg = sgm->second.lstGroupsFrom->begin();
 sg != sgm->second.lstGroupsFrom->end();
 sg++) {
 auto sg_ = mapSG->find(*sg);
 if (sg_ != mapSG->end()) {
 for (auto s_ = sg_->second.lstShops->begin();
 s_ != sg_->second.lstShops->end();
 s_++) {
 s->second.lstFromAll->push_back(*s_);
 }
 }
 }
 }
 // de duping
 s->second.lstFromAll->sort();
 s->second.lstFromAll->unique();
 // remove shop itself
 s->second.lstFromAll->remove(s->second.nShopID);
 // copy to 10k
 /* auto n = std::copy_if( */
 /* s->second.lstFromAll->begin(), */
 /* s->second.lstFromAll->end(), */
 /* s->second.lstFrom10k->begin(), */
 /* [](int i){return (i < 10000);}); */
 /* s->second.lstFrom10k->size( */
 /* auto n = std::copy( */
 /* s->second.lstFromAll->begin(), */
 /* s->second.lstFromAll->end(), */
 /* s->second.lstFrom10k->begin() */
 /* ); */
 /* s->second.lstFrom10k->resize( */
 /* std::distance(s->second.lstFrom10k->begin(), n)); */
 for (auto s_ = s->second.lstFromAll->begin();
 s_ != s->second.lstFromAll->end();
 s_++) {
 if (*s_ < 10000) {
 s->second.lstFrom10k->push_back(*s_);
 }
 }
 }
}
// }}}
// {{{ DebugPrintS
void DebugPrintS(ShopsMap *mapS) {
 for (auto s = mapS->begin(); s != mapS->end(); s++) {
 printf("%d:\t", s->second.nShopID);
 for (auto s_ = s->second.lstFrom10k->begin(); 
 s_ != s->second.lstFrom10k->end();
 s_++) {
 printf("%d, ", *s_);
 }
 printf("\n");
 }
}
// }}}
// {{{ DebugPrintSG
void DebugPrintSG(ShopGroupsMap *mapSG, ShopsMap *mapS) {
 // print shop groups
 for (auto sg=mapSG->begin(); sg != mapSG->end(); sg++) {
 printf("%d:\t", sg->second.nGroupID);
 for (auto s = sg->second.lstShops->begin();
 s != sg->second.lstShops->end();
 s++) {
 printf("%d ", *s);
 }
 printf("\n");
 }
 // print shops
 for ( auto s = mapS->begin(); s != mapS->end(); s++) {
 printf("%d : %d\n", s->second.nShopID, s->second.nGroupID);
 }
}
// }}}
// {{{ PostProcessItem
void PostProcessItem(ShopStockStruct *pstSS, ShopsMap *mapS, FILE *out) {
 AvailabilityStruct *pstAV;
 for (auto s = mapS->begin();
 s != mapS->end();
 s++) {
 // if shop is not present in current item Availability
 // add it with 0 amounts, 
 // coz we could still move goods there from others shops
 auto s_ = pstSS->mapAv->find(s->second.nShopID);
 if (s_ == pstSS->mapAv->end()) {
 pstAV = new AvailabilityStruct();
 pstAV->nShopID = s->second.nShopID;
 pstAV->nAmount = 0;
 pstAV->nTotalAmount = 0;
 pstAV->nTotalAmount10k = 0;
 (*(pstSS->mapAv))[s->second.nShopID] = *pstAV;
 }
 }
 // summing up total availability
 for (auto av = pstSS->mapAv->begin();
 av != pstSS->mapAv->end();
 av++) {
 // add amounts from shop itself 
 av->second.nTotalAmount = av->second.nAmount;
 av->second.nTotalAmount10k = av->second.nAmount;
 // add amount from other shops which could move goods into this one
 auto s_ = mapS->find(av->second.nShopID);
 if (s_ == mapS->end()) continue;
 for (auto s__ = s_->second.lstFromAll->begin();
 s__ != s_->second.lstFromAll->end();
 s__++) {
 auto av_ = pstSS->mapAv->find(*s__);
 if (av_ == pstSS->mapAv->end()) continue;
 av->second.nTotalAmount += av_->second.nAmount;
 if (*s__ < 10000) {
 av->second.nTotalAmount10k += av_->second.nAmount;
 }
 }
 }
 //DebugPrintSS(pstSS);
 WriteResult(pstSS, out);
}
// }}}
// {{{ DebugPrintSS
void DebugPrintSS(ShopStockStruct *pstSS) {
 printf("%s:\t", pstSS->sItemID);
 for (auto av = pstSS->mapAv->begin();
 av != pstSS->mapAv->end();
 av++) {
 printf("%d - %d - %d;", av->second.nShopID, av->second.nTotalAmount,
 av->second.nTotalAmount10k);
 }
 printf("\n");
}
// }}}
// {{{ WriteResult
void WriteResult(ShopStockStruct *pstSS, FILE *out) {
 for (auto av = pstSS->mapAv->begin();
 av != pstSS->mapAv->end();
 av++) {
 if (av->second.nTotalAmount + av->second.nTotalAmount10k == 0) {
 continue;
 }
 fprintf(out, "%s\t%d\t%d\t%d\n",
 pstSS->sItemID,
 av->second.nShopID,
 av->second.nTotalAmount,
 av->second.nTotalAmount10k
 );
 }
}
// }}}
// {{{ ProcessShopStocks
void ProcessShopStocks(ShopStocksMap *mapSS, ShopsMap *mapS, FILE *out) {
 std::ifstream file(SHOP_STOCKS_FILE);
 std::string l;
 ShopStockStruct *pstSS = NULL;
 AvailabilityStruct *pstAV;
 int nKey = -1;
 int nSSs = 0;
 char *sItemID;
 char *sItemIDprev = (char *)"";
 int nShopID;
 int nAmount;
 std::vector<std::string> vsItems;
 while (std::getline(file, l)) {
 // skip first line
 if (nKey < 0) {
 nKey = 0;
 continue;
 }
 vsItems = split(l, (char) '\t');
 if (vsItems.size() < 3) 
 continue;
 sItemID = strdup(vsItems[0].c_str());
 if (strcmp(sItemIDprev, sItemID) != 0) {
 // item changed - post process previous item
 if (pstSS != NULL)
 PostProcessItem(pstSS, mapS, out);
 }
 nShopID = atoi(vsItems[1].c_str());
 nAmount = atoi(vsItems[2].c_str());
 auto ss = mapSS->find(sItemID);
 if (ss == mapSS->end()) {
 // not found, so we need to create a new one
 pstSS = new ShopStockStruct();
 pstSS->sItemID = sItemID;
 pstSS->mapAv = new AvailabilitiesMap();
 // and store it, we'll add from value a bit later
 (* mapSS)[sItemID] = *pstSS;
 }
 else {
 // found group already, get a pointer to it for adding nGroupFrom
 // value later
 pstSS = &(ss->second);
 }
 // in any case (found or not), we've to add GroupFrom value
 // to do: create struct, and store it in map
 pstAV = new AvailabilityStruct();
 pstAV->nShopID = nShopID;
 pstAV->nAmount = nAmount;
 pstAV->nTotalAmount = 0;
 pstAV->nTotalAmount10k = 0;
 (*(pstSS->mapAv))[nShopID] = *pstAV;
 nSSs++;
 }
 // process the last item if exists 
 if (pstSS != NULL)
 PostProcessItem(pstSS, mapS, out);
 printf("Total %d ShopStocks loaded.\n", nSSs);
}
// }}}
// {{{ Generic split to vector function 
std::vector<std::string> &split(const std::string &s, char delim, std::vector<std::string> &elems) {
 std::stringstream ss(s);
 std::string item;
 while (std::getline(ss, item, delim)) {
 elems.push_back(item);
// printf("\t%s\n", item.c_str());
 }
 return elems;
}
std::vector<std::string> split(const std::string &s, char delim) {
 std::vector<std::string> elems;
 split(s, delim, elems);
 return elems;
}
// }}}

Makefile:

PROGRAM = shop-stock-total
CXX=g++
CXXFLAGS := -g -std=c++11 -Wall -O3 -march=native -mtune=native
LDFLAGS := -g -L /usr/lib:/usr/lib/x86_64-linux-gnu
all: $(PROGRAM)
$(PROGRAM): $(PROGRAM).gcda $(PROGRAM).o 
 $(CXX) $(CXXFLAGS) -fprofile-use -o $@ $(PROGRAM).o $(LDFLAGS) 
$(PROGRAM).gcda: profile/$(PROGRAM)
 ./profile/$(PROGRAM) >/dev/null
 cp profile/$(PROGRAM).gcda ./
profile/$(PROGRAM): $(PROGRAM).cpp $(PROGRAM).h
 [[ -d profile ]] || mkdir profile
 cp $(PROGRAM).cpp profile/
 cp $(PROGRAM).h profile/
 cd profile && $(CXX) -c $(CXXFLAGS) -fprofile-generate $(PROGRAM).cpp
 cd profile && $(CXX) $(CXXFLAFS) -fprofile-generate -o $(PROGRAM) $(PROGRAM).o $(LDFLAGS)
shop-stock-total.o: $(PROGRAM).cpp $(PROGRAM).h
 $(CXX) -c $(CXXFLAGS) -fprofile-use $(PROGRAM).cpp
clean:
 @rm -rf *.o $(PROGRAM) ./profile *.gcda

Question 2

For better performance, try to 1) replace all uses of std::list with std::vector 2) reduce pointer usage the items pointed to can often be stored by value instead.

Question 3

Make better meassuments. How much of those 19s are from reading the file? how much from writing to file?

Question 4

juvian, without writing result file cpp version runs 14.3-14.5 seconds.

Question 5

I count 18 uses of new, but 0 uses of delete. I don't think this code is working correctly. Not only that, AFAICT all of those uses of new aren't necessary if one were to make minor modifications to the datastructures used.

Question 6

The header misses #includes. Please go over your code again and make sure it actually works!

Question 7

I see a number of things that may help you improve your program.

Don't abuse `using namespace std`

Putting using namespace std at the top of every program is a bad habit that you'd do well to avoid.

Use consistent formatting

The code as posted has inconsistent indentation which makes it hard to read and understand. Pick a style and apply it consistently.

Don't use Hungarian notation

Prefixing every variable with an abbreviation of its type is usually called "Hungarian notation" and it was once popular. Even then it was a bad idea. Don't clutter up your source code with that; instead, concentrate on defining meaningful names for each variable and choose types appropriately.

Use better naming

The name mapSGM is not terrible, because once we figure out that SGM stands for "Shop Group Moves" we have, at least, some idea as to what it means. However many of the names are very poor. For instance within PostProcessItem(), we have s, s_, s__, av and av_. In addition to those being very short cryptic names, having deceptively similar names all in the same function is diabolically confusing.

Understand `new` and `delete`

When you declare a variable such as std::string item; it is called an automatic variable and needs neither new nor delete. It comes into existence and is destroyed as needed and is used rationally within split(). However, many places the code looks more like this:

ShopGroupMovesMap *mapSGM;
mapSGM = new ShopGroupMovesMap();
ProcessShopGroupMoves(mapSGM);

This is really not a good way to go about things because what ProcessShopGroupMoves actually does is to populate an empty container that it's handed. Better would be to have it return such a structure instead as:

auto mapSGM{ProcessShopGroupMoves()};

Note that I am using auto here and the uniform initialization syntax of C++11.

Don't leak memory

The memory that's allocated via new should be freed via delete but no such deletions occur within this program. That's the recipe for a memory leak and really should be addressed. The best way with modern C++ is to simply avoid explicitly calling new, and I'll demonstrate below.

Consider using range-for syntax

The code for DebugPrintSGM currently contains this code in the inner loop:

for (auto f=sgm->second.lstGroupsFrom->begin();
 f!=sgm->second.lstGroupsFrom->end(); 
 f++) {
 printf("%d ", *f);
}

In addition to being really hard to read and using the antique printf (more on that in a moment), it uses iterators when it could be greatly simplified using range-for syntax:

for (const auto &fromStore : sgm->second.lstGroupsFrom) {
 std::cout << fromStore << ' ';
}

Much neater, cleaner and easier to read, understand and maintain. Using similar syntax for the outer loop will also clean up the syntax for this loop.

Allow the user to specify input and output files

The file names are currently hardcoded which certainly greatly restricts the usefulness of the program. Consider using argc and argv to allow the user to specify file names on the command line.

Use objects better

The code uses unaugmented data structures such as std::list, but uses typedef and forgoes adding application-specific functions. This makes the code much, much more complex than it needs to be. For example, in a rewrite I did, here is what the Shops class looks like:

class Shops {
public:
 Shops(const std::string &groupfilename, const std::string &groupmovesfilename);
 Shops(std::istream &groupsfile, std::istream &movesfile);
 const std::vector<int> &getDestinationShops(int shop) const;
 std::ostream &show(int shopID, std::ostream &out = std::cout ) const;
 std::ostream &showGroups(std::ostream &out = std::cout ) const;
 const std::size_t shopCount() const { return shopgroup.size(); }
 const std::size_t groupCount() const { return groupshops.size(); }
private:
 std::unordered_map<int, int> shopgroup;
 std::unordered_map<int, std::vector<int>> groupshops;
};

The constructor receives either the file names or the two std::istream corresponding to two of the original files. show is a convenience function similar to your DebugPrintxx functions, except that the knowledge of how to print out the information is captured within the object, so that it may be easily re-used if we have collections of such objects.

Use a better algorithm

The current algorithm is very complex and difficult to follow. It also produces incorrect results, such as listing different inventories for the same store. Better would be to think carefully about both inputs and desired outputs and map out an algorithm for getting from one to the other. As inspiration, here's what my rewritten main looks like:

int main(int argc, char *argv[]) {
 if (argc != 4) {
 std::cout << "Usage: shops groupsfile movesfile stockfile\n";
 return 1;
 }
 Shops shops{argv[1], argv[2]};
 Inventory inv{argv[3]};
 inv.printAll(shops);
}

Much of the algorithm has now been distilled into printAll at the end, which may be redirected from std::cout to a file if desired. Here's the algorithm I employed in code form. First the two different style of constructor:

Shops::Shops(const std::string &groupfilename, const std::string &groupmovesfilename) {
 std::ifstream groupsfile{groupfilename};
 std::ifstream movesfile{groupmovesfilename};
 Shops temp{groupsfile, movesfile};
 std::swap(*this, temp);
}
Shops::Shops(std::istream &groupsfile, std::istream &movesfile) {
 skipline(groupsfile);
 int shop;
 int group;
 while (groupsfile >> shop >> group) {
 // skip zero ID shop
 if (shop != 0) {
 shopgroup[shop] = group;
 if (group != 0) {
 groupshops[group].push_back(shop);
 }
 }
 }
 // temporarily store the unaugmented groups
 auto basicgroupshops{groupshops};
 // now created the final augmented groups
 int srcGroup;
 int dstGroup;
 skipline(movesfile);
 while (movesfile >> srcGroup >> dstGroup) {
 const auto &basic{basicgroupshops[dstGroup]};
 groupshops[srcGroup].insert(groupshops[srcGroup].end(), basic.begin(), basic.end());
 }
 // for each group in augmented groups, 
 // sort and remove duplicates
 for (auto &item : groupshops)
 { 
 std::sort(item.second.begin(), item.second.end());
 item.second.erase(std::unique(item.second.begin(), item.second.end()), item.second.end());
 }
}

Note that shopgroup simply maps from a single shop number to a single shop group. The groupshops maps in the opposite direction from a single shop group to a collection of shops in that group. We then augment that using data from the shop group moves file. That is, if we have a shop group move line that says 8 15, we add all of the shops in group 15 to the list of destination for group 8. Eventually, for each group have an augmented list of shops, possibly including duplicates, which are then sorted and duplicates removed. Once that's complete, we no longer need the unaugmented groups, so that is a local variable that goes out of scope and is destroyed at the end of the constructor.

The Inventory class is very simple and looks like this:

class Inventory {
public:
 Inventory(const std::string &filename);
 std::ostream &printBasic(std::ostream &out = std::cout) const;
 std::ostream &printAll(const Shops &shops, std::ostream &out = std::cout) const;
private:
 struct ItemInventory {
 int shop;
 int qty;
 };
 std::unordered_map<std::string, std::vector<ItemInventory>> inv;
};

The constructor just reads in the file in the most straightforward way possible, using a skipline convenience function:

std::istream &skipline(std::istream &in) { 
 return in.ignore(std::numeric_limits<std::streamsize>::max(), '\n'); 
}
Inventory::Inventory(const std::string &filename) {
 std::ifstream infile{filename};
 skipline(infile);
 std::string item;
 int shop;
 int qty;
 while (infile >> item >> shop >> qty) {
 inv[item].push_back({shop, qty});
 }
};

The printAll function is then relatively simple:

std::ostream &Inventory::printAll(const Shops &shops, std::ostream &out) const {
 for (const auto &item: inv) {
 struct Totals {
 int total;
 int total10k;
 };
 std::unordered_map<int, Totals> itemTotals{};
 // create a map for totals for all recipient shops
 for (const auto &oneShop : item.second) {
 // count the stock from this shop
 itemTotals[oneShop.shop].total += oneShop.qty;
 itemTotals[oneShop.shop].total10k += oneShop.qty;
 // add stock to each other destination shop
 for (const auto &destShop : shops.getDestinationShops(oneShop.shop)) {
 if (oneShop.shop != destShop) {
 itemTotals[destShop].total += oneShop.qty;
 if (oneShop.shop < 10000) {
 itemTotals[destShop].total10k += oneShop.qty;
 }
 }
 }
 }
 for (const auto &total : itemTotals) {
 out << item.first << '\t' << total.first << '\t' 
 << total.second.total << '\t' << total.second.total10k << '\n';
 }
 }
 return out;
}

Results

The original, with a few errors fixed, takes around 1.915 seconds on my machine with the data files from your GitLab project. It also leaks about 80M of memory. By contrast, the rewritten version leaks no memory and runs in 0.194 seconds, or about 10 times faster. Also, because the structures it keeps in memory are simpler and smaller, it has a smaller overall footprint. Finally, it only produces a single line of output for each stockID, shopID pair, unlike the original, which, as I've mentioned, sometimes emits multiple lines with different totals.

Alternative version

For complete clarity, here's the complete, working, rewritten version which omits some convenience functions mentioned above. It's 140 lines long.

#include <iostream>
#include <fstream>
#include <vector>
#include <unordered_map>
#include <string>
#include <iterator>
#include <algorithm>
#include <limits>
class Shops {
public:
 Shops(const std::string &groupfilename, const std::string &groupmovesfilename);
 Shops(std::istream &groupsfile, std::istream &movesfile);
 const std::vector<int> &getDestinationShops(int shop) const;
 const std::size_t shopCount() const { return shopgroup.size(); }
 const std::size_t groupCount() const { return groupshops.size(); }
private:
 std::unordered_map<int, int> shopgroup;
 std::unordered_map<int, std::vector<int>> groupshops;
};
class Inventory {
public:
 Inventory(const std::string &filename);
 std::ostream &printAll(const Shops &shops, std::ostream &out = std::cout) const;
private:
 struct ItemInventory {
 int shop;
 int qty;
 };
 std::unordered_map<std::string, std::vector<ItemInventory>> inv;
};
std::istream &skipline(std::istream &in) { 
 return in.ignore(std::numeric_limits<std::streamsize>::max(), '\n'); 
}
Inventory::Inventory(const std::string &filename) {
 std::ifstream infile{filename};
 skipline(infile);
 std::string item;
 int shop;
 int qty;
 while (infile >> item >> shop >> qty) {
 inv[item].push_back({shop, qty});
 }
};
std::ostream &Inventory::printAll(const Shops &shops, std::ostream &out) const {
 for (const auto &item: inv) {
 struct Totals {
 int total;
 int total10k;
 };
 std::unordered_map<int, Totals> itemTotals{};
 // create a map for totals for all recipient shops
 for (const auto &oneShop : item.second) {
 // count the stock from this shop
 itemTotals[oneShop.shop].total += oneShop.qty;
 itemTotals[oneShop.shop].total10k += oneShop.qty;
 // add stock to each destination shop
 for (const auto &destShop : shops.getDestinationShops(oneShop.shop)) {
 if (oneShop.shop != destShop) {
 itemTotals[destShop].total += oneShop.qty;
 if (oneShop.shop < 10000) {
 itemTotals[destShop].total10k += oneShop.qty;
 }
 }
 }
 }
 for (const auto &total : itemTotals) {
 out << item.first << '\t' << total.first << '\t' 
 << total.second.total << '\t' << total.second.total10k << '\n';
 }
 }
 return out;
}
Shops::Shops(const std::string &groupfilename, const std::string &groupmovesfilename) {
 std::ifstream groupsfile{groupfilename};
 std::ifstream movesfile{groupmovesfilename};
 Shops temp{groupsfile, movesfile};
 std::swap(*this, temp);
}
Shops::Shops(std::istream &groupsfile, std::istream &movesfile) {
 skipline(groupsfile);
 int shop;
 int group;
 while (groupsfile >> shop >> group) {
 // skip zero ID shop
 if (shop != 0) {
 shopgroup[shop] = group;
 if (group != 0) {
 groupshops[group].push_back(shop);
 }
 }
 }
 // temporarily store the unaugmented groups
 auto basicgroupshops{groupshops};
 // now created the final augmented groups
 int srcGroup;
 int dstGroup;
 skipline(movesfile);
 while (movesfile >> srcGroup >> dstGroup) {
 const auto &basic{basicgroupshops[dstGroup]};
 groupshops[srcGroup].insert(groupshops[srcGroup].end(), basic.begin(), basic.end());
 }
 // for each group in augmented groups, 
 // sort and remove duplicates
 for (auto &item : groupshops)
 { 
 std::sort(item.second.begin(), item.second.end());
 item.second.erase(std::unique(item.second.begin(), item.second.end()), item.second.end());
 }
}
/*
 * return a vector of destination shops for this shop
 * or throw an out_of_range error
 */
const std::vector<int> &Shops::getDestinationShops(int shop) const {
 static const std::vector<int> mt{};
 auto srcGroup{shopgroup.at(shop)};
 auto grp{groupshops.find(srcGroup)};
 if (grp == groupshops.end()) {
 return mt;
 } 
 return grp->second;
}
int main(int argc, char *argv[]) {
 if (argc != 4) {
 std::cout << "Usage: shops groupsfile movesfile stockfile\n";
 return 1;
 }
 Shops shops{argv[1], argv[2]};
 Inventory inv{argv[3]};
 inv.printAll(shops);
}

Question 8

Edwards, thanks for a detailed answer. I'm still doing corrections to my code based on your feedback. However, I've some issues still. 1) I still have to use pointer when using unordered_map.find() method. And it's not clear how I could get rid of it. 2) Converting from WriteResult function from fprintf to << style increased total time from 1.0 seconds to 1.3 seconds. Might be, I did something wrong. 3) In ProcessShopStocks item, I'm creating AvailabilityStruct using new because of I'm adding it to unordered_map. Without using new I suspect it will be freed when out of scope. Can I avoid it?

Question 9

btw, how did you measure my memory leak? Isn't memory reclaimed by OS after process quit?

Question 10

My rewrite uses no pointers, so I'm quite sure it's possible. Also, it's unlikely that operator<< takes more time than fprintf. The cause is likely something else. For adding things to an unordered_map, I used operator[] in lines like this: groupshops[group].push_back(shop); No pointers or new are required. Note that std::vector::push_back creates a copy of the pushed element unless it can be moved as with std::move.

Question 11

The tool I used to measure memory usage and leaks is valgrind which I find to be an extremely valuable set of tools. Although the OS will typically reclaim memory after the program exits, the problem is that by not freeing memory as you go, the program's memory usage will grow and grow and get slower and slower until memory is exhausted. And then you'll be left trying to figure out how to debug a program that only crashes with certain very long input and exactly when depends on what other software is concurrently running. Leaky software is unreliable!

Question 12

I've added my complete alternative code to the answer to make it simpler to see what's been done and how. Hope that helps!

Question 13

Bug

The output, shop_stock_total.csv, contains the following lines, each of which is a tuple of (item, shop, quantity available, quantity available excluding transfers from warehouse stores):

100971 1 2 2
100971 1 4 4
100971 1 6 6

Obviously, there should be at most one row representing item 100971 at store 1. These results were not aggregated correctly. There are many more occurrences of this bug.

Also note that one of your input files, sgm.csv, contains some duplicate entries, such as these:

shop_group_from shop_group_to
38 39
38 39

Assuming that this doesn't mean that goods magically double in quantity when moved from group 38 to 39, you'll have to be extra careful in the calculations. Perhaps you have another bug in the part of your system that generates sgm.csv.

Insanity

In my opinion, trying to accomplish this query in C++ is crazy. Your C++ program consists of a huge amount of custom code, which is hard to understand (as evidenced by the bug above), and will likely be unmaintainable by your successor if you ever move on to another job. Furthermore, the program does one task only, and would be a lot of work to modify it to perform any other queries.

I think that you are not using the right tool for the job. Running queries on relational data is traditionally done using SQL, and that's what you should use.

What about performance?

Is the solution above faster than your C++ program? Admittedly, no: it takes about twice as long. However, consider what steps you can easily take to improve its performance:

Switch to a more capable database system than SQLite (such as PostgreSQL)
Keep the inventory data in the SQL database, already in memory, so that you don't have to spend time loading CSV files
Use materialized views and triggers to automatically regenerate the query results whenever inventory levels change
Take the money that you saved by not spending so much time on maintaining the code, and use it to buy faster hardware

Question 14

200_success, thanks for pointing to the bug. I'll need to verify it. The .csv files are retrieved from database (sgm is filled manually to the database, that's why it could contains duplicates.) Of course, items are not moved twice from shop to shop in that case. We had initial implementation using SQL query and it was very slow (minutes to run) on our mysql server. The server itself is powerful enough. That's why we've rewritten the solution using csv offload, python processing and than .csv loading. This way it's much faster than doing this in SQL. I could post SQL and tables, so you'll see.

Question 15

Minutes? This SQLite query ran in a few seconds on an old laptop. Perhaps you should have your SQL schema and query reviewed instead of this gigantic workaround.

Question 16

In any case, I agree that the SQL-based version would be an excellent candidate for another question here.

Question 17

200_success, yep, you was right. There are bugs. I did 'cat shop_stock_total.csv | sort |less' and there are lots of duplicates. All with the same values as I see but this is wrong any way. I'll need to find out how it's happening.

Question 18

posted an sql version

Question 19

Most of the issues are answered by Edward in the first answer. However, the suggested by Edward solution is slower than my current. So I've decided to post mine.

Rationale for changes

There're no ShopGroup structures anymore. Since group is not used any further except inside shops. I've decided to use group only internally inside Shops class.
During processing of shops. The shop item itself is included as target shop for itself. This helps to avoid processing shop for itself separately. Instead, it happens during for-cycle like all others destinations.
Using at() method of unordered_map helps to pass argument with const declaration and it does not require to create static empty vector.
Shops with group_id equal to 0 have special meaning - "no group specified". Thus, in spite of all such shops have the same group_id - there are no movements between them. I had to add a special magic_number there to avoid moving between those shops. This is bad - but I don't see how to fix this without lots of complications.
Both classes could be static. Since there's only single object exists during lifetime of application. I've decided not to use static. Guess, it could be better in case of reusing those objects later. And due to this, I still have a separate header file.

P.S. I've decided not to use parameters for filenames since this is not required P.P.S when line is wrapped - my editor add 2 tabs instead of 1 (guess, this is easier to read. I didn't find any issues with the formatting of original code). And yes, manual code folding

// {{{ comment

// code goes here

// }}}

are looking nice. And it allows better grouping of code blocks than all IDEs code folding I've seen, IMHO.

shop-stock-total.h

// {{{ Structures, typedefs and classes
// {{{ Shops
class Shops {
public:
 Shops(const std::string& shop_groups_filename, 
 const std::string& groups_moves_filename);
 const std::vector<int>& getToShops(int shop_id) const;
private:
 std::unordered_map<int, std::vector<int>> shops_to_shops;
};
// }}}
// {{{
struct ItemStockStruct {
 int totalAmount;
 int totalAmount10k;
};
// }}}
// {{{ ShopsStocksProcessor
class ShopsStocksProcessor {
public:
 ShopsStocksProcessor(const Shops& shops, const std::string& stock_filename,
 const std::string& out_filename);
private:
 void save_item_stock(std::ofstream& out, const std::string& item_id,
 const std::unordered_map<int, ItemStockStruct>& item_stock);
};
// }}}
// }}}
// {{{ Functions
// {{{ generic
// {{{ skipline
std::istream &skipline(std::istream &in); 
// }}}
// }}}
// }}}

shop-stock-total.cpp

// {{{ standard and other includes
#include <stdio.h>
#include <stdlib.h>
#include <iostream>
#include <locale>
#include <fstream>
#include <string>
#include <string.h>
#include <sstream>
#include <assert.h>
#include <ctype.h>
#include <algorithm>
#include <unordered_map>
#include <vector>
#include <list>
#include <set>
// }}}
// {{{ filenames 
const char *SHOP_GROUP_MOVES_FILE = "sgm.csv";
const char *SHOP_GROUPS_FILE = "shopgroups.csv";
const char *SHOP_STOCKS_FILE = "shopstock_all.csv";
const char *SHOP_STOCK_TOTALS_FILE = "shop_stock_total.csv";
// }}}
#include "shop-stock-total.h"
// {{{ Shops
// {{{ Shops::Shops
Shops::Shops(const std::string& shop_groups_filename, 
 const std::string& groups_moves_filename) {
 int shop_id, group_id;
 // temporary storage for shops and groups
 std::unordered_map<int, int> shops;
 std::unordered_map<int, std::vector<int>> groups;
 // read file and store shops and groups
 std::ifstream shop_groups_file{shop_groups_filename};
 skipline(shop_groups_file);
 int magic_group_id = 100000;
 while (shop_groups_file >> shop_id >> group_id) {
 if (shop_id == 0) continue;
 if (group_id == 0) group_id = magic_group_id++;
 shops[shop_id] = group_id; 
 groups[group_id].push_back(shop_id);
 }
 // add shops from the same group
 for (const auto& shop: shops) {
 shops_to_shops[shop.first].insert(shops_to_shops[shop.first].end(),
 groups[shops[shop.first]].begin(),
 groups[shops[shop.first]].end());
 }
 // read group moves file and add toShops from the toGroup into
 // all corresponding fromGroup shops
 int groupFrom, groupTo;
 std::ifstream group_moves_file{groups_moves_filename};
 skipline(group_moves_file);
 while (group_moves_file >> groupFrom >> groupTo) {
 for (auto shop_id : groups[groupFrom]) {
 shops_to_shops[shop_id].insert(shops_to_shops[shop_id].end(),
 groups[groupTo].begin(),
 groups[groupTo].end());
 }
 }
 // erase shop itself from it's own list,
 // sort and remove dupes
 for (auto& shop: shops_to_shops) {
 // de duping via set and back to vector
 std::set<int> s(
 shop.second.begin(),
 shop.second.end()
 );
 //assign back to vector
 shop.second.assign(s.begin(), s.end());
 }
}
// }}}
// {{{ Shops::getToShops
const std::vector<int>& Shops::getToShops(int shop_id) const {
 return shops_to_shops.at(shop_id);
}
// }}}
// }}}
// {{{ ShopsStocksProcessor
// {{{ ProcessShopStocks
ShopsStocksProcessor::ShopsStocksProcessor(const Shops& shops,
 const std::string& stock_filename,
 const std::string& out_filename) {
 std::string item_id, item_prev_id = "";
 int shop_id, amount;
 std::ifstream stock_file{stock_filename};
 std::ofstream out{out_filename};
 std::unordered_map<int, ItemStockStruct> item_stock;
 // reading file
 skipline(stock_file);
 while (stock_file >> item_id >> shop_id >> amount) {
 // if item has changed than output the previous one
 if (item_id.compare(item_prev_id) != 0) {
 save_item_stock(out, item_prev_id, item_stock);
 item_stock.clear();
 item_prev_id = item_id;
 }
 for (const auto s: shops.getToShops(shop_id)) {
 item_stock[s].totalAmount += amount;
 if (shop_id < 10000) item_stock[s].totalAmount10k += amount;
 }
 }
 if (item_id.length() > 0) {
 save_item_stock(out, item_id, item_stock);
 }
}
// }}}
// {{{ save_item_stock
void ShopsStocksProcessor::save_item_stock(std::ofstream& out, const std::string& item_id,
 const std::unordered_map<int, ItemStockStruct>& item_stock) {
 for (const auto& item : item_stock) {
 out << item_id << "\t" << item.first << "\t"
 << item.second.totalAmount << "\t"
 << item.second.totalAmount10k << "\n";
 }
}
// }}}
// }}}
// {{{ skipline (thx Edward from CodeReview @ StackExchange
std::istream &skipline(std::istream &in) { 
 return in.ignore(std::numeric_limits<std::streamsize>::max(), '\n'); 
}
// }}}
// {{{ main
int main() {
 ShopsStocksProcessor(
 Shops(SHOP_GROUPS_FILE, SHOP_GROUP_MOVES_FILE),
 SHOP_STOCKS_FILE, SHOP_STOCK_TOTALS_FILE); 
 return 0;
}
// }}}

An original Makefile is used.

Performance

Well, this version is slightly (5-10%) faster than Edward's solution. And it keeps only inventory of single item in memory based on assumption that shopstock_all.csv is sorted by item_idx field. Code is updated on Gitlab.

Edward Edward 67.2k4 gold badges120 silver badges284 bronze badges · Accepted Answer · 2018-07-07 00:05:52Z

I see a number of things that may help you improve your program.

Don't abuse `using namespace std`

Putting using namespace std at the top of every program is a bad habit that you'd do well to avoid.

Use consistent formatting

The code as posted has inconsistent indentation which makes it hard to read and understand. Pick a style and apply it consistently.

Don't use Hungarian notation

Prefixing every variable with an abbreviation of its type is usually called "Hungarian notation" and it was once popular. Even then it was a bad idea. Don't clutter up your source code with that; instead, concentrate on defining meaningful names for each variable and choose types appropriately.

Use better naming

The name mapSGM is not terrible, because once we figure out that SGM stands for "Shop Group Moves" we have, at least, some idea as to what it means. However many of the names are very poor. For instance within PostProcessItem(), we have s, s_, s__, av and av_. In addition to those being very short cryptic names, having deceptively similar names all in the same function is diabolically confusing.

Understand `new` and `delete`

When you declare a variable such as std::string item; it is called an automatic variable and needs neither new nor delete. It comes into existence and is destroyed as needed and is used rationally within split(). However, many places the code looks more like this:

ShopGroupMovesMap *mapSGM;
mapSGM = new ShopGroupMovesMap();
ProcessShopGroupMoves(mapSGM);

This is really not a good way to go about things because what ProcessShopGroupMoves actually does is to populate an empty container that it's handed. Better would be to have it return such a structure instead as:

auto mapSGM{ProcessShopGroupMoves()};

Note that I am using auto here and the uniform initialization syntax of C++11.

Don't leak memory

The memory that's allocated via new should be freed via delete but no such deletions occur within this program. That's the recipe for a memory leak and really should be addressed. The best way with modern C++ is to simply avoid explicitly calling new, and I'll demonstrate below.

Consider using range-for syntax

The code for DebugPrintSGM currently contains this code in the inner loop:

for (auto f=sgm->second.lstGroupsFrom->begin();
 f!=sgm->second.lstGroupsFrom->end(); 
 f++) {
 printf("%d ", *f);
}

In addition to being really hard to read and using the antique printf (more on that in a moment), it uses iterators when it could be greatly simplified using range-for syntax:

for (const auto &fromStore : sgm->second.lstGroupsFrom) {
 std::cout << fromStore << ' ';
}

Much neater, cleaner and easier to read, understand and maintain. Using similar syntax for the outer loop will also clean up the syntax for this loop.

Allow the user to specify input and output files

The file names are currently hardcoded which certainly greatly restricts the usefulness of the program. Consider using argc and argv to allow the user to specify file names on the command line.

Use objects better

The code uses unaugmented data structures such as std::list, but uses typedef and forgoes adding application-specific functions. This makes the code much, much more complex than it needs to be. For example, in a rewrite I did, here is what the Shops class looks like:

class Shops {
public:
 Shops(const std::string &groupfilename, const std::string &groupmovesfilename);
 Shops(std::istream &groupsfile, std::istream &movesfile);
 const std::vector<int> &getDestinationShops(int shop) const;
 std::ostream &show(int shopID, std::ostream &out = std::cout ) const;
 std::ostream &showGroups(std::ostream &out = std::cout ) const;
 const std::size_t shopCount() const { return shopgroup.size(); }
 const std::size_t groupCount() const { return groupshops.size(); }
private:
 std::unordered_map<int, int> shopgroup;
 std::unordered_map<int, std::vector<int>> groupshops;
};

The constructor receives either the file names or the two std::istream corresponding to two of the original files. show is a convenience function similar to your DebugPrintxx functions, except that the knowledge of how to print out the information is captured within the object, so that it may be easily re-used if we have collections of such objects.

Use a better algorithm

The current algorithm is very complex and difficult to follow. It also produces incorrect results, such as listing different inventories for the same store. Better would be to think carefully about both inputs and desired outputs and map out an algorithm for getting from one to the other. As inspiration, here's what my rewritten main looks like:

int main(int argc, char *argv[]) {
 if (argc != 4) {
 std::cout << "Usage: shops groupsfile movesfile stockfile\n";
 return 1;
 }
 Shops shops{argv[1], argv[2]};
 Inventory inv{argv[3]};
 inv.printAll(shops);
}

Much of the algorithm has now been distilled into printAll at the end, which may be redirected from std::cout to a file if desired. Here's the algorithm I employed in code form. First the two different style of constructor:

Shops::Shops(const std::string &groupfilename, const std::string &groupmovesfilename) {
 std::ifstream groupsfile{groupfilename};
 std::ifstream movesfile{groupmovesfilename};
 Shops temp{groupsfile, movesfile};
 std::swap(*this, temp);
}
Shops::Shops(std::istream &groupsfile, std::istream &movesfile) {
 skipline(groupsfile);
 int shop;
 int group;
 while (groupsfile >> shop >> group) {
 // skip zero ID shop
 if (shop != 0) {
 shopgroup[shop] = group;
 if (group != 0) {
 groupshops[group].push_back(shop);
 }
 }
 }
 // temporarily store the unaugmented groups
 auto basicgroupshops{groupshops};
 // now created the final augmented groups
 int srcGroup;
 int dstGroup;
 skipline(movesfile);
 while (movesfile >> srcGroup >> dstGroup) {
 const auto &basic{basicgroupshops[dstGroup]};
 groupshops[srcGroup].insert(groupshops[srcGroup].end(), basic.begin(), basic.end());
 }
 // for each group in augmented groups, 
 // sort and remove duplicates
 for (auto &item : groupshops)
 { 
 std::sort(item.second.begin(), item.second.end());
 item.second.erase(std::unique(item.second.begin(), item.second.end()), item.second.end());
 }
}

Note that shopgroup simply maps from a single shop number to a single shop group. The groupshops maps in the opposite direction from a single shop group to a collection of shops in that group. We then augment that using data from the shop group moves file. That is, if we have a shop group move line that says 8 15, we add all of the shops in group 15 to the list of destination for group 8. Eventually, for each group have an augmented list of shops, possibly including duplicates, which are then sorted and duplicates removed. Once that's complete, we no longer need the unaugmented groups, so that is a local variable that goes out of scope and is destroyed at the end of the constructor.

The Inventory class is very simple and looks like this:

class Inventory {
public:
 Inventory(const std::string &filename);
 std::ostream &printBasic(std::ostream &out = std::cout) const;
 std::ostream &printAll(const Shops &shops, std::ostream &out = std::cout) const;
private:
 struct ItemInventory {
 int shop;
 int qty;
 };
 std::unordered_map<std::string, std::vector<ItemInventory>> inv;
};

The constructor just reads in the file in the most straightforward way possible, using a skipline convenience function:

std::istream &skipline(std::istream &in) { 
 return in.ignore(std::numeric_limits<std::streamsize>::max(), '\n'); 
}
Inventory::Inventory(const std::string &filename) {
 std::ifstream infile{filename};
 skipline(infile);
 std::string item;
 int shop;
 int qty;
 while (infile >> item >> shop >> qty) {
 inv[item].push_back({shop, qty});
 }
};

The printAll function is then relatively simple:

std::ostream &Inventory::printAll(const Shops &shops, std::ostream &out) const {
 for (const auto &item: inv) {
 struct Totals {
 int total;
 int total10k;
 };
 std::unordered_map<int, Totals> itemTotals{};
 // create a map for totals for all recipient shops
 for (const auto &oneShop : item.second) {
 // count the stock from this shop
 itemTotals[oneShop.shop].total += oneShop.qty;
 itemTotals[oneShop.shop].total10k += oneShop.qty;
 // add stock to each other destination shop
 for (const auto &destShop : shops.getDestinationShops(oneShop.shop)) {
 if (oneShop.shop != destShop) {
 itemTotals[destShop].total += oneShop.qty;
 if (oneShop.shop < 10000) {
 itemTotals[destShop].total10k += oneShop.qty;
 }
 }
 }
 }
 for (const auto &total : itemTotals) {
 out << item.first << '\t' << total.first << '\t' 
 << total.second.total << '\t' << total.second.total10k << '\n';
 }
 }
 return out;
}

Results

The original, with a few errors fixed, takes around 1.915 seconds on my machine with the data files from your GitLab project. It also leaks about 80M of memory. By contrast, the rewritten version leaks no memory and runs in 0.194 seconds, or about 10 times faster. Also, because the structures it keeps in memory are simpler and smaller, it has a smaller overall footprint. Finally, it only produces a single line of output for each stockID, shopID pair, unlike the original, which, as I've mentioned, sometimes emits multiple lines with different totals.

Alternative version

For complete clarity, here's the complete, working, rewritten version which omits some convenience functions mentioned above. It's 140 lines long.

#include <iostream>
#include <fstream>
#include <vector>
#include <unordered_map>
#include <string>
#include <iterator>
#include <algorithm>
#include <limits>
class Shops {
public:
 Shops(const std::string &groupfilename, const std::string &groupmovesfilename);
 Shops(std::istream &groupsfile, std::istream &movesfile);
 const std::vector<int> &getDestinationShops(int shop) const;
 const std::size_t shopCount() const { return shopgroup.size(); }
 const std::size_t groupCount() const { return groupshops.size(); }
private:
 std::unordered_map<int, int> shopgroup;
 std::unordered_map<int, std::vector<int>> groupshops;
};
class Inventory {
public:
 Inventory(const std::string &filename);
 std::ostream &printAll(const Shops &shops, std::ostream &out = std::cout) const;
private:
 struct ItemInventory {
 int shop;
 int qty;
 };
 std::unordered_map<std::string, std::vector<ItemInventory>> inv;
};
std::istream &skipline(std::istream &in) { 
 return in.ignore(std::numeric_limits<std::streamsize>::max(), '\n'); 
}
Inventory::Inventory(const std::string &filename) {
 std::ifstream infile{filename};
 skipline(infile);
 std::string item;
 int shop;
 int qty;
 while (infile >> item >> shop >> qty) {
 inv[item].push_back({shop, qty});
 }
};
std::ostream &Inventory::printAll(const Shops &shops, std::ostream &out) const {
 for (const auto &item: inv) {
 struct Totals {
 int total;
 int total10k;
 };
 std::unordered_map<int, Totals> itemTotals{};
 // create a map for totals for all recipient shops
 for (const auto &oneShop : item.second) {
 // count the stock from this shop
 itemTotals[oneShop.shop].total += oneShop.qty;
 itemTotals[oneShop.shop].total10k += oneShop.qty;
 // add stock to each destination shop
 for (const auto &destShop : shops.getDestinationShops(oneShop.shop)) {
 if (oneShop.shop != destShop) {
 itemTotals[destShop].total += oneShop.qty;
 if (oneShop.shop < 10000) {
 itemTotals[destShop].total10k += oneShop.qty;
 }
 }
 }
 }
 for (const auto &total : itemTotals) {
 out << item.first << '\t' << total.first << '\t' 
 << total.second.total << '\t' << total.second.total10k << '\n';
 }
 }
 return out;
}
Shops::Shops(const std::string &groupfilename, const std::string &groupmovesfilename) {
 std::ifstream groupsfile{groupfilename};
 std::ifstream movesfile{groupmovesfilename};
 Shops temp{groupsfile, movesfile};
 std::swap(*this, temp);
}
Shops::Shops(std::istream &groupsfile, std::istream &movesfile) {
 skipline(groupsfile);
 int shop;
 int group;
 while (groupsfile >> shop >> group) {
 // skip zero ID shop
 if (shop != 0) {
 shopgroup[shop] = group;
 if (group != 0) {
 groupshops[group].push_back(shop);
 }
 }
 }
 // temporarily store the unaugmented groups
 auto basicgroupshops{groupshops};
 // now created the final augmented groups
 int srcGroup;
 int dstGroup;
 skipline(movesfile);
 while (movesfile >> srcGroup >> dstGroup) {
 const auto &basic{basicgroupshops[dstGroup]};
 groupshops[srcGroup].insert(groupshops[srcGroup].end(), basic.begin(), basic.end());
 }
 // for each group in augmented groups, 
 // sort and remove duplicates
 for (auto &item : groupshops)
 { 
 std::sort(item.second.begin(), item.second.end());
 item.second.erase(std::unique(item.second.begin(), item.second.end()), item.second.end());
 }
}
/*
 * return a vector of destination shops for this shop
 * or throw an out_of_range error
 */
const std::vector<int> &Shops::getDestinationShops(int shop) const {
 static const std::vector<int> mt{};
 auto srcGroup{shopgroup.at(shop)};
 auto grp{groupshops.find(srcGroup)};
 if (grp == groupshops.end()) {
 return mt;
 } 
 return grp->second;
}
int main(int argc, char *argv[]) {
 if (argc != 4) {
 std::cout << "Usage: shops groupsfile movesfile stockfile\n";
 return 1;
 }
 Shops shops{argv[1], argv[2]};
 Inventory inv{argv[3]};
 inv.printAll(shops);
}

Edwards, thanks for a detailed answer. I'm still doing corrections to my code based on your feedback. However, I've some issues still. 1) I still have to use pointer when using unordered_map.find() method. And it's not clear how I could get rid of it. 2) Converting from WriteResult function from fprintf to << style increased total time from 1.0 seconds to 1.3 seconds. Might be, I did something wrong. 3) In ProcessShopStocks item, I'm creating AvailabilityStruct using new because of I'm adding it to unordered_map. Without using new I suspect it will be freed when out of scope. Can I avoid it?
btw, how did you measure my memory leak? Isn't memory reclaimed by OS after process quit?
My rewrite uses no pointers, so I'm quite sure it's possible. Also, it's unlikely that operator<< takes more time than fprintf. The cause is likely something else. For adding things to an unordered_map, I used operator[] in lines like this: groupshops[group].push_back(shop); No pointers or new are required. Note that std::vector::push_back creates a copy of the pushed element unless it can be moved as with std::move.
The tool I used to measure memory usage and leaks is valgrind which I find to be an extremely valuable set of tools. Although the OS will typically reclaim memory after the program exits, the problem is that by not freeing memory as you go, the program's memory usage will grow and grow and get slower and slower until memory is exhausted. And then you'll be left trying to figure out how to debug a program that only crashes with certain very long input and exactly when depends on what other software is concurrently running. Leaky software is unreliable!
I've added my complete alternative code to the answer to make it simpler to see what's been done and how. Hope that helps!

Stack Exchange Network

Calculating effective shop inventories from CSV files

Program purpose

Input and algorithm details

Shops and groups

Simple inventory

Effective inventory

Output file

The code

Questions

shop_stock_total.h

shop_stock_total.cpp

Makefile:

3 Answers 3

Don't abuse `using namespace std`

Use consistent formatting

Don't use Hungarian notation

Use better naming

Understand `new` and `delete`

Don't leak memory

Consider using range-for syntax

Allow the user to specify input and output files

Use objects better

Use a better algorithm

Results

Alternative version

Bug

Insanity

Suggested solution

What about performance?

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Calculating effective shop inventories from CSV files

Program purpose

Input and algorithm details

Shops and groups

Simple inventory

Effective inventory

Output file

The code

Questions

shop_stock_total.h

shop_stock_total.cpp

Makefile:

3 Answers 3

Don't abuse using namespace std

Use consistent formatting

Don't use Hungarian notation

Use better naming

Understand new and delete

Don't leak memory

Consider using range-for syntax

Allow the user to specify input and output files

Use objects better

Use a better algorithm

Results

Alternative version

Bug

Insanity

Suggested solution

What about performance?

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions

Don't abuse `using namespace std`

Understand `new` and `delete`