Program purpose
The purpose for the program is to create a file containing the effective inventory for each stocked item stored in a collection of shops.
Input and algorithm details
For clarity, in the descriptions below, variables in the C++ program use this style: nShopID
, while fields in CSV files are shown like this: shop_id.
Shops and groups
The problem domain is a collection of shops (identified by an int
shop ID number) each of which is assigned to a group (also identified by an int
). The association between nShopID
and nGroupID
is contained in a text file shopgroups.csv
, which is a tab-delimited file with a header with two fields: shop_id and shopgroup_id.
Simple inventory
The simple inventory is in a file named shopstock_all.csv
. That file is also a tab-delimited file with header containing three fields: item_idx, shop_id, and amount. The item_idx field is a string containing the item identifier, and the shop_id represents the shop that has the item and amount is an integer representing the quantity of the item at that shop.
Effective inventory
The simple inventory is simply the number of items at each shop, as described above, but the desired effective inventory takes into account the fact that the inventory of nearby shops could, if needed, be quickly moved to one particular shop. Thus, the effective inventory of a particular shop represents the simple inventory of that shop plus that of nearby stores. Nearby stores and the direction of possible movement of inventory is represented in a third tab-delimited file called sgm.csv
which contains a header and two fields: shop_group_from and shop_group_to. For example, a line in the file might say 8 15
which means that any inventory in group 8 can be counted in the effective inventory of all shops in group 15. These associations are many-to-many (that is group 8 might feed multiple groups and group 15 might be fed from multiple other groups) and not necessarily reciprocal. That is, group 8 inventory may be moved to group 15 shops but not necessarily the other direction unless explicitly stated in the file.
Additionally, all shops within the same group also share their effective inventory without needing a special line in the sgm.csv
file.
Finally, some warehouse shops (those with nShopID > 10000
) have extra storage available and goods are only transferred from and never to such shops. To differentiate, the program actually calculates two effective inventory quantities for each store; nTotalAmount
which is the effective inventory from all stores including these warehouse shops and nTotalAmount10k
which is the effective inventory excluding these warehouse shops.
Output file
The output file is a tab-delimited file with no header with each record (line) containing four fields: itemID, shopID, totalAmount, totalAmount10k each of which has been described above. The order of the records within the file does not matter.
The code
The code and sample files are all on on GitLab
Questions
Since I rarely write C++ code, I've got a mix of C++ and C code. Could it be fixed without hurting performance?
What should be fixed to improve readability, naming of variables and code styling?
Could makefile be less "bloated"? (This is the 2nd one I've ever written.)
P.S. I'm using vim, so there are
// {{{ manual-style folding comments
// ...
// }}}
and I would prefer to leave those as is.
shop_stock_total.h
// {{{ Structures and typedefs
// {{{ ShopGroupMove
typedef list<int> GroupsFromList;
struct ShopGroupMoveStruct {
int nGroupTo;
GroupsFromList *lstGroupsFrom;
};
typedef unordered_map<int, struct ShopGroupMoveStruct> ShopGroupMovesMap;
// }}}
// {{{ ShopGroup
typedef list<int> ShopsList;
struct ShopGroupStruct {
int nGroupID;
ShopsList *lstShops; // all shops with the same group
};
typedef unordered_map<int, struct ShopGroupStruct> ShopGroupsMap;
// }}}
// {{{ Shop
struct ShopStruct {
int nShopID;
int nGroupID;
ShopsList *lstFromAll;
ShopsList *lstFrom10k;
};
typedef unordered_map<int, struct ShopStruct> ShopsMap;
// }}}
// {{{ Availability
struct AvailabilityStruct {
int nShopID;
int nAmount;
int nTotalAmount;
int nTotalAmount10k;
};
typedef unordered_map<int, struct AvailabilityStruct> AvailabilitiesMap;
// }}}
// {{{ ShopStock
struct ShopStockStruct {
char *sItemID;
AvailabilitiesMap *mapAv;
};
typedef unordered_map<std::string, struct ShopStockStruct> ShopStocksMap;
// }}}
// }}}
// {{{ Functions
// {{{ APP specific
void ProcessShopGroupMoves(ShopGroupMovesMap *mapSGM);
void DebugPrintSGM(ShopGroupMovesMap *mapSGM);
void ProcessShopGroups(ShopGroupsMap *mapSG, ShopsMap *mapS);
void DebugPrintSG(ShopGroupsMap *mapSG, ShopsMap *mapS);
void Fill_FromShopAll(ShopGroupMovesMap *mapSGM, ShopGroupsMap *mapSG, ShopsMap *mapS);
void DebugPrintS(ShopsMap *mapS);
void ProcessShopStocks(ShopStocksMap *mapSS, ShopsMap *mapS, FILE *out);
void PostProcessItem(ShopStockStruct *pstSS, ShopsMap *mapS, FILE *out);
void DebugPrintSS(ShopStockStruct *pstSS);
void WriteResult(ShopStockStruct *pstSS, FILE *out);
// }}}
// {{{ generic
std::vector<std::string> &split(const std::string &s, char delim, std::vector<std::string> &elems);
std::vector<std::string> split(const std::string &s, char delim);
// }}}
// }}}
shop_stock_total.cpp
// {{{ standard and other includes
#include <stdio.h>
#include <stdlib.h>
#include <iostream>
#include <locale>
#include <fstream>
#include <string>
#include <string.h>
#include <sstream>
#include <assert.h>
#include <ctype.h>
#include <algorithm>
#include <unordered_map>
#include <vector>
#include <list>
// }}}
// {{{ filenames
const char *SHOP_GROUP_MOVES_FILE = "sgm.csv";
const char *SHOP_GROUPS_FILE = "shopgroups.csv";
const char *SHOP_STOCKS_FILE = "shopstock_all.csv";
const char *SHOP_STOCK_TOTALS_FILE = "shop_stock_total.csv";
// }}}
using namespace std;
#include "shop-stock-total.h"
// {{{ main
int main() {
// ShopGroupMoves
ShopGroupMovesMap *mapSGM;
mapSGM = new ShopGroupMovesMap();
ProcessShopGroupMoves(mapSGM);
DebugPrintSGM(mapSGM);
// ShopGroups and Shops
ShopGroupsMap *mapSG;
mapSG = new ShopGroupsMap();
ShopsMap *mapS;
mapS = new ShopsMap();
ProcessShopGroups(mapSG, mapS);
DebugPrintSG(mapSG, mapS);
Fill_FromShopAll(mapSGM, mapSG, mapS);
DebugPrintS(mapS);
// ShopStock
ShopStocksMap *mapSS;
mapSS = new ShopStocksMap();
FILE* fileSST = fopen (SHOP_STOCK_TOTALS_FILE, "w+");
ProcessShopStocks(mapSS, mapS, fileSST);
fclose(fileSST);
return 0;
}
// }}}
// {{{ ProcessShopGroupMoves
void ProcessShopGroupMoves(ShopGroupMovesMap *mapSGM) {
std::ifstream file(SHOP_GROUP_MOVES_FILE);
std::string l;
ShopGroupMoveStruct *pstSGM;
int nKey = -1;
int nSGMs = 0;
int nGroupFrom;
int nGroupTo;
std::vector<std::string> vsItems;
while (std::getline(file, l)) {
// skip first line
if (nKey < 0) {
nKey = 0;
continue;
}
vsItems = split(l, (char) '\t');
if (vsItems.size() < 2)
continue;
nGroupFrom = atoi(vsItems[0].c_str());
nGroupTo = atoi(vsItems[1].c_str());
auto sgm = mapSGM->find(nGroupTo);
if (sgm == mapSGM->end()) {
// not found, so we need to create a new one
pstSGM = new ShopGroupMoveStruct();
pstSGM->nGroupTo = nGroupTo;
pstSGM->lstGroupsFrom = new GroupsFromList();
// and store it, we'll add from value a bit later
(* mapSGM)[nGroupTo] = *pstSGM;
}
else {
// found group already, get a pointer to it for adding nGroupFrom
// value later
pstSGM = &(sgm->second);
}
// in any case (found or not), we've to add GroupFrom value
pstSGM->lstGroupsFrom->push_back(nGroupFrom);
nSGMs++;
}
printf("Total %d ShopGroupMoves loaded.\n", nSGMs);
}
// }}}
// {{{ DebugPrintSGM
void DebugPrintSGM(ShopGroupMovesMap *mapSGM) {
for (auto sgm=mapSGM->begin(); sgm != mapSGM->end(); sgm++) {
printf("%d:\t", sgm->second.nGroupTo);
for (auto f=sgm->second.lstGroupsFrom->begin();
f!=sgm->second.lstGroupsFrom->end();
f++) {
printf("%d ", *f);
}
printf("\n");
}
}
// }}}
// {{{ ProcessShopGroups
void ProcessShopGroups(ShopGroupsMap *mapSG, ShopsMap *mapS) {
std::ifstream file(SHOP_GROUPS_FILE);
std::string l;
int nSGs = 0;
int nKey = -1;
int nShopID;
int nGroupID;
ShopStruct *pstShop;
ShopGroupStruct *pstSG;
std::vector<std::string> vsItems;
while (std::getline(file, l)) {
// skip first line
if (nKey < 0) {
nKey = 0;
continue;
}
vsItems = split(l, (char) '\t');
if (vsItems.size() < 2)
continue;
nShopID = atoi(vsItems[0].c_str());
nGroupID = atoi(vsItems[1].c_str());
// skip zero ID shop
if (nShopID == 0)
continue;
// create new Shop and save it
pstShop = new ShopStruct();
pstShop->nShopID = nShopID;
pstShop->nGroupID = nGroupID;
pstShop->lstFromAll = new ShopsList();
pstShop->lstFrom10k = new ShopsList();
(* mapS)[nShopID] = *pstShop;
// do we need to save ShopGroup
if (nGroupID == 0)
continue;
auto sg = mapSG->find(nGroupID);
if (sg == mapSG->end()) {
// not found, so we need to create a new one
pstSG = new ShopGroupStruct();
pstSG->nGroupID = nGroupID;
pstSG->lstShops = new ShopsList();
// and store it, we'll add from value a bit later
(* mapSG)[nGroupID] = *pstSG;
}
else {
// found group already, get a pointer to it for adding shop
// value later
pstSG = &(sg->second);
}
// in any case (found or not), we've to add GroupFrom value
pstSG->lstShops->push_back(nShopID);
nSGs++;
}
printf("Total %d ShopGroups loaded.\n", nSGs);
}
// }}}
// {{{ Fill_FromShopAll
void Fill_FromShopAll(ShopGroupMovesMap *mapSGM, ShopGroupsMap *mapSG, ShopsMap *mapS) {
for (auto s = mapS->begin(); s != mapS->end(); s++) {
// fill shops from the same group
auto sg = mapSG->find(s->second.nGroupID);
if (sg != mapSG->end()) {
for (auto s_ = sg->second.lstShops->begin();
s_ != sg->second.lstShops->end();
s_++) {
s->second.lstFromAll->push_back(*s_);
}
}
// fill shops from movement groups
auto sgm = mapSGM->find(s->second.nGroupID);
if (sgm != mapSGM->end()) {
for (auto sg = sgm->second.lstGroupsFrom->begin();
sg != sgm->second.lstGroupsFrom->end();
sg++) {
auto sg_ = mapSG->find(*sg);
if (sg_ != mapSG->end()) {
for (auto s_ = sg_->second.lstShops->begin();
s_ != sg_->second.lstShops->end();
s_++) {
s->second.lstFromAll->push_back(*s_);
}
}
}
}
// de duping
s->second.lstFromAll->sort();
s->second.lstFromAll->unique();
// remove shop itself
s->second.lstFromAll->remove(s->second.nShopID);
// copy to 10k
/* auto n = std::copy_if( */
/* s->second.lstFromAll->begin(), */
/* s->second.lstFromAll->end(), */
/* s->second.lstFrom10k->begin(), */
/* [](int i){return (i < 10000);}); */
/* s->second.lstFrom10k->size( */
/* auto n = std::copy( */
/* s->second.lstFromAll->begin(), */
/* s->second.lstFromAll->end(), */
/* s->second.lstFrom10k->begin() */
/* ); */
/* s->second.lstFrom10k->resize( */
/* std::distance(s->second.lstFrom10k->begin(), n)); */
for (auto s_ = s->second.lstFromAll->begin();
s_ != s->second.lstFromAll->end();
s_++) {
if (*s_ < 10000) {
s->second.lstFrom10k->push_back(*s_);
}
}
}
}
// }}}
// {{{ DebugPrintS
void DebugPrintS(ShopsMap *mapS) {
for (auto s = mapS->begin(); s != mapS->end(); s++) {
printf("%d:\t", s->second.nShopID);
for (auto s_ = s->second.lstFrom10k->begin();
s_ != s->second.lstFrom10k->end();
s_++) {
printf("%d, ", *s_);
}
printf("\n");
}
}
// }}}
// {{{ DebugPrintSG
void DebugPrintSG(ShopGroupsMap *mapSG, ShopsMap *mapS) {
// print shop groups
for (auto sg=mapSG->begin(); sg != mapSG->end(); sg++) {
printf("%d:\t", sg->second.nGroupID);
for (auto s = sg->second.lstShops->begin();
s != sg->second.lstShops->end();
s++) {
printf("%d ", *s);
}
printf("\n");
}
// print shops
for ( auto s = mapS->begin(); s != mapS->end(); s++) {
printf("%d : %d\n", s->second.nShopID, s->second.nGroupID);
}
}
// }}}
// {{{ PostProcessItem
void PostProcessItem(ShopStockStruct *pstSS, ShopsMap *mapS, FILE *out) {
AvailabilityStruct *pstAV;
for (auto s = mapS->begin();
s != mapS->end();
s++) {
// if shop is not present in current item Availability
// add it with 0 amounts,
// coz we could still move goods there from others shops
auto s_ = pstSS->mapAv->find(s->second.nShopID);
if (s_ == pstSS->mapAv->end()) {
pstAV = new AvailabilityStruct();
pstAV->nShopID = s->second.nShopID;
pstAV->nAmount = 0;
pstAV->nTotalAmount = 0;
pstAV->nTotalAmount10k = 0;
(*(pstSS->mapAv))[s->second.nShopID] = *pstAV;
}
}
// summing up total availability
for (auto av = pstSS->mapAv->begin();
av != pstSS->mapAv->end();
av++) {
// add amounts from shop itself
av->second.nTotalAmount = av->second.nAmount;
av->second.nTotalAmount10k = av->second.nAmount;
// add amount from other shops which could move goods into this one
auto s_ = mapS->find(av->second.nShopID);
if (s_ == mapS->end()) continue;
for (auto s__ = s_->second.lstFromAll->begin();
s__ != s_->second.lstFromAll->end();
s__++) {
auto av_ = pstSS->mapAv->find(*s__);
if (av_ == pstSS->mapAv->end()) continue;
av->second.nTotalAmount += av_->second.nAmount;
if (*s__ < 10000) {
av->second.nTotalAmount10k += av_->second.nAmount;
}
}
}
//DebugPrintSS(pstSS);
WriteResult(pstSS, out);
}
// }}}
// {{{ DebugPrintSS
void DebugPrintSS(ShopStockStruct *pstSS) {
printf("%s:\t", pstSS->sItemID);
for (auto av = pstSS->mapAv->begin();
av != pstSS->mapAv->end();
av++) {
printf("%d - %d - %d;", av->second.nShopID, av->second.nTotalAmount,
av->second.nTotalAmount10k);
}
printf("\n");
}
// }}}
// {{{ WriteResult
void WriteResult(ShopStockStruct *pstSS, FILE *out) {
for (auto av = pstSS->mapAv->begin();
av != pstSS->mapAv->end();
av++) {
if (av->second.nTotalAmount + av->second.nTotalAmount10k == 0) {
continue;
}
fprintf(out, "%s\t%d\t%d\t%d\n",
pstSS->sItemID,
av->second.nShopID,
av->second.nTotalAmount,
av->second.nTotalAmount10k
);
}
}
// }}}
// {{{ ProcessShopStocks
void ProcessShopStocks(ShopStocksMap *mapSS, ShopsMap *mapS, FILE *out) {
std::ifstream file(SHOP_STOCKS_FILE);
std::string l;
ShopStockStruct *pstSS = NULL;
AvailabilityStruct *pstAV;
int nKey = -1;
int nSSs = 0;
char *sItemID;
char *sItemIDprev = (char *)"";
int nShopID;
int nAmount;
std::vector<std::string> vsItems;
while (std::getline(file, l)) {
// skip first line
if (nKey < 0) {
nKey = 0;
continue;
}
vsItems = split(l, (char) '\t');
if (vsItems.size() < 3)
continue;
sItemID = strdup(vsItems[0].c_str());
if (strcmp(sItemIDprev, sItemID) != 0) {
// item changed - post process previous item
if (pstSS != NULL)
PostProcessItem(pstSS, mapS, out);
}
nShopID = atoi(vsItems[1].c_str());
nAmount = atoi(vsItems[2].c_str());
auto ss = mapSS->find(sItemID);
if (ss == mapSS->end()) {
// not found, so we need to create a new one
pstSS = new ShopStockStruct();
pstSS->sItemID = sItemID;
pstSS->mapAv = new AvailabilitiesMap();
// and store it, we'll add from value a bit later
(* mapSS)[sItemID] = *pstSS;
}
else {
// found group already, get a pointer to it for adding nGroupFrom
// value later
pstSS = &(ss->second);
}
// in any case (found or not), we've to add GroupFrom value
// to do: create struct, and store it in map
pstAV = new AvailabilityStruct();
pstAV->nShopID = nShopID;
pstAV->nAmount = nAmount;
pstAV->nTotalAmount = 0;
pstAV->nTotalAmount10k = 0;
(*(pstSS->mapAv))[nShopID] = *pstAV;
nSSs++;
}
// process the last item if exists
if (pstSS != NULL)
PostProcessItem(pstSS, mapS, out);
printf("Total %d ShopStocks loaded.\n", nSSs);
}
// }}}
// {{{ Generic split to vector function
std::vector<std::string> &split(const std::string &s, char delim, std::vector<std::string> &elems) {
std::stringstream ss(s);
std::string item;
while (std::getline(ss, item, delim)) {
elems.push_back(item);
// printf("\t%s\n", item.c_str());
}
return elems;
}
std::vector<std::string> split(const std::string &s, char delim) {
std::vector<std::string> elems;
split(s, delim, elems);
return elems;
}
// }}}
Makefile:
PROGRAM = shop-stock-total
CXX=g++
CXXFLAGS := -g -std=c++11 -Wall -O3 -march=native -mtune=native
LDFLAGS := -g -L /usr/lib:/usr/lib/x86_64-linux-gnu
all: $(PROGRAM)
$(PROGRAM): $(PROGRAM).gcda $(PROGRAM).o
$(CXX) $(CXXFLAGS) -fprofile-use -o $@ $(PROGRAM).o $(LDFLAGS)
$(PROGRAM).gcda: profile/$(PROGRAM)
./profile/$(PROGRAM) >/dev/null
cp profile/$(PROGRAM).gcda ./
profile/$(PROGRAM): $(PROGRAM).cpp $(PROGRAM).h
[[ -d profile ]] || mkdir profile
cp $(PROGRAM).cpp profile/
cp $(PROGRAM).h profile/
cd profile && $(CXX) -c $(CXXFLAGS) -fprofile-generate $(PROGRAM).cpp
cd profile && $(CXX) $(CXXFLAFS) -fprofile-generate -o $(PROGRAM) $(PROGRAM).o $(LDFLAGS)
shop-stock-total.o: $(PROGRAM).cpp $(PROGRAM).h
$(CXX) -c $(CXXFLAGS) -fprofile-use $(PROGRAM).cpp
clean:
@rm -rf *.o $(PROGRAM) ./profile *.gcda
3 Answers 3
I see a number of things that may help you improve your program.
Don't abuse using namespace std
Putting using namespace std
at the top of every program is a bad habit that you'd do well to avoid.
Use consistent formatting
The code as posted has inconsistent indentation which makes it hard to read and understand. Pick a style and apply it consistently.
Don't use Hungarian notation
Prefixing every variable with an abbreviation of its type is usually called "Hungarian notation" and it was once popular. Even then it was a bad idea. Don't clutter up your source code with that; instead, concentrate on defining meaningful names for each variable and choose types appropriately.
Use better naming
The name mapSGM
is not terrible, because once we figure out that SGM
stands for "Shop Group Moves" we have, at least, some idea as to what it means. However many of the names are very poor. For instance within PostProcessItem()
, we have s
, s_
, s__
, av
and av_
. In addition to those being very short cryptic names, having deceptively similar names all in the same function is diabolically confusing.
Understand new
and delete
When you declare a variable such as std::string item;
it is called an automatic
variable and needs neither new
nor delete
. It comes into existence and is destroyed as needed and is used rationally within split()
. However, many places the code looks more like this:
ShopGroupMovesMap *mapSGM;
mapSGM = new ShopGroupMovesMap();
ProcessShopGroupMoves(mapSGM);
This is really not a good way to go about things because what ProcessShopGroupMoves
actually does is to populate an empty container that it's handed. Better would be to have it return such a structure instead as:
auto mapSGM{ProcessShopGroupMoves()};
Note that I am using auto
here and the uniform initialization syntax of C++11.
Don't leak memory
The memory that's allocated via new
should be freed via delete
but no such deletions occur within this program. That's the recipe for a memory leak and really should be addressed. The best way with modern C++ is to simply avoid explicitly calling new
, and I'll demonstrate below.
Consider using range-for syntax
The code for DebugPrintSGM
currently contains this code in the inner loop:
for (auto f=sgm->second.lstGroupsFrom->begin();
f!=sgm->second.lstGroupsFrom->end();
f++) {
printf("%d ", *f);
}
In addition to being really hard to read and using the antique printf
(more on that in a moment), it uses iterators when it could be greatly simplified using range-for
syntax:
for (const auto &fromStore : sgm->second.lstGroupsFrom) {
std::cout << fromStore << ' ';
}
Much neater, cleaner and easier to read, understand and maintain. Using similar syntax for the outer loop will also clean up the syntax for this loop.
Allow the user to specify input and output files
The file names are currently hardcoded which certainly greatly restricts the usefulness of the program. Consider using argc
and argv
to allow the user to specify file names on the command line.
Use objects better
The code uses unaugmented data structures such as std::list
, but uses typedef
and forgoes adding application-specific functions. This makes the code much, much more complex than it needs to be. For example, in a rewrite I did, here is what the Shops
class looks like:
class Shops {
public:
Shops(const std::string &groupfilename, const std::string &groupmovesfilename);
Shops(std::istream &groupsfile, std::istream &movesfile);
const std::vector<int> &getDestinationShops(int shop) const;
std::ostream &show(int shopID, std::ostream &out = std::cout ) const;
std::ostream &showGroups(std::ostream &out = std::cout ) const;
const std::size_t shopCount() const { return shopgroup.size(); }
const std::size_t groupCount() const { return groupshops.size(); }
private:
std::unordered_map<int, int> shopgroup;
std::unordered_map<int, std::vector<int>> groupshops;
};
The constructor receives either the file names or the two std::istream
corresponding to two of the original files. show
is a convenience function similar to your DebugPrintxx
functions, except that the knowledge of how to print out the information is captured within the object, so that it may be easily re-used if we have collections of such objects.
Use a better algorithm
The current algorithm is very complex and difficult to follow. It also produces incorrect results, such as listing different inventories for the same store. Better would be to think carefully about both inputs and desired outputs and map out an algorithm for getting from one to the other. As inspiration, here's what my rewritten main
looks like:
int main(int argc, char *argv[]) {
if (argc != 4) {
std::cout << "Usage: shops groupsfile movesfile stockfile\n";
return 1;
}
Shops shops{argv[1], argv[2]};
Inventory inv{argv[3]};
inv.printAll(shops);
}
Much of the algorithm has now been distilled into printAll
at the end, which may be redirected from std::cout
to a file if desired. Here's the algorithm I employed in code form. First the two different style of constructor:
Shops::Shops(const std::string &groupfilename, const std::string &groupmovesfilename) {
std::ifstream groupsfile{groupfilename};
std::ifstream movesfile{groupmovesfilename};
Shops temp{groupsfile, movesfile};
std::swap(*this, temp);
}
Shops::Shops(std::istream &groupsfile, std::istream &movesfile) {
skipline(groupsfile);
int shop;
int group;
while (groupsfile >> shop >> group) {
// skip zero ID shop
if (shop != 0) {
shopgroup[shop] = group;
if (group != 0) {
groupshops[group].push_back(shop);
}
}
}
// temporarily store the unaugmented groups
auto basicgroupshops{groupshops};
// now created the final augmented groups
int srcGroup;
int dstGroup;
skipline(movesfile);
while (movesfile >> srcGroup >> dstGroup) {
const auto &basic{basicgroupshops[dstGroup]};
groupshops[srcGroup].insert(groupshops[srcGroup].end(), basic.begin(), basic.end());
}
// for each group in augmented groups,
// sort and remove duplicates
for (auto &item : groupshops)
{
std::sort(item.second.begin(), item.second.end());
item.second.erase(std::unique(item.second.begin(), item.second.end()), item.second.end());
}
}
Note that shopgroup
simply maps from a single shop number to a single shop group. The groupshops
maps in the opposite direction from a single shop group to a collection of shops in that group. We then augment that using data from the shop group moves file. That is, if we have a shop group move line that says 8 15
, we add all of the shops in group 15 to the list of destination for group 8. Eventually, for each group have an augmented list of shops, possibly including duplicates, which are then sorted and duplicates removed. Once that's complete, we no longer need the unaugmented groups, so that is a local variable that goes out of scope and is destroyed at the end of the constructor.
The Inventory
class is very simple and looks like this:
class Inventory {
public:
Inventory(const std::string &filename);
std::ostream &printBasic(std::ostream &out = std::cout) const;
std::ostream &printAll(const Shops &shops, std::ostream &out = std::cout) const;
private:
struct ItemInventory {
int shop;
int qty;
};
std::unordered_map<std::string, std::vector<ItemInventory>> inv;
};
The constructor just reads in the file in the most straightforward way possible, using a skipline
convenience function:
std::istream &skipline(std::istream &in) {
return in.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
}
Inventory::Inventory(const std::string &filename) {
std::ifstream infile{filename};
skipline(infile);
std::string item;
int shop;
int qty;
while (infile >> item >> shop >> qty) {
inv[item].push_back({shop, qty});
}
};
The printAll
function is then relatively simple:
std::ostream &Inventory::printAll(const Shops &shops, std::ostream &out) const {
for (const auto &item: inv) {
struct Totals {
int total;
int total10k;
};
std::unordered_map<int, Totals> itemTotals{};
// create a map for totals for all recipient shops
for (const auto &oneShop : item.second) {
// count the stock from this shop
itemTotals[oneShop.shop].total += oneShop.qty;
itemTotals[oneShop.shop].total10k += oneShop.qty;
// add stock to each other destination shop
for (const auto &destShop : shops.getDestinationShops(oneShop.shop)) {
if (oneShop.shop != destShop) {
itemTotals[destShop].total += oneShop.qty;
if (oneShop.shop < 10000) {
itemTotals[destShop].total10k += oneShop.qty;
}
}
}
}
for (const auto &total : itemTotals) {
out << item.first << '\t' << total.first << '\t'
<< total.second.total << '\t' << total.second.total10k << '\n';
}
}
return out;
}
Results
The original, with a few errors fixed, takes around 1.915 seconds on my machine with the data files from your GitLab project. It also leaks about 80M of memory. By contrast, the rewritten version leaks no memory and runs in 0.194 seconds, or about 10 times faster. Also, because the structures it keeps in memory are simpler and smaller, it has a smaller overall footprint. Finally, it only produces a single line of output for each stockID, shopID
pair, unlike the original, which, as I've mentioned, sometimes emits multiple lines with different totals.
Alternative version
For complete clarity, here's the complete, working, rewritten version which omits some convenience functions mentioned above. It's 140 lines long.
#include <iostream>
#include <fstream>
#include <vector>
#include <unordered_map>
#include <string>
#include <iterator>
#include <algorithm>
#include <limits>
class Shops {
public:
Shops(const std::string &groupfilename, const std::string &groupmovesfilename);
Shops(std::istream &groupsfile, std::istream &movesfile);
const std::vector<int> &getDestinationShops(int shop) const;
const std::size_t shopCount() const { return shopgroup.size(); }
const std::size_t groupCount() const { return groupshops.size(); }
private:
std::unordered_map<int, int> shopgroup;
std::unordered_map<int, std::vector<int>> groupshops;
};
class Inventory {
public:
Inventory(const std::string &filename);
std::ostream &printAll(const Shops &shops, std::ostream &out = std::cout) const;
private:
struct ItemInventory {
int shop;
int qty;
};
std::unordered_map<std::string, std::vector<ItemInventory>> inv;
};
std::istream &skipline(std::istream &in) {
return in.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
}
Inventory::Inventory(const std::string &filename) {
std::ifstream infile{filename};
skipline(infile);
std::string item;
int shop;
int qty;
while (infile >> item >> shop >> qty) {
inv[item].push_back({shop, qty});
}
};
std::ostream &Inventory::printAll(const Shops &shops, std::ostream &out) const {
for (const auto &item: inv) {
struct Totals {
int total;
int total10k;
};
std::unordered_map<int, Totals> itemTotals{};
// create a map for totals for all recipient shops
for (const auto &oneShop : item.second) {
// count the stock from this shop
itemTotals[oneShop.shop].total += oneShop.qty;
itemTotals[oneShop.shop].total10k += oneShop.qty;
// add stock to each destination shop
for (const auto &destShop : shops.getDestinationShops(oneShop.shop)) {
if (oneShop.shop != destShop) {
itemTotals[destShop].total += oneShop.qty;
if (oneShop.shop < 10000) {
itemTotals[destShop].total10k += oneShop.qty;
}
}
}
}
for (const auto &total : itemTotals) {
out << item.first << '\t' << total.first << '\t'
<< total.second.total << '\t' << total.second.total10k << '\n';
}
}
return out;
}
Shops::Shops(const std::string &groupfilename, const std::string &groupmovesfilename) {
std::ifstream groupsfile{groupfilename};
std::ifstream movesfile{groupmovesfilename};
Shops temp{groupsfile, movesfile};
std::swap(*this, temp);
}
Shops::Shops(std::istream &groupsfile, std::istream &movesfile) {
skipline(groupsfile);
int shop;
int group;
while (groupsfile >> shop >> group) {
// skip zero ID shop
if (shop != 0) {
shopgroup[shop] = group;
if (group != 0) {
groupshops[group].push_back(shop);
}
}
}
// temporarily store the unaugmented groups
auto basicgroupshops{groupshops};
// now created the final augmented groups
int srcGroup;
int dstGroup;
skipline(movesfile);
while (movesfile >> srcGroup >> dstGroup) {
const auto &basic{basicgroupshops[dstGroup]};
groupshops[srcGroup].insert(groupshops[srcGroup].end(), basic.begin(), basic.end());
}
// for each group in augmented groups,
// sort and remove duplicates
for (auto &item : groupshops)
{
std::sort(item.second.begin(), item.second.end());
item.second.erase(std::unique(item.second.begin(), item.second.end()), item.second.end());
}
}
/*
* return a vector of destination shops for this shop
* or throw an out_of_range error
*/
const std::vector<int> &Shops::getDestinationShops(int shop) const {
static const std::vector<int> mt{};
auto srcGroup{shopgroup.at(shop)};
auto grp{groupshops.find(srcGroup)};
if (grp == groupshops.end()) {
return mt;
}
return grp->second;
}
int main(int argc, char *argv[]) {
if (argc != 4) {
std::cout << "Usage: shops groupsfile movesfile stockfile\n";
return 1;
}
Shops shops{argv[1], argv[2]};
Inventory inv{argv[3]};
inv.printAll(shops);
}
-
\$\begingroup\$ Edwards, thanks for a detailed answer. I'm still doing corrections to my code based on your feedback. However, I've some issues still. 1) I still have to use pointer when using unordered_map.find() method. And it's not clear how I could get rid of it. 2) Converting from WriteResult function from fprintf to << style increased total time from 1.0 seconds to 1.3 seconds. Might be, I did something wrong. 3) In ProcessShopStocks item, I'm creating AvailabilityStruct using new because of I'm adding it to unordered_map. Without using new I suspect it will be freed when out of scope. Can I avoid it? \$\endgroup\$pk72– pk722018年07月09日 22:19:39 +00:00Commented Jul 9, 2018 at 22:19
-
\$\begingroup\$ btw, how did you measure my memory leak? Isn't memory reclaimed by OS after process quit? \$\endgroup\$pk72– pk722018年07月09日 22:34:48 +00:00Commented Jul 9, 2018 at 22:34
-
\$\begingroup\$ My rewrite uses no pointers, so I'm quite sure it's possible. Also, it's unlikely that
operator<<
takes more time thanfprintf
. The cause is likely something else. For adding things to anunordered_map
, I usedoperator[]
in lines like this:groupshops[group].push_back(shop);
No pointers ornew
are required. Note thatstd::vector::push_back
creates a copy of the pushed element unless it can be moved as withstd::move
. \$\endgroup\$Edward– Edward2018年07月09日 23:34:28 +00:00Commented Jul 9, 2018 at 23:34 -
\$\begingroup\$ The tool I used to measure memory usage and leaks is
valgrind
which I find to be an extremely valuable set of tools. Although the OS will typically reclaim memory after the program exits, the problem is that by not freeing memory as you go, the program's memory usage will grow and grow and get slower and slower until memory is exhausted. And then you'll be left trying to figure out how to debug a program that only crashes with certain very long input and exactly when depends on what other software is concurrently running. Leaky software is unreliable! \$\endgroup\$Edward– Edward2018年07月09日 23:40:27 +00:00Commented Jul 9, 2018 at 23:40 -
\$\begingroup\$ I've added my complete alternative code to the answer to make it simpler to see what's been done and how. Hope that helps! \$\endgroup\$Edward– Edward2018年07月09日 23:54:12 +00:00Commented Jul 9, 2018 at 23:54
Bug
The output, shop_stock_total.csv
, contains the following lines, each of which is a tuple of (item, shop, quantity available, quantity available excluding transfers from warehouse stores):
100971 1 2 2
100971 1 4 4
100971 1 6 6
Obviously, there should be at most one row representing item 100971 at store 1. These results were not aggregated correctly. There are many more occurrences of this bug.
Also note that one of your input files, sgm.csv
, contains some duplicate entries, such as these:
shop_group_from shop_group_to
38 39
38 39
Assuming that this doesn't mean that goods magically double in quantity when moved from group 38 to 39, you'll have to be extra careful in the calculations. Perhaps you have another bug in the part of your system that generates sgm.csv
.
Insanity
In my opinion, trying to accomplish this query in C++ is crazy. Your C++ program consists of a huge amount of custom code, which is hard to understand (as evidenced by the bug above), and will likely be unmaintainable by your successor if you ever move on to another job. Furthermore, the program does one task only, and would be a lot of work to modify it to perform any other queries.
I think that you are not using the right tool for the job. Running queries on relational data is traditionally done using SQL, and that's what you should use.
Suggested solution
Once you switch to an SQL-based solution, the choice of programming language becomes much less of an issue. You could even switch to Python, like the other implementation you have in your GitLab repository. Python even has a built-in SQLite3 engine and library.
This solution is quite simple, readable, and maintainable. It consists of:
- a function to import a TSV file into a database table
- a function to write the result of a SELECT query in tab-separated format
- a main function that contains the SQL query
When all of the joining and aggregation logic is expressed in one SQL query rather than spread out over several C++ functions, it's a lot easier to verify its correctness!
import csv
import sqlite3
def import_table(db_conn, name, *indexes, **types):
"""
Create a table named name based on the data in name.csv. Columns are
assumed to be INTEGER unless specified in types.
"""
cursor = db_conn.cursor()
with open(name + '.csv') as f:
reader = csv.DictReader(f, delimiter='\t')
create_stmt = 'CREATE TABLE {name} ( {col_defs} );'.format(
name=name,
col_defs=', '.join(
'{f} {type}'.format(f=f, type=types.get(f, 'INTEGER'))
for f in reader.fieldnames
)
)
cursor.execute(create_stmt)
for i, col_group in enumerate(indexes):
idx_stmt = 'CREATE INDEX {name}{i} ON {name} ({col_list});'.format(
name=name,
i=i,
col_list=', '.join(col_group)
)
cursor.execute(idx_stmt)
insert_stmt = 'INSERT INTO {name} VALUES ({placeholders});'.format(
name=name,
placeholders=', '.join('?' for _ in reader.fieldnames)
)
cursor.executemany(insert_stmt, (list(row.values()) for row in reader))
def query(out, db_conn, select_stmt, *params):
"""
Execute a SELECT statement and write the results in tab-separated format
to a file-like object.
"""
cursor = db_conn.cursor()
cursor.execute(select_stmt, *params)
fieldnames = [d[0] for d in cursor.description]
writer = csv.DictWriter(out, fieldnames, delimiter='\t')
writer.writeheader()
writer.writerows(dict(zip(fieldnames, row)) for row in cursor.fetchall())
def main():
db_conn = sqlite3.connect(':memory:')
import_table(db_conn, 'sgm', ['shop_group_to'])
import_table(db_conn, 'shopgroups', ['shopgroup_id'])
import_table(db_conn, 'shopstock_all', ['shop_id', 'item_idx'],
item_idx='TEXT'
)
db_conn.commit()
with open('shop_stock_total.csv', 'w') as out:
query(out, db_conn, '''
WITH shop_sources AS (
SELECT shop_id -- Each shop has access to
, shop_id AS source_shop_id -- its own stuff,
, 1 AS not_from_warehouse -- even if it's a warehouse
FROM shopgroups
UNION
SELECT shop.shop_id -- A shop may also access
, source.shop_id -- stuff from its group,
, source.shop_id < 10000 -- noting if from warehouse
FROM shopgroups AS shop
INNER JOIN sgm
ON shop.shopgroup_id = sgm.shop_group_to
INNER JOIN shopgroups AS source
ON sgm.shop_group_from = source.shopgroup_id
WHERE shop.shopgroup_id <> 0
AND shop.shop_id <> source.shop_id
)
SELECT inv.item_idx AS item_idx
, ss.shop_id AS shop_id
, SUM(inv.amount) AS amount
, SUM(inv.amount * not_from_warehouse) AS amount_excl_warehouses
FROM shop_sources AS ss
INNER JOIN shopstock_all AS inv
ON ss.source_shop_id = inv.shop_id
GROUP BY inv.item_idx, ss.shop_id
ORDER BY inv.item_idx, ss.shop_id;
''')
if __name__ == '__main__':
main()
What about performance?
Is the solution above faster than your C++ program? Admittedly, no: it takes about twice as long. However, consider what steps you can easily take to improve its performance:
- Switch to a more capable database system than SQLite (such as PostgreSQL)
- Keep the inventory data in the SQL database, already in memory, so that you don't have to spend time loading CSV files
- Use materialized views and triggers to automatically regenerate the query results whenever inventory levels change
- Take the money that you saved by not spending so much time on maintaining the code, and use it to buy faster hardware
-
\$\begingroup\$ 200_success, thanks for pointing to the bug. I'll need to verify it. The .csv files are retrieved from database (sgm is filled manually to the database, that's why it could contains duplicates.) Of course, items are not moved twice from shop to shop in that case. We had initial implementation using SQL query and it was very slow (minutes to run) on our mysql server. The server itself is powerful enough. That's why we've rewritten the solution using csv offload, python processing and than .csv loading. This way it's much faster than doing this in SQL. I could post SQL and tables, so you'll see. \$\endgroup\$pk72– pk722018年07月09日 22:25:01 +00:00Commented Jul 9, 2018 at 22:25
-
\$\begingroup\$ Minutes? This SQLite query ran in a few seconds on an old laptop. Perhaps you should have your SQL schema and query reviewed instead of this gigantic workaround. \$\endgroup\$200_success– 200_success2018年07月09日 23:03:06 +00:00Commented Jul 9, 2018 at 23:03
-
1\$\begingroup\$ In any case, I agree that the SQL-based version would be an excellent candidate for another question here. \$\endgroup\$Edward– Edward2018年07月09日 23:23:15 +00:00Commented Jul 9, 2018 at 23:23
-
\$\begingroup\$ 200_success, yep, you was right. There are bugs. I did 'cat shop_stock_total.csv | sort |less' and there are lots of duplicates. All with the same values as I see but this is wrong any way. I'll need to find out how it's happening. \$\endgroup\$pk72– pk722018年07月10日 20:45:06 +00:00Commented Jul 10, 2018 at 20:45
-
\$\begingroup\$ posted an sql version \$\endgroup\$pk72– pk722018年07月12日 13:29:44 +00:00Commented Jul 12, 2018 at 13:29
Most of the issues are answered by Edward in the first answer. However, the suggested by Edward solution is slower than my current. So I've decided to post mine.
Rationale for changes
- There're no
ShopGroup
structures anymore. Sincegroup
is not used any further except insideshops
. I've decided to usegroup
only internally insideShops
class. - During processing of shops. The shop item itself is included as target shop for itself. This helps to avoid processing shop for itself separately. Instead, it happens during for-cycle like all others destinations.
- Using
at()
method ofunordered_map
helps to pass argument withconst
declaration and it does not require to create static empty vector. Shops
withgroup_id
equal to0
have special meaning - "no group specified". Thus, in spite of all suchshops
have the samegroup_id
- there are no movements between them. I had to add a special magic_number there to avoid moving between those shops. This is bad - but I don't see how to fix this without lots of complications.Both classes could be static. Since there's only single object exists during lifetime of application. I've decided not to use static. Guess, it could be better in case of reusing those objects later. And due to this, I still have a separate header file.
P.S. I've decided not to use parameters for filenames since this is not required P.P.S when line is wrapped - my editor add 2 tabs instead of 1 (guess, this is easier to read. I didn't find any issues with the formatting of original code). And yes, manual code folding
// {{{ comment
// code goes here
// }}}
are looking nice. And it allows better grouping of code blocks than all IDEs code folding I've seen, IMHO.
shop-stock-total.h
// {{{ Structures, typedefs and classes
// {{{ Shops
class Shops {
public:
Shops(const std::string& shop_groups_filename,
const std::string& groups_moves_filename);
const std::vector<int>& getToShops(int shop_id) const;
private:
std::unordered_map<int, std::vector<int>> shops_to_shops;
};
// }}}
// {{{
struct ItemStockStruct {
int totalAmount;
int totalAmount10k;
};
// }}}
// {{{ ShopsStocksProcessor
class ShopsStocksProcessor {
public:
ShopsStocksProcessor(const Shops& shops, const std::string& stock_filename,
const std::string& out_filename);
private:
void save_item_stock(std::ofstream& out, const std::string& item_id,
const std::unordered_map<int, ItemStockStruct>& item_stock);
};
// }}}
// }}}
// {{{ Functions
// {{{ generic
// {{{ skipline
std::istream &skipline(std::istream &in);
// }}}
// }}}
// }}}
shop-stock-total.cpp
// {{{ standard and other includes
#include <stdio.h>
#include <stdlib.h>
#include <iostream>
#include <locale>
#include <fstream>
#include <string>
#include <string.h>
#include <sstream>
#include <assert.h>
#include <ctype.h>
#include <algorithm>
#include <unordered_map>
#include <vector>
#include <list>
#include <set>
// }}}
// {{{ filenames
const char *SHOP_GROUP_MOVES_FILE = "sgm.csv";
const char *SHOP_GROUPS_FILE = "shopgroups.csv";
const char *SHOP_STOCKS_FILE = "shopstock_all.csv";
const char *SHOP_STOCK_TOTALS_FILE = "shop_stock_total.csv";
// }}}
#include "shop-stock-total.h"
// {{{ Shops
// {{{ Shops::Shops
Shops::Shops(const std::string& shop_groups_filename,
const std::string& groups_moves_filename) {
int shop_id, group_id;
// temporary storage for shops and groups
std::unordered_map<int, int> shops;
std::unordered_map<int, std::vector<int>> groups;
// read file and store shops and groups
std::ifstream shop_groups_file{shop_groups_filename};
skipline(shop_groups_file);
int magic_group_id = 100000;
while (shop_groups_file >> shop_id >> group_id) {
if (shop_id == 0) continue;
if (group_id == 0) group_id = magic_group_id++;
shops[shop_id] = group_id;
groups[group_id].push_back(shop_id);
}
// add shops from the same group
for (const auto& shop: shops) {
shops_to_shops[shop.first].insert(shops_to_shops[shop.first].end(),
groups[shops[shop.first]].begin(),
groups[shops[shop.first]].end());
}
// read group moves file and add toShops from the toGroup into
// all corresponding fromGroup shops
int groupFrom, groupTo;
std::ifstream group_moves_file{groups_moves_filename};
skipline(group_moves_file);
while (group_moves_file >> groupFrom >> groupTo) {
for (auto shop_id : groups[groupFrom]) {
shops_to_shops[shop_id].insert(shops_to_shops[shop_id].end(),
groups[groupTo].begin(),
groups[groupTo].end());
}
}
// erase shop itself from it's own list,
// sort and remove dupes
for (auto& shop: shops_to_shops) {
// de duping via set and back to vector
std::set<int> s(
shop.second.begin(),
shop.second.end()
);
//assign back to vector
shop.second.assign(s.begin(), s.end());
}
}
// }}}
// {{{ Shops::getToShops
const std::vector<int>& Shops::getToShops(int shop_id) const {
return shops_to_shops.at(shop_id);
}
// }}}
// }}}
// {{{ ShopsStocksProcessor
// {{{ ProcessShopStocks
ShopsStocksProcessor::ShopsStocksProcessor(const Shops& shops,
const std::string& stock_filename,
const std::string& out_filename) {
std::string item_id, item_prev_id = "";
int shop_id, amount;
std::ifstream stock_file{stock_filename};
std::ofstream out{out_filename};
std::unordered_map<int, ItemStockStruct> item_stock;
// reading file
skipline(stock_file);
while (stock_file >> item_id >> shop_id >> amount) {
// if item has changed than output the previous one
if (item_id.compare(item_prev_id) != 0) {
save_item_stock(out, item_prev_id, item_stock);
item_stock.clear();
item_prev_id = item_id;
}
for (const auto s: shops.getToShops(shop_id)) {
item_stock[s].totalAmount += amount;
if (shop_id < 10000) item_stock[s].totalAmount10k += amount;
}
}
if (item_id.length() > 0) {
save_item_stock(out, item_id, item_stock);
}
}
// }}}
// {{{ save_item_stock
void ShopsStocksProcessor::save_item_stock(std::ofstream& out, const std::string& item_id,
const std::unordered_map<int, ItemStockStruct>& item_stock) {
for (const auto& item : item_stock) {
out << item_id << "\t" << item.first << "\t"
<< item.second.totalAmount << "\t"
<< item.second.totalAmount10k << "\n";
}
}
// }}}
// }}}
// {{{ skipline (thx Edward from CodeReview @ StackExchange
std::istream &skipline(std::istream &in) {
return in.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
}
// }}}
// {{{ main
int main() {
ShopsStocksProcessor(
Shops(SHOP_GROUPS_FILE, SHOP_GROUP_MOVES_FILE),
SHOP_STOCKS_FILE, SHOP_STOCK_TOTALS_FILE);
return 0;
}
// }}}
An original Makefile is used.
Performance
Well, this version is slightly (5-10%) faster than Edward's solution. And it keeps only inventory of single item in memory based on assumption that shopstock_all.csv
is sorted by item_idx
field. Code is updated on Gitlab.
std::list
withstd::vector
2) reduce pointer usage the items pointed to can often be stored by value instead. \$\endgroup\$new
, but 0 uses ofdelete
. I don't think this code is working correctly. Not only that, AFAICT all of those uses ofnew
aren't necessary if one were to make minor modifications to the datastructures used. \$\endgroup\$#include
s. Please go over your code again and make sure it actually works! \$\endgroup\$