I am looking for people's opinions on my use of a set of non standard "standard classes". Basically I am working on a 'modern' C++ Lexical Analyser and Parser Generator duo to replace flex and GNU Bison which will work as a library and tool combination.
I have a number of non standard extensions of std
library classes. The reason I am needing these nonstandard classes is to be able to express algorithms more succinctly and nearer the mathematical equivalents.
Here's a vector class that allows extension via operator[]
, which also has a generic Base
parameter, allowing the vector to be based at 1 rather than zero, for example. And I return indices as size_t
's rather than iterators as indexes are more useful in a traditional mathematical sense.
//
// vector.h - auto sizing 'vector' class
//
// Copyright (c) 2015-2016 Aaron Nathaniel Gray under the MIT License
//
#include <vector>
#include <iostream>
template<typename T, size_t Base = 0>
class vector : public std::vector<T> {
public:
vector() : std::vector<T>() {}
vector(const std::initializer_list<T> il) : std::vector<T>(il) {}
~vector() {}
iterator last() { return end() - 1; }
const_iterator clast() const { return end - 1; }
size_t size() const { return std::vector<T>::size(); }
void resize(size_t _size) { std::vector<T>::resize(_size); }
T& operator [] (size_t index) {
if (index + 1 - Base > size())
resize(index + 1 - Base);
return std::vector<T>::operator[](index - Base);
}
const T operator [] (size_t index) const {
if (index + 1 - Base <= size())
return std::vector<T>::operator[](index - Base);
else
return T();
}
signed find(const T& value) {
auto i = std::find(cbegin(), cend(), value);
if (i != cend())
return i - cbegin() + Base;
else
return -1;
}
size_t add(const T value = T()) {
push_back(value);
return size() - 1 + Base;
}
std::string toString() const;
};
template<typename T, size_t Base = 0>
std::string vector<T, Base>::toString() const {
std::stringstream ss;
for (size_t element = Base; element < size() + Base; ++element) {
ss << (*this)[element];
if (element < size() + Base)
ss << ",";
ss << std::endl;
}
return ss.str();
}
I am wondering whether to use the namespace nonstd::vector
.
I also have non-standard set, deque
, and a very extensive bitset
class.
Example:
nonstd::vector<size_t> accept;
for(auto state : states) {
.... accept = ....
this->accept[state->index] = accept;
}
Here, state->index
may not be sequential or contiguous and maximum value not known, so push_back
will not work.
And a find example:
nonstd::vector<Production*> productions;
...
size_t productionIndex = production.find(aProduction);
3 Answers 3
Standard classes and derivation
If a class does not have a virtual destructor it is not supposed to be derived from. Before we got the finally key word this was the indicator you were supposed to use before deriving from a class.
But its safe unless you start taking pointers to your class. So to make sure it is not used incorrectly you should put it in a private context (so nobody else can use it apart from your library).
Inheriting functions.
You don't need to copy functions if they have the same name (that's the point of inheriting).
size_t size() const { return std::vector<T>::size(); }
void resize(size_t _size) { std::vector<T>::resize(_size); }
Don't need these. You automatically inherit them.
Inheriting Constructors.
You can use the using clause to inherit all the constructors.
vector() : std::vector<T>() {}
vector(const std::initializer_list<T> il) : std::vector<T>(il) {}
~vector() {}
Can be replaced with:
using std::vector::vector;
Identifiers with leading underscore.
Don't use them.
void resize(size_t _size)
^^^^^
DO you know the rules about a leading underscore. I do (you got it correct). But can you assume that all the people reading your code know (you can't and most get it wrong). So prefer not to use a leading underscore on your identifiers.
Operator[] is supposed to be fast
The operator[]
is an unchecked access into the data. You use this when you know that your index is guaranteed correct. We also have a checked access at()
which is used when you can't guarantee that your index is in range.
T& operator [] (size_t index) {
if (index + 1 - Base > size())
resize(index + 1 - Base);
return std::vector<T>::operator[](index - Base);
}
const T operator [] (size_t index) const {
if (index + 1 - Base <= size())
return std::vector<T>::operator[](index - Base);
else
return T();
}
Your operator[]
is severely restricting the performance of your code for people that know that their access is good.
for(int loop=0;loop < v.size(); ++loop)
{
doStuff(v[loop]); // Why do I need to check loop is in bounds
// each time? It is guaranteed to be in bounds
// by the context.
}
When people use operator[]
they have already checked (or by context guranteed) that the index is in bounds. You should not be checking. You should provide an alternative function that checks and resizes.
find does not return an iterator
It seems strange that find does not return an iterator.
signed find(const T& value) {
auto i = std::find(cbegin(), cend(), value);
if (i != cend())
return i - cbegin() + Base;
else
return -1;
}
What's all this pointer maths.
return i - cbegin() + Base;
Use the standard functions.
return std::distance(cbegin(), i) + Base;
Constructing a string is easy
Your toString()
function is bad as all that work can be done using the std::string constructors (also the standard library uses to_string()
you should follow the same convention).
std::string to_string() const
{
return std::string(cbegin(), cend());
}
Note: this does nothing.
ss << std::endl;
You should probably never use std::endl
. It is actually very rare that you want to flush a stream (they flush themselves at the best times anyway being a human you will just make the code in-efficient by trying to overcompensate).
Namespace
You should definitely put your code into its own namespace. The name vector
is just too common.
Sure nonstd
seems like a fine name.
Personally I use an initial uppercase letter for all user defined types and namespaces. This allows me to quickly spot an object from type.
namespace ThorsAnvil
{
class BlaBla
{
public:
BlaBla(int x);
};
int bla(int y);
}
int main()
{
namespace TA = ThorsAnvil;
int data1 = 15 + TA::BlaBla(15); // I constructed an object here.
int data2 = 16 + TA::bla(15); // I called a function here.
}
So I would call your stuff.
namespace NonStd
{
template<typename T>
class Vector { /* STUFF */};
};
template<typename T>
#if SomeNormalSituation
using Vec = NonStd::Vector<T>;
#else
using Vec = std::vector<T>;
#endif;
int main()
{
Vec<char> data;
}
Idea
Overall find.
A vector class that allows extension via operator [] which also has a generic Base parameter allowing the vector to be based at 1 rather than zero for example.
Your crazy this is going to cause you so many issues. CS people are so used to their arrays being zero based that any other number is just going to cause errors.
Thoughts
Basically I am working on a 'modern' C++ Lexical Analyser and Parser Generator duo to replace flex and GNU Bison
Hmm. I love flex/bison. Use them all the time. It will be hard to replace something that is already near perfection. There is a reason why nobody has tried to improve them in years (and all attempts to replace them have failed).
-
\$\begingroup\$ you should probably remove the code about
std::string(cbegin(), cend());
because that's definitely incorrect.string
's constructors all take ranges of characters, not data values to be stringified; and even if they took the latter, they still don't join them with comma characters. Everything else here looks good at first glance. \$\endgroup\$Quuxplusone– Quuxplusone2016年07月09日 18:06:37 +00:00Commented Jul 9, 2016 at 18:06 -
\$\begingroup\$ @Quuxplusone: en.cppreference.com/w/cpp/string/basic_string/basic_string Constructor number 6 It takes two iterators (like most containers). \$\endgroup\$Loki Astari– Loki Astari2016年07月09日 21:54:33 +00:00Commented Jul 9, 2016 at 21:54
-
\$\begingroup\$ Flex and GNU Bison are far from perfect and based on 30 year old YACC and lex. Bison is under the GPL also. And neither can be used as a library tools at runtime ;| \$\endgroup\$AaronNGray– AaronNGray2016年07月10日 00:59:01 +00:00Commented Jul 10, 2016 at 0:59
-
\$\begingroup\$
And neither can be used as a library tools at runtime
When do you need to define a grammer at runtime? Are you dynamically trying to create a language? \$\endgroup\$Loki Astari– Loki Astari2016年07月10日 01:15:17 +00:00Commented Jul 10, 2016 at 1:15 -
\$\begingroup\$ Bison is GPL (like G++). But the parser code is under a special exception allowing the generated code (including the boilerplate copied verbatim from Bison). To be used anywhere without the viral copying. gnu.org/software/bison/manual/html_node/Conditions.html \$\endgroup\$Loki Astari– Loki Astari2016年07月10日 01:20:46 +00:00Commented Jul 10, 2016 at 1:20
Your code doesn't compile at all, for many different reasons. For example, you try to use functions cbegin
and cend
that don't exist in the current context (maybe you meant this->cbegin()
and this->cend()
?); you refer to both end() - 1
and end - 1
; you try to use std::initializer_list
without including the appropriate header; and so on and so forth.
Next time, you should fix all compilation errors before posting to CodeReview; you'll get better (well, more targeted) answers that way. And certainly the answerers will enjoy themselves more.
In general, don't inherit from std::vector<T>
or any other container class. If you want to add functionality to a standard container, add it via standalone algorithms, such as those found in the standard <algorithm>
header. The problem with inheritance is that either you have to abandon value semantics and start allocating everything on the heap and taking references to them (as Java and C# do), or else you have to abandon most of the advantages of inheritance. For example:
std::vector<int> transform_and_return(std::vector<int> v)
{
std::sort(v.begin(), v.end()); // or whatever
return std::move(v);
}
This function works with a normal vector, but with your nonstd::vector
it's going to silently do the wrong thing, because of slicing.
nonstd::vector<int> x {1,2,3};
auto y = transform_and_return(x);
static_assert(std::is_same_v<std::vector<int>, decltype(y)>);
And as Loki said above: don't index vectors from 1. That's incorrect, and will bite you many times if you try to maintain your code for a year or more. (For throwaway code, you could probably get away with it, because there's no maintainer to confuse.)
-
\$\begingroup\$
cbegin()
andcend()
are part of std::vector. The class here inherits fromstd::vector<>
so they are available in this context. The problem is the templitiztion. Usingthis->cbegin()
is a bad idea and generally recommended against becausethis->X
will cause problems of accidental shadowing of X. Alternatives are:std::vector<T>::cbeign()
to make it exploit. Or bringing it into local scope by doingusing std::vector<T>::cbegin;
inside the class declaration. \$\endgroup\$Loki Astari– Loki Astari2016年07月09日 22:08:05 +00:00Commented Jul 9, 2016 at 22:08 -
\$\begingroup\$ @LokiAstari: All correct, but my point was simply that the code as written doesn't compile (because
cbegin
andcend
don't exist in the OP's context). Sure it's possible to change the code so that the changed code will compile; but that's cheating. Code posted to CR ought to be code that already compiles and works. \$\endgroup\$Quuxplusone– Quuxplusone2016年07月10日 05:29:54 +00:00Commented Jul 10, 2016 at 5:29
#include <cstdlib>
#include <vector>
#include <string>
#include <algorithm>
#include <initializer_list>
namespace nonstd {
template<typename T, size_t BASE = 0>
class vector : public std::vector<T> {
public:
using std::vector<T>::vector;
using typename std::vector<T>::iterator;
using typename std::vector<T>::const_iterator;
using std::vector<T>::begin;
using std::vector<T>::end;
using std::vector<T>::cbegin;
using std::vector<T>::cend;
iterator last() { return end() - 1; }
const_iterator clast() const { return end - 1; }
using std::vector<T>::size;
using std::vector<T>::capacity;
#ifndef SHOW_REALLACATION
using std::vector<T>::resize;
#else
void resize(size_t _size) {
size_t old_capacity = capacity();
std::vector<T>::resize(_size);
if (old_capacity != capacity())
std::cerr << "Growing vector from " << old_capacity << " to " << capacity() << std::endl;
}
#endif
const T operator [] (size_t index) const {
if (index + 1 - BASE <= size())
return std::vector<T>::operator[](index - BASE);
else
return T();
}
signed find(const T& value) {
auto i = std::find(cbegin(), cend(), value);
if (i != cend())
return std::distance(cbegin(), i) + BASE;
else
return -1;
}
size_t add(const T value = T()) {
push_back(value);
return size() - 1 + BASE;
}
std::string to_string() const;
};
} // end namespace nonstd
Okay I have compiling code now. The main problem was I took the code from my project which was compiled on MS Visual C++ which is far too permissive compared to Clang and GCC.
I have have modified the code to follow advice where applicable :-
- I have sorted the headers.
- extensive use of using now.
- I have put in a SHOW_REALLACATION conditional
- Used std::difference
Regarding the idiosyncrasies of my nonstd library thats why I have called it nonstd as it diverges from standard C++ and its restrictions which make some algorithms very unclear.
I do intend to follow the NL10 rule from the Core Guidelines https://isocpp.org/blog/2015/09/bjarne-stroustrup-announces-cpp-core-guidelines https://github.com/isocpp/CppCoreGuidelines/blob/master/CppCoreGuidelines.md#Rl-camel hence not using CamelCase or Capitalization.
Also heres the code in wandbox :- http://melpon.org/wandbox/permlink/xtQoUyC3cf0fiiGl
std::vector
would not work. \$\endgroup\$Vector
with the capitalV
to further differentiate fromstd::vector
. \$\endgroup\$