6

I'm evaluating my options to structure an in-memory database and I have a few ideas of how to implement it. I would like to know your opinion of what the best design choice is.

I have a column class which is parametrized to represent different column types.

template<typename T>
class Column<T> {
public:
 std::string name();
 T sum();
 T avg();
 ...
private:
 std::string name;
 std::vector<T> vec;
 ...
};

I'm not too sure what the best route to store a vector of Column with different type parameters. For example a 3-column table might have one integer column, float column and string column.

I know there is boost::variant but I'm not allowed to use boost.

I was thinking of using one of the following:

  1. Tagged Union
  2. Pure OO: Extend the column like IntColumn : Column, etc.

What are your thoughts? Got a better idea?

asked Feb 29, 2016 at 16:33
4
  • 3
    Are you sure you want to use columns as the primary data storage mechanism in your database? Tables might be defined based on columns, but nearly all data access in a database is row based. Commented Feb 29, 2016 at 16:47
  • Good question! This is a column-orientated database similar to kx's kdb database so it is intentional to not have a row-based table because it allows to aggregate whole columns faster since a column's data is in one contiguous memory block. Commented Feb 29, 2016 at 16:56
  • 4
    It seems that you have some arbitrary restrictions here (such as not using Boost). Could you detail them in the question? If this is for production, why would you re-invent the wheel at all and not use an existing in-memory database? If you want something really simple, note that SQLite can be used as a pure in-memory database when given a special file name. Commented Feb 29, 2016 at 17:28
  • I would go the OO route: Just derive your column template from the generic column class, store pointers/references to that column class, and be done with it. Commented Feb 29, 2016 at 17:40

2 Answers 2

7

Because the type of a column is a template parameter, you are modelling the column type within the C++ type system. This is good. A Column<int> and Column<std::string> are different types. If there are some properties that are common for all column types (e.g. that a column has a name), you could extract these into a base class so that these common operations can be accessed via a common type. However, no type-specific operations like get() or sum() can exist in this base, and must be part of the templated Column<T>.

If you have a table type that has columns of different types, it is clearly not sensible to force these to have the same type since you would necessarily lose access to the template parameter ("type erasure"). Instead, embrace the different types and make your Table strongly typed as well. A container like std::tuple<T...> can help here.

If you need access to the column-type independent parts, you can always get a pointer to the column that can be used as the base type.

A sketch using C++14 (C++11 would require you to implement a couple of convenience functions yourself, but has std::tuple and template parameter packs):

class ColumnBase {
 ...
public:
 std::string name() { ... }
};
template<class T>
class Column : public ColumnBase {
 std::vector<T> m_items;
 ...
};
template<class... T>
class Table {
 std::tuple<Column<T>...> m_columns;
 template<std::size_t... index>
 std::vector<ColumnBase*> columns_vec_helper(std::index_sequence<index...>) {
 return { (&std::get<index>(m_columns))... };
 }
public:
 std::vector<ColumnBase*> columns_vec() {
 return columns_vec_helper(std::make_index_sequence<sizeof...(T)>{});
 }
};

We could then print out the name of all columns:

for (const auto& colBase : table.columns_vec())
 std::cout << "column " << colBase->name() << "\n";

without having to handle each column type separately.

(runnable demo on ideone)

Only templates will give you the type safety that you get an int out of an integer column. In contrast, unions/variant types require the using code to remember all possible types (with template, the type checker enforces that we handle everything). With subtyping, we can't have column-type specific operations that share an implementation. I.e. a method int IntColumn::get(std::size_t i) and a related method const std::string& StringColumn::get(std::size_t i) might look like they have a common interface, but that would be only accidental and cannot be enforced. In particular, any combination of virtual methods and templates in C++ gets very ugly, very fast.

The disadvantage of templates is that you will be required to carefully write generic code, and will have to do template metaprogramming. When done correctly the results can have amazing usability, but the implementation would be advanced C++. If your design is intended to be maintained by less advanced programmers (which will be as baffled as I will be when I look back at this code in a couple of months), then it might be more sensible to avoid such a "clever" solution despite its benefits and use more traditional OOP patterns that give you a similar structure, but might require a couple of static_casts to work.

answered Feb 29, 2016 at 20:31
1
  • 4
    As nice as this might look from a C++ template standpoint, it looses the ability to create tables with a runtime defined sequence of columns. Ultimately, the sequence, or a couple of supported sequences are frozen into the source code, constraining runtime configuration to what has been provided explicitly by the programmer. Commented Feb 29, 2016 at 21:45
2

While I'd strongly favor the approach @amon presented, there are situations where you couldn't follow that route, for example table configurations that aren't known until runtime.

In that case, and since you already mentioned it, functionality like boost:: variant or boost::any might provide good solution.

Since you seem to be constrained in that you're not allowed to use boost, why not roll your own? The two basic approaches are using a tagged union or exploiting C++'s dynamic type system by using a polymorphic base class (and either a well defined interface or dynamic_casts, probably hid behind an acyclic visitor)

I'm referencing an answer of mine on SO showing an basic sketch of both approaches, with a link to a more complete boost::any like implementation.

answered Mar 8, 2016 at 9:07

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.