So I've written a URL router which also allows for wildcards (path parameters).
A URL like /users/{uuid}
can be added and then, when a user sends a request to the server with the following target /users/955b2a88-ae80-11ea-b3de-0242ac130004
, the uuid will then equal 955b2a88-ae80-11ea-b3de-0242ac130004
.
Example:
PathMatcher<int> matcher(-1);
matcher.add_path("/users/{uuid}", 1);
Parameters parameters;
assert(matcher.match("/users/9c4ceec8-f929-434e-8ff1-837dd54b7b56", parameters) == 1);
assert(parameters.find("uuid", "") == "9c4ceec8-f929-434e-8ff1-837dd54b7b56");
I have learned C++ through Stack Overflow, therefore I don't really have an idea how production ready code should look like.
I'd like to know what should be written differently and would could be improved.
If you'd like to run the code, it is hosted on GitHub with an example main.cpp
.
typedef std::string string_t;
class Parameters {
std::vector<std::pair<string_t, string_t>> parameters_;
public:
void add(string_t& name, string_t& value) {
parameters_.emplace_back(name, value);
}
string_t find(string_t name, string_t default_="") {
for(auto & parameter : parameters_) {
if(parameter.first == name) return parameter.second;
}
return default_;
}
void clear() {
parameters_.clear();
}
};
template<typename Result>
class PathMatcher {
struct Path {
char character;
std::vector<Path *> children;
std::optional<Result> result;
Path(char character, std::optional<Result> result) : character(character), result(result) {}
~Path() {
for(size_t i = 0; i < children.size(); i++) {
delete children[i];
}
}
};
Path *
find_child(char character, std::vector<Path *> &parent, bool findWildcard = false, Path **wildcard = nullptr) {
Path *child = nullptr;
for (size_t i = 0; i < parent.size(); i++) {
if (parent[i]->character == character) {
child = parent[i];
// If wildcards are not wanted, there is no need to continue
if (!findWildcard) break;
} else if (findWildcard && parent[i]->character == wildcard_open_) {
(*wildcard) = parent[i];
// If child has already been found, there is no need to continue
if (child != nullptr) break;
}
}
return child;
}
Path *create_path(std::vector<Path *> &parent, char character, std::optional<Result> value) {
Path *route = new Path(character, value);
parent.push_back(route);
return route;
}
void insert_path(string_t &path, Result &result, Path *step = nullptr, int path_pos = 0) {
/*
* Recursively creates path. A path contains a vector with child paths.
* These linked paths create a chain which can be walked down.
* If we input /users/a and /users/b the following chain would be created.
* / -> u -> s -> e -> r -> s -> / -> a
* -> b
*
* Now if you want to match against an input, this chain can be walked down until a path no longer contains the matching character.
* If the input would equal /users/c, the chain would be walked until the last '/' and then failing because it only contains the children a & b, not c.
*
* The last two paths (a & b) will contains a value. This value states that the path has reached its end.
*/
assert(path.size() > 1);
if (path_pos == path.size() - 1) {
// last insertion accompanied by ending value
Path *child = find_child(path[path_pos], step->children);
if (child != nullptr) {
assert(!child->result); // Cant already have a value
child->result = result;
} else {
create_path(step->children, path[path_pos], result);
}
} else {
Path *child;
if (path_pos == 0) {
child = find_child(path[path_pos], paths_);
} else {
child = find_child(path[path_pos], step->children);
}
if (child == nullptr && path_pos == 0) {
child = create_path(paths_, path[path_pos], std::nullopt);
} else if (child == nullptr) {
child = create_path(step->children, path[path_pos], std::nullopt);
}
return insert_path(path, result, child, path_pos + 1);
}
}
void get_wildcard_name(Path **wildcard, string_t &wildcard_name) {
/*
* /users/{uuid} and users/{uuid}/friends is allowed
* /users/{uuid} and users/{id}/friends is not allowed, because wildcards at the same positions must match
*
* This method walks down the chain until the wildcard_close_ character has been found. Everything between start and end is appended to the value.
*/
assert((*wildcard)->children.size() == 1);
if ((*wildcard)->children[0]->character != wildcard_close_) {
wildcard_name.append(1, (*wildcard)->children[0]->character);
*wildcard = (*wildcard)->children[0];
get_wildcard_name(wildcard, wildcard_name);
} else {
*wildcard = (*wildcard)->children[0];
}
}
string_t get_wildcard_value(string_t &path, size_t &pos) {
// Walks down the input until the trailing_wildcard_ is found or the end is reached, everything between equals the wildcard value
int begin = pos;
for (; pos < path.size() - 1; pos++) {
if (path[pos + 1] == trailing_wildcard_) {
return path.substr(begin, pos - begin + 1);
}
}
return path.substr(begin);
}
std::vector<Path *> paths_;
Result default_;
char wildcard_open_;
char wildcard_close_;
char trailing_wildcard_;
public:
PathMatcher(Result default_, char wildcard_open='{', char wildcard_close='}', char trailing_wildcard='/')
: default_(default_), wildcard_open_(wildcard_open), wildcard_close_(wildcard_close), trailing_wildcard_(
trailing_wildcard) {}
virtual ~PathMatcher() {
for(size_t i = 0; i < paths_.size(); i++) {
delete paths_[i];
}
}
void add_path(string_t path, Result value) {
insert_path(path, value);
}
Result match(string_t path, Parameters &variables) {
/*
* Starts at paths_ and continues trying to find children matching the next characters in input path.
* If there is no child which matches the next character, but there was a wildcard_open_ as a child,
* the code jumps back to it and sets a Parameters value for the wildcard with its value and then continues normally.
*/
Path *step = find_child(path[0], paths_);
if (step == nullptr) return default_;
Path *lastWildcard = nullptr;
size_t lastWildcardPos;
size_t i = 1;
for (; i < path.size() - 1 && step != nullptr; i++) {
Path *nextWildcard = nullptr;
step = find_child(path[i], step->children, true, &nextWildcard);
if (nextWildcard != nullptr && nextWildcard != lastWildcard) {
lastWildcardPos = i;
lastWildcard = nextWildcard;
}
if (path[i] == trailing_wildcard_) {
lastWildcard = nullptr;
}
if (step == nullptr && lastWildcard != nullptr) {
i = lastWildcardPos;
string_t wildcard_name;
get_wildcard_name(&lastWildcard, wildcard_name);
string_t wildcard_value = get_wildcard_value(path, i);
variables.add(wildcard_name, wildcard_value);
if (i == path.size() - 1) {
// Wildcard value reaches end
if (!lastWildcard->result) return default_;
return lastWildcard->result.value();
} else {
step = lastWildcard;
}
}
}
if (step == nullptr) return default_;
Path *wildcard = nullptr;
Path *result = find_child(path[path.size() - 1], step->children, true, &wildcard);
if(result != nullptr && result->result) return result->result.value();
else if(wildcard != nullptr) {
// find wildcard ending and check if it contains a value
string_t wildcardName;
get_wildcard_name(&wildcard, wildcardName);
if(!wildcard->result) return default_;
string_t value = path.substr(path.size() - 1);
variables.add(wildcardName, value);
return wildcard->result.value();
}
return default_;
}
};
1 Answer 1
I see some things that may help you improve your program.
Separate interface from implementation
It makes the code somewhat longer for a code review, but it's often very useful to separate the interface from the implementation. In C++, this is usually done by putting the interface into separate .h
files and the corresponding implementation into .cpp
files. It helps users (or reviewers) of the code see and understand the interface and hides implementation details. The other important reason is that you might have multiple source files including the .h
file but only one instance of the corresponding .cpp
file. In other words, split your existing .h
file into a .h
file and a .cpp
file. The templated class, by contrast, can stay in a .h
file.
Make sure you have all required #include
s
The code uses std::pair
but doesn't #include <utility>
. Also, carefully consider which #include
s are part of the interface (and belong in the .h
file) and which are part of the implementation per the above advice.
Reconsider the interface
It doesn't seem very useful to me to have the PathMatcher
return an int
. What would be more useful, I think, would be for it to return a Parameters
object. That would also probably eliminate the need for Parameters::clear()
.
Use standard libraries more effectively
The Parameters
object currently iterates through its parameters to find a match. I would suggest that instead of this existing line:
std::vector<std::pair<string_t, string_t>> parameters_;
a more appropriate data structure would be this:
std::unordered_map<std::string, std::string> parameters_;
That would make find
very simple:
std::string Parameters::find(std::string key, std::string notfound) const {
auto result = parameters_.find(key);
return result == parameters_.end() ? notfound : result->second;
}
Here again, the interface is strange. I don't see much use in passing back a passed string if the key is not found. I'd expect to have an exception thrown instead. That renders the function even simpler:
std::string Parameters::find(std::string key) const {
return parameters_.at(key);
}
Use <regex>
to greatly simplify the code
This code could be much simpler, even without altering the interface, by using std::regex
. Here's what the include file would look like:
parameter.h
#ifndef PARAMETER_H
#define PARAMETER_H
#include <regex>
#include <string>
#include <unordered_map>
#include <utility>
#include <vector>
class Parameters {
public:
std::string find(std::string name, std::string notfound = "") const;
void clear();
template <class T>
friend class PathMatcher;
private:
std::unordered_map<std::string, std::string> m;
};
struct Pattern {
Pattern(std::string path);
std::regex re;
std::vector<std::string> names;
};
template <class T>
class PathMatcher {
public:
PathMatcher(T init) : nomatch{init} {}
void add_path(std::string pattern, T value) {
patterns.emplace_back(pattern, value);
}
T match(std::string input, Parameters& p) {
T answer{nomatch};
for (const auto& patpair : patterns) {
std::smatch m;
if (std::regex_match(input, m, patpair.first.re)) {
answer = patpair.second;
for (unsigned i{1}; i < m.size(); ++i) {
p.m[patpair.first.names[i-1]] = m[i].str();
}
}
}
return answer;
}
private:
T nomatch;
std::vector<std::pair<Pattern, T>> patterns;
};
#endif // PARAMETER_H
This is the implementation file:
parameter.cpp
#include "parameter.h"
std::string Parameters::find(std::string name, std::string notfound) const {
auto result = m.find(name);
return result == m.end() ? notfound : result->second;
}
void Parameters::clear() {
m.clear();
}
Pattern::Pattern(std::string path) {
static const std::regex vble{R"(\{[^\}]*\})"};
auto start = std::sregex_iterator{path.begin(), path.end(), vble};
auto finish = std::sregex_iterator{};
for (auto it{start}; it != finish; ++it) {
auto str = it->str();
str.erase(0, 1); // remove {
str.pop_back(); // remove }
names.push_back(str);
}
re = std::regex_replace(path, vble, "(.*)");
}
When I tested it, it's exactly as fast as the original version (both ran in 5ms on my machine).
-
\$\begingroup\$ I don't see much use in passing back a passed string if the key is not found I assume that parameters are optional (possibly passed on the command line), so when you ask for the value of one of these having the default value returned (instead of having to handle a not found exception to set that) can simplify the code. \$\endgroup\$1201ProgramAlarm– 1201ProgramAlarm2020年06月15日 21:24:17 +00:00Commented Jun 15, 2020 at 21:24
-
\$\begingroup\$ It might also be worth mentioning that the original implementation of
Parameters
does not handle duplicate parameters properly. It will return the original (first) value added if an "updated" value is added. Using some form of map will replace the existing value with the new value. \$\endgroup\$1201ProgramAlarm– 1201ProgramAlarm2020年06月15日 21:26:47 +00:00Commented Jun 15, 2020 at 21:26 -
1\$\begingroup\$ Passing
std::string
without the&
looks unusual to me. Did I miss the latest developments? \$\endgroup\$Roland Illig– Roland Illig2020年06月15日 21:36:29 +00:00Commented Jun 15, 2020 at 21:36 -
\$\begingroup\$ @RolandIllig: It depends where you mean. For
find
it would make sense to passconst std::string &
but for thePattern
constructor, since we need to make a copy anyway, it makes sense. \$\endgroup\$Edward– Edward2020年06月15日 22:05:22 +00:00Commented Jun 15, 2020 at 22:05 -
1\$\begingroup\$ Only
#include
s that are required to understand the interface go in the.h
file. So if, for example I had used something from<cmath>
inparameter.cpp
that#include
would go only in the.cpp
file since it’s an internal detail not relevant to the interface. \$\endgroup\$Edward– Edward2020年06月16日 08:59:57 +00:00Commented Jun 16, 2020 at 8:59
std::basic_regex
? \$\endgroup\$/user/
, even if what follows doesn't look like a UUID at all. With regular expressions, you can write a pattern that only accepts valid UUIDs. \$\endgroup\$