3
\$\begingroup\$

Previously I wrote a very generic HTTP handler.
But in real life the server needs to be able to handle different functions on different paths. i.e. /rest/addUser would add a user while /rest/getUser/45 would get user 45 etc.

So we need to add lambdas to a generic path that can pick up variables to from the path provided to the server. So this class allows you to register paths that embed sections that will provided to the lambda as variables.

Example: /rest/getUser/{id} would be a path that contains /rest/getUser/ and has a suffix that is put in the variable id for the lambda to use. Note the algorithm is generic so you can register multiple variable names in different sections. The variable will match against any character except '/'.

PathMatcher.h

#ifndef THORSANVIL_NISSE_NISSEHTTP_PATH_MATCHER_H
#define THORSANVIL_NISSE_NISSEHTTP_PATH_MATCHER_H
#include <map>
#include <vector>
#include <string>
#include <functional>
#include <regex>
namespace ThorsAnvil::Nisse::NisseHTTP
{
class Request;
class Response;
using Match = std::map<std::string, std::string>;
using Action = std::function<void(Match const&, Request&, Response&)>;
using NameList = std::vector<std::string>;
class PathMatcher
{
 struct MatchInfo
 {
 std::regex test;
 NameList names;
 Action action;
 };
 std::vector<MatchInfo> paths;
 public:
 void addPath(std::string pathMatch, Action&& action);
 bool findMatch(std::string const& path, Request& request, Response& response);
};
}
#endif

PathMatcher.cpp

#include "PathMatcher.h"
using namespace ThorsAnvil::Nisse::NisseHTTP;
void PathMatcher::addPath(std::string pathMatch, Action&& action)
{
 // Variables to be built.
 std::string expr; // Convert pathMatch into a regular expression.
 NameList names; // Extract list of names from pathMatch.
 // Search variables
 std::smatch searchMatch;
 std::regex pathNameExpr{"\\{[^}]*\\}"};
 while (std::regex_search(pathMatch, searchMatch, pathNameExpr))
 {
 expr += pathMatch.substr(0, searchMatch.position());
 expr += "([^/]*)";
 std::string match = searchMatch.str();
 names.emplace_back(match.substr(1, match.size() - 2));
 pathMatch = searchMatch.suffix().str();
 }
 expr += pathMatch;
 // Add the path information to the list.
 paths.emplace_back(std::regex{expr}, std::move(names), std::move(action));
}
bool PathMatcher::findMatch(std::string const& path, Request& request, Response& response)
{
 for (auto const& pathMatchInfo: paths)
 {
 std::smatch match{};
 if (std::regex_match(path, match, pathMatchInfo.test))
 {
 Match result;
 for (std::size_t loop = 0; loop < pathMatchInfo.names.size(); ++loop)
 {
 result.insert({pathMatchInfo.names[loop], match[loop+1].str()});
 }
 pathMatchInfo.action(result, request, response);
 return true;
 }
 }
 return false;
}

Test Case

TEST(PathMatcherTest, NameMatchMultiple)
{
 PathMatcher pm;
 int count = 0;
 Match hit;
 pm.addPath("/path1/{name}/{id}", [&count, &hit](Match const& match, Request&, Response&){++count;hit = match;});
 std::stringstream ss{"GET /path1/path2/path3 HTTP/1.1\r\nhost: google.com\r\n\r\n"};
 Request request("http", ss);
 Response response(ss, Version::HTTP1_1);
 pm.findMatch("/path1/path2/path3", request, response);
 EXPECT_EQ(1, count);
 EXPECT_EQ(2, hit.size());
 EXPECT_EQ("name", (++hit.begin())->first);
 EXPECT_EQ("path2", (++hit.begin())->second);
 EXPECT_EQ("id", hit.begin()->first);
 EXPECT_EQ("path3", hit.begin()->second);
}
asked Oct 21, 2024 at 18:05
\$\endgroup\$

1 Answer 1

4
\$\begingroup\$

Limit the scope of identifiers

Match, Action and NameList are declared inside the NisseHTTP namespace, but they are specific to the PathMatcher, so instead declare them inside class PathMatcher.

Consider that a name like Action is very generic, and maybe you want to have some other thing in NisseHTTP that has the concept of actions, but takes different parameters.

Do you really need regular expressions?

Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems. (Jamie Zawinski)

Regular expressions are a powerful tool, but they come at a price. They are not always that efficient; C++'s std::regexs are not known for their speed, and a std::regex object might use a lot of memory. But also, the code you wrote to parse the pathMatch string could just as well be used to match the actual URL, and would probably be as efficient as a good regex implementation.

But most importantly:

Are you handling all corner cases?

I can thing of a few match strings that have some issues:

  • "/foo/bar/{}" (empty name)
  • "/foo/{bar}/{bar}" (same name used twice)
  • "/foo/{{bar}/baz" ({ appears in name, should you allow that?)
  • "/foo/{bar}}/baz" (} is not part of the name, will it match?)
  • "/foo/bar.baz" (this will match /foo/bar/baz!)
  • "/foo/\\{bar}" (this will cause an out-of-bounds read)
  • "/foo/(bar)/{baz}" (this will cause the wrong string to be added to result)

And if you didn't know you could actually add regular expressions to the match string, what if you wanted to match a literal {?

What if I added two matcher functions, with these two different match strings:

  • "/{foo}"
  • "/{bar}"
answered Oct 22, 2024 at 8:50
\$\endgroup\$
2
  • \$\begingroup\$ Yea second thoughts on the regular expression matching already. :-( \$\endgroup\$ Commented Oct 22, 2024 at 16:50
  • \$\begingroup\$ Will move the types inside PathMatcher \$\endgroup\$ Commented Oct 22, 2024 at 16:51

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.