I'm building a small MPL module in one of my utility libraries for fun and learning experience.
One of the problems I'm trying to solve is getting a list of all indices where a sequence of types appears inside another sequence of types.
Example:
static_assert(is_same
<
List<int, char, int, int, char, int, double, int, char>
::IdxsOfSeq<int, char>,
ListInt<0, 3, 7>
>());
The above code is valid and works as intended. However, finding out the indices of a subsequence in a long type sequence becomes very slow very fast.
Searching in a list with 20 or 30 elements can take up to 30 seconds of compilation time.
That would rarely be useful, but I'm also experimenting with compile-time strings implemented as character lists.
The current implementation of IdxsOfSeq
basically uses std::index_sequence
to perform a naive string matching algorithm over the type sequence.
The naive algorithm requires no preprocessing time, but the execution of the type matching requires a lot of computation time.
This is my current implementation of the string matching algorithm.
// Matches<bool, int> returns a ListInt<TStart> is the boolean
// is true, otherwise an empty ListInt<>
template<bool TAllTrue, int TStart>
struct Matches;
template<int TStart>
struct Matches<true, TStart>
{
using Type = ListInt<TStart>;
};
template<int TStart>
struct Matches<false, TStart>
{
using Type = ListInt<>;
};
// GetMatch<TS, TM, TStart, TMIdxs>::Type checks if every type
// of TS (the source type sequence) starting from the index TStart
// matches every type in TM (the subsequence to find)
template<typename TS, typename TM, int TStart, typename TMIdxs>
struct GetMatch;
template<typename TS, typename TM, int TStart, SizeT... TMIdxs>
struct GetMatch<TS, TM, TStart, IdxSeq<TMIdxs...>>
{
using Type = typename Matches
<
// True if all the types inside AreAllTrue are std::true_type
AreAllTrue
<
// Type checking with index sequence expansion
std::is_same_t
<
typename TS::template At<TStart + TMIdxs>,
typename TM::template At<TMIdxs>
>...
>{}(),
TStart
>::Type;
};
// Main helper template
// Finds every occurrence of the list TM inside TS and returns its index
template<typename TS, typename TM, typename TSIdxs, typename TMIdxs>
struct IdxsOfSeqHlpr;
template<typename TS, typename TM, SizeT... TSIdxs, typename TMIdxs>
struct IdxsOfSeqHlpr<TS, TM, IdxSeq<TSIdxs...>, TMIdxs>
{
// ConcatLists (obviously) concatenates the contents of the
// lists passed as template parameters in a single list
using Type = typename ConcatLists
<
ListInt<>,
typename GetMatch<TS, TM, TSIdxs, TMIdxs>::Type...
>::Type;
};
// If the subsequence to match is empty, return an empty ListInt<>
template<typename TS, SizeT... TSIdxs, typename TMIdxs>
struct IdxsOfSeqHlpr<TS, List<>, IdxSeq<TSIdxs...>, TMIdxs>
{
using Type = ListInt<>;
};
// Main typedef
//
template<typename TS, typename TM>
using IdxsOfSeq = typename IdxsOfSeqHlpr
<
// Source list
TS,
// Subsequence to match
TM,
// Index sequence of the source list
// Starts from 0, ends at (source_list_size - matching_list_size + 1)
std::make_index_sequence
<
getClampedMin(int(TS::size - TM::size + 1), 0)
>,
// Index sequence of the subsequence to match
std::make_index_sequence<TM::size>
>::Type;
Is there any part of the algorithm that can easily be optimized?
Or is it better to rewrite the solution from scratch, using another algorithm?
1 Answer 1
I'm going to throw out a fairly complicated solution here by breaking the problem into lots of smaller parts. The end result is that the compilation of the following:
using T = typelist<int, char, int, int, char, int, double, int, char,
int, char, int, int, char, int, double, int, char,
int, char, int, int, char, int, double, int, char,
int, char, int, int, char, int, double, int, char>;
static_assert(std::is_same<
matches<T, typelist<int, char>>,
intlist<0, 3, 7, 9, 12, 16, 18, 21, 25, 27, 30, 34>
>::value, "!");
is still almost instantaneous.
The algorithm is actually going to come from an idea in Python's itertools. What we want to do is walk our typelist N at a time (where N is the match length), and just use std::is_same
to compare. That is, taking the original problem's list:
using SRC = typelist<int, char, int, int, char, int, double, int, char>;
we want to turn it into something that looks like:
using X = typelist<typelist<int, char>,
typelist<char, int>,
typelist<int, int>,
...
>;
And then pass that into a metafunction taking an appropriate typelist<Ts...>
and std::index_sequence<Is...>
do:
using type = concat<
std::conditional_t<std::is_same<T, SUB>::value,
typelist<std::integral_constant<std::size_t, Idx>>,
typelist<>
>...>;
That's the basic idea anyway. Plus, the smaller pieces are probably going to be useful in other contexts.
The Small Algorithms
These are very straightforward and I will omit them here for brevity: head
, tail
, size
, concat
(variadic!), any_of
, and is_empty
.
take_by
To generate the X
typelist above, we need something like:
template <typename TL, int I>
using take_by = ???
The implementation of this is to make a typelist that is I
copies of TL
, each one dropping a steadily larger prefix off the top. Once we have that, we need to zip all the iterators together. So really, I need to start with the smaller pieces...
skip_n
Given a typelist, basically just call tail
n
times:
template <typename TL, int I>
struct skip_n_impl : skip_n_impl<tail<TL>, I-1> { };
template <typename TL>
struct skip_n_impl<TL, 0> {
using type = TL;
};
template <typename TL, int I>
using skip_n = typename skip_n_impl<TL, I>::type;
There may be a better way to do this, but I'm not sure. So, skip_n<typelist<int, char, double>, 2>
is typelist<double>
.
zip
This is the tricky part. Basically, we have a list of lists of types. If any of those typelists is empty, we're done. Otherwise, we concat all the heads with a recursive call against all the tails. We can't use std::conditional_t
here due to eager evaluation of both parts, so I'm introducing two different impl
types:
template <typename TL>
struct zip_impl;
template <typename TL, bool done>
struct zip_impl2;
template <typename TL>
struct zip_impl2<TL, true> {
using type = typelist<>;
};
template <typename... Ts>
struct zip_impl2<typelist<Ts...>, false> {
using type = concat<typelist<typelist<head<Ts>...>>,
typename zip_impl<typelist<tail<Ts>...>>::type
>;
};
template <typename... Ts>
struct zip_impl<typelist<Ts...>>
: zip_impl2<typelist<Ts...>, any_of<is_empty<Ts>::value...>::value>
{ };
template <typename T>
using zip = typename zip_impl<T>::type;
This gives us the pieces to get back to take_by
:
template <typename TL, int I, typename = std::make_index_sequence<I>>
struct take_by_impl;
template <typename TL, int I, std::size_t... Idx>
struct take_by_impl<TL, I, std::index_sequence<Idx...>>
{
using iters = typelist<skip_n<TL, Idx>...>;
using type = zip<iters>;
};
template <typename TL, int I>
using take_by = typename take_by_impl<TL, I>::type;
Hopefully that makes sense. First, we built up our "iterators" - which are all offset by one from each other. Then we just zip them up. Now we have the X
, and all we need is:
**matches**
:
template <typename SRC, typename SUB>
struct matches_impl
{
template <typename TL, typename = std::make_index_sequence<size<TL>::value>>
struct impl;
template <typename... T, std::size_t... Idx>
struct impl<typelist<T...>, std::index_sequence<Idx...>>
{
using type = concat<
std::conditional_t<std::is_same<T, SUB>::value,
typelist<std::integral_constant<std::size_t, Idx>>,
typelist<>
>...>;
};
using type = typename impl<take_by<SRC, size<SUB>::value>>::type;
};
template <typename SRC, typename SUB>
using matches = typename matches_impl<SRC, SUB>::type;
Basically, we end up creating a meta-list internally that would look like:
intlist<0>,
typelist<>,
typelist<>,
intlist<3>,
typelist<>,
...
But concat
will drop the empties, so when that gets combined, we get what we wanted - intlist<0, 3, 7>
.
Explore related questions
See similar questions with these tags.
std::index_sequence
(yourIdxSeq
) and evensize_t
(yourSizeT
). It would help me to understand the code if you would just use the standard entities directly, without renaming them. \$\endgroup\$X is a substring of Y
translates into simplyX is a member of Y::substrings
. \$\endgroup\$