Is this a good way to get all sublists of a sequence/list that have a given length?
An inefficient way to do it would be something like
f n = (filter (\x -> (length x) > n)) . (take n) . tails
This just takes the n
first elements of each tail of the original list. I think it should be slow because of the length check on every tail-element.
A smarter way would should be to "slide" a sequence of length n
over an input sequence and save the result of each slide by one to the right.
-- | Get all the subsequences of a given sequence sq of length n
ngrams::Int -> Seq a -> Seq (Seq a)
ngrams n sq | length sq < n = empty
ngrams n sq | otherwise = ngrams' restSequence initialWindow empty
where
initialWindow = take n sq
restSequence = drop n sq
ngrams' (viewl -> EmptyL) window acc = acc |> window
ngrams' (viewl -> x :< s) window@(viewl -> a :< r) acc =
ngrams' s (r |> x) (acc |> window)
Somehow I have the feeling I am missing an obvious way to do this better...
1 Answer 1
A little math goes a long way.
ngrams :: Int -> [a] -> [[a]]
ngrams n l = take (length l - (n - 1)) . map (take n) . tails $ l
-
\$\begingroup\$ Thanks for your reply. I guess this is still not optimal because it won't work on an infinite list. I guess that each call to
take
also takes n operations for each tail, right ? \$\endgroup\$Christof Schramm– Christof Schramm2014年11月17日 09:29:30 +00:00Commented Nov 17, 2014 at 9:29 -
\$\begingroup\$ Re: infinite lists, neither would your
Seq
version. Re:take
, yes but no, remember that it will be evaluated lazily, so you'll only compute the elements oftake n
as you use them, it won't walk the spine of the ngram twice. \$\endgroup\$bisserlis– bisserlis2014年11月17日 14:14:46 +00:00Commented Nov 17, 2014 at 14:14 -
\$\begingroup\$ I think it is better to replace
take (length) l - (n-1)
byzipWith (flip const) (drop (n-1) l)
. This increases lazyness and makes the function work for infinite lists. \$\endgroup\$Tashi Walde– Tashi Walde2018年09月07日 14:23:15 +00:00Commented Sep 7, 2018 at 14:23