3
\$\begingroup\$

This code works, but since I'm just learning Haskell I'm wondering if it could have been done in a more idiomatic way.

The main function is called tokenize, and it uses helper functions called tokenizeOne and tokenizeSeveral.

Here is the code:

tokenizeOne "" (' ':rest) = tokenizeOne "" rest 
tokenizeOne str ('\\':ch:rest) = tokenizeOne (str ++ [ch]) rest
tokenizeOne x (' ':rest) = (x, rest)
tokenizeOne x "" = (x, "")
tokenizeOne x (ch:rest) = tokenizeOne (x ++ [ch]) rest
tokenizeSeveral x "" = x
tokenizeSeveral x str = let (token, rest) = tokenizeOne "" str in
 tokenizeSeveral (x ++ [token]) rest
tokenize str = tokenizeSeveral [] str
asked Feb 5, 2014 at 15:17
\$\endgroup\$

1 Answer 1

2
\$\begingroup\$

Apart from general code layout I don't see much to improve - the only issue worth talking about is that you should never ever actually use something like ++ [ch]. Your function is needlessly O(n2) for that reason alone. That might not sound like much here, but it is a good idea to shake this habit early.

So let's talk about that a bit. Your options generally are:

  1. Find a library function that frees you from actually constructing the list yourself. But to be fair, for this examples I can't actually think of anything helpful myself.

  2. Use a data structure that copes better with it, such as Seq from Data.Sequence.

  3. First construct the list using just (:), then reverse it at the end. That might seem inefficient, but given that reverse is only O(n), it will never actually increase algorithmic complexity.

  4. Rebuild your function so it constructs the list in the right order right away.

Here is what things look like if we go for the last option:

import Control.Arrow ( first )
token :: String -> (String, String)
token "" = ("", "")
token (' ' : cs) = ("", cs)
token ('\\':c:cs) = first (c:) $ token cs
token (c : cs) = first (c:) $ token cs

The key here is that we add the character c using first (c:) after the recursive call returns, and therefore to the front of the string.

A notable advantage is that this version has better laziness properties:

*Main> head $ fst $ token ('t':undefined)
't'

We know that the first character is 't' even without looking at the rest of the string, whereas your function always needs to finish a complete token before it can return. And as we are programming in Haskell, we often like our functions as lazy as possible.

Finally, a matching tokens implementation:

tokens :: String -> [String]
tokens str
 | null str' = []
 | otherwise = tok : tokens rest
 where str' = dropWhile (==' ') str
 (tok, rest) = token str'

This is again basically your version using direct list construction. Only difference is that we now deal with whitespace once per token, which I feel is slightly more elegant.

answered Feb 7, 2014 at 17:51
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.