2
\$\begingroup\$

I started learning my first functional programming language (Haskell) yesterday and I have been working on a lexer for an expression evaluator. The lexer is now basically feature complete, but I'm not sure what I can do to improve the code.

import Data.Char
import Data.List
data TokenType = Identifier |
 RealNumberLiteral |
 PlusSign |
 MinusSign |
 Asterisk |
 ForwardSlash |
 Caret |
 LeftParenthesis |
 RightParenthesis
 deriving (Show)
data Token = Token TokenType String
 deriving (Show)
read_token :: String -> Token
read_token [] = error "Unexpectedly reached the end of the source code while reading a token."
read_token source_code@(next_character:_)
 | isSpace next_character = read_token (dropWhile isSpace source_code)
 | isAlpha next_character = Token Identifier (takeWhile isAlpha source_code)
 | isDigit next_character = let token_lexeme = (takeWhile (\x -> isDigit x || x == '.') source_code)
 in let period_count = length (filter (=='.') token_lexeme)
 in Token RealNumberLiteral (if period_count <= 1
 then token_lexeme
 else error "There can only be one period in a real number literal.")
 | next_character == '+' = Token PlusSign "+"
 | next_character == '-' = Token MinusSign "-"
 | next_character == '*' = Token Asterisk "*"
 | next_character == '/' = Token ForwardSlash "/"
 | next_character == '^' = Token Caret "^"
 | next_character == '(' = Token LeftParenthesis "("
 | next_character == ')' = Token RightParenthesis ")"
 | otherwise = error ("Encountered an unexpected character (" ++ [next_character] ++ ") while reading a token.")
append_read_tokens :: [Token] -> String -> [Token]
append_read_tokens tokens source_code
 | null source_code = tokens
 | isSpace (head source_code) = append_read_tokens tokens (dropWhile isSpace source_code)
 | otherwise = let next_token@(Token next_token_type next_token_lexeme) = read_token source_code
 in append_read_tokens (tokens ++ [next_token]) (drop (length next_token_lexeme) source_code)
tokenize :: String -> [Token]
tokenize [] = []
tokenize source_code = append_read_tokens [] source_code
Jamal
35.2k13 gold badges134 silver badges238 bronze badges
asked Jul 11, 2013 at 22:06
\$\endgroup\$

1 Answer 1

1
\$\begingroup\$
  • No need of TokenType: data Token = Identifier String | RealNumberLiteral String | PlusSign ...

  • No need of nested let statements, see http://learnyouahaskell.com/syntax-in-functions#let-it-be

  • camelCase naming is preferred everywhere.

  • generally it is more efficient to aggregate lists using cons (: operator), and reverse at the end, if needed. It is about tokens ++ [next_token] fragment.

  • read_token could return a tuple of token and the rest string, no need to drop (length next_token_lexeme) source_code after that.

Note, I didn't inspect code logic at all.

answered Jul 16, 2013 at 4:22
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.