Lexer for expression evaluator

Question 1

I started learning my first functional programming language (Haskell) yesterday and I have been working on a lexer for an expression evaluator. The lexer is now basically feature complete, but I'm not sure what I can do to improve the code.

import Data.Char
import Data.List
data TokenType = Identifier |
 RealNumberLiteral |
 PlusSign |
 MinusSign |
 Asterisk |
 ForwardSlash |
 Caret |
 LeftParenthesis |
 RightParenthesis
 deriving (Show)
data Token = Token TokenType String
 deriving (Show)
read_token :: String -> Token
read_token [] = error "Unexpectedly reached the end of the source code while reading a token."
read_token source_code@(next_character:_)
 | isSpace next_character = read_token (dropWhile isSpace source_code)
 | isAlpha next_character = Token Identifier (takeWhile isAlpha source_code)
 | isDigit next_character = let token_lexeme = (takeWhile (\x -> isDigit x || x == '.') source_code)
 in let period_count = length (filter (=='.') token_lexeme)
 in Token RealNumberLiteral (if period_count <= 1
 then token_lexeme
 else error "There can only be one period in a real number literal.")
 | next_character == '+' = Token PlusSign "+"
 | next_character == '-' = Token MinusSign "-"
 | next_character == '*' = Token Asterisk "*"
 | next_character == '/' = Token ForwardSlash "/"
 | next_character == '^' = Token Caret "^"
 | next_character == '(' = Token LeftParenthesis "("
 | next_character == ')' = Token RightParenthesis ")"
 | otherwise = error ("Encountered an unexpected character (" ++ [next_character] ++ ") while reading a token.")
append_read_tokens :: [Token] -> String -> [Token]
append_read_tokens tokens source_code
 | null source_code = tokens
 | isSpace (head source_code) = append_read_tokens tokens (dropWhile isSpace source_code)
 | otherwise = let next_token@(Token next_token_type next_token_lexeme) = read_token source_code
 in append_read_tokens (tokens ++ [next_token]) (drop (length next_token_lexeme) source_code)
tokenize :: String -> [Token]
tokenize [] = []
tokenize source_code = append_read_tokens [] source_code

Question 2

No need of TokenType: data Token = Identifier String | RealNumberLiteral String | PlusSign ...
No need of nested let statements, see http://learnyouahaskell.com/syntax-in-functions#let-it-be
camelCase naming is preferred everywhere.
generally it is more efficient to aggregate lists using cons (: operator), and reverse at the end, if needed. It is about tokens ++ [next_token] fragment.
read_token could return a tuple of token and the rest string, no need to drop (length next_token_lexeme) source_code after that.

Note, I didn't inspect code logic at all.

leventov leventov 4463 silver badges8 bronze badges · Accepted Answer · 2013-07-16 04:22:49Z

No need of TokenType: data Token = Identifier String | RealNumberLiteral String | PlusSign ...
No need of nested let statements, see http://learnyouahaskell.com/syntax-in-functions#let-it-be
camelCase naming is preferred everywhere.
generally it is more efficient to aggregate lists using cons (: operator), and reverse at the end, if needed. It is about tokens ++ [next_token] fragment.
read_token could return a tuple of token and the rest string, no need to drop (length next_token_lexeme) source_code after that.

Note, I didn't inspect code logic at all.

Stack Exchange Network

Lexer for expression evaluator

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Lexer for expression evaluator

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions