Finding the first most digit number in a string with Haskell. FUNCTION PARSER

Question 1

This problem stems from me wanting to create a function parser, a parse that will take any mathematical function with n variables and parse it into a tree structure that can be evaluated and returned as an answer. I want to do this because I am attempting to figure out a better way of doing some Numerical Analysis type stuff without using Matlab.

I am unsure if anyone else has had this issue but currently I have a working tree class but I have an issue with parsing a function into a tree form. My first issue lies with my inability to figure out how to traverse through a string and find the biggest number in the function. Ideally, I would call a function createFunction which has the signature createFunction :: String -> Function and it will take for example f(x)=34*x+92*x-3*x or even just 34*x+92*x-3*x but I can't figure out how to best parse this an find the numbers since using read would just get a 3 for 34 (I think.). I thought about how I would do this imperatively and I did a buffer till I didn't get a number and then read the number in that buffer or I guess a list in that case but that seems poor and very much not the in the spirit of functional programming. I just am having an issue conceptualizing the problem and solving it functionally.

If there's even just a way in Haskell to replace the x with the x value and do that, that would be interesting to see. I checked a few things but nothing seemed to make sense to me or understand what is going on to use it effectively. I just need someone to point me in a good direction. I suppose a regex would suffice but I am unsure. Here is my code for anyone who needs an idea of what I am doing.

My clear expectations is to take something like 34*x-4 and turn it into a tree that has a root node that is a subtraction sign, with a left child that is a multiplication sign with two leafs of 34 and x and the root subtraction sign will have a right child of 4. It will all be evaluated as that and replace the x with the input value. This might be over kill to use a tree. In no way is this tree necessary for the problem but I thought it would be the best way to do this!

operator.hs:

module Operator (Operator (Add, Sub, Mult, Div)) where
data Operator = Add | Sub | Mult | Div
 deriving (Eq,Show)

tree.hs:

module Tree (Tree (Leaf, Branch), fromFloat) where
import Operator (Operator(Add,Sub,Mult,Div))
data Tree = Leaf Float | Branch Operator Tree Tree
 deriving (Eq,Show)
fromFloat :: Float -> Tree
fromFloat = Leaf

function.hs:

module Function(fromFloat) where
import Tree
data Function = Function String Int Tree
 deriving (Eq,Show)

Question 2

So, you want a parser for polynomials written in a normal form, not arbitrary functions?

Question 3

For parsing, you should have a look at parser combinator libraries like megaparsec if you haven’t already — they’re great for parsing highly structured, nested data like this. Megaparsec even includes a module for parsing mathematical expressions!

Question 4

Alternately, if you don’t want to bother with parser combinators, you could split the polynomial on + and -, then write a function to turn each subexpression into a (coefficient, power) tuple, which can be mapped over the split polynomial to get a structure you could use.

Question 5

Thank you. I will look into this!! I want a normal form to be parsed.

Question 6

A fairly standard way of writing parsers from scratch in functional languages is to write functions that take as an argument the "input that's left", try to parse the initial part of that input, and return the result of the parse plus the "input remaining". Since you generally want to allow parsers to "fail gracefully" (see below), you usually wrap this return value in a Maybe or something.

For example, a parsing function to try and parse an integral number at the beginning of a string would look something like this:

import Data.Char
import Data.List
number :: String -> Maybe (Int, String)
number str = case span isDigit str of
 ([], _) -> Nothing -- no digits found
 (a, rest) -> Just (read a, rest)

This is how it works:

> number "34*x-4"
Just (34,"*x-4")
> number "x+2"
Nothing

It's not too tough to write a parser for simple polynomial expressions using this approach. Here's a parser for single-character variable names:

-- read a single-character variable name
var :: String -> Maybe (Char, String)
var (x:rest) | isAlpha x = Just (x, rest)
 | otherwise = Nothing

From this, you can construct a parser for "terms" of the form "2*x", "y", or "18".

-- read a "term": "2*x" or "y" or "18"
data Term = TermPoly Int Char | TermConstant Int
 deriving (Show)
term :: String -> Maybe (Term, String)
term str
 = case number str of
 Just (n, '*':str1) -> case var str1 of
 Just (x, str2) -> Just (TermPoly n x, str2)
 Nothing -> error "expected variable"
 Just (n, str3) -> Just (TermConstant n, str3)
 Nothing -> case var str of
 Just (x, str4) -> Just (TermPoly 1 x, str4)
 Nothing -> error "expected term"

Note how we use the potential failure of the initial number str parser (i.e., the Nothing branch of the topmost case) to try to parse the alternative of a variable without a coefficient (e.g., "y"). This is why allowing parsers to "fail gracefully" is helpful.

Now, we can write a parser for sums and differences of terms. The awkward use of the helper function go here makes sure that "2*x-4*y+3" gets parsed as (2*x-4*y)+3 instead of 2*x-(4*y+3) which is likely to happen if you use a more obvious recursive method of defining poly without the go helper:

-- read a polynomial as a sum of terms
data Poly = Single Term | Plus Poly Term | Minus Poly Term
 deriving (Show)
poly :: String -> Maybe (Poly, String)
poly str = case term str of
 Just (t, rest) -> go (Single t) rest
 Nothing -> error "expected term"
 where go :: Poly -> String -> Maybe (Poly, String)
 go p ('+':str1) = case term str1 of
 Just (t', str2) -> go (Plus p t') str2
 Nothing -> error "expected term"
 go p ('-':str1) = case term str1 of
 Just (t', str3) -> go (Minus p t') str3
 Nothing -> error "expected term"
 go p str4 = Just (p, str4)

With the above code, you get:

> poly "34*x-4"
Just (Minus (Single (TermPoly 34 'x')) (TermConstant 4),"")
> poly "34*x+92*x-3*x"
Just (Minus (Plus (Single (TermPoly 34 'x')) (TermPoly 92 'x')) (TermPoly 3 'x'),"")
> poly "x+y"
Just (Plus (Single (TermPoly 1 'x')) (TermPoly 1 'y'),"")

It doesn't handle a leading minus sign, though:

> poly "-2*x+y"
*** Exception: expected term
CallStack (from HasCallStack):
 error, called at PolyParser.hs:27:20 in main:Main

nor does it handle x*5 or 2*5*x or parentheses or spaces or lots of other cases.

Also, the biggest drawback of this approach is how complicated it is to combine simple component parsers into larger parsers. For example, term is really conceptually very simple: parse either a number times a var, or a lone number or lone var, but we have all these complicated case statements to combine number and var and parse the '*' character. This is where "parser combinators" or "monadic parsers" become important, as they provide a straightforward, simple syntax for combining parsers.

This is also why, realistically, no experienced Haskell programmer writes parsers from scratch. There are many excellent and powerful monadic parsing libraries available. They take some time to learn to use but are well worth it. A reasonably full-featured Megaparsec parser for an expression language takes about 30 minutes to write in 60-odd lines and can handle complex expressions (like the example in the main function below):

import Data.Void
import Text.Megaparsec
import Text.Megaparsec.Char
import qualified Text.Megaparsec.Char.Lexer as L
import Control.Monad.Combinators.Expr
type Parser = Parsec Void String
-- standard boilerplate for space handling
spaces :: Parser ()
spaces = L.space space1 empty empty
lexeme :: Parser a -> Parser a
lexeme = L.lexeme spaces
symbol :: String -> Parser String
symbol = L.symbol spaces
-- parse a number
number :: Parser Float
number = lexeme $ try L.float <|> fromIntegral <$> L.decimal
-- parse an identifier (e.g., "x_4")
identifier :: Parser String
identifier = lexeme $ (:) <$> (letterChar <|> char '_') <*> many (alphaNumChar <|> char '_')
-- abstract syntax tree for expressions
data Expr
 = Num Float
 | Var String
 | Negate Expr
 | BinOp BinOp Expr Expr
 deriving (Show)
data BinOp
 = Add | Sub | Mult | Div | Power
 deriving (Show)
-- parse a "term": number, identifier, or expression in parentheses
term :: Parser Expr
term = Num <$> number
 <|> Var <$> identifier
 <|> between (symbol "(") (symbol ")") expr
-- parse an expression combining terms with operators
expr :: Parser Expr
expr = makeExprParser term
 [ [binaryR "^" Power]
 , [Prefix (Negate <$ symbol "-")]
 , [binary "*" Mult, binary "/" Div]
 , [binary "+" Add, binary "-" Sub]
 ]
 where binary name f = InfixL (BinOp f <$ symbol name)
 binaryR name f = InfixR (BinOp f <$ symbol name)
-- parse a whole string as an expression
fullExpr :: Parser Expr
fullExpr = spaces *> expr <* eof
main = parseTest fullExpr "(pi*r^2 - 4*pi*r) / (c^2 - a^2 - b^2)"

As I say, getting to the point where you can write parsers like this takes quite a bit of time, but it's worth working through some tutorials, even if you aren't yet at the point where you'd be able to write parsers all on your own. Some resources that might be helpful to get started:

Jake Wheat's Intro to Parsing with Parsec in Haskell
Write Yourself a Scheme in 48 Hours
Hutton and Meijer's Monadic Parsing in Haskell
This Reddit thread from about a year ago with links to some hopefully up-to-date tutorials

K. A. Buhr K. A. Buhr 52.3k3 gold badges53 silver badges81 bronze badges · Accepted Answer · 2020-01-18 00:50:25Z

A fairly standard way of writing parsers from scratch in functional languages is to write functions that take as an argument the "input that's left", try to parse the initial part of that input, and return the result of the parse plus the "input remaining". Since you generally want to allow parsers to "fail gracefully" (see below), you usually wrap this return value in a Maybe or something.

For example, a parsing function to try and parse an integral number at the beginning of a string would look something like this:

import Data.Char
import Data.List
number :: String -> Maybe (Int, String)
number str = case span isDigit str of
 ([], _) -> Nothing -- no digits found
 (a, rest) -> Just (read a, rest)

This is how it works:

> number "34*x-4"
Just (34,"*x-4")
> number "x+2"
Nothing

It's not too tough to write a parser for simple polynomial expressions using this approach. Here's a parser for single-character variable names:

-- read a single-character variable name
var :: String -> Maybe (Char, String)
var (x:rest) | isAlpha x = Just (x, rest)
 | otherwise = Nothing

From this, you can construct a parser for "terms" of the form "2*x", "y", or "18".

-- read a "term": "2*x" or "y" or "18"
data Term = TermPoly Int Char | TermConstant Int
 deriving (Show)
term :: String -> Maybe (Term, String)
term str
 = case number str of
 Just (n, '*':str1) -> case var str1 of
 Just (x, str2) -> Just (TermPoly n x, str2)
 Nothing -> error "expected variable"
 Just (n, str3) -> Just (TermConstant n, str3)
 Nothing -> case var str of
 Just (x, str4) -> Just (TermPoly 1 x, str4)
 Nothing -> error "expected term"

Note how we use the potential failure of the initial number str parser (i.e., the Nothing branch of the topmost case) to try to parse the alternative of a variable without a coefficient (e.g., "y"). This is why allowing parsers to "fail gracefully" is helpful.

Now, we can write a parser for sums and differences of terms. The awkward use of the helper function go here makes sure that "2*x-4*y+3" gets parsed as (2*x-4*y)+3 instead of 2*x-(4*y+3) which is likely to happen if you use a more obvious recursive method of defining poly without the go helper:

-- read a polynomial as a sum of terms
data Poly = Single Term | Plus Poly Term | Minus Poly Term
 deriving (Show)
poly :: String -> Maybe (Poly, String)
poly str = case term str of
 Just (t, rest) -> go (Single t) rest
 Nothing -> error "expected term"
 where go :: Poly -> String -> Maybe (Poly, String)
 go p ('+':str1) = case term str1 of
 Just (t', str2) -> go (Plus p t') str2
 Nothing -> error "expected term"
 go p ('-':str1) = case term str1 of
 Just (t', str3) -> go (Minus p t') str3
 Nothing -> error "expected term"
 go p str4 = Just (p, str4)

With the above code, you get:

> poly "34*x-4"
Just (Minus (Single (TermPoly 34 'x')) (TermConstant 4),"")
> poly "34*x+92*x-3*x"
Just (Minus (Plus (Single (TermPoly 34 'x')) (TermPoly 92 'x')) (TermPoly 3 'x'),"")
> poly "x+y"
Just (Plus (Single (TermPoly 1 'x')) (TermPoly 1 'y'),"")

It doesn't handle a leading minus sign, though:

> poly "-2*x+y"
*** Exception: expected term
CallStack (from HasCallStack):
 error, called at PolyParser.hs:27:20 in main:Main

nor does it handle x*5 or 2*5*x or parentheses or spaces or lots of other cases.

Also, the biggest drawback of this approach is how complicated it is to combine simple component parsers into larger parsers. For example, term is really conceptually very simple: parse either a number times a var, or a lone number or lone var, but we have all these complicated case statements to combine number and var and parse the '*' character. This is where "parser combinators" or "monadic parsers" become important, as they provide a straightforward, simple syntax for combining parsers.

This is also why, realistically, no experienced Haskell programmer writes parsers from scratch. There are many excellent and powerful monadic parsing libraries available. They take some time to learn to use but are well worth it. A reasonably full-featured Megaparsec parser for an expression language takes about 30 minutes to write in 60-odd lines and can handle complex expressions (like the example in the main function below):

import Data.Void
import Text.Megaparsec
import Text.Megaparsec.Char
import qualified Text.Megaparsec.Char.Lexer as L
import Control.Monad.Combinators.Expr
type Parser = Parsec Void String
-- standard boilerplate for space handling
spaces :: Parser ()
spaces = L.space space1 empty empty
lexeme :: Parser a -> Parser a
lexeme = L.lexeme spaces
symbol :: String -> Parser String
symbol = L.symbol spaces
-- parse a number
number :: Parser Float
number = lexeme $ try L.float <|> fromIntegral <$> L.decimal
-- parse an identifier (e.g., "x_4")
identifier :: Parser String
identifier = lexeme $ (:) <$> (letterChar <|> char '_') <*> many (alphaNumChar <|> char '_')
-- abstract syntax tree for expressions
data Expr
 = Num Float
 | Var String
 | Negate Expr
 | BinOp BinOp Expr Expr
 deriving (Show)
data BinOp
 = Add | Sub | Mult | Div | Power
 deriving (Show)
-- parse a "term": number, identifier, or expression in parentheses
term :: Parser Expr
term = Num <$> number
 <|> Var <$> identifier
 <|> between (symbol "(") (symbol ")") expr
-- parse an expression combining terms with operators
expr :: Parser Expr
expr = makeExprParser term
 [ [binaryR "^" Power]
 , [Prefix (Negate <$ symbol "-")]
 , [binary "*" Mult, binary "/" Div]
 , [binary "+" Add, binary "-" Sub]
 ]
 where binary name f = InfixL (BinOp f <$ symbol name)
 binaryR name f = InfixR (BinOp f <$ symbol name)
-- parse a whole string as an expression
fullExpr :: Parser Expr
fullExpr = spaces *> expr <* eof
main = parseTest fullExpr "(pi*r^2 - 4*pi*r) / (c^2 - a^2 - b^2)"

As I say, getting to the point where you can write parsers like this takes quite a bit of time, but it's worth working through some tutorials, even if you aren't yet at the point where you'd be able to write parsers all on your own. Some resources that might be helpful to get started:

Jake Wheat's Intro to Parsing with Parsec in Haskell
Write Yourself a Scheme in 48 Hours
Hutton and Meijer's Monadic Parsing in Haskell
This Reddit thread from about a year ago with links to some hopefully up-to-date tutorials

CollectivesTM on Stack Overflow

Finding the first most digit number in a string with Haskell. FUNCTION PARSER

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related