1
\$\begingroup\$

I've written a date parser that parses different types of dates using Parsec.

Credit for the original problem goes to here.

The following formats are as such:

yyyy-mm-dd
mm/dd/yy
mm#yy#dd
dd*mm*yyyy
(month word) dd, yy
(month word) dd, yyyy

(month word) can be: Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

Note if is yyyy it is a full 4 digit year. If it is yy then it is only the last 2 digits of the year. Years only go between 1950-2049.

module Main where
import System.Environment
import Text.Printf
import Text.ParserCombinators.Parsec
type Day = Int
type Month = Int
type Year = Int
data Date = Date Year Month Day
instance Show Date where
 show (Date year month day) =
 printf "%04d-%02d-%02d" year month day
parseDay :: Parser Day
parseDay = do
 digits <- count 2 digit
 let day = read digits
 if day < 0 || day > 31
 then fail $ printf "day %02d not in range" day
 else return day
parseMonth :: Parser Month
parseMonth = do
 digits <- count 2 digit
 let month = read digits
 if month < 0 || month > 12
 then fail $ printf "month %02d not in range" month
 else return month
wordMonths :: [String]
wordMonths = [
 "Jan",
 "Feb",
 "Mar",
 "Apr",
 "May",
 "Jun",
 "Jul",
 "Aug",
 "Sep",
 "Oct",
 "Nov",
 "Dec" ]
parseWordMonth :: Parser Month
parseWordMonth = choice $ map try parsers
 where
 parseWordMonth :: String -> Month -> Parser Month
 parseWordMonth word month = do
 string word
 return month
 parsers :: [Parser Month]
 parsers = zipWith parseWordMonth wordMonths [1..12]
parseYear :: Parser Year
parseYear = do
 digits <- count 4 digit
 return $ read digits
parseTwoYear :: Parser Year
parseTwoYear = do
 digits <- count 2 digit
 let year = read digits
 return $ if year > 49
 then 1900 + year
 else 2000 + year
parseDate :: Parser Date
parseDate = do
 date <- choice $ map try [parseISODate, parseNiceDate, parseWeirdDate, parseWTFDate, parseWordyDate]
 eof
 return date
 where
 -- yyyy-mm-dd
 parseISODate :: Parser Date
 parseISODate = do
 year <- parseYear
 char '-'
 month <- parseMonth
 char '-'
 day <- parseDay
 return $ Date year month day
 -- mm/dd/yy
 parseNiceDate :: Parser Date
 parseNiceDate = do
 month <- parseMonth
 char '/'
 day <- parseDay
 char '/'
 year <- parseTwoYear
 return $ Date year month day
 -- mm#yy#dd
 parseWeirdDate :: Parser Date
 parseWeirdDate = do
 month <- parseMonth
 char '#'
 year <- parseTwoYear
 char '#'
 day <- parseDay
 return $ Date year month day
 -- dd*mm*yyyy
 parseWTFDate :: Parser Date
 parseWTFDate = do
 day <- parseDay
 char '*'
 month <- parseMonth
 char '*'
 year <- parseYear
 return $ Date year month day
 -- (month word) dd, yyyy
 -- (month word) dd, yy
 parseWordyDate :: Parser Date
 parseWordyDate = do
 month <- parseWordMonth
 char ' '
 day <- parseDay
 string ", "
 year <- try parseYear <|> parseTwoYear
 return $ Date year month day
readDate :: String -> String
readDate xs = case parse parseDate "date" xs of
 Left err -> show xs ++ ": " ++ show err
 Right date -> show date
main :: IO ()
main = do
 contents <- getContents
 let dates = filter (/= "") (lines contents)
 mapM_ putStrLn $ map readDate dates

This is already quite polished but I wonder if more can be done to reduce the repetition. I'm looking for general advice on improving this parser (error handling, use of Parsec types, etc.). Perhaps there's a way to specify some sort of Date expression that could make this neater.

Jamal
35.2k13 gold badges134 silver badges238 bronze badges
asked Nov 18, 2014 at 17:10
\$\endgroup\$

1 Answer 1

2
\$\begingroup\$

One interesting thing you could do is create the higher level date parser parseDate using the alternative instance on Parser with:

parseDate :: Parser Date
parseDate = do
 date <- parseISODate <|> parseNiceDate <|> parseWeirdDate <|> parseWTFDate <|> parseWordyDate
 eof
 return date

This really just makes it easy to read the code and is the general solution to combining monadic parsers.

The other way to do it without creating your own parser is to use something like the time package: http://hackage.haskell.org/package/time-1.5/docs/Data-Time-Format.html

Here you can define the parse strings:

yyyy-mm-dd => "%Y-%m-%d"
mm/dd/yy => "%m/%d/%y"
mm#yy#dd => "%m#%y#%d"
dd*mm*yyyy => "%d*%m*%Y"
(month word) dd, yy => "%b %d, %y"
(month word) dd, yyyy => "%b %d, %Y"

and then parse them in a similar way

dateFormats = ["%Y-%m-%d"
 ,"%m/%d/%y"
 ,"%m#%y#%d"
 ,"%d*%m*%Y"
 ,"%b %d, %y"
 ,"%b %d, %Y"]
parseDate :: String -> Maybe Day
parseDate inp = choice $ map (\fmt -> parseTimeM True defaultTimeLocale fmt inp) dateFormats

and then you can print the day using the function in Data.Time.Format

printIso8601 :: Day -> IO ()
printIso8601 d = putStrLn $ formatTime defaultTimeLocale fmt d
 where fmt = iso8601DateFormat Nothing

If it wasn't something like time that has been done multiple times I think Parsec is the right answer for parsing and you use it well.

It definitely looks good to me. I'm not a fan of printf but that's just a preference.

answered Nov 18, 2014 at 18:13
\$\endgroup\$
1
  • \$\begingroup\$ I've found that the parseDate function won't work that way because it doesn't backtrack when something goes wrong. As for the time package, thanks for your suggestion! I'll reimplement the parser using that. \$\endgroup\$ Commented Nov 19, 2014 at 5:03

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.