I've written a date parser that parses different types of dates using Parsec.
Credit for the original problem goes to here.
The following formats are as such:
yyyy-mm-dd
mm/dd/yy
mm#yy#dd
dd*mm*yyyy
(month word) dd, yy
(month word) dd, yyyy
(month word)
can be: Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Note if is yyyy
it is a full 4 digit year. If it is yy
then it is only the last 2 digits of the year. Years only go between 1950-2049.
module Main where
import System.Environment
import Text.Printf
import Text.ParserCombinators.Parsec
type Day = Int
type Month = Int
type Year = Int
data Date = Date Year Month Day
instance Show Date where
show (Date year month day) =
printf "%04d-%02d-%02d" year month day
parseDay :: Parser Day
parseDay = do
digits <- count 2 digit
let day = read digits
if day < 0 || day > 31
then fail $ printf "day %02d not in range" day
else return day
parseMonth :: Parser Month
parseMonth = do
digits <- count 2 digit
let month = read digits
if month < 0 || month > 12
then fail $ printf "month %02d not in range" month
else return month
wordMonths :: [String]
wordMonths = [
"Jan",
"Feb",
"Mar",
"Apr",
"May",
"Jun",
"Jul",
"Aug",
"Sep",
"Oct",
"Nov",
"Dec" ]
parseWordMonth :: Parser Month
parseWordMonth = choice $ map try parsers
where
parseWordMonth :: String -> Month -> Parser Month
parseWordMonth word month = do
string word
return month
parsers :: [Parser Month]
parsers = zipWith parseWordMonth wordMonths [1..12]
parseYear :: Parser Year
parseYear = do
digits <- count 4 digit
return $ read digits
parseTwoYear :: Parser Year
parseTwoYear = do
digits <- count 2 digit
let year = read digits
return $ if year > 49
then 1900 + year
else 2000 + year
parseDate :: Parser Date
parseDate = do
date <- choice $ map try [parseISODate, parseNiceDate, parseWeirdDate, parseWTFDate, parseWordyDate]
eof
return date
where
-- yyyy-mm-dd
parseISODate :: Parser Date
parseISODate = do
year <- parseYear
char '-'
month <- parseMonth
char '-'
day <- parseDay
return $ Date year month day
-- mm/dd/yy
parseNiceDate :: Parser Date
parseNiceDate = do
month <- parseMonth
char '/'
day <- parseDay
char '/'
year <- parseTwoYear
return $ Date year month day
-- mm#yy#dd
parseWeirdDate :: Parser Date
parseWeirdDate = do
month <- parseMonth
char '#'
year <- parseTwoYear
char '#'
day <- parseDay
return $ Date year month day
-- dd*mm*yyyy
parseWTFDate :: Parser Date
parseWTFDate = do
day <- parseDay
char '*'
month <- parseMonth
char '*'
year <- parseYear
return $ Date year month day
-- (month word) dd, yyyy
-- (month word) dd, yy
parseWordyDate :: Parser Date
parseWordyDate = do
month <- parseWordMonth
char ' '
day <- parseDay
string ", "
year <- try parseYear <|> parseTwoYear
return $ Date year month day
readDate :: String -> String
readDate xs = case parse parseDate "date" xs of
Left err -> show xs ++ ": " ++ show err
Right date -> show date
main :: IO ()
main = do
contents <- getContents
let dates = filter (/= "") (lines contents)
mapM_ putStrLn $ map readDate dates
This is already quite polished but I wonder if more can be done to reduce the repetition. I'm looking for general advice on improving this parser (error handling, use of Parsec types, etc.). Perhaps there's a way to specify some sort of Date expression that could make this neater.
1 Answer 1
One interesting thing you could do is create the higher level date parser parseDate using the alternative instance on Parser with:
parseDate :: Parser Date
parseDate = do
date <- parseISODate <|> parseNiceDate <|> parseWeirdDate <|> parseWTFDate <|> parseWordyDate
eof
return date
This really just makes it easy to read the code and is the general solution to combining monadic parsers.
The other way to do it without creating your own parser is to use something like the time package: http://hackage.haskell.org/package/time-1.5/docs/Data-Time-Format.html
Here you can define the parse strings:
yyyy-mm-dd => "%Y-%m-%d"
mm/dd/yy => "%m/%d/%y"
mm#yy#dd => "%m#%y#%d"
dd*mm*yyyy => "%d*%m*%Y"
(month word) dd, yy => "%b %d, %y"
(month word) dd, yyyy => "%b %d, %Y"
and then parse them in a similar way
dateFormats = ["%Y-%m-%d"
,"%m/%d/%y"
,"%m#%y#%d"
,"%d*%m*%Y"
,"%b %d, %y"
,"%b %d, %Y"]
parseDate :: String -> Maybe Day
parseDate inp = choice $ map (\fmt -> parseTimeM True defaultTimeLocale fmt inp) dateFormats
and then you can print the day using the function in Data.Time.Format
printIso8601 :: Day -> IO ()
printIso8601 d = putStrLn $ formatTime defaultTimeLocale fmt d
where fmt = iso8601DateFormat Nothing
If it wasn't something like time that has been done multiple times I think Parsec is the right answer for parsing and you use it well.
It definitely looks good to me. I'm not a fan of printf but that's just a preference.
-
\$\begingroup\$ I've found that the parseDate function won't work that way because it doesn't backtrack when something goes wrong. As for the time package, thanks for your suggestion! I'll reimplement the parser using that. \$\endgroup\$wei2912– wei29122014年11月19日 05:03:25 +00:00Commented Nov 19, 2014 at 5:03