I'm a newbie with Haskell and I'm trying to learn the language and its idioms while working through a small data analysis project. The first step of my analysis is to take a very long list of data, generate the set of ngrams from that list, then create a histogram from the generated set of ngrams.
This approach seems to work nicely:
import Data.List
import Control.Arrow
histogram :: Ord a => [a] -> [(a,Int)]
histogram = map (head &&& length) . group . sort
ngrams :: Eq a => Int -> [a] -> [[a]]
ngrams n xs
| nx == xs = [xs]
| otherwise = [nx] ++ (ngrams n (tail xs))
where nx = take n xs
The next step is divide each ngram frequency by the total number of ngrams and then take the inverse of each of these values.
(//) a b = fromIntegral a / fromIntegral b
mapOverSnd :: (a -> b) -> [(c,a)] -> [(c,b)]
mapOverSnd f = map (fst &&& f . snd)
...
let myData = (long list of values)
let my3grams = ngrams 3 myData
let freq3grams = histogram my3grams
let novelty3grams = mapOverSnd ((1/) . (//(length freq3grams))) freq3grams
My first question is.. is this a reasonable way to be doing this? Is there a more idiomatic way to accomplish mapOverSnd? I don't yet have much dexterity wielding higher order structures to accomplish my tasks.
Also, I'd like to sort the resultant structure: first by the value stored in snd, then by the value stored in fst. I'd like the primary sort to be descending and the secondary sort to be ascending. This works:
my2TupleOrdering :: (Ord a, Ord b) => (a,b) -> (a,b) -> Ordering
my2TupleOrdering (a1,b1) (a2,b2)
| b1 < b2 = GT
| b1 > b2 = LT
| a1 < a2 = LT
| a1 > a2 = GT
| otherwise = EQ
however, I was wondering if there is a more idiomatic way to say "sort this 2-tuple, first by snd descending, then by fst ascending."
I realize these are relatively minor questions, but I'm interested in exploring the expressive power of Haskell and figured 'code review' might be the correct forum. Thanks!
1 Answer 1
Mapping over second
It is surprising that you know about &&&
, but not second
, also from Arrows:
mapOverSnd f = map (second f)
second
is of type Arrow a => a b c -> a (d, b) (d, c)
, which boils down to (b->c) -> (d,b) -> (d,c)
here.
Alternatively, you can take advantage that pairs are a Functor
instance: `fmap (+1) ("...",10) = ("...",11)
mapOverSnd f = map (fmap f)
Definitions can be infix
You wrote (//) a b =
, but definitions can be infix, too:
a // b = fromIntegral a / fromIntegral b
your2TupleOrdering
2-tuples are simply called pairs. sort
needs a function like compare
, so define yours in terms of compare
. Take advantage of Ordering
being an instance of Monoid
:
myPairOrdering (a1,b1) (a2,b2) = compare a1 a2 `mappend` compare b2 b1
The Monoid
instance of Ordering
is:
mempty = EQ
EQ `mappend` y = y -- only if left side equals, use right
x `mappend` _ = x -- shortcut otherwise
ngrams
Any solution to ngrams
will be ugly because you don't want the very last elements which are below the minimum length. You are forcing an Eq
constraint, which is not necessary. Avoid by comparing the length
. Another solution is
ngrams' n = filter ((==n).length) . map (take n) . tails
The filter
part here is ugly.
The let-chain
You can condense the let
-chain with composition, cutting out freq3grams
and my3grams
. At some point in you haskell journey, you will do that. When you grow wiser, you will value readability over succinctness. Long story short: This part is OK.
-
\$\begingroup\$ Great feedback. Thank you so much for taking the time to walk through this. \$\endgroup\$El Senor– El Senor2016年03月04日 22:14:53 +00:00Commented Mar 4, 2016 at 22:14