Follow Up to my earlier question.
This is an RSS Parser intended only to parse the feed Wordpress provides (and even then, only what I actually want to display on my website). System.Linq.Xml
and System.ServiceModel.Syndication
are unavailable because I'm targeting .Net Core instead of the full framework.
I think I've achieved a more idiomatic implementation this time. Where can I improve? Are there opportunities to compose higher order functions that I'm missing?
Rss.fs
module Rss
open System
type Item = {
Title: string
Link: Uri
Comments: Uri
PublishDate: DateTimeOffset
Description: string
}
type Channel = {
Title: string
Link: Uri
Description: string
LastBuildDate: DateTimeOffset
Items: seq<Item>
}
type RssFeed = {
Version: double
Channel: Channel
}
open System.Xml
let parse input =
// Helper Functions
let innerText (elementName: string) (node: XmlNode) =
let element = node.Item elementName
element.InnerText
let titleText node =
node |> innerText "title"
let descriptionText node =
node |> innerText "description"
let elementAsDateTimeOffset elementName node =
node |> innerText elementName |> DateTimeOffset.Parse
let elementAsUri elementName node =
node |> innerText elementName |> Uri
let linkUri node =
node |> elementAsUri "link"
let itemNodeToItem node = {
Title = node |> titleText
Link = node |> linkUri
Comments = node |> elementAsUri "comments"
PublishDate = node |> elementAsDateTimeOffset "pubDate"
Description = node |> descriptionText
}
// Load the document and begin parsing
let document = XmlDocument()
document.LoadXml input
let rss = document.DocumentElement
let channelNode = rss.FirstChild
let items =
channelNode.ChildNodes
|> Seq.cast<XmlNode>
|> Seq.filter (fun node -> node.Name.Equals "item")
|> Seq.map itemNodeToItem
let channel = {
Title = channelNode |> titleText
Link = channelNode |> linkUri
Description = channelNode |> descriptionText
LastBuildDate = channelNode |> elementAsDateTimeOffset "lastBuildDate"
Items = items
}
let version =
(rss.Attributes.ItemOf "version").Value
|> Double.Parse
let rssFeed = {Version = version; Channel = channel}
rssFeed
Controllers/BlogController.fs
namespace theupsyde.Controllers
open System
open System.Collections.Generic
open System.Linq
open System.Threading.Tasks
open Microsoft.AspNetCore.Mvc
open Rss
open System.Net
type BlogController () =
inherit Controller()
let fetchUrl url = async {
let req = WebRequest.Create(Uri(url))
use! resp = req.GetResponseAsync() |> Async.AwaitTask
use stream = resp.GetResponseStream()
use reader = new IO.StreamReader(stream)
return reader.ReadToEnd()
}
let rssFeed =
fetchUrl "https://christopherjmcclellan.wordpress.com/feed/"
|> Async.RunSynchronously
|> Rss.parse
let recent =
rssFeed.Channel.Items
member this.Index () =
this.View(rssFeed.Channel)
member this.Recent() =
this.PartialView("BlogPosts", recent)
member this.Latest () =
let latest =
recent
|> Seq.head
this.PartialView("BlogPost", latest)
2 Answers 2
The
Rss
helper functions are inside theparse
function, even though they don't use any of the state inside that scope. Consider moving these up to the module level, even though they are not used anywhere else. By doing this the main function is shorter and easier to follow. You could also make those functions private when you pull them out.Seq.cast
can throw an exception. Even though you know that every item will be anXmlNode
in this case, it can be safer to filter out objects that are not of the required type while casting. Unfortunately, F# doesn't have a function for this in theSeq
module but you can always use the one already in .NET:channelNode.ChildNodes |> System.Linq.Enumerable.OfType<XmlNode>
You fetch the RSS feed in the constructor of
BlogController
. This will happen once for each request. UsingAsync.RunSynchronously
will block a thread, so you lose the benefit of usingAsync
in the first place. You should avoidAsync.RunSynchronously
whenever you can. You can pull out the RSS fetching and parsing into another function that returns anasync
. Call this function from each controller method, do more async processing if necessary and then useAsync.StartAsTask
to return a task from the method. ASP.NET will then be able to run your code without unnecessary blocking.
EDIT:
I'm not too familiar with ASP.NET MVC, but based on this issue I think something like this should work:
let fetchRss() = async {
let! content = fetchUrl "https://christopherjmcclellan.wordpress.com/feed/"
return Rss.parse content }
type BlogController() =
inherit Controller()
member this.Index() = async {
let! rss = fetchRss()
return this.View(rss.Channel) }
-
\$\begingroup\$ Thanks for the review. I was struggling with the async and I had a feeling I was blocking... Could I trouble you to show me what you mean? A small example of how to do it right would really help me out. \$\endgroup\$RubberDuck– RubberDuck2017年08月14日 11:01:04 +00:00Commented Aug 14, 2017 at 11:01
-
1\$\begingroup\$ @RubberDuck See my edit \$\endgroup\$TheQuickBrownFox– TheQuickBrownFox2017年08月14日 14:47:17 +00:00Commented Aug 14, 2017 at 14:47
-
\$\begingroup\$ Thanks. That was close. It needs to be something like
member this.Index () = Async.StartAsTask(async {...})
. It's a bit ugly, but properly converts the Async<'a> into a Task<T> and runs everything asynchronously. \$\endgroup\$RubberDuck– RubberDuck2017年08月16日 02:02:37 +00:00Commented Aug 16, 2017 at 2:02
So TheQuickBrownFox is spot on when they mention the potential exception with Seq.Cast<'T>
, and I want to talk about that and introduce a couple new operators to you.
The potential problem lies in the fact that it could be possible for an item to not be castable to XmlNode
, and then we throw exceptions which is usually frowned upon in F#, we would instead want to return an error state.
One of the many operators in F# is the :?
operator, which tests if a value is castable to the specified type. (So x :? XmlNode
will test if x
is an XmlNode
.) We can thing of this as analogous to the is
operator in C#.
So I want to write a castSafe
method which will cast our object list to a type, without throwing an exception. We intend to use this as channelNode.ChildNodes |> Seq.castSafe<XmlNode>
.
The first step is to define the type-structure of our function, which I'll define as IEnumerable -> 'T option seq
. We also want to define what happens if an invalid cast is attempted? I'm going to return a 'T option
, that defines the result. If it's Some 'T
, then we casted successfully, if it's None
, then we didn't.
A basic, lazy implementation might look as follows:
let castSafe<'T> (items:IEnumerable) : 'T option seq =
items
|> Seq.cast<obj>
|> Seq.map (fun node ->
match node with
| :? 'T as result -> Some result
| _ -> None)
So we're immediately casting to obj
to allow the type-system to make it's next move. Since XmlNodeList
doesn't implement IEnumerable<T>
we can't enumerate each item as an XmlNode
: we have to enumerate as an obj
. Then we match
on the type: :? 'T
says "Is node
a 'T
type?" If it's a yes, then we return Some result
which is the node casted as 'T
, if not we return a None
.
Of course this means on the client-side we would have to test for Some node
or None
, which we don't want to do. So we'll also build a castSafeFilter
which will cast to 'T
and remove the None
entires. A naive implementation could look as follows:
let castSafeFilter<'T> (items:IEnumerable) : 'T seq =
items
|> castSafe
|> Seq.filter (fun node ->
match node with
| Some x -> true
| None -> false)
|> Seq.map (fun node -> node.Value)
But we don't want to do that: we're casting then testing for Some
/None
, but there's a more effective manner: we can filter first. Enter the :?>
operator. This will do a direct cast to a type lower in the type hierarchy, that is: this will do a runtime cast to a sub-type. (So if we cast XmlNode -> obj
, this let's us go back to XmlNode
.) I wish this were synonymous with the as
operator in C#, but it's more like an explicit cast: (T)obj
.
Our new function looks something like this:
let castSafeFilter<'T> (items:IEnumerable) : 'T seq =
items
|> Seq.cast<obj>
|> Seq.filter (fun node -> node :? 'T)
|> Seq.map (fun node -> node :?> 'T)
Do the <obj>
cast immediately, but then filter out all non-'T
items then cast.
By doing it like this we eliminate the double-map, and do a single map to 'T
if and only if it is a 'T
. More interesting: we can add these two methods directly to the Seq
module just as if they were extension methods:
module Seq
open System.Collections
let castSafe<'T> (items:IEnumerable) : 'T option seq =
items
|> Seq.cast<obj>
|> Seq.map (fun node ->
match node with
| :? 'T as result -> Some result
| _ -> None)
let castSafeFilter<'T> (items:IEnumerable) : 'T seq =
items
|> Seq.cast<obj>
|> Seq.filter (fun node -> node :? 'T)
|> Seq.map (fun node -> node :?> 'T)
Usually I would omit type-annotations, but when building a generic, multi-purpose module I always keep them. We could define an fsi
file instead that would contain all the signatures, but that's beyond the scope of what I wish to explain here.
Finally, we're back to our code, you had:
let items = channelNode.ChildNodes |> Seq.cast<XmlNode> |> Seq.filter (fun node -> node.Name.Equals "item") |> Seq.map itemNodeToItem
We're going to change that to:
let items =
channelNode.ChildNodes
|> Seq.castSafeFilter<XmlNode>
|> Seq.filter (fun node -> node.Name.Equals "item")
|> Seq.map itemNodeToItem
And bam, now we have safely casted the items to an XmlNode
, and allowed ourselves a great deal of reuse in the future.
And finally, with some extra cleverness, we could even devise a castFilter
method, that would test if our node
is a 'T
, and if so do the filtering at that level to avoid redundant calls. That is:
let castFilter<'T> (filter:'T -> bool) (items:IEnumerable) : 'T seq =
items
|> Seq.cast<obj>
|> Seq.filter (fun node ->
match node with
| :? 'T as result -> result |> filter
| _ -> false)
|> Seq.map (fun node -> node :?> 'T)
And interestingly if we work with result
on the filter at that point, the :? 'T as result
tests if node
is a 'T
, and does the node :?> 'T
cast for us, so we've finally achieved the same functionality as the C# as
keyword.
And as mentioned in comments, using Seq.choose
is another great alternative, which opens us up to a little bit of extraction:
let private castItemSafe<'T> (item:obj) : 'T option =
match item with
| :? 'T as result -> Some result
| _ -> None
let castSafe<'T> (items:IEnumerable) : 'T option seq =
items
|> Seq.cast<obj>
|> Seq.map castItemSafe
let castSafeFilter<'T> (items:IEnumerable) : 'T seq =
items
|> Seq.cast<obj>
|> Seq.choose castItemSafe
Because of the nature of our casting, we really only need to store the function which casts an individual item in one spot, since map
will return all items and choose
will naturally filter them out.
I previously touched on some basic composition stuff, and I want to continue that discussion with some new examples here.
We have opportunity to use some function composition in this code, which lends us to partial application, and allow us to not care about the node parameter to our functions.
For example, we can take your titleText
and descriptionText
functions and simply omit the node
parameter and our functions still work the same, thus leaving them agnostic to what's applied even moreso than they were. But what if we want to omit node from elementAsDateTimeOffset
? Well we can't really do that as it's written, because the pipe-right (|>
) operator expects a value, not a function. Now we can use the compose-right (>>
) operator to eliminate that requirement:
let titleText =
innerText "title"
let descriptionText =
innerText "description"
let elementAsDateTimeOffset elementName =
innerText elementName >> DateTimeOffset.Parse
let elementAsUri elementName =
innerText elementName >> Uri
let linkUri =
elementAsUri "link"
Now here is the part I really like: none of our signatures changed, but we no longer care about the parameter itself in these functions. We've partially applied the functions. For example, on titleText
it is simply applying the "title"
string to the innerText
function (which expects two parameters) and returning a function that just needs an XmlNode
applied to it.
I like to use this more and more in my code as it seems a bit clearer to me in the end. The net result is the same, and there's no significant advantage (that I know of - do note I'm no expert) to using one over the other, but it becomes a preference on the developer's end. (This also lends itself very easily towards the introduction of currying: that is, every function has one and only one parameter, even those with multiple parameters.)
-
\$\begingroup\$ An alternative implementation:
let castSafeFilter<'T> = Seq.map box >> Seq.choose (function :? 'T as x -> Some x | _ -> None)
. But even better:let castSafeFilter<'T> = System.Linq.Enumerable.OfType<'T>
. I think it's more idiomatic to use an existing .NET method instead of re-writing it :) \$\endgroup\$TheQuickBrownFox– TheQuickBrownFox2017年08月14日 17:16:23 +00:00Commented Aug 14, 2017 at 17:16 -
\$\begingroup\$ I'm unlikely to use the safe cast here, as I'm guaranteed that
XmlNodeList
only containsXmlNode
s, but it was highly educational none the less. Thank you. I'll see if I can grok your partial application suggestions later. \$\endgroup\$RubberDuck– RubberDuck2017年08月14日 18:20:28 +00:00Commented Aug 14, 2017 at 18:20 -
\$\begingroup\$ @RubberDuck I don't blame you there - I'm not saying this is the better idea, all I wanted to point at is that it is possible to build something like this that feels natural, allowing us to expand on what the language already does. :) \$\endgroup\$Der Kommissar– Der Kommissar2017年08月15日 21:11:27 +00:00Commented Aug 15, 2017 at 21:11
-
\$\begingroup\$ The partial application thing worked great. It was neat to see how it didn't change the signature of the function. Now I just need to figure out how/when I can leverage partial application in my code in general. You're a great teacher. Thanks again! \$\endgroup\$RubberDuck– RubberDuck2017年08月16日 22:43:32 +00:00Commented Aug 16, 2017 at 22:43