I'm trying to learn F# by creating a little web scraper that will do custom scraping based on the url domain. For this, I need to create and select the correct kind of scraper. I figure I would use a Factory that would determine the correct kind of scraper for me.
So here is the interface for my scraper
type IExtractor =
/// determines if the extractor will work for the url
/// returns true if the extrator can handle the url
abstract member Suitable : string -> bool
/// name of the extractor
abstract ExtractorName : string with get
Here is the instance of one of the scrapers
type CustomExtractor1() =
/// <summary>
/// tests whether the input url is valid
/// </summary>
/// <param name="url"></param>
let (|ValidUrl|_|) url =
let regexExpression = "^http(s)?://www\.customsite\.com.*\?fd="
// regex helper that returns string option
Common.UrlMatch regexExpression url
interface IExtractor with
member this.Suitable url =
match url with
| ValidUrl url -> true
| _ -> false
member this.ExtractorName with get() = "Custom extractor"
Here is the Factory
module Extractor =
let private _extractors: IExtractor list = [new CustomExtractor1(); new CustomExtractor2()]
let create url =
let isSuitable (input: string) (ex:IExtractor) = ex.Suitable input
let one (input:string) (ex:IExtractor) = true
match (_extractors |> List.tryFind (isSuitable url) ) with
| Some ex -> Some ex
| _ -> None
Here is the usage
let input = "https://www.customsite.com?fd=cWz5d"
let extractor = Extractor.create input
let extractorOption =
match Extractor.create input with
| Some ex -> Some ex
| _ ->
(
raise (Error ("no extractor found. Exiting"))
)
let extractor = extractorOption.Value
// do stuff with extractor
printfn "%s" extractor.ExtractorName
The flow for this usage somehow feels more like C#, so it seems a little off. Especially in the usage where I'm using extractorOption.Value
. I am currently investigating using ROP instead of exception handling for better control flow.
What do you think — is there a way to improve this?
1 Answer 1
Instead of defining the IExtractor
interface, wouldn't it be simpler with a function?
In this question, I assume that ExtractorName
is a stand-in for something you'd really want the extractor to do, because otherwise, as the OP is stated, the extractor doesn't really do anything. In this case, an extractor is nothing by a string
, but I assume that in reality, it'd be a function.
This would enable you to make the Extractor
module simpler:
module Extractor =
let private fooExtractor url =
if url = "foo"
then Some "Foo Extractor" // Stand-in for actual implementation
else None
let private barExtractor url =
if url = "bar"
then Some "Bar Extractor" // Stand-in for actual implementation
else None
let private extractors = [fooExtractor; barExtractor]
let create url =
extractors
|> List.choose (fun candidate -> candidate url)
|> List.tryHead
Each element of extractors
is a function with the type string -> string option
. In order to be useful, the return value should probably be another function that you can subsequently call; something like string -> ('a -> 'b) option
.
Note the use of List.choose
to pick only those functions that return Some 'a
. Instead of the object-oriented Try/Parse pattern, you can indicate match success or failure with an option value.
Usage can be simplified as well:
let input = "https://www.customsite.com?fd=cWz5d"
match Extractor.create input with
| Some x ->
// do stuff with extractor
printfn "%s" x
| None -> printfn "%s" "No extractor found; exiting."
Here, we're simply printing the name of the extractor, since x
is a string
, but if you imagine that x
was instead a function, you'd be able to call it.
-
\$\begingroup\$ hit enter too fast I like it! Yes, the ExtractorName was one of the operations an extractor can perform. In one case, I want to have a pretty name for the extractor to print out on console. I don't need the interface to enforce a specific signature. The compiler will complain if the signature doesn't match. Ultimately I was envisioning the extractor would have a couple of methods on it. If I wanted the extractor to have richer functionality, wouldn't making it a function limit how much I can extend it? \$\endgroup\$ceiling cat– ceiling cat2016年02月02日 21:44:52 +00:00Commented Feb 2, 2016 at 21:44