PureScript breadth-first search implementation

Question 1

For context, I wrote a function, get_servers, for the game Bitburner which is usually played by writing JavaScript code. Instead, I used PureScript to write the function which is an implementation of something like the breadth-first search algorithm to get a list of all the servers in the game:

module Main (main) where
import Data.Array
import Data.Function.Uncurried
import Data.Maybe
import Effect
import Effect.Aff
import Effect.Class
import Effect.Uncurried
import Control.Promise
import Prelude
type NetscriptEnvironment = {
 exit :: EffectFn1 Unit Unit,
 getHostname :: Fn0 String,
 scan :: Fn1 String (Array String),
 tprint :: forall a. EffectFn1 a Unit
}
get_servers :: NetscriptEnvironment -> Array String
get_servers ns = sort $ get_servers' [] [runFn0 ns.getHostname] where
 get_servers' :: Array String -> Array String -> Array String
 get_servers' scanned unscanned =
 if null unscanned then
 scanned
 else
 get_servers' (concat [scanned, unscanned]) $ get_new unscanned where
 get_new :: Array String -> Array String
 get_new = get_new' [] where
 get_new' :: Array String -> Array String -> Array String
 get_new' accumulator servers =
 case head servers of
 Nothing -> accumulator
 Just head -> let
 get_new'' :: Array String -> String -> Array String
 get_new'' accumulator' server =
 if elem server accumulator' || elem server unscanned || elem server scanned then
 accumulator'
 else
 snoc accumulator' server
 in
 case tail servers of
 Nothing -> get_new' (foldl get_new'' accumulator $ runFn1 ns.scan head) []
 Just tail -> get_new' (foldl get_new'' accumulator $ runFn1 ns.scan head) tail
main :: EffectFn1 NetscriptEnvironment (Promise Unit)
main = let
 main' :: NetscriptEnvironment -> Effect (Promise Unit)
 main' ns = let
 exit :: Aff Unit
 exit =
 liftEffect $ do
 runEffectFn1 ns.exit unit
 tprint :: forall a. a -> Aff Unit
 tprint input =
 liftEffect $ do
 runEffectFn1 ns.tprint input
 pure unit
 in
 fromAff $ liftEffect $ launchAff_ do
 tprint $ get_servers ns
 exit
 in
 mkEffectFn1 main'

I'm still somewhat new to writing PureScript/Haskell-like code and so would appreciate some pointers on how I might refactor the above.

In particular, I don't like these about the current implementation:

It seems like there's a lot of right-ward drift. Maybe this is normal with regards to PS/Haskell-like syntax, but I'm not sure.
In lines 43 and 44, I called get_new' essentially twice with almost the same arguments.
I also feel like I'm maybe not leveraging existing types/functions in the standard libraries and functional programming techniques (maybe point-free style?) to make the code more concise.

So I want to know I might refactor the code such that:

There's not so much right-ward drift.
There's less redundancies and the code is more concise in general. I'm thinking I could maybe accomplish this by leveraging some kind of standard library types or mapping/binding functions, but I'm not too familiar with the standard library and would appreciate recommendations on which types and functions I should familiarise myself with (with the aim of both improving the code above and my ability to write future PureScript code in general).
It's more idiomatic according to the common best practices of writing PureScript code. I'd also appreciate some reasoning for any changes you might suggest to help me understand them better.

I also want to know:

Is it fine when inter-operating with JS code to work with the standard JS data-types, or is it better to convert to some of the other PS types (e.g., from Array to List or Sequence), manipulate those, then convert back to JS types when needed? If so, when should the latter be done?
Aside from the standard library, are there other convenience libraries/tools (e.g. for error-handling, testing, linting, formatting, etc.) that I should look into using when working with PS code in general?

Question 2

First, on rightward drift: you don't have to have every next function nested under the previous one, especially since none of them actually need access to the enclosing function's parameters. You can have everything bound under the same where. Except that get_new'' wants access to scanned and unscanned, so it has to be bound inside get_servers' unfortunately. But everything else can be flat:

get_servers :: NetscriptEnvironment -> Array String
get_servers ns = sort $ get_servers' [] [runFn0 ns.getHostname] 
 where
 get_servers' :: Array String -> Array String -> Array String
 get_servers' scanned unscanned =
 if null unscanned then
 scanned
 else
 get_servers' (concat [scanned, unscanned]) $ get_new unscanned
 where
 get_new :: Array String -> Array String
 get_new = get_new' []
 get_new' :: Array String -> Array String -> Array String
 get_new' accumulator servers =
 case head servers of
 Nothing -> accumulator
 Just head ->
 case tail servers of
 Nothing -> get_new' (foldl get_new'' accumulator $ runFn1 ns.scan head) []
 Just tail -> get_new' (foldl get_new'' accumulator $ runFn1 ns.scan head) tail
 get_new'' :: Array String -> String -> Array String
 get_new'' accumulator' server =
 if elem server accumulator' || elem server unscanned || elem server scanned then
 accumulator'
 else
 snoc accumulator' server

Second, instead of functions like head and null, you can use pattern matching. It's looks better, and a bit faster too:

 get_servers' :: Array String -> Array String -> Array String
 get_servers' scanned [] = scanned
 get_servers' scanned unscanned =
 get_servers' (scanned <> unscanned) (get_new unscanned)

Also note that I used operator <> instead of concat [] to glue the two arrays together.

Third, the difference between the two foldl lines seems to be just that when the tail servers == Nothing, you turn it into an empty array, and otherwise you use the tail itself. Well, you can encode that idea directly with fromMaybe:

 case head servers of
 Nothing -> accumulator
 Just head ->
 get_new' (foldl get_new'' accumulator $ runFn1 ns.scan head) (fromMaybe [] $ tail servers)

Fourth, get_new seems to be called only once, so there is really no reason to keep it around just so you can pass an empty list to get_new':

 get_servers' scanned unscanned =
 get_servers' (scanned <> unscanned) (get_new' [] unscanned)

And finally, I think you have structured the whole get_new logic in a very imperative style - essentially updating arrays as you go along. As far as I understand the idea, you should just (1) scan all unscanned servers, (2) concatenate all their results together, and then (3) subtract the already seen servers. The result of that should be newly discovered servers.

get_new :: Array String -> Array String
get_new servers = concatMap (runFn1 ns.scan) servers \\ (scanned <> unscanned)

So your whole program becomes just this:

get_servers :: NetscriptEnvironment -> Array String
get_servers ns = sort $ get_servers' [] [runFn0 ns.getHostname] 
 where
 get_servers' scanned [] = scanned
 get_servers' scanned unscanned = 
 get_servers' (scanned <> unscanned) $ 
 concatMap (runFn1 ns.scan) unscanned \\ (scanned <> unscanned)

Also, a few stylistic notes, which are generally accepted in the community, but ultimately a question of personal choice:

where- and let-bound variables don't need type signatures. You can just omit them.
Identifiers in PureScript are commonly camelCase, not snake_case. So getNew instead of get_new.
Fn1 and EffectFn1 are better kept at the very edges of FFI. Don't pass them around. Convert to normal function before passing.

Is it fine when inter-operating with JS code to work with the standard JS data-types, or is it better to convert to some of the other PS types (e.g., from Array to List or Sequence), manipulate those, then convert back to JS types when needed? If so, when should the latter be done?

Actually, Array is the default choice in PureScript, not List. List's advantage over Array is linear cons/uncons operations, so you would generally use it when that matters for your algorithm. Similarly with Sequence. Otherwise - Array is the default choice, because it's default on the underlying platform, and therefore is optimized the hell out of.

In general, as long as the JS type is "well-behaved" (i.e. doesn't have multi-typed or optional fields or some such), it's totally ok to use it from PureScript. That is, in fact, one of the selling points: zero-cost FFI.

Question 3

Thanks, this was exactly the kind of answer I was after! Regarding get_new though, I'm not sure your new version would work correctly if the scan output has nodes that share the same child nodes. I think it could cause duplicates in the final output, right? Wouldn't you need to either use nub to remove those (maybe before the sort), or use foldl instead of concatMap to track scan results and prevent duplicates from being added (this was what my version did). Also, is concatMap just an infix version of >>=, or are there other differences between them?

Question 4

Sorry, I meant prefix instead of infix.

Question 5

Yes, you're right: you do need to nub after the whole process to remove duplicates. Or, better yet, nub after each wave of scans. Less nubbing that way. And yes, concatMap is the same as >>=, but it's more understandable what's going on this way.

Question 6

About using pattern matching in lieu of head, how would that work exactly? I thought I could only pattern match on the head and tail of List, but not of Array, since the latter doesn't use Nil and Cons constructors and instead maps to native JS arrays? And why would nubbing after each wave of scans result in less nubbing relative to nubbing once after the whole process?

Question 7

For arrays PureScript has special support - for both constructing values and pattern matching. Nubbing after each waves ought to result is smaller nubs, because each wave is presumably smaller than the total list of servers. Though I guess I could be wrong on that point. Depends on the network topology.

Fyodor Soikin Fyodor Soikin 7393 silver badges7 bronze badges · Accepted Answer · 2021-06-29 16:03:17Z

First, on rightward drift: you don't have to have every next function nested under the previous one, especially since none of them actually need access to the enclosing function's parameters. You can have everything bound under the same where. Except that get_new'' wants access to scanned and unscanned, so it has to be bound inside get_servers' unfortunately. But everything else can be flat:

get_servers :: NetscriptEnvironment -> Array String
get_servers ns = sort $ get_servers' [] [runFn0 ns.getHostname] 
 where
 get_servers' :: Array String -> Array String -> Array String
 get_servers' scanned unscanned =
 if null unscanned then
 scanned
 else
 get_servers' (concat [scanned, unscanned]) $ get_new unscanned
 where
 get_new :: Array String -> Array String
 get_new = get_new' []
 get_new' :: Array String -> Array String -> Array String
 get_new' accumulator servers =
 case head servers of
 Nothing -> accumulator
 Just head ->
 case tail servers of
 Nothing -> get_new' (foldl get_new'' accumulator $ runFn1 ns.scan head) []
 Just tail -> get_new' (foldl get_new'' accumulator $ runFn1 ns.scan head) tail
 get_new'' :: Array String -> String -> Array String
 get_new'' accumulator' server =
 if elem server accumulator' || elem server unscanned || elem server scanned then
 accumulator'
 else
 snoc accumulator' server

Second, instead of functions like head and null, you can use pattern matching. It's looks better, and a bit faster too:

 get_servers' :: Array String -> Array String -> Array String
 get_servers' scanned [] = scanned
 get_servers' scanned unscanned =
 get_servers' (scanned <> unscanned) (get_new unscanned)

Also note that I used operator <> instead of concat [] to glue the two arrays together.

Third, the difference between the two foldl lines seems to be just that when the tail servers == Nothing, you turn it into an empty array, and otherwise you use the tail itself. Well, you can encode that idea directly with fromMaybe:

 case head servers of
 Nothing -> accumulator
 Just head ->
 get_new' (foldl get_new'' accumulator $ runFn1 ns.scan head) (fromMaybe [] $ tail servers)

Fourth, get_new seems to be called only once, so there is really no reason to keep it around just so you can pass an empty list to get_new':

 get_servers' scanned unscanned =
 get_servers' (scanned <> unscanned) (get_new' [] unscanned)

And finally, I think you have structured the whole get_new logic in a very imperative style - essentially updating arrays as you go along. As far as I understand the idea, you should just (1) scan all unscanned servers, (2) concatenate all their results together, and then (3) subtract the already seen servers. The result of that should be newly discovered servers.

get_new :: Array String -> Array String
get_new servers = concatMap (runFn1 ns.scan) servers \\ (scanned <> unscanned)

So your whole program becomes just this:

get_servers :: NetscriptEnvironment -> Array String
get_servers ns = sort $ get_servers' [] [runFn0 ns.getHostname] 
 where
 get_servers' scanned [] = scanned
 get_servers' scanned unscanned = 
 get_servers' (scanned <> unscanned) $ 
 concatMap (runFn1 ns.scan) unscanned \\ (scanned <> unscanned)

Also, a few stylistic notes, which are generally accepted in the community, but ultimately a question of personal choice:

where- and let-bound variables don't need type signatures. You can just omit them.
Identifiers in PureScript are commonly camelCase, not snake_case. So getNew instead of get_new.
Fn1 and EffectFn1 are better kept at the very edges of FFI. Don't pass them around. Convert to normal function before passing.

Is it fine when inter-operating with JS code to work with the standard JS data-types, or is it better to convert to some of the other PS types (e.g., from Array to List or Sequence), manipulate those, then convert back to JS types when needed? If so, when should the latter be done?

Actually, Array is the default choice in PureScript, not List. List's advantage over Array is linear cons/uncons operations, so you would generally use it when that matters for your algorithm. Similarly with Sequence. Otherwise - Array is the default choice, because it's default on the underlying platform, and therefore is optimized the hell out of.

In general, as long as the JS type is "well-behaved" (i.e. doesn't have multi-typed or optional fields or some such), it's totally ok to use it from PureScript. That is, in fact, one of the selling points: zero-cost FFI.

Thanks, this was exactly the kind of answer I was after! Regarding get_new though, I'm not sure your new version would work correctly if the scan output has nodes that share the same child nodes. I think it could cause duplicates in the final output, right? Wouldn't you need to either use nub to remove those (maybe before the sort), or use foldl instead of concatMap to track scan results and prevent duplicates from being added (this was what my version did). Also, is concatMap just an infix version of >>=, or are there other differences between them?
Yes, you're right: you do need to nub after the whole process to remove duplicates. Or, better yet, nub after each wave of scans. Less nubbing that way. And yes, concatMap is the same as >>=, but it's more understandable what's going on this way.
About using pattern matching in lieu of head, how would that work exactly? I thought I could only pattern match on the head and tail of List, but not of Array, since the latter doesn't use Nil and Cons constructors and instead maps to native JS arrays? And why would nubbing after each wave of scans result in less nubbing relative to nubbing once after the whole process?
For arrays PureScript has special support - for both constructing values and pattern matching. Nubbing after each waves ought to result is smaller nubs, because each wave is presumably smaller than the total list of servers. Though I guess I could be wrong on that point. Depends on the network topology.

Stack Exchange Network

PureScript breadth-first search implementation

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

PureScript breadth-first search implementation

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions