Kind of a functional programming newbie question here:
I've been reading the transcripts of some of Rich Hickey's talks, and in several of his more well-known ones, he recommends using queues as an alternative to having functions call one another. (E.g. in Design, Composition and Performance and in Simple Made Easy.)
I don't quite understand this, in a number of respects:
Is he talking about putting data in a queue and then having each function use it? So instead of function A calling function B to carry out its own computation, we just have function B slap its output on a queue and then have function A grab it? Or, alternatively, are we talking about putting functions on a queue and then successively applying them to data (surely not, because that would involve massive mutation, right? And also multiplication of queues for multiple-arity functions, or like trees or something?)
How does that make things simpler? My intuition would be that this strategy would create more complexity, because the queue would be a kind of state, and then you have to worry "what if some other function snuck in and put some data on top of the queue?"
One answer to an implementation question on SO suggests that the idea is creating a bunch of different queues. So each function puts its output in its own queue(??). But that also confuses me, because if you're running a function once, then why does it need a queue for its output when you could just take that output and slap a name on it as a (var, atom, entry in a big hash table, whatever). By contrast, if a function is running multiple times, and you stick its output onto a queue, then you've inflicted state on yourself again, and you have to worry about the order in which everything is called, downstream functions get less pure, etc.
Clearly I'm not understanding the point here. Can someone explain a little?
2 Answers 2
It's more of a design exercise than a general recommendation. You aren't usually going to put a queue between all your direct function calls. That would be ridiculous. However, if you don't design your functions as if a queue might be inserted between any of the direct function calls, you cannot justifiably claim you have written reusable and composable code. That's the point Rich Hickey is making.
This is a major reason behind the success of Apache Spark, for example. You write code that looks like it's making direct function calls on local collections, and the framework translates that code into passing messages on queues between cluster nodes. The kind of simple, composable, reusable coding style Rich Hickey advocates makes that possible.
-
But isn't that just a change in the method binding process? At the end of the day, a function call is just a function call, right? What happens after that depends on what the function does. So it seems less about making function calls than it is about how the underlying infrastructure is designed.Robert Harvey– Robert Harvey2016年04月14日 17:16:05 +00:00Commented Apr 14, 2016 at 17:16
-
2To put it another way, what change would you make to your function calls to make them "queue-friendly?"Robert Harvey– Robert Harvey2016年04月14日 17:17:18 +00:00Commented Apr 14, 2016 at 17:17
-
5Remember, the code on the other end of the queue doesn't necessarily have access to the same memory and IO. Queue-friendly functions are free of side effects and expect inputs and produce outputs that are immutable and easily serializable. That's not so easy a test to meet on many codebases.Karl Bielefeldt– Karl Bielefeldt2016年04月14日 17:44:06 +00:00Commented Apr 14, 2016 at 17:44
-
3Ah, so "functional programming friendly" then. Kinda makes sense, since it is Rich Hickey discussing it.Robert Harvey– Robert Harvey2016年04月14日 17:56:47 +00:00Commented Apr 14, 2016 at 17:56
One thing to note is that functional programming allows you to connect functions to each other indirectly through mediator objects that take care of procuring arguments to feed into the functions and flexibly routing their results to recipients that want their results. So suppose you have some straightforward direct calling code that looks like this example, in Haskell:
myThing :: A -> B -> C
myThing a b = f a (g a b)
Well, using Haskell's Applicative
class and its <$>
and <*>
operators we can mechanically rewrite that code to this:
myThing :: Applicative f => f A -> f B -> f C
myThing a b = f <$> a <*> (g <$> a <*> b)
...where now myThing
is no longer directly calling f
and g
, but rather connecting them through some mediators of type f
. So for example, f
could be some Stream
type provided by a library that provides an interface to a queueing system, in which case we'd have this type:
myThing :: Stream A -> Stream B -> Stream C
myThing a b = f <$> a <*> (g <$> a <*> b)
Systems like this do exist. In fact, you can look at Java 8 streams as a version of this paradigm. You get code like this:
List<Integer> transactionsIds =
transactions.parallelStream()
.filter(t -> t.getType() == Transaction.GROCERY)
.sorted(comparing(Transaction::getValue).reversed())
.map(Transaction::getId)
.collect(toList());
Here you are using the following functions:
t -> t.getType() == Transaction.GROCERY
comparing(Transaction::getValue).reversed()
Transaction::getId
toList()
...and instead of having them call each other directly, you're using the Stream
system to mediate between them. This code example isn't calling the Transaction::getId
function directly—the Stream
is calling it with the transactions that survived the earlier filter
. You can think of the Stream
as a very minimal sort of queue that couples functions indirectly and routes values between them.
Job
object, push that into a queue, and have one or more worker threads work on that queue. TheJob
then dispatches moreJob
s into the queue upon completion. Return values are replaced by callbacks in that concept. It's a nightmare to debug and verify as you lack a call stack, and efficient and flexible for the very same reason.