Showing posts with label implementation. Show all posts
Showing posts with label implementation. Show all posts

Wednesday, October 20, 2010

On outreach to applied communities

There's an argument playing out on cstheory (yes, we need a better name) on whether we should encourage or discourage questions like "how can I model X phenomenon theoretically", or "Are applied questions off-topic"

While Scott Aaronson posted an eloquent defense of why we should always entertain such questions, the general tone of the responses has been negative. The main fear has been a 'slippery slope' - that if we allow one modelling question, then the site would be overrun by modelling questions.

I think this sentiment is particularly interesting in the context of David Johnson's post requesting examples where approximation algorithms were deployed usefully in practice (and that he couldn't come up with many good examples)

The truth is that outreach to more applied communities is not about taking your algorithms hammer and beating their heuristic over the head with it. As I've said before, you have to have a DRAMATIC improvement in performance before people will change their standard practices.

It's firstly about communicating a style of asking and answering questions, even if the actual answers are "trivial". A collaboration where you persuade someone to use max flows might not be theoretically very splashy, but it has a huge impact in how people think about algorithms, theory and problem solving.

Doing this first prepares the ground for future collaborations. If you want someone to use your new and snazzy approximation for vertex cover, you have to first convince them that vertex cover makes sense, and before that you have to convince them that even modelling the problem as a graph makes sense. Believe me when I say that the very notion of defining a problem formally before solving it is not how work gets done in many applied areas.

Michael Mitzenmacher makes a similar point in a comment to the above post, where he says:
I do think, however, that sometimes the focus of theorists -- to get the best constant-approximation or to shave log or log log factors off -- obscures rather than highlights a good part of what I see as the real "practical value" of this line of research. Focusing on what algorithmic ideas underlie the approximation, and more implementation and experimentation, would (I think) give more real-world power to this theoretical work.

Sunday, February 07, 2010

The challenge of doing good experimental work

I recently submitted (with Arvind Agarwal and Jeff Phillips) a paper on a unified view of multidimensional scaling. It's primarily empirical, in that the main technique is a heuristic that has many nice properties (including providing a single technique for optimizing a whole host of cost measures for MDS). For anyone interested though, there's also a nice JL-style theorem for dimensionality reduction from (hi-D) sphere to (lo-D) sphere, which gets the "right" bound for the number of dimensions.

But this post isn't really about the paper. It's more about the challenges of doing good empirical work when you're trained to think formally about problems. This post is influenced by Michael Mitzenmacher's exhortations (one, two) on the importance of implementations, and Mikkel Thorup's guest lament on the lack of appreciation for simple algorithmic results that have major consequences in practice.

So you're looking at practical ramifications of some nice theory result, or you're trying to apply formal algorithmic tools to some application problem. If you're lucky, the problem doesn't have an existing base of heuristics to compare against, and so even a straight-up implementation of your ideas is a reasonable contribution.

Of course, you're rarely this lucky, and there's usually some mish-mash of heuristics to compare against. Some of them might be head-scratchingly weird, in the "why on earth should this work" category, and some are possibly more principled. At any rate, you go and implement your idea, and you suddenly realize to your horror that worst-case bounds don't mean s*** when your principled method is ten times slower than the crazy heuristics. So now what ?

The central point that I want to make here is that while actually paying attention to implementation issues is of immense value if we want people to actually care about theoretical work, I don't think we get the right kind of training to do it well.

First off, we lack training in various heuristic design strategies. Now I don't actually mean the kinds of heuristics that one might come across in the Kleinberg-Tardos book (local search, annealing, and the like). I mean the much larger body of principle-driven heuristics that the optimization community is quite familiar with. Without even thinking too hard, it's easy to list heuristics like Newton's method, conjugate gradients, alternating optimizations, matching pursuit, majorization, the frank-wolfe method, iteratively reweighted least-squares, (and I could keep going on...)

Of course you might point out that I seem to know all about these heuristics. Well, not quite. The second problem is that even if one knows about these methods, that's not the same thing as having a good intuitive feel for when and why they work well. Case in point: one of the thorns in our side in this paper was a heuristic for MDS called SMACOF. It's a nice technique based on majorization, and although it's a heuristic, it works pretty darn well most of the time and takes a lot of effort to beat, even though there's no clear reason (at least to me) why it should work so well. The only real way to get a sense for how different heuristics work is to implement them all really, or at least have the right set of MATLAB/C/C++ tools lying around. I notice that ML folks tend to do this a lot.

The third problem that often comes up is by far the most irritating one: the actual cost function you're optimizing doesn't even matter that much at all. Returning again to our paper, there are many ways to define the "error" when embedding a metric into another metric. The traditional theoryCS way looks at dilation/contraction: the worst-case ratio between distances in the original and target space. Most variations on MDS actually look at an average difference (take the difference between the distances, and average some function of this). As anyone who mucks around with metric spaces will tell you, the actual error function used can make a huge difference to the complexity of the problem, the ability to approximate, and so on and so forth.

But here's the thing we discovered: it's actually possible to run heuristics designed explicitly for one kind of error function that do just great for another kind of error function, and it takes a lot of work to construct examples that that demonstrate the difference.

These points tie together in a more general theme. I was reading a post by Jake Abernathy at Inherent Uncertainty, and he makes a valuable point about the difference between algorithms/theory culture and ML culture (although ML could be replaced by other applied areas like db/data mining as well). His point is in theoryCS, we are problem-centric: the goal is to prove results about problems, and taxonomize them well. Improve asymptotic run-time - great ! get a better approximation ratio - excellent ! reimplement the same algorithm with the same running time to get a better behaviour in practice - hmmm. This is in contrast (as he puts it) to a lot of ML research, where the algorithmic technique comes first, and it's later on that some results are generated to go along with it.

This of course drives us nuts: NOT EVERY CLUSTERING PROBLEM SHOULD BE SOLVED WITH k-MEANS ! (phew - I feel better now). But if you reexamine this situation from the setting of applied situations, you realize its utility. I want to solve a problem - I pick up some off-the-shelf algorithm. Maybe it's doesn't solve my problem exactly; maybe it solves a related problem. But it's either this, or some very complicated theoretical method that has no extant implementation, and is optimizing for a worst-case that I might never encounter. What do I do then ?

This is not a rant about worst-case analysis. Far from it. It's not a rant about O() notation either. What I'm merely trying to say is that a focus on worst-case analysis, asymptotic improvements, and provable guarantees, while essential to the enterprise of doing theory, leaves us little room for the kind of experience needed to do effective practical implementation of our ideas.
Subscribe to: Comments (Atom)

Disqus for The Geomblog

AltStyle によって変換されたページ (->オリジナル) /