Thursday, January 29, 2015

More FOCS 2014-blogging

In the spirit of better late than never, some more updates from Amirali Abdullah from his sojourn at FOCS 2014. Previously, he had blogged about the higher-order Fourier analysis workshop at FOCS.

I'll discuss now the first official day of FOCS, with a quick digression into the food first: the reception was lovely, with some nice quality beverages, and delectable appetizers which I munched on to perhaps some slight excess. As for the lunches given to participants, I will think twice in future about selecting a kosher option under dietary restrictions. One hopes for a little better than a microwave instant meal at a catered lunch, with the clear plastic covering still awaiting being peeled off. In fairness to the organizers, once I decided to revert to the regular menu on the remaining days, the chicken and fish were perfectly tasty.

I will pick out a couple of the talks I was most interested in to summarize briefly. This is of course not necessarily a reflection of comparative quality or scientific value; just which talk titles caught my eye.

The first talk is "Discrepancy minimization for convex sets" by Thomas Rothvoss. The basic setup of a discrepany problem is this: consider a universe of $n$ elements, $[n]$ and a set system of $m$ sets ($m$ may also be infinite), $S = \{S_1, S_2, \ldots, S_m \},ドル where $S_i \subset [n]$. Then we want to find a 2ドル$-coloring $\chi : [n] \to \{-1, +1 \}$ such that each set is as evenly colored as possible. The discrepany then measures how unevenly colored some set $S_i \in S$ must be under the best possible coloring.

One fundamental result is that of Spencer, which shows there always exists a coloring of discrepancy $O(\sqrt{n})$. This shaves a logarithmic factor off of a simple random coloring, and the proof is non-constructive. This paper by Rothvoss gives the first algorithm that serves as a constructive proof of the theorem.

The first (well-known) step is that Spencer's theorem can be recast as a problem in convex geometry. Each set $S_i$ can be converted to a geometric constraint in $R^n,ドル namely define a region $x \in R^n : \{ \sum_{j \in S_i} | x_j | \leq 100 \sqrt{n} \}$. Now the intersection of these set of constraints define a polytope $K,ドル and iff $K$ contains a point of the hypercube $\{-1 , +1 \}^n$ then this corresponds to the valid low discrepancy coloring.

One can also of course do a partial coloring iteratively - if a constant fraction of the elements can be colored with low discrepancy, it suffices to repeat.

The algorithm is surprisingly simple and follows from the traditional idea of trying to solve a discrete problem from the relaxation. Take a point $y$ which is generated from the sphercial $n$-dimensional Gaussian with variance 1. Now find the point $x$ closest to $y$ that lies in the intersection of the constraint set $K$ with the continuous hypercube $[-1, +1]^n$. (For example, by using the polynomial time ellipsoid method.) It turns out some constant fraction of the coordinates of $x$ are actually tight(i.e, integer valued in $\{-1, +1 \}$) and so $x$ turns out to be a good partial coloring.

To prove this, the paper shows that with high probability all subsets of $[-1 +1]^n$ with very few tight coordinates are far from the starting point $y$. Whereas with high probability, the intersection of $K$ with some set having many tight coordinates is close to $y$. This boils down to showing the latter has sufficiently large Gaussian measure, and can be shown by standard tools in convex analysis and probabilitiy theory. Or to rephrase, the proof works by arguing about the isoperimetry of the concerned sets.

The other talk I'm going to mention from the first day is by Karl Bringmann on the hardness of computing the Frechet distance between two curves. The Frechet distance is a measure of curve similarity, and is often popularly described as follows: "if a man and a dog each walk along two curves, each with a designated start and finish point, what is the shortest length leash required?"

The problem is solvable in $O(n^2)$ time by simple dynamic programming, and has since been improved to $O(n^2 / \log n)$ by Agarwal, Avraham, Kaplan and Sharir. It has long been conjectured that there is no strongly subquadratic algorithm for the Frechet distance. (A strongly subquadratic algorithm being defined as $O(n^{2 -\delta})$ complexity for some constant $\delta,ドル as opposed to say $O(n^2 / polylog(n))$.)

The work by Bringmann shows this conjecture to be true, assuming SETH (the Strongly Exponential Time Hypothesis), or more precisely that there is no $O*((2- \delta)^N)$ algorithm for CNF-SAT. The hardness result holds for both the discrete and continuous versions of the Frechet distance, as well as for any 1ドル.001$ approximation.

The proof works on a high level by directly reducing an instance of CNF-SAT to two curves where the Frechet distance is smaller than 1ドル$ iff the instance is satisfiable. Logically, one can imagine the set of variables are split into two halves, and assigned to each curve. Each curve consists of a collection of "clause and assignment" gadgets, which encode whether all clauses are satisfied by a particular partial assignment. A different such gadget is created for each possible partial assignment, so that there are $O*(2^{N/2})$ vertices in each curve. (This is why solving Frechet distance by a subquadratic algorithm would imply a violation of SETH.)

There are many technical and geometric details required in the gadgets which I won't go into here. I will note admiringly that the proof is surprisingly elementary. No involved machinery or complexity result is needed in the clever construction of the main result; mostly just explicit computations of the pairwise distances between the vertices of the gadgets.


I will have one more blog post in a few days about another couple of results I thought were interesting, and then comment on the Knuth Prize lecture by the distinguished Dick Lipton.

Tuesday, January 27, 2015

Streaming @ SODA: Part II

This is the second of two posts by Samira Daruki on the streaming sessions at SODA 2015. For the first post, see here.

In the third paper from the streaming graph family in SODA15: "Parameterized Streaming: Maximal Matching and Vertex Cover", Chitnis, Cormode, Hajiaghayi and Monemizadeh introduce a new approach to handling graph streams called parameterized streaming algorithms. Also, in addition to insertion-only model, they consider the dynamic model of streaming graphs in which the input is a sequence of insertion/deletion on the edges.

This dynamic model of streaming graph processing is popular when the graph size is changing, and has recently received much attention due to breakthroughs by Ahn, Guha and McGregor (one, and two). Over these two papers, they showed the first results for a number of graph problems over dynamic streams. This has provoked much interest into what can be computed over dynamic graph streams, although still there is not much work on solving graph optimization problems in this model. The challenge here is that when an edge is deleted, sometimes it requires a substantial work to repair the solution again, so we need to make sure that the algorithm has enough information to do so, while keeping only a bounded amount of working space. (ed: not surprisingly, some of these ideas are useful for non-streaming dynamic graph algorithms: witness the paper by Kapron, King and Mountjoy on dynamic connectivity in (randomized) worst-case polylog time from SODA a few years ago)

Returning to parametrized streaming, in this paper instead of solving exactly the optimization problem on the graph stream, the goal is to solve the “parametrized” version of the problem, where the parameter $k$ is given and we want to solve the following decision problem:
Is there a solution with size bounded by $k$?
The motivation behind parametrizing the problem comes from real world applications in which the solution of the graph problems is small comparing to the size of the input (i.e. sublinear in the size of input). In these cases, the interesting challenge is to solve the optimization graph problems in streaming fashion using space bounded by some function of the “solution size” instead of the “input size”.

To solve the parameterized problems, one of the techniques which is used is kernelization, which uses a polynomial time preprocessing to map the input to another equivalent input of smaller size $f(k)$ (called a kernel) with a new parameter value $k’ \le g(k),ドル for a computable function $g$.

In this paper, by combining kernelization techniques with randomized sketch structures, the first streaming algorithms for the parameterized versions of the Vertex Cover problem is obtained. The main idea here is to maintain a maximal matching of underlying graph in a streaming fashion. Then run the well-known kernelization algorithm for Vertex Cover on the maintained maximal matching. The data structure to maintain the maximal matching use the $k$-sample recovery sketching algorithm, which is a generalization of linear sketching for $\ell_0$-sampling, as the main tool and apply it to the neighborhood of each vertex (incident edges) in the resulted matching. So as the edges are inserted or deleted, these sketches can be updated without needing knowledge of the full neighborhood of nodes. However, there are some challenges with deletion of edges: as the edges are deleted we need to have an intelligent mechanism to ensure the matching remains maximal using only limited stored information.

Another nice result here is showing a tight lower bound of $\Omega(k^2)$ (by reducing from the INDEX problem in communication complexity) for the space complexity of any (randomized) streaming algorithms for parameterized Vertex Cover, which holds even in the insertion-only model.

Besides the general models of insert-only and dynamic, another restricted model in dynamic framework is also discussed in which we know for sure that at time $i,ドル the size of the vertex cover of underlying graph induced by the edges till that point is at most $k$. With this promise, they develop a dynamic parameterized streaming algorithm whose space usage matches the proved lower bound.

It is interesting to think about other NP-hard problems in the framework of parameterized streaming and explore how kernelization can be helpful in this direction or see whether we can find other powerful hammers to overcome the challenges which arises in designing algorithms for hard problems in streaming setting.


Coda:
Along with the three papers discussed above, there was another paper on streaming presented at SODA (by Naumovitz and Saks) which provides a deterministic polylogarithmic-space streaming algorithm for approximating distance to monotonicity for a sequence of $n$ numbers, compared to the corresponding randomized result presented at SODA two years ago.

While I won't discuss this work here so as to keep these posts just about streaming graph algorithms, I encourage the interested readers to take a look at this paper as well, as the last one in the streaming family of SODA15.

Streaming @ SODA: Part I

This two part series is written by my student Samira Daruki.

Modern graph data sets are too large to fit in the memory. And so the streaming model is one of the more popular and attractive ones for analyzing massive graphs: in this model, for an input graph $G = (V, E)$ with $n$ vertices and $m$ edges, the edges arrive in an arbitrary order in a stream and the algorithm can only use $\tilde{O}(n)$ space. The algorithm is allowed to have several passes over the stream but usually the ideal case is to have just one pass.

For many graph problems such as matching, spanning tree, graph sparsification, approximate distance and counting subgraphs there now exist streaming algorithms with space complexity $O(n \text{poly} (\log n))$. In these algorithms, usually we assume that the nodes of the graphs can be stored in memory but edges cannot. Notice that for constructing matchings, spanners and sparsifiers, the output size is often $\Omega(n),ドル so it forces the streaming algorithm to use $\Omega(n)$ space.


But if you don't need to report the output, then this can be avoided. For an example, see the work of Kapralov, Khanna and Sudan from SODA 2014 which approximates the matching size to a $\text{poly}(\log n)$ factor using $\text{poly}(\log n)$ space in a random stream (where edges appear in a random order rather than arbitrarily)

Thus, the question now is:
can we obtain a $\text{poly}\log$ space streaming algorithm for approximating the solution cost for other graph problems?
Consider for instance MAX-CUT. There are several results on approximating the maximum cut in graphs in a non-streaming model; the trivial approach is to take a random cut. This selects half of the edges in expectation, resulting in a factor 1ドル/2$-approximation.

Thus implies that in a streaming model we can just count the number of edges $m$ and output $\frac{m}{2}$ which results in a $O(\log n)$-space algorithm. By keeping a sample of the edge set we can get a different tradeoff: a $(1+\epsilon)$-approximation algorithm which uses $O(\frac{n}{\epsilon^2})$ space.

Can we get a streaming algorithm with better than factor-2ドル$ approximation using just $poly(\log n)$ space?

A paper by Kapralov, Khanna and Sudan in the streaming session of SODA15 this year answers this question. This is the latest in a line of results on streaming graph problems by Kapralov and others from SODA 2012, 2013 and 2014 (mentioned above)

Here is their main result
For any constant $\epsilon > 0$ a single pass streaming algorithm for approximating the value of MAX-CUT to a factor 2ドル − \epsilon$ requires $\Omega(\sqrt{n})$ space, even in the random order model.
This result rules out the possibility of $\text{poly}(\log n)$ space, but suggests that $\tilde{O}(\sqrt{n})$ space cost might be possible in some specific settings.

Another major result of this paper is as follows:

For any constant $\epsilon > 0$ a single pass streaming algorithm for approximating MAX-CUT value to factor 1ドル + \epsilon$ requires $n^{1−O(\epsilon)}$ space in adversarial streams.

The main fact they use here is the connection between the MAX CUT value and (distance from) bipartiteness:
if a graph $G$ with $m$ edges is $\beta$-far from being bipartite, then maxcut value of $G$ is at most $(1-\beta)m$.
The other simple observation is that any algorithm that computes a $\gamma$-approximation to MAX CUT distinguishes between bipartite graphs and graphs that are 1ドル-\frac{1}{\gamma}$-far from being bipartite. Thus to show that no streaming algorithm using space $c$ can achieve a $\gamma$- approximation with failure probability at most $\delta,ドル it's enough enough to show no streaming algorithm using space $c$ can distinguish between bipartite graphs and graphs which are 1ドル- \frac{1}{\gamma}$-far from being bipartite with probability at least 1ドル- \delta$.

Given these facts, now the core idea to prove the main result here is exhibiting a distribution over inputs where $(2-\epsilon)$ approximation to MAX-CUT requires $\Omega(\sqrt{n})$ space.

More precisely, the input graph instances are based on random Erdos-Renyi graphs, which are either bipartite in YES case or non-bipartite in the NO case. In order to achieve a $(2-\epsilon)$-factor gap for the MAX CUT in this structure, we choose the expected degree of vertices to be $\Theta(\frac{1}{\epsilon^2})$.

This way, the input streaming graph can be partitioned and given to the algorithm in $\Omega(\frac{1}{\epsilon^2})$ phases, which can be simulated as a $\frac{1}{\epsilon^2}$-party one-way communication game.

Then, by giving a reduction from a variation of Boolean Hidden Matching(BHM) called Distributional Boolean Hidden Partition(D-BHP) to the MAX-CUT on the input instance of the problem, and showing that $\Omega(\sqrt{n})$ space is necessary to differentiate between these two cases, the main streaming lower bound result for obtaining approximate MAX-CUT is straightforward.

There are many technical details in performing this reduction, but roughly speaking they show that any algorithm that solves MAX-CUT on the constructed instances must solve the two-party communication problem in at least one of the phases.

There are still some main open problems left in this paper:

  • One is whether breaking 2ドル$-approximation barrier in $\sqrt{n}$ space is possible if we are allowed to have $poly(\log n)$ passes over the input stream?
  • Also it is interesting to think about designing streaming algorithms for other graph problems using $o(n)$ space.

This brings us to another paper presented in this session. In Streaming Algorithms for Estimating the Matching Size in Planar Graphs and Beyond (by Esfandiari, Hajiaghayi, Liaghat, Monemizadeh and Onak), this latter question is answered about finding the maximum matching for planar graphs using $o(n)$ space.

Here is the main result:
If the underlying graph is planar, then there is a streaming algorithm which provides a $O(1)$-approximation solution to maximum matching with high probability using $O(n^{\frac{2}{3}})$ space.
The main idea for proving the result here is to use a known structural graph property:
If we characterize the nodes of the input graph based on the degrees to two groups of light (deg < 9) and heavy (deg > 9) vertices and then define the shallow edges as the ones with two light endpoints, then we have the following nice property: (Assuming |maximum matching| = m, |shallow edges| = s and | heavy vertices| = h):
$$ \frac{\max (s, h)}{12} \leq m \leq h + s$$

Then using this structural fact as the main tool, constructing a small size matching (bounded by $c n^{\frac{2}{3}}$) as we read the edges in a greedy manner, and estimating the number of shallow edges and heavy vertices in the induced subgraph by a subset of sampled vertices with size $c n^{\frac{2}{3}},ドル we can approximate the size of the maximum matching by a constant factor.

In addition to planar case, they show that similar results for approximating maximum matching in other graph structures such as $d$-degenerate graphs and forests are achievable.

Coming up: parametrized streaming and VERTEX COVER.


Thursday, January 15, 2015

Brief Review of the post-SODA workshop on algorithmic challenges in machine learning.

My student +John Moeller attended the workshop on algorithmic challenges in machine learning organized by +Shachar Lovett and +Kamalika Chaudhuri after SODA. He wrote up a brief report of some papers that he liked.

Monday, January 12, 2015

FOCS Workshop on higher-order Fourier analysis: A Review

This is a guest post by my student Amirali Abdullah. Amirali attended FOCS 2014 and has a number of interesting reflections on the conference.

2014 was my first experience of attending a FOCS conference, and finally seeing the faces* (attached to some of the cutting edge as well as classical results in theoretical computer science. Not only did I get to enjoy the gentle strolls between conference halls, and hearing about fields I'd never known existed, I had the pleasure of doing so while revisiting historic Philadelphia.

With the talk videos now available, I thought now would be a good time to revisit some of the talks and my thoughts on the event. The usual disclaimers apply - this will be by no means comprehensive, and I won't go into the technicalities in much depth.

I'll begin with the first day, where I chose to attend the workshop on Higher-Order Fourier analysis. The starting point is that for the study of a function $f,ドル it is standard to consider its correlations with the Fourier basis of exponential functions (i.e., of the form $e(x) = e^{2 \pi \iota x}$) (also called as linear phase functions). This basis is orthonormal and has many nice analytic properties.

A standard technique is to decompose a function $f$ into the heavy components of its Fourier basis, and then argue that the contribution of the lower weight components is negligible. This gives a decompositon $f= f_1+ f_2,ドル where $f_1$ has few non-zero Fourier coefficients, and $f_2$ has all Fourier coefficients close to 0. Another way to view this under certain perspectives is $f_1$ representing the structured part of $f$ and $f_2$ the pseudorandom part.

However for some applications, the analysis of correlation with quadratic (or even higher order) phase functions of the form $e(x) = e^{2 \pi \iota x^2}$ is more powerful and indeed required. (An example of such a problem is where given a function on the integers, one desires to study its behavior on arithmetic progressions of length four or more.)

A subtlety in the study of higher order Fourier analysis is that notions such as "tail" and "weight" now have to be redefined. For regular Fourier analysis (at least on the Hamming cube) the natural notion corresponds with the $\ell_2^2$ norm or the linear correlation of a function and a linear basis element. However, in higher order Fourier analysis one has to define norms known as the Gowers uniformity norms which capture higher order correlations with these higher degree functions. This yields a decomposition of $f = f_1 + f_2 + f_3,ドル where $f_1$ consists of few non-zero higher order phase functions, $f_2$ has small $\ell_2$ norm and $f_3$ has small \emph{Gower's norm} under the right notion.

There were several talks discussing the various subfields of higher order Fourier analysis. Madhur Tulsiani discussed some of the algorithmic questions, including computing such a decomposition of the function into higher order Fourier terms in an analog of the Goldreich-Levin algorithm. Yui Yoshida discussed applications to algebraic property testing.

Abhishek Bhowmick discussed his very interesting paper with Lovett showing that the list-decoding radius of the Reed-Muller code over finite prime fields equals (approximately) the minimum distance of the code. The application of Fourier order analysis here is essentially to decompose the input space of the code by high-degree polynomials so that any random distribution is well-spread over these partition atoms.

I thought the workshop was an interesting exposure to some deep mathematics I had not previously seen, and gave some good pointers to the literature/work I can consult if I ever want a richer understanding of the toolset in the field.

Note: Thanks to Terry Tao's book on the subject for some useful background and context. All mistakes in this post are entirely mine. For a much more comprehensive, mathematically correct and broad view on the subject, do check out his blog.

* One can of course claim that 'seeing faces' could also include the digital images on faculty and student websites snapped in haste or misleading poses, but I choose to ignore this subtlety.

Privacy is dead, but not because Scott McNealy said so.

I decided to experiment with using Medium for a long-form not-quite technical post on the evolution of research in data privacy.
The idea of privacy has driven much of our concerns over data for the last few years, and has been the driver of extremely successful research efforts, most notably in differential privacy.What we’re seeing now though is a pivot away from the undifferentiated and problematic notion of privacy in data towards a more subtle and nuanced notion of how data is actually used, and how it should be used.

Here's the post: Medium allows this nifty way of commenting inline, so comment away there, or here.

Thursday, December 11, 2014

Accountability in data mining

For a while now, I've been thinking about the endgame in data mining. Or more precisely, about a world where everything is being mined for all kinds of patterns, and the contours of our life are shaped by the patterns learnt about us.

We're not too far from that world right now - especially if you use intelligent mail filtering, or get your news through social media. And when you live in such a world, the question is not "how do I find patterns in data", but rather
How can I trust the patterns being discovered ?
When you unpack this question, all kinds of questions come to mind: How do you verify the results of a computation ? How do you explain the results of a learning algorithm ? How do you ensure that algorithms are doing not just what they claim to be doing, but what they should be doing ?

The problem with algorithms is that they can often be opaque: but opacity is not the same as fairness, and this is now a growing concern amongst people who haven't yet drunk the machine learning kool-aid.

All of this is a lead up to two things. Firstly, the NIPS 2014 workshop on Fairness, Accountability and Transparency in Machine Learning, organized by +Moritz Hardt and +Solon Barocas and written about by Moritz here.

But secondly, it's about work that +Sorelle Friedler, +Carlos Scheidegger and I are doing on detecting when algorithms run afoul of a legal notion of bias called disparate impact. What we're trying to do is pull out from a legal notion of bias something more computational that we can use to test and certify whether algorithms are indulging in legally proscribed discriminatory behavior.

This is preliminary work, and we're grateful for the opportunity to air it out in a forum perfectly designed for it.

And here's the paper:
What does it mean for an algorithm to be biased?
In U.S. law, the notion of bias is typically encoded through the idea of disparate impact: namely, that a process (hiring, selection, etc) that on the surface seems completely neutral might still have widely different impacts on different groups. This legal determination expects an explicit understanding of the selection process.
If the process is an algorithm though (as is common these days), the process of determining disparate impact (and hence bias) becomes trickier. First, it might not be possible to disclose the process. Second, even if the process is open, it might be too complex to ascertain how the algorithm is making its decisions. In effect, since we don't have access to the algorithm, we must make inferences based on the data it uses.
We make three contributions to this problem. First, we link the legal notion of disparate impact to a measure of classification accuracy that while known, has not received as much attention as more traditional notions of accuracy. Second, we propose a test for the possibility of disparate impact based on analyzing the information leakage of protected information from the data. Finally, we describe methods by which data might be made "unbiased" in order to test an algorithm. Interestingly, our approach bears some resemblance to actual practices that have recently received legal scrutiny.
In one form or another, most of my recent research touches upon questions of accountability in data mining. It's a good thing that it's now a "trend for 2015". I'll have more to say about this in the next few months, and I hope that the issue of accountability stays relevant for more than a year.

Friday, December 05, 2014

Experiments with conference processes

NIPS is a premier conference in machine learning (arguably the best, or co-best with ICML). NIPS has also been a source of interesting and ongoing experiments with the process of reviewing.

For example, in 2010 Rich Zemel, who was a PC chair of NIPS at the time, experimented with a new system he and Laurent Charlin were developing that would determine the "fit" between a potential reviewer and a submitted paper. This system, called the Toronto Paper Matching System, is now being used regularly in the ML/vision communities.

This year, NIPS is trying another experiment. In brief,

10% of the papers submitted to NIPS were duplicated and reviewed by two independent groups of reviewers and Area Chairs.
And the goal is to determine how inconsistent the reviews are, as part of a larger effort to measure the variability in reviewing. There's even a prediction market set up to guess what the degree of inconsistency will be. Also see Neil Lawrence's fascinating blog describing the mechanics of constructing this year's PC and handling the review process.

I quite like the idea of 'data driven' experiments with format changes. It's a pity that we didn't have a way of measuring the effect of having a two-tier committee for STOC a few years ago, and instead had to rely on anecdotes about its effectiveness, and didn't even run the experiment long enough to collect enough data. I feel that every time there are proposals to change anything about the theory conference process, the discussion gets drowned out in a din of protests, irrelevant anecdotes, and opinions based entirely on ... nothing..., and nothing ever changes.

Maybe there's something about worst-case analysis (and thinking) that makes change hard :).

Thursday, November 20, 2014

Open access, ACM and the Gates Foundation.

Matt Cutts, in an article on the new Gates Foundation open access policy (ht +Fernando Pereira) says that
while the ACM continues to drag its heels, the Gates Foundation has made a big move to encourage Open Access...
Which got me thinking. Why can't the ACM use this policy as a guidelines to encourage open access ? Specifically,

  • Announce that from now on, it will subsidize/support the open access fees paid by ACM members
  • (partially) eat the cost of publication in ACM publications (journals/conferences/etc)
  • Use the resulting clout to negotiate cheaper open access rates with various publishers in exchange for supporting open access fees paid to those journals.
Of course this would put the membership side of ACM at odds with its publication side, which maybe points to another problem with ACM having these dual roles.

Monday, October 06, 2014

Algorithms is the new Algorithms...

Hal Daumé wrote a rather provocative post titled ‘Machine Learning is the new algorithms’, and has begged someone to throw his gauntlet back at him. Consider it now thrown !

His thesis is the following quote:
Everything that algorithms was to computer science 15 years ago, machine learning is today
And among the conclusions he draws is the following:
we should yank algorithms out as an UG requirement and replace it with machine learning
Having spent many years having coffees and writing papers with Hal, I know that he truly does understand algorithms and isn’t just trying to be a troll (at least I hope not). So I’m trying to figure out exactly what he’s trying to say. It will help if you read his article first before returning…

First off, I don’t understand the conclusion. Why not (say) replace architecture with ML, or databases with ML. Or why replace anything at all ? the assumption is that ML is a core tool that students will need more than any other topic. Now I have no problem with adding ML to the list of “things a CS person ought to know”, but I don’t see why it’s not important for a CS person to understand how a database works, or how a compiler processes code, or even how we design an algorithm. This fake mutual exclusiveness appears to be without basis.

But maybe he’s saying that algorithms and ML are two flavors of the same object, and hence one can be replaced by the other. If so, what exactly is that object ? In his view, that object is:
given an input, what’s the best way to produce an (optimal) output ?
This seems to be an overly charitable view of ML. In ML, the goal is to, well, learn. Or to be more stodgy about it, ML provides tools for making systematic inferences and predictions from data.

Which suggests that the concerns are fundamentally orthogonal, not in opposition (and Sasho Nikolov makes this point very well in a comment on Hal’s post). As Hal correctly argues, the hard work in ML is front-loaded: modeling, feature extraction, and the like. The algorithm itself is mostly an afterthought.

But what’s ironic is that one of the most important trends in ML of late is the conversion of an ML problem to an optimization problem. The goal is to make good modeling choices that lead to an optimization problem that can be solved well. But wait: what do you need to know how to solve an optimization ? Wait for it…… ALGORITHMS !!

The argument about stability in algorithms is an odd one, especially coming from someone who’s just written a book on ML. Yes, some core algorithms techniques haven’t changed much in the last many years, but did you see that 2014 paper on improvements in recurrence analysis ? Or the new sorting algorithm by Mike Goodrich ? or even the flurry of new results for Hal’s beloved flow problems ?

As for “everything’s in a library”, I’ll take your boost graph library and give you back WEKA, or libSVM, or even scikit-learn. I don’t need to know anything from Hal’s book to do some basic futzing around in ML using libraries: much like I could invoke some standard hashing subroutine without knowing much about universal hash families.

In one sense though, Hal is right: ML is indeed where algorithms was 15 years ago. Because 15 years ago (approximately) is when the streaming revolution started, and with it the new interest in sub linear algorithms, communication complexity, matrix approximations, distributed algorithms with minimal communication, and the whole “theory of big data” effort. And ML is now trying to catch up to all of this, with some help with from algorithms folks :).

What is true is this though: it wouldn’t hurt us to revisit how we construct the core algorithms classes (undergrad and grad). I realize that CLRS is the canon, and it would cause serious heart palpitations to contemplate not stuffing every piece of paper of that book down students’ throats, but maybe we should be adapting algorithms courses to the new and exciting developments in algorithms itself. I bring in heavy doses of approximation and randomization in my grad algorithms class, and before we forked off a whole new class, I used to teach bits of streaming, bloom filters, min-wise hashing and the like as well. My geometry class used to be all about the core concepts from the 3 Marks book, but now I throw in lots of high dimensional geometry, approximate geometry, kernels and the like.

Ultimately, I think a claim of fake mutual exclusivity is lazy, and ignores the true synergy between ML and algorithms. I think algorithms research has a lot to learn from ML about robust solution design and the value of "noise-tolerance", and ML has plenty to learn from algorithms about how to organize problems and solutions, and how deep dives into the structure of a problem can yield insights that are resilient to changes in the problem specification.



Wednesday, October 01, 2014

On the master theorem vs Akra-Bazzi

Everyone knows the master theorem.

Or at least everyone reading this blog does.

And I'm almost certain that everyone reading this blog has heard of the generalization of the master theorem due to Akra and Bazzi. It's particularly useful when you have recurrences of the form
$$ T(n) = \sum_i a_i T(n/b_i) + g(n) $$
because like the master theorem it gives you a quick way to generate the desired answer (or at least a guess that you can plug in to the recurrence to check).

(And yes, I'm aware of the generalization of A/B due to Drmota and Szpankowski)

When I started teaching grad algorithms this fall, I was convinced that I wanted to teach the Akra-Bazzi method instead of the master theorem. But I didn't, and here's why.

Let's write down the standard formulation that the master theorem applies to
$$ T(n) = a T(n/b) + f(n) $$
This recurrence represents the "battle" between the two terms involved in a recursive algorithm: the effort involved in dividing (the $a T(n/b)$) and the effort involved in putting things back together (the $f(n)$).

And the solution mirrors this tension: we look at which term is "stronger" and therefore dominates the resulting running time, or what happens when they balance each other out. In fact this is essentially how the proof works as well.

I have found this to be a useful way to make the master theorem "come alive" as it were, and allow students to see what's likely to happen in a recurrence without actually trying to solve it. And this is very valuable, because it reinforces the point I'm constantly harping on: that the study of recurrences is a way to see how to design a recursive algorithm. That decimation as a strategy can be seen to work just by looking at the recurrence. And so on.

But the Akra-Bazzi method, even though it's tremendously powerful, admits no such easy intuition. The bound comes from solving the equation
$$ \sum a_i b_i^p = 1 $$ for $p,ドル and this is a much more cryptic expression to parse. And the proof doesn't help make it any less cryptic.

Which is not to say you can't see how it works with sufficient experience. But that's the point: with sufficient experience. From a purely pedagogical perspective, I'd much rather teach the master theorem so that students get a more intuitive feel for recurrences, and then tell them about A/B for cases (like in median finding) where the master theorem can only provide an intuitive answer and not a rigorous one.


Friday, September 26, 2014

STOC 2015 Deadline: Nov 4, 2014

Via Ronitt Rubinfeld comes word that the STOC 2015 CFP is out.

Submission deadline: Tuesday, November 4, 2014, 3:59pm EST

Conference: June 14-17 2015, Portland, Oregon (part of FCRC)

Saturday, August 23, 2014

LaTeX is code...

I'm giving a talk on LaTeX this Monday as part of our new grad student "boot camp" series. It's really more of an interactive presentation: I'll use writelatex (or sharelatex) to demo examples, give student simple assignments, and use real-time chat to see how things are going. It should be quite interesting.

Here's the talk announcement:
Did you know that every time you use $..$ to italicize text, or use \\ to force a newline, Leslie Lamport cries out in agony and Don Knuth starts another fascicle of Volume IV of TAoCP ?
Come and learn about how to use LaTeX, and use it well. Your collaborators will love you, your advisors will love me, and you'll realize that the most awfully written drivel looks awesome when typeset well.
This will be interactive!! I'll be using a shared space for editing and viewing latex documents, and there will be class activities, so please do bring your laptops/tablets/other editing device so you can follow along and participate.

For this talk I solicited comments from colleagues as to what they'd like their students to learn. Probably the most useful comment I got was from +Robert Ricci and +Eric Eide: to whit,

LaTeX is code.

This might seem obvious, but once you internalize it, all kinds of other things become very natural. For example
  • You should really use some kind of IDE to write and build your documents
  • Version control is your friend
  • *sections should be separate files.
  • Text should be readable.
  • Use macros where convenient
  • Don't reinvent: use the many many built-in packages at ctan.org
  • Use tex.stackexchange.com to learn how to hack whatever you need in LaTeX.

A corollary: to see a theoretician editing LaTeX close to a STOC/FOCS/SODA deadline is to realize that theory folk are AWESOME programmers.







Tuesday, August 19, 2014

Long Live the Fall Workshop (guest post by Don Sheehy)

An announcement for the Fall Workshop in Computational Geometry, by Don Sheehy.


In all the conversation about SoCG leaving the ACM, there were many discussions about ownership, paywalls, and money. This leads naturally to questions of ideals. What can and ought a research community be like? What should it cost to realize this? Isn't it enough to bring together researchers and in an unused lecture hall at some university somewhere, provide coffee (and wifi), and create a venue for sharing problems, solutions, and new research in an open and friendly atmosphere? There is a place for large conferences, with grand social events (Who will forget the boat cruise on the Seine at SoCG 2011?), but there is also a place for small meetings run on shoestring budgets that are the grassroots of a research community.

The Fall Workshop on Computational Geometry is such a meeting. It started in 1991, at SUNY Stony Brook and has been held annually every fall since. I first attended a Fall Workshop during my first year of graduate school, back in 2005. This year marks the 24th edition of the workshop, and this time, I will be hosting it at the University of Connecticut. It is organized as a labor of love, with no registration fees. There are no published proceedings and it is a great opportunity to discuss new work and fine-tune it in preparation for submission. It is perfectly timed to provide a forum for presenting and getting immediate feedback on your potential SoCG submissions. I cordially invite you to submit a short abstract to give a talk and I hope to see you there.

Important dates:
Submission deadline: Oct 3 midnight (anywhere on earth)
Conference: Oct 31-Nov 1, 2014.


Wednesday, August 13, 2014

Interdisciplinary research and the intellectual richness of data analysis

Slides on a brief introduction to themes in machine learning from an algorithms perspective, and some thoughts on the mathematical richness of the study of data.

Wednesday, August 06, 2014

A brief note on Fano's inequality

I've been bumping into Fano's inequality a lot lately, and have found the various explanations on the web somewhat lacking. Not because they aren't useful, but because their perspective is very different to the kind that I'd prefer as an algorithms person.

So after grumbling and mumbling and complaining, I decided the only solution was to write my own ! And here it is, as a raindrop.

Eh ? What's that you say ? And here we're just getting used to twitter ?

Raindrops are a web publishing form designed by the company run by our very own V. Vinay. When I was in India last I visited him in Bangalore, and he showed me the system. It's a nice way to make presentations or short lectures.

The raindrop I created is embedded directly in my blog, but can also be viewed directly at this link. I hope you like the medium, and the content !

[フレーム]

Sunday, July 27, 2014

A response to Vint Cerf

A somewhat grumpy response to +Vint cerf 's request for feedback on how the ACM can do more for "professional programmers".

Dear Dr. Cerf
In your recent letter to the members of ACM, you write "I would like to ask readers how they satisfy their need to keep informed about computing practices and research results that may influence their own work". While I suspect your goal is to understand how ACM can serve the larger tech community and not the research community and I am a card-carrying member of the latter group, I thought I'd respond anyway.

First up, it's an ambitious (and brave!) idea to think that the ACM (or any single entity for that matter) can serve the needs of the vast technology enterprise. There was probably a time before the web when professional societies played an important role in collecting together people with shared interests and disseminating valuable information out to interested individuals. But now we have your current employer ! and online communities galore ! and Quora ! and the Stackexchange ecosystem ! and so many different ways for people to build communities, share information and learn about new ideas percolating through the world of tech.

It's a little funny though that you're worried about ACM's presence in the professional world. Many of us have long assumed that ACM spends most of its focus on that side of the computing community (the excellent revamp of the CACM under +Moshe Vardi being the exception that proved the rule). In fact, I'd go as far as to argue that the ACM would be much better served if it were instead to realize how it's driving itself into irrelevance in a research community that so desperately needs an institutional voice.

How do we satisfy our need to keep informed about results that might influence our work ? We (still) read papers and go to conferences. And how does the ACM help ? Well not very well.


  • Aggregating the deluge of information: anyone will tell you that the amount of research material to track and read has grown exponentially. But we still, to this day, have nothing like PUBMED/MEDLINE as a central clearinghouse for publications in CS-disciplines. The ACM DL is one step towards this, but it's a very poor imitation of what a 21st century repository of information should look like. It's not comprehensive, its bibliographic data is more erroneous than one expects, and the search mechanisms are just plain depressing (it's much easier to use Google)
  • Dealing with the changing nature of peer review and publication: Sadly, ACM, rather than acting like a society with its members' interests at heart, has been acting as a for-profit publisher with a some window dressing to make it look less execrable. Many people have documented this far more effectively than I ever could.
  • Conference services: One of the services a national organization supposedly provides are the conference services that help keep communities running. But what exactly does the ACM do ? It sits back and nitpicks conference budgets, but provides little in the way of real institutional support. There's no infrastructure to help with conference review processes, no support for at-conference-time services like social networking, fostering online discussion and communities, and even modern web support. I only bring this up because all of these services exist, but piecemeal, and outside the ACM umbrella.

Underneath all of this is a slow but clear change in the overall CS research experience. The CRA has been doing yeoman service here: taking the temperature of the community every year with the Taulbee surveys, putting out a best practices document for postdocs after extensive community discussion, and even forming action groups to help gain more support for CS research from the government. Does the ACM do any of this ?

In many ways, this is a golden era for computer science, as the fruits of decades of work in our field seep out into the larger world under the guise of computational thinking, big data and learning. It's a perfect time for an organization that has deep connections in both the academic side of CS and the industrial side to help with the translation and tech transfer needed to maximize the impact of the amazing new technologies we're all developing, as well as reach out to similar institutions in other areas to bring more CS into their communities (as you rightly pointed out)


But there is no sign that ACM has ideas about how to do this or even wants to. And while it continues to chase a professional tech community that doesn't seem to care about it at all, the academics who would have cared are finding their own way.

Monday, June 09, 2014

A declaration of independence (kind of), and technology-induced disintermediation

Musings on the SoCG move to declare independence of the ACM.

It's a little odd to be sitting in Denmark while high quality rød pølse (with remoulade!) is being made across the world in Kyoto. There's an abysmal lack of blogging/tweeting/facebooking/instagramming/snapchatting/can-I-stop-now-ing from the conference.

Of course following in the lines of the complexity conference, we're attempting to declare our own independence from the (削除) EEVVIIL SKELETOR AND HIS MINIONS (削除ここまで) ACM. Jeff Erickson, our (削除) esteeemed grand poobah (削除ここまで) steering committee chair has put out a series of posts outlining the issues at hand, the history of the matter, and the current status of discussions with SIGACT and ACM. Please visit http://makingsocg.wordpress.com to read, discuss and/or heckle as you see fit.

It might be obvious, but this new wave of devolution is being driven largely by technology - most importantly the existence of LIPIcs. Having a platform for open-access publication with minimal costs exposes as false the claims of institutional providers that they alone can provide the support needed for conferences. In fact, there are a number of related claims that appear to be not true in practice (or at least are not as obviously true as once thought).

* We need the imprimatur of an institutional provider as a signalling mechanism for quality. While it might be too soon to tell if dropping affiliation with established institutions (IEEE or ACM) will affect how publications in a venue will be perceived, there's a lot of confidence in the communities that their long-established reputation will outweigh any loss of prestige.

* Institutional providers provide quality editorial and archival services necessary for a serious venue. I think juxtaposing 'quality' and 'editorial' together with Sheridan Printing might cause my two remaining readers to die of hysterical laughter. But the archival issue is a good one. LIPIcs appears to be funded solidly by the German government for a while, and will take a fixed fee from a conference for publication (capped at 750 €). But the Arxiv was struggling for a while. Should we view the ACM and IEEE as "more likely to last" than any other entity ?

* Institutional providers provide the backing and clout needed for conference organization, hotel bookings and so on. This is another good example of a time-money tradeoff. Organizations like SIAM actually do take over the management of the conference: while the results are not always optimal, there is a clear reduction in hassle for the conference organizers. But organizations like the ACM don't take things over in the same way (and from what I can tell, neither does IEEE). I'm surprised though that there aren't yet lean-and-mean event planning providers that we can just pay money to and make our planning problems go away.

* Institutional providers have the financial wherewithal to manage cycles in revenue streams for a conference. This is another real issue. Conferences that have gone independent have eventually managed to maintain a steady income stream, but theory conferencs are smaller and poorer: it remains to be seen whether we can generate the kind of endowment needed to insulate the community against the natural variation in revenue from year to year.

What's disappointing is that none of this had to play out this way.

  • Take LIPICs for example: they clearly marked out their scope -- indexing, archiving functions and hosting -- while staying away from the more content-driven aspects of the process (editing, proofing etc). This makes a lot of sense, given that everyone who publishes knows how to use LaTeX and style files, but might still not be able to make a web page. Why couldn't the ACM have realized this and provided a slimmed-down publishing service ?
  • Why do we go to Microsoft (or Easychair, or hotCRP, or Shai Halevi's software) for our conference submission servers ? If the ACM had provided a service of this kind (or even provided hosting for hotCRP/Shai's software), we'd be happily using it right now, and it could have then tied nicely into Regonline, that ACM already partners with.
  • A lot of the current angst seems to have tone as a root cause: a certain feeling about the provider's attitude towards the conference. This is again something that could have been recognized and addressed before things got to this stage.
While it's exciting to be part of the "academic spring" (?), people tend to forget that in all revolutions someone gets hurt, and often things don't get better for a long time. I'm intrigued by our attempt to move towards independence though, and the people involved have thought this through very carefully.


Tuesday, May 20, 2014

On beauty and truth in science.

Philip Ball writes a thought-provoking article in Aeon with the thesis that the kind of beauty that scientists describe does not necessarily match the aesthetic notions of art, and is not even consistent among scientists.

It was hard for me to get beyond the casual conflating of beauty in mathematics (the four-color theorem, the proof of Fermat's theorem, and proofs in general) and beauty in scientific theories (relativity, evolution, and so on). But if one goes beyond the artificial duality constructed by the author, the idea of beauty as a driver in science (and mathematics) is a rich one to explore.
A particular example: for a long time (and even codified in books) it was taught that there were five natural classes of approximation hardness: PTAS, constant factor-hard, log-hard, label-cover (superlogarithmic) hard, and near-linear hard. There were even canonical members of each class.

Of course, this nice classification no longer exists. There are even problems that are $\log^* n$-hard to approximate, and can also be approximated to that factor. And to be fair, I'm not sure how strong the belief was to begin with.

But it was such a beautiful idea.

At least in mathematics, the search for the beautiful result can be quite fruitful. It spurs us on to find better, simpler proofs, or even new ideas that connect many different proofs together. That notion of connection doesn't appear to be captured in the article above: that beauty can arise from the way a concept ties disparate areas together.

MADALGO Summer School on Learning At Scale

I'm pleased to announce that this year's MADALGO summer school (continuing a long line of summer programs on various topics in TCS) will be on algorithms and learning. The formal announcement is below, and registration information will be posted shortly.

Save the date ! Aug 11-14, 2014.
MADALGO Summer School on
LEARNING AT SCALE
August 11- 14, 2014, Aarhus University, Denmark

OVERVIEW AND GOAL
The MADALGO Summer School 2014 will introduce attendees to the latest developments in learning at scale. The topics will include high dimensional inference, algorithmic perspectives on learning and optimization, and challenges in learning with huge data.
LECTURES
The school will be taught by experts in learning:
  • Amr Ahmed (Google)
  • Mikhail Belkin (Ohio State)
  • Stefanie Jegelka (Berkeley)
  • Ankur Moitra (MIT)
PARTICIPATION
The summer school will take place on August 11-14, 2014 at Center for Massive Data Algorithmics (MADALGO) at the Department of Computer Science, Aarhus University, Denmark. The school is targeted at graduate students, as well as researchers interested in an in-depth introduction to Learning. Registration will open soon at the school webpage. Registration is free on a first-come-first serve basis - handouts, coffee breaks, lunches and a dinner will be provided by MADALGO and Aarhus University.
ORGANIZING COMMITTEE
  • Suresh Venkatasubramanian (University of Utah)
  • Peyman Afshani (MADALGO, Aarhus University)
  • Lars Arge (MADALGO, Aarhus University)
  • Gerth S. Brodal (MADALGO, Aarhus University)
  • Kasper Green Larsen (MADALGO, Aarhus University)
LOCAL ARRANGEMENTS
  • Trine Ji Holmgaard (MADALGO, Aarhus University)
  • Katrine Østergaard Rasmussen (MADALGO, Aarhus University)
ABOUT MADALGO
Center for Massive Data Algorithmics is a major basic research center funded by the Danish National Research Foundation. The center is located at the Department of Computer Science, Aarhus University, Denmark, but also includes researchers at CSAIL, Massachusetts Institute of Technology in the US, and at the Max Planck Institute for Informatics and at Frankfurt University in Germany. The center covers all areas of the design, analysis and implementation of algorithms and data structures for processing massive data (interpreted broadly to cover computations where data is large compared to the computational resources), but with a main focus on I/O-efficient, cache-oblivious and data stream algorithms.
Subscribe to: Comments (Atom)

Disqus for The Geomblog

AltStyle によって変換されたページ (->オリジナル) /