Tom Lord - Diagnosing Subversion

People have recently said things here along the lines of "svn fails to
significantly improve upon CVS and, to the degree it does, meta-CVS
and dcvs do the same job in a better way" (I pretty much agree) and
"it looks like an ego driven project" (perhaps, but then I'd like to
think that arch is a pride driven project and ultimately, isn't that
just a slight difference in spin?).
I've thought a lot about "what went wrong" with svn (and take it as
axiomatic, on this list, that _something_ went wrong) for two reasons:
(1) like Bob, I really tried to like svn; (2) as I started to think
about "what went wrong" -- it seemed like what went wrong was a bunch
of mistakes of exactly the sort that I am inclined towards myself and
therefore have to actively resist: there, but for the grace of
something, stand I.
Here's what I think went wrong. This is just my unscientific
impression based on following news of the project over the years.
A) It started with a brilliant idea for a hack: a transactional,
 write-once, versioned, hierarchical filesystem database.
 Around the time svn started, that idea was "going around" -- I 
 even had my own version for a little while.
 As an abstract data structure, that kind of database is a neat
 thing with many potential applications. If you ever spend time
 trying to write robust data-intensive apps on top of a unix
 filesystem without using a database, you really long for that kind
 of functionality.
 Moreover, it's _conceptually_ simple to implement: it's essentially
 just trees written in a functional (side-effectless) style. To
 "modify" a tree, you build a new tree, sharing unmodified nodes
 with the previous tree. Seems relatively cheap and transactions
 just fall out of that nearly for free.
 So here's the first mistake: the idea of a transactional FS is 
 like a shiny new hammer. It's pretty natural to let it possess
 you and start running around looking for nails.
B) It took off from there with an underdeveloped notion of revision
 control.
 Suppose you have the same intuition that Walter expressed a while
 back, which I'll paraphrase as: "The first and most fundamental
 task of a revision control system is to take snapshots of working
 directories." 
 If you don't believe that that's a seductive (even though wrong)
 intuition, go back and look at how I replied. It took many, quite
 abstract paragraphs. What revision control is really about
 (archival, access, and manipulation of changesets) is subtle and
 _non_-intuitive. (Anecodtally, a few years before arch, I made an
 earlier attempt at revision control based on, guess what:
 snapshotting.) What's worse is that a set of working tree
 snapshots combined with a little meta-data is a kind of dual space
 to the kinds of records rev ctl is really about (they're logically
 interconvertable representations). Anything you say to a
 snapshotting fan about what you want to do with a
 changeset-librarian orientation they can reply to with "Yeah, but
 we could do that, too." So it's not even that the snapshot
 intuition is completely wrong: it's just putting an emphasis on the
 wrong details.
 Now the transactional filesystem DB takes snapshots handily. It's
 ideal for that. So if you have the snapshot-intuition, and the
 transactional fs hammer -- you're apt to leap to a wrong
 conclusion: you've solved the problem!
 And if, as some of the original svn contributors were, you're
 coming from hacking CVS and it's screwy (historically constrained)
 repository format, an apparent escape route from that mess is just
 going to strengthen your convictions.
 Second mistake: The assumption that "a filesystem DB pretty much
 solves revision control -- all the rest is just a small matter of
 hacking".
C) It underwent fuzzy design conceptualization.
 I infer from some of the design documents and other materials that,
 early on, there must have been some bull sessions to plan out
 how svn would work. As an example, "history-sensitive merging" 
 has been part of the plan (such as it is) for as long as I've been
 aware of the project.
 Whatever planning there was: it didn't nail the details. Instead,
 it reduced a lot of problems, in a sort of hand-wavy manner, to
 meta-data mechanisms. I'm guessing (and inferring from docs), for
 example, that somebody straw-manned an intelligent merge operator,
 never really worried about its limitations, but worried more about 
 what kind of meta-data it needed. Since functionality like file 
 properties seemed more than adequate to record that meta-data,
 the problem was considered reduced to "a small matter of hacking".
 Well, that's the problem with some design patterns like attaching
 property lists to everything under the sun: they don't really
 solve design problems but they give you an operational language
 in which to _restate_ design problems. It's sometimes very hard to
 recognize the difference between a restatement of a design problem
 in operational terms and its actual solution. Application of
 patterns like property lists in a design bull session all too easily
 gives rise to the feeling that "all the problems we're thinking
 about have natural solutions in this design" even though all you're
 really saying is "the problems we need to solve can be expressed
 in terms of associative lookup".
 Third mistake: insufficient skepticism about their own design 
 sketches, early on.
D) Narrow design focus combined with grand technology ambitions
 The original contributors included people who worked on CVS,
 people who used CVS, and people working on products that 
 incorporate CVS. In some sense, the itch they must have had
 in common was "Get this CVS monkey off my back; I'm sick of it."
 At the same time, they (justifiably) had their eyes on a real
 treasure -- that transactional filesystem database.
 In that context, it'd be hard to get behind the idea of just
 incrementally fixing CVS. It'd be hard to invent meta-CVS, for
 example.
 As the project has progressed, over the years, those conflicting
 goals have tended to be resolved in the "get 1.0 out the door"
 direction -- a scaling back of functionality ambitions in the
 direction of CVS necessitated by the degree of difficulty of the
 grand technology ambition.
 Fourth mistake: conflicting goals at opposite ends of the ambition
 spectrum -- the low end ultimately defining official success, the
 high end providing the personal motivations.
E) Leaping to unstable proto-standards.
 SVN came into being at a time when it looked to many like HTTP and
 Apache were the spec and best implementation of the new distributed
 OS for the world that would solve everything beautifully. There
 was a kind of dog-pile onto everything W3C based on irrational
 exhuberence. Well, they weren't that OS and they don't solve
 everything beautifully.
 Fifth mistake: jumping on the W3C bandwagon.
F) The API Fallacy
 When you lack confidence about your intended way to implement
 something, a common pattern is to decide to hide the implementation
 under an API. That way you can always change the implementation
 later, right?
 The problems are: (1) unless you have at least one fully worked
 design for how to implement your API, you shouldn't have any
 confidence that good implementations can exist; (2) unless you have
 at least two fully worked designs for how to implement your API,
 and they make usefully contrary trade-offs, you should really start
 to wonder whether doing extra work to make an abstraction here is
 the right way to proceed.
 Sixth mistake: assuming that defining APIs all over the place 
 would improve chances for success.
G) Collision with reality.
 The transactional filesystem idea is, indeed, conceptually simple
 and elegant. The reality of implementing it, however, is a swamp
 of external considerations and inconvenient realities.
 Supposing you want to achieve high transaction rates and
 size-scalability. You have a _lot_ to consider: locking
 (contention over the root of the tree is especially fun),
 deadlocks, logging, crash recovery, physical layout of data, I/O
 bottlenecks, network protocols, etc. etc.
 In short, implementing a really good, high-performance,
 transactional fs is an undertaking comperable in scope and
 complexity to implementing a really good, high-performance,
 relational database storage manager -- only while there's tons of
 literature and experience about RDB implementation, transactional
 filesystems are fresh territory. (As an aside, if you were to
 seriously undertake to make a transactional FS, I think you would
 not want to burden yourself with the extra work of concurrently 
 building a revision control system on top of it -- give that task
 to a separate team after you have something working.)
 Wanting to make progress simply and quickly, they spotted the
 Berekeley DB library. After all: it provides transactions with
 ACID properties for our favorite handwavy design tool -- the
 associative lookup table. As we all know, design problems can be
 "solved" simply by restating them in terms of associative tables.
 And anyway, even if Berkeley DB isn't the _best_ choice to
 implement this, it'll be the fastest way to get something working,
 and anyway it'll be hidden behind an API.
 Well, I think Berkeley DB is a lousy choice for this application.
 It creates administrative headaches, and it's optimized for simple
 associations, not hierarchical filessytems. It doesn't natively
 provide any sort of delta-compression -- you'll have to layer that.
 Ultimately _all_ that it buys you is transactions and locks --
 every other aspect is a force-fit.
 And what resulted? Sure enough: years of fighting against
 excessive space consumption, disk-filling log files, and poor
 performance, characterized by substantial rewrites of core
 functionality and API changes.
 A similar mistake happened with network protocols: W3C solves
 everything from authentication to proxying to browsers-for-free;
 WebDAV just sweetens the deal, right? Except, no, as with Berkeley
 DB for physical storage, a lot is left out and, again, the result
 has been years of rewrites and API changes trying to get somewhere
 in the neighborhood of good performance, plus lots of dependencies
 on unstable externally developed libraries.
 Seventh mistake: underestimating the degree of difficulty of a 
 transactional FS.
 Eigth mistake: overconfidence in dubiously selected building
 blocks.
H) Failure to Fail
 If a team went away for six months and came back with SVN as it
 works today, I think it'd be pretty easy to say: "That's a great and
 even useful prototype. It definately proves, at least
 schematically, the idea of using a transactional FS to back
 revision control. There are clearly some bad choices in the
 implementation, and clearly some important neglected revision
 control issues that competing projects are starting to leapfrog you
 over. And there's a heck of a lot of code here for what it does.
 Let's suspend development on it for a little while, and invest in
 a design effort to see how to get this thing _right_".
 That isn't the situation, though. A team spent _years_ on this
 and they justified the project institutionally and publicly not by
 saying "let's build a proof of concept" but by saying "let's
 replace CVS".
 And, sure enough, if you suggest to them a stop-and-think phase at
 this late date you get back, basically: "Um...no...not gonna
 happen."
 I won't label that one of their mistakes because I don't think
 the root cause is something they could have easily avoided.
 I'll label it:
 First bad circumstance: crappy socio-economic circumstances for
 smart, ambitious programmers in the free software industry and
 community -- way too much weight given to what supposedly
 successful projects "smell like" and too much resulting pressure
 on hackers to project an image resembling that false idol.
So, in summary, I don't think they're a bunch of egomaniacs or
anything. I think they rushed into something that looked like it
would be easier than it is, got boxed in by the mythologies of open
source and W3C, and now have way too much momentum and contraints to
do much about it.
What's really disappointed me most, though is that, while I do
perceive them as smart and ambitious, they don't seem terribly
open-minded about stepping back to review their project for deep
structural mistakes that need fixing. My sense is that most of them
are pretty young and several have been associated with some successful
projects (like Apache and CVS) -- good, young programmers, since they
tend to be capable of so much more than their average peers, often
fall into a pit of overconfidence which is hard to recognize from the
inside until you've experienced a few disasters. The situation is
made worse since there's so little effective mentoring in the industry
from old-salts who are good at making a religion of the
K.I.S.S. principle and making fun of the wealth of bloated, crappy,
yet slow-to-fail stall-ware projects that dominate so much of the
landscape. If you ask me, explosive growth during the dot-com bubble
really blunted the technology edges of the free software movment and
our industry generally. It left us collectively struggling to do
things the hard way, svn being just one small example.
- -t