As I've been exploring the OSM rails app for other data, Git has hovered in the background of my thoughts, and I've been watching GeoGit and GitHub Geo Features closely. The conceptual basis of Git, distributed version control, solves issues we come up against regularly in OpenStreetMap, like how to keep an "authoritative" data source and community data in sync or how do we support offline editing, in areas with bad or non-existent net (something to explore with BRCK perhaps). As Jeff Johnson says, "OSM is a geodata repository with just a single branch".
Chris Holmes gives a thorough recap of BoundlessGeo's rational and work so far with GeoGit (part 1 part 2) including experiments with using git itself. Git is built around managing revisions of individual files, and hits performance issues with very large files, or very large numbers of directories (which early GeoGit experimented with, using a directory hierarchy to support quad-tree indexing). So they worked to decouple Git's set of verbs from its backend, and implement those concepts on top of spatial databases, and provide special verbs particular to interact with OpenStreetMap (or even perhaps, OSM clones). Perhaps that's comparable to git integrating with svn.
The work looks really promising, though they are still working on the internal technical challenges, and they've set the bar high, to fork the entirety of OSM including all history! They're tremendously talented, so I expect they can get there. But what then? Git without the interface and social features of github is a frustrating experience. Replicating that kind of community space for GeoGit is another tall tall order, and I expect not the first order of business for a GIS oriented customer base.
Now, if interacting with OSM clones is a core part of GeoGit, then perhaps can simply use the OSM application ecosystem for editing and socializing on an individual branch. And in that case, makes sense to invest effort in improving OSM website features for Moabi, and leverage any future interoperability with GeoGit itself.
"GitHub Geo" takes another approach. Accept the file limitations of git, GitHub and browser client displays (split up large files, if needed, etc). For rendering GeoJSON on GitHub, the limits seem to be in the 5-10 MB range. They seem to be loading the data as a GeoJSON layer in Leaflet/MapboxJS, so a performance improvement would be rendering of that data into map tiles and utf8grids. I would guess that there's already been thinking into what kind of infrastructure would be needed to support that, and they're watching uptake of Geo features before making that kind of investment.
The approach takes full advantage of GitHub social functions, and there's been some fun examples of this, and it will be great to see if the city of Chicago will accept pull requests. Just as important, the GitHub API to build applications to interact with GeoJSON files. This is what MapBox has been doing with Prose.IO to edit gitpages, and their GeoJSON.io provides an excellent editing environment for geodata coupled to GitHub, and would be the first mapping example of this pattern. (And as a mindbender, the entire GeoJSON.io site is hosted with GitPages). GeoJSON.io also has performance limits, probably similar to what we've seen with iD (I assume some of the internals are similar, but I haven't looked).
Another service matching this pattern launched this week is GitSpatial, which selectively syncs GeoJSON files in your repos, and provides a simple geospatial query API, for example an API for Kenyan constituency boundaries counties near Nairobi.
This pattern could conceivably be used to build a rendering service, that could build those tile sets and grids, or vector tiles, from very large GitHub GeoJSON files. Or a combination of GitSpatial indexing & geo API and GeoJSON.io could build the ability to edit a select area of a large file, and commit just those changes. That would I suppose require GitSpatial reassembling the GeoJSON features in order they were received, or using some convention for ordering features based on a geographic index, and committing a diff to that file. Another useful service would be visualizing geographic diffs, something that OSM itself (or anything really) doesn't do particularly well. Though that feature is something I could expect from GitHub soon, seeing that they are now doing 3D file diffs.
Large geographic data collaboration today
This is all so new and untested, and so far, not really built with large data sets in mind. Unlike the OSM architecture, API and ecosystem, which can pretty solidly handle loads of data and provide lots of services. It's hard to get this kind of glimpse of the future, but also have the needs of today to grapple with. For now, I reckon OSM is a really good place to experiment and build, while we'll keep a close eye on these other approaches.