skip to main | skip to sidebar
Showing posts with label git. Show all posts
Showing posts with label git. Show all posts

Thursday, December 4, 2008

What makes the Camp and Darcs VCSs unique

In short, not necessarily viewing a VCS repository from a purely chronological/historical perspective, but from a change perspective.

[埋込みオブジェクト:http://www.youtube.com/v/iOGmwA5yBn0&color1=0xb1b1b1&color2=0xcfcfcf&hl=en&feature=player_embedded&fs=1]

From the Camp website via reddit.

Wednesday, November 12, 2008

Jungle Disk R.I.P.

In the past I have tried to manage my files by storing then in Git and keeping a central repository on Amazon S3 via Jungle Disk. I was using this system from two different macs and a linux machine and when it worked, it worked well, but the intermittent corruption/synchronisation issues were a killer.

My fallback plan was to store the central repositories on encFS on a USB memory stick instead of Jungle Disk over S3. I tried with the USB stick formatted as FAT32 and then ext3 but that also had intermittent failures.

Over the last few weeks I have ditched the linux machine and stored the repositories in an encrypted sparse disk image on the USB stick formatted as Mac OS Extended (Journaled). It has worked very well so far!

Wednesday, August 20, 2008

File Management

I need to access my personal files in different locations such as home, work office or when traveling. I wish to be able to move between a number of trusted computers, rather than being tied to a specific laptop, as I like to cycle to and from work. I also don't want the hassle of running my own server at home (have done this in the past).

So my first step in solving this is to store all my files in a Distributed Version Control System. I get all the benefits of a common centralised VCS such as version history and version management between multiple machines, as well as being able to work normally when I don't have an internet connection (e.g. travelling). I have chosen Git, although Mercurial would probably be a fine choice to.

The second step is having a location for a master repository that is accessible over the internet. I could have purchased a hosted linux virtual machine, but I didn't want to deal with setting it up, security, software upgrades, etc. Git can synchronise repositories located at different points on the same file system, so I thought I would try a locally mounted, encrypted virtual file system over Amazon S3. I chose JungleDisk for this purpose.

As it is only me using these Git repositories, I only have one machine writing to the master at any one time, so I don't have to worry about concurrency issues. Secondly, whenever I clone a repository from the master, I use the --no-hardlinks option, although I am not sure if that is necessary.

In principle the ideas have worked out pretty well. I have run into some issues though. From minor to major:

  • S3 has been unavailable on two occasions, when I have tried to access it in the last three months.

  • Sometimes I have had errors pulling (synchronising) from the master. Recreating the local repository by cloning it again from the master has solved these issues. This may also be similar to the next one.

  • I have had a case where I don't get any errors pulling from the master, but I don't get the latest commits pushed from another machine either. This one has been a real pain. In the process of getting everything back to a stable state, I updated to Git 1.6.0, JungleDisk 2.10a, deleted my local JungleDisk caches and reduced the cache size down to the minimum (I would have liked to turn caching off altogether). I suspect the JungleDisk caching was the issue, but that is only a guess. Will see how things go over the next few weeks.

I now don't need backups from a file deletion point of view, as the VCS takes care of that (I am not using any of the Git feature to modify history). I also keep a subset of the machines synchronised on a daily basis, so I don't need backups from a hardware failure/lost/stolen perspective either.

Tuesday, July 8, 2008

Music Management

After going to all the trouble of ripping and encoding my CD's to a lossless format, I want to:

  • Ensure integrity of the music library, i.e. at any point be able to validate that all the files exist, their contents haven't changed and that there are no extra files.

  • Have a recovery strategy should there be a problem with the files.

Ideally, I would satisfy these requirements by placing all the music in a Distributed VCS and storing a master copy somewhere like S3. Unfortunately there are a couple of problems:
  • I tried out Git, but after the initial commit of a music file, the repository storage space on the filesystem took up twice the size of the music file. Furthermore, changing metadata such as fixing a spelling mistake in the track name and committing increases the repository by the full size of the file again. I assume this is because the files are binary and already compressed. I didn't try out Mercurial, but I expect it will be the same.

  • The music files are already large, even without the extra overhead of the previous point and the data transfer costs here in Australia are just too high.

My current solution:
  • Store the music library on a removable drive on the Mac at home.

  • Keep a copy of the music library on my computer at work by either periodically taking in the removable drive and using rsync or copying newer music onto a USB drive if physical space is at a premium, such as when cycling.

  • Put checksums of the files in a Git repository stored on both machines. I can then verify the integrity of a music library at any time. Currently I use md5deep because it can recursively process a directory tree and is available for both linux and Mac OS X. The default md5 program on the Mac does not seem to have the same feature set as md5sum on linux.

  • I also store FLAC fingerprints in the Git repository. FLAC files store a checksum of the uncompressed audio in the metadata and various tools, such as xAct on the Mac, can verify the file against that. I am not sure how useful storing the fingerprints is, but I can think of a few unlikely situations where it might be helpful, plus it is small and easy to generate anyway.

To verify a music library, I do:

$ cd $MUSIC_LIBRARY
$ md5deep -rl * | sort | diff $GIT_REPO/md5deep.txt -


where $MUSIC_LIBRARY and $GIT_REPO represent appropriate file paths.

I originally tried the matching feature of md5deep instead:

$ cd $MUSIC_LIBRARY
$ md5deep -rX $GIT_REPO/md5deep.txt *


However this does not catch the case where a file has been deleted in the music library but is still present in the Git checksum file.

Tuesday, May 6, 2008

Convert and Filter Subversion to Git

The Challenge
I have one large (25GB) Subversion repository that partly has a structure like this:


/brad
/docs
/finances
/foo
...
I wish to convert the docs subtree (including history) into its own Git repository, without the foo directory.

The Solution
One way to achieve this would have been to dump the repository, filter the history, create a new repository, load the filtered history and then convert with git-svnimport.

Instead, I did the following:

1. Convert the docs subtree into a Mercurial repository, excluding the foo directory.
$ hg convert --filemap filemap --config convert.hg.usebranchnames=False file:///path/to/svnrepos/brad/docs docs-hg

filemap is a file in the current directory with only one line in it:
exclude foo

The effect of --config onvert.hg.usebranchnames=False is to import onto the default branch in Mercurial. Without it, a docs branch would have been used and carried over to Git in the subsequent steps. I wish the final Git repository to just have the conventional master branch.

2. Convert the Mercurial repository to Git.
$ mkdir docs
$ cd docs
$ git init


I installed Mercurial via MacPorts, so to get fast-export to work, I needed to use the right Python:
$ export PYTHON="/opt/local/bin/python2.5"
$ /path/to/fast-export/hg-fast-export.sh -A ../authors.txt -r ../docs-hg


The -A ../authors.txt simply maps the Subversion commit username to a normal Git author format. Same as git-svnimport.

$ git checkout master

3. Remove the intermediate Mercurial repository:
$ cd ..
$ rm -rf docs-hg


I did a diff of the docs subdirectories in the Subversion and Git working copies and did a quick check of the history. Looks like it worked successfully.

Monday, May 5, 2008

And the Winner is: Git

I have decided to move from Subversion to a distributed VCS and have been considering Git and Mercurial. I have settled on Git for the following reasons:

  • Git seems more granular. I expect this to provide more flexibility to adapt to different circumstances, but at a greater learning time cost.

  • The way tags are managed in Mercurial (.hgtags) looks a bit odd.

  • The notion Git has of tracking content rather than files is interesting, although I don't understand the ramifications yet.
To be fair, either Mercurial or Git would be suitable for my current needs. Mercurial was initially more attractive as it seemed simpler to get up and going and the Subversion import works better on my existing repository.

There were a couple of interesting posts I found along the way: Experimenting with Git and The Differences Between Mercurial and Git.

Friday, March 7, 2008

Converting a Subversion repository to Git

Previously, I had tried converting from a local Subversion repository to Mercurial. I thought I would try converting the same projects into Git.

Firstly, to install Git on the mac with Subversion support:
$sudo port install git-core +svn

Secondly, setup an authors file to map Subversion committer names to appropriate Git format.

Now my local svn repository has multiple projects in it. The layout is like:


bar/artifacts
branches
tags
trunk
foo/artifacts
branches
tags
trunk

git-svnimport seems to expect the standard svn directories (branches, tags, trunk) to be at the root. So for each import, the paths to these directories need to be specified explicitly.

Foo is a fairly new project with no branches, 2 tags and about 27 commits.
$git-svnimport -C foo -A authors.txt -T foo/trunk -t foo/tags file:///path/to/repos

Completed with no errors.

Bar is an older project with 1 branch, 13 tags and about 464 commits.
$git-svnimport -C bar -A authors.txt -T bar/trunk -t bar/tags -b bar/branches file:///path/to/repos
Initialized empty Git repository in /Users/cbrad/tmp/bar/.git/
...
fatal: Needed a single revision
64: cannot find commit 'origin'!
...
Use of uninitialized value in system at /opt/local/bin/git-svnimport line 877.
fatal: Failed to resolve '' as a valid ref.
Cannot create tag 10:

There were many of the 'fatal: Needed a single revision' errors.

This article mentions the potential for issues if the svn repository layout had changed over time. The bar project was one of the first things in this repository, so it has been moved and changed several times, as the overall repository layout evolved. I tried doing imports around the movement points in the history, with the view to merge (like in this post), but that also failed.

I guess I can start a fresh Git repository, add the current project head and keep the svn repository lying around for reference, but I was hoping not to have to do that.

Thursday, March 6, 2008

John Goerzen and Git

John Goerzen has been writing about switching from Mercurial to Git recently on his blog.

Subscribe to: Comments (Atom)
 

AltStyle によって変換されたページ (->オリジナル) /