Darcs and Repositories

I think it’s reasonable to consider two sorts of “repository” when dealing with darcs — public repositories that are used to reflect a particular line of development, and private working directories that are used to actually do development. Unfortunately there’s some overlap here, pretty much taking the form of “copying your working directory around”.

The difference between the two main classes are nice and clear: for working directories you want as much control over what happens as you can get; and for public repositories you want consistency and accessibility. Which means working directories need to be local, public repositories can be remote; public repositories need to be consistent and append-only, while working directories can be “unpulled”, “unrecorded” and “reverted” as often as you like.

Now, darcs already handles working directories fine; but it’s arguably a bit too flexible as far as public repositories are concerned. We’ll just ignore the “in between” case and, presuming that one or the other extreme will be good enough in practice, work on adding some better support for public repositories.

(As an aside, I’m writing this entry concurrently with designing the actual code, kind-of a weird amalgam of blogging and literate programming. I wonder how it’ll work out.)

For public repositories we fundamentally want to be able to store patches easily: so we need to be able to do darcs push, and we want it to be reasonable space efficient. We also want to be able to get patches easily; hence we need darcs pull support, and we want it web accessible. We want to be able to deal with different projects, with different branches of a project, and different versions within a branch.

In order to have something that works with darcs directly, we only need to make available the patches for each tree we’ve got checked in, and the “inventory”, which tells us which patches the tree incorporates, and in what order they should be applied. Different trees in the same project will tend to have the same patches with one caveat: when they’re applied in different orders minor details will differ, particularly line numbers. Patches have a unique name based on the date they were created, the log message and some other things that doesn’t change when the patch is merged into a new tree, even if the actual contents of the patch is munged somewhat.

The way we’ll do this, then, is to support many branches per project, and share patches between branches. When a single patch takes a different form between two branches, we’ll represent that as the original patch, and a diff from the original to the variant — so rather than duplicating the entire patch, we’ll only note the number line number where it’s to be applied, eg. That should be both sufficiently capable and efficient enough that I don’t worry about wasting space gratuitously. We’ll just store each branch’s inventory as a separate file.

How about naming? I think for the moment it’s best to say “freeform filename”, so say anything matching [a-zA-Z0-9+_.,-]+, and adopt the arch style of using foo--bar to indicate “bar” is a sub-branch of “foo”. Then I can say mainline--1.0 and have separate branches per major version, or debian--1.0.1 to record Debian changes, or debian-nmu--1.0.1-3 to record the NMUs to the 1.0.1-3 Debian package. If using slashes or something later turns out to be convenient, it can be hacked in later. For now, KISS.

Finally, access. We want, perhaps, three forms of access: management, commits, and retrievals. In reverse order, retrievals we want to do over anonymous http, so we’ll need a CGI script to adapt our storage into precisely what darcs expects; for commits we’ll need to write some scripts at which to point DARCS_SSH and DARCS_SCP so we can get control over the process, and we’ll need to write a darcs-repo script to sit on the server that actually manages the repository and tries to avoid storing too much data.

In order to have the ssh hooks called, we need our repository name to be of the form [a-z]+:.*; and it seems like the most forwards-compatible thing to do is make it be darcs-repo:server/project/branch. Using URI syntax and putting a double-slash before the server name would unfortunately make darcs try to use curl instead, and wouldn’t get us anywhere. But hey, this is a “near enough” project. We’re also not going to worry about how you’d specify a username — .ssh/config will do for the time being.

The darcs-repo script needs two modes — a get mode that can give us a branch inventory or a patch from a branch, and an apply mode that will actually commit to a branch. The former’s simple — it’s just a matter of unmunging our stored patches. The latter’s trickier: we need to be able to parse darcs’ “apply” format, and we need to make sure that the patch actually applies to our repository. Fortunately, we can do this by just comparing the presented context to what we’ve actually got in the inventory — if they’re not a perfect match, rather than performing a merge like darcs would, we can just error out; that’ll even get passed back to the user, so it’s all good.

The CGI script can then basically just call darcs-repo get based on the URL it’s given. Easy.

darcs-ssh and darcs-scp are also easy — they just need to catch invocations of ssh darcs-repo .. and convert them to ssh host darcs-repo apply project branch, and convert invocations of scp darcs-repo:host/project/branch dest to ssh host darcs-repo get project branch >dest.

So, all that said, here’s an implementation of the above.

Bugs? Yes, there are some — darcs’ patch format isn’t parsed properly, so if you find yourself setting a preference to be “}”, you might have problems. --apply-as isn’t supported. Error messages also aren’t very nice. The CGI might also be a bit slow. Tags or checkpoints or contexts could break things.

Missing features? Creating new projects and branches has to be done manually (with a mkdir and a touch inventory-branch respectively). It’s probably also be nice to have tarballs automatically made from checked in code. Otherwise, it seems like a good first pass at the idea.

Neat trick learnt? If you want to apply a patch without actually writing the new file to the filesystem anywhere, one way is to use diff --ed style patches, and feed them to red - origfile, followed by the command 1,$p to print the entire resulting file to stdout.

Leave a Reply