Thu, 14 Oct 2004
I think it’s reasonable to consider two sorts of “repository” when dealing with darcs – public repositories that are used to reflect a particular line of development, and private working directories that are used to actually do development. Unfortunately there’s some overlap here, pretty much taking the form of “copying your working directory around”.
The difference between the two main classes are nice and clear: for working directories you want as much control over what happens as you can get; and for public repositories you want consistency and accessibility. Which means working directories need to be local, public repositories can be remote; public repositories need to be consistent and append-only, while working directories can be “unpulled”, “unrecorded” and “reverted” as often as you like.
Now, darcs already handles working directories fine; but it’s arguably a bit too flexible as far as public repositories are concerned. We’ll just ignore the “in between” case and, presuming that one or the other extreme will be good enough in practice, work on adding some better support for public repositories.
(As an aside, I’m writing this entry concurrently with designing the actual code, kind-of a weird amalgam of blogging and literate programming. I wonder how it’ll work out.)
For public repositories we fundamentally want to be able to store
patches easily: so we need to be able to do darcs push,
and we want it to be reasonable space efficient. We also want to be able
to get patches easily; hence we need darcs pull support,
and we want it web accessible. We want to be able to deal with different
projects, with different branches of a project, and different versions
within a branch.
In order to have something that works with darcs directly, we only need to make available the patches for each tree we’ve got checked in, and the “inventory”, which tells us which patches the tree incorporates, and in what order they should be applied. Different trees in the same project will tend to have the same patches with one caveat: when they’re applied in different orders minor details will differ, particularly line numbers. Patches have a unique name based on the date they were created, the log message and some other things that doesn’t change when the patch is merged into a new tree, even if the actual contents of the patch is munged somewhat.
The way we’ll do this, then, is to support many branches per project, and share patches between branches. When a single patch takes a different form between two branches, we’ll represent that as the original patch, and a diff from the original to the variant – so rather than duplicating the entire patch, we’ll only note the number line number where it’s to be applied, eg. That should be both sufficiently capable and efficient enough that I don’t worry about wasting space gratuitously. We’ll just store each branch’s inventory as a separate file.
How about naming? I think for the moment it’s best to say “freeform
filename”, so say anything matching [a-zA-Z0-9+_.,-]+, and
adopt the arch style of using foo--bar to indicate “bar”
is a sub-branch of “foo”. Then I can say mainline--1.0 and
have separate branches per major version, or debian--1.0.1 to
record Debian changes, or debian-nmu--1.0.1-3 to record the
NMUs to the 1.0.1-3 Debian package. If using slashes or something later
turns out to be convenient, it can be hacked in later. For now, KISS.
Finally, access. We want, perhaps, three forms of access: management,
commits, and retrievals. In reverse order, retrievals we want to do over
anonymous http, so we’ll need a CGI script to adapt our storage into
precisely what darcs expects; for commits we’ll need to write some scripts
at which to point DARCS_SSH and DARCS_SCP
so we can get control over the process, and we’ll need to write
a darcs-repo script to sit on the server that actually
manages the repository and tries to avoid storing too much data.
In order to have the ssh hooks called, we need our repository
name to be of the form [a-z]+:.*; and it seems
like the most forwards-compatible thing to do is make it be
darcs-repo:server/project/branch. Using
URI syntax and putting a double-slash before the server name would
unfortunately make darcs try to use curl instead, and wouldn’t get us
anywhere. But hey, this is a “near enough” project. We’re also not going
to worry about how you’d specify a username – .ssh/config
will do for the time being.
The darcs-repo script needs two modes – a get mode that
can give us a branch inventory or a patch from a branch, and an apply
mode that will actually commit to a branch. The former’s simple – it’s
just a matter of unmunging our stored patches. The latter’s trickier:
we need to be able to parse darcs’ “apply” format, and we need to make
sure that the patch actually applies to our repository. Fortunately,
we can do this by just comparing the presented context to what we’ve
actually got in the inventory – if they’re not a perfect match, rather
than performing a merge like darcs would, we can just error out; that’ll
even get passed back to the user, so it’s all good.
The CGI script can then basically just call darcs-repo get
based on the URL it’s given. Easy.
darcs-ssh and darcs-scp are also easy
– they just need to catch invocations of ssh darcs-repo
.. and convert them to ssh host darcs-repo apply
project branch, and convert invocations of scp
darcs-repo:host/project/branch dest
to ssh host darcs-repo get project branch
>dest.
So, all that said, here’s an implementation of the above.
Bugs? Yes, there are some – darcs’ patch format isn’t parsed properly,
so if you find yourself setting a preference to be “}”, you might have
problems. --apply-as isn’t supported. Error messages also
aren’t very nice. The CGI might also be a bit slow. Tags or checkpoints
or contexts could break things.
Missing features? Creating new projects and branches has
to be done manually (with a mkdir and a touch
inventory-branch respectively). It’s probably also be nice
to have tarballs automatically made from checked in code. Otherwise,
it seems like a good first pass at the idea.
Neat trick learnt? If you want to apply a patch without actually
writing the new file to the filesystem anywhere, one way is to use
diff --ed style patches, and feed them to red -
origfile, followed by the command 1,$p to
print the entire resulting file to stdout.
