indolence log

Thu, 10 Apr 2008

Select and Python Generators

One of the loveliest things about Unix is the select() function (or its replacement, poll()), and the way it lets a single thread handle a host of concurrent tasks efficiently by just using file descriptors as work queues.

Unfortunately, it can be a nuisance to use – you end up having to structure your program as a state machine around the select() invocation, rather than the actual procedure you want to have happen. You can avoid that by not using select() and instead just having a separate thread/process for every task you want to do – but that creates a bunch of tedious overhead for the OS (and admin) to worry about.

But magically making state machines is what Python’s generators are all about; so for my little pet project that involves forking a bunch of subprocesses to do the interesting computational work my python program wants done, I thought I’d see if I could use that to make my code more obvious.

What I want to achieve is to have a bunch of subprocesses accepting some setup data, then a bunch of two byte ids, terminated by two bytes of 0xFF, and for each of the two byte inputs to output a line of text giving the calculation result. For the time being at least, I want the IO to be asynchronous: so I’ll give it as many inputs as I can, rather than waiting for the result before sending the next input.

So basically, I want to write something like:


def send_inputs(f, s, n):
	f.write(s) # write setup data
	for i in xrange(n):
		f.write(struct.pack("!H", i))
	f.write(struct.pack("!H", 0xFFFF))

def read_output(f):
	for line in f:
		if is_interesting(line):
			print line

Except of course, that doesn’t work directly because writing some data or reading a line can block, and when it does, I want it to be doing something else (reading instead of writing or vice-versa, or paying attention to another process).

Generators are the way to do that in Python, with the “yield” keyword passing control flow and some information back somewhere else, so adopting the theory that: (a) I’ll only resume from a “yield” when it’s okay to write some more data, (b) if I “yield None” there’s probably no point coming back to me unless you’ve got some more data for me to read, and (c) I’ll provide a single parameter which is an iterator that will give me input when it’s available and None when it’s not, I can code the above as:


def send_inputs(_):
	# s, n declared in enclosing scope
	yield s
	for i in xrange(n):
		yield struct.pack("!H", i))
	yield struct.pack("!H", 0xFFFF)

def read_output(f):
	for line in f:
		if line is None: yield None; continue
		if is_interesting(line):
			print line

There’s a few complications there. For one, I could be yielding more data than can actually be written, so I might want to buffer there to avoid blocking. (I haven’t bothered; just as I haven’t worried about “print” possibly blocking) Likewise, I might only receive part of a line, or I might receive more than one line at once, and afaics a buffer there is unavoidable. If I were doing fixed size reads (instead of line at a time), that might be different.

So far, the above seems pretty pleasant to me – those functions describe what I want to have happen in a nice procedural manner (almost as if they had a thread all to themselves) with the only extra bit the “None, None, continue” line, which I’m willing to accept in order not to use threads.

Making that actually function does need a little grunging around, but happily we can hide that away in a module – so my API looks like:


p = subprocess.Popen(["./helper"], stdin=PIPE, stdout=PIPE, close_fds=True)
comm = communicate.Communication()
comm.add(send_inputs, p.stdin, None)
comm.add(read_output, None, p.stdout, communicate.ByLine())
comm.communicate()

The comm.add() function takes a generator function, an output fd (ie, the subprocess’s stdin), an input fd (the subprocess’s output), and an (optional) iterator. The generator gets created when communication starts, with the iterator passed as the argument. The iterator needs to have an “add” function (which gets given the bytes received), a “waiting” function, which returns True or False depending on whether it can provide any more input for the generator, and a “finish” function that gets called once EOF is hit on the input. (Actually, it doesn’t strictly need to be an iterator, though it’s convenient for the generator if it is)

The generator functions once “executed” return an object with a next() method that’ll run the function you defined until the next “yield” (in which case next() will return the value yielded), or a “return” is hit (in which case the StopIteration exception is raised).

So what we then want to do to have this all work then, is this: (a) do a select() on all the files we’ve been given; (b) for the ones we can read from, read them and add() to the corresponding iterators; (c) for the generators that don’t have an output file, or whose output file we can write to, invoke next() until either: they raise StopIteration, they yield a value for us to output, or they yield None and their iterator reports that it’s waiting. Add in some code to ensure that reads from the file descriptors don’t block, and you get:


def communicate(self):
    readable, writable = [], []
    for g,o,i,iter in self.coroutines:
        if i is not None:
            fcntl.fcntl(i, fcntl.F_SETFL, 
                        fcntl.fcntl(i, fcntl.F_GETFL) | os.O_NONBLOCK)
            readable.append(i)
        if o is not None:
            writable.append(o)
    
    while readable != [] or writable != []:
        read, write, exc = select.select(readable, writable, [])
        for g,o,i,iter in self.coroutines:
            if i in read:
                x = i.read()
                if x == "": # eof
                    iter.finish()
                    readable.remove(i)
                else:
                    iter.add(x)

            if o is None or o in write:
                x = None
                try:
                    while x is None and not iter.waiting():
                        x = g.next()
                    if x is not None:
                        o.write(x)
                except StopIteration:
                    if o is not None:
                        writable.remove(o)
    return

You can break it by: (a) yielding more than you can write without blocking (it’ll block rather than buffer, and you might get a deadlock), (b) yielding a value from a generator that doesn’t have a file associated with it (None.write(x) won’t work), (c) having generators that don’t actually yield, and (d) probably some other ways. And it would’ve been nice if I could have somehow moved the “yield None” into the iterator so that it was implicit in the “for line in f”, rather than explicit.

But even so, I quite like it.

Sat, 12 Jan 2008

Tue, 08 Jan 2008

Baby Got Bloat

With the whole incipient git obsession I’ve been cleaning out some of my scratch dirs. In one, last touched in mid-2006, I found:


Oh. My. God.
Becky, look at that bloat!
It's so big...
It looks like one of those Microsoft products...
Who understands those Microsoft guys anyway?
They only code that crap because they're paid by the line...
I mean the bloat...
It's just so slow...
I can't believe it's so laggy...
It's just bloated...
I mean, gross...
Look, that just ain't a Hack.

I like big apps and I cannot lie.
You other bruthas can't deny,
That when some perl comes by, not a symbol to waste
Like line-noise, cut and paste --
You're bewitched;
But now my context's switched,
Coz I notice that glest's got glitz.
Oh BABY! I wanna apt-get ya,
Coz you got pictures,
Those hackers tried to warn me,
But the bling you got
/Make me so horny/
Oooo, app fantastic,
You say you wanna fill up my drive?
Well, use me, use me, coz you ain't that average GUI.

I've seen them typing,
To hell with reciting,
I point, and click, and never miss a single trick.

I'm tired of tech websites,
Sayin' command lines are the thing.
Ask the average power user what makes them tick --
You gotta point and click.

So hackers! (Yeah!) Hackers! (Yeah!)
Has your UI got the G? (Hell Yeah!)
Well click it (click it), click it (click it), and use that healthy glitz,
Baby got bloat.

(vi code with a KDE UI...)

And before you ask, no, I don’t know what I was drinking…

Sat, 05 Jan 2008

User configuration

Inspired mostly by Joey’s nonchalant way of dealing with the death of his laptop…

This seems less of a disaster than other times a laptop’s disk has died on me. When did it start to become routine? […] My mr and etckeeper setup made it easy to check everything back out from revision control. […]

…I’ve been looking at getting all my stuff version controlled too. I’ve just gotten round to checking all my dotfiles into git, and it crossed my mind that it’d be nice if I could just set an environment variable to tell apps to create their random new dot-files directly in my “.etc-garbage” repo. I figured using “$USER_ETC/foo” instead of “$HOME/.foo” would be pretty easy, and might be a fun release goal that other Debian folks might be interested in, so I did a quick google to see if something similar had already been suggested.

The first thing I stumbled upon was a mail from the PLD Linux folks who apparently were using $HOME_ETC at one time which sounded pretty good, though it doesn’t seem to have gotten anywhere. That thread included a pointer to the system that has gotten somewhere which is the XDG spec.

It’s actually pretty good, if you don’t mind it being ugly as all hell.

They define three classes of directory – configuration stuff, non-essential/cached data, and other data. That more or less matches the /etc, /var/cache and /var/lib directories for the system-wide equivalents, though if the “other data” is stuff that can be distributed by the OS vendor it might go in /usr/lib or /usr/share (or the /usr/local/ equivalents) too.

Which is all well and good. Where it gets ugly is the naming.

For the “/etc” configuration stuff, we have the environment variable $XDG_CONFIG_HOME, which defaults to ~/.config, and has a backup path defined by $XDG_CONFIG_DIRS, which defaults to /etc/xdg.

For the “/var/lib” other data stuff, we have the environment variable $XDG_DATA_HOME, which defaults to ~/.local/share, and has a backup path defined by $XDG_DATA_DIRS, which defaults to /usr/local/share:/usr/share. (Though if you’re using gdm, it’ll get set for you to also include /usr/share/gdm)

And for the “/var/cache” stuff, we have the environment variable $XDG_CACHE_HOME, which defaults to ~/.cache.

That seems to me like exactly the right idea, with way too much crap on it. If you simplify it obsessively – using existing names, dropping the desktop-centrism, you end up with:

Put configuration files in $HOME_ETC/foo or $HOME/.foo. For shared/fallback configuration, search $PATH_ETC if it’s set, or just /etc if it’s not.

Put data files in $HOME_LIB/foo or $HOME/.foo. For shared data, search $PATH_LIB if it’s set, or look through /var/lib, /usr/local/{lib,share} and /usr/{lib,share} if it’s not.

Put caches in $HOME_CACHE/foo or $HOME/.foo. For shared caches, search $PATH_CACHE if it’s set, or just look in /var/cache if it’s not.

That seems much simpler to me to the point of being self-explanatory, and much more in keeping with traditional Unix style. It’s also backwards compatabile if you use both old and new versions of a program with the same home directory (or you happen to like dotfiles). And having the XDG variables set based on the above seems pretty easy too.

I wonder what other people think – does {HOME,PATH}_{ETC,LIB,CACHE} seem sensible, or is XDG_{CONFIG,DATA,CACHE}_{HOME,DIRS} already entrenched enough that it’s best just to accept what’s fated?

Thu, 03 Jan 2008

tempus fugit

I blogged a fair bit about darcs some time ago, but since then I’ve not been able to get comfortable with the patch algebra’s approach to dealing with conflicting merges – I think mostly because it doesn’t provide a way for the user to instruct darcs on how to recover from a conflict and continue on. I’ve had a look at bzr since then, but it just feels slow, to the point where I tend to rsync things around instead of using it properly, and it just generally hasn’t felt comfortable.

On the other hand, a whole bunch of other folks I respect have been a bit more decisive than I have on this, and from where I sit, there’s been a notable trend:

Keith Packard, Oct 2006
Repository formats matter, Tyrannical SCM selection
Ted Tso, Mar 2007
Git and hg
Joey Hess, Oct 2007
Git transitions, etckeeper, git archive as distro package format

Of course, Rusty swings the other way, as do the OpenSolaris guys. The OpenSolaris conclusions seem mostly out of date if you’re able to use git 1.5, and I haven’t learnt quilt to miss its mode of operation the way Rusty does. And as far as the basics go, Carl Worth did an interesting exercise in translating an introduction to Mercurial into the equivalent for git, so that looks okay for git too.

Fri, 16 Nov 2007

Hark!

What I want for christmas:

Fri, 19 Oct 2007

Multiple Repositories -- Sumultaneously!

If, like me, you’ve been following development of Joey’s nifty new multi-repository tool and busily registering all your git and bzr and cvs and whatnot repos, you might have noticed a tantalising TODO item that’s recently appeared in the git repo:

* Ability to run commands in paralell? (-j n)

  If done right, this could make an update of a lot of repos faster. If
  done wrong, it could suck mightily. ;-)

Well, sucking mightily just means you need to prototype it first, so here’s a little add-on to mr(1) that runs multiple invocations of mr(1) simultaneously, naturally enough called mrs(1). Consideration of what that implies about superior multitasking is left as an exercise to the interested reader.

The implementation is slightly interesting: it’s a fairly simple perl script that first uses “mr list” to get a list of repositories to work with, then simply uses perl’s “open” function to run mr on each of those directories with the output piped to a filehandle. At that point, things get slightly complicated, since we want to keep them all running no matter what’s going on, so we have a select() loop that collects all the output into one buffer per command, which we put together later, and print out. And just for kicks, if the output ’s longer than 20 lines, we pipe it through less after trimming out any ugly ^M nonsense we might have had thanks to progress updates or similar.

I like it, anyway. And happily, while “mr update” takes about fifty seconds for me, “mrs update” takes about ten. Fun!

(Joey: btw, it’s “parallel” :)

Sat, 16 Jun 2007

Amazon's S3

I’ve thought S3 was pretty awesome sounding ever since it launched early last year. Sadly I haven’t managed to get it set up yet – it’s refusing to accept my credit card (which I’ve used to buy stuff from Amazon before), so I just get a NotSignedUp error and told “Your account is not signed up for the S3 service.” The webservices help has responded, but not managed to work out wtf’s going on yet.

One of the things I’d like to do with it is (the pretty obvious and staid) remote backup thing – though what I’m actually hoping for is more to get something more or less the same as the backup described in Greg Egan’s Distress, which is more or less: your laptop automatically backs itself up to the web somewhere, then when your laptop dies or gets stolen, you get a random new one, authorise it with your passcode or thumbprint or whatever, and it automatically recovers itself from the backups you’ve got on the web.

Oddly though none of the command line tools seem to quite do backup “right” (by my definition) – there’re some that encrypt, some that do an rsync equivalent, and none that I’ve seen compress, all of which seem like a requirement to me; and given the support for metadata pretty easy to actually handle.

It’d be even nicer if there were some other services with similar APIs (and cost structures!) so it wasn’t quite tying yourself to amazon so much, but until Google decide to release G-Drive or some other distributed thing happens, there’s not much to be done. Of course while amazon won’t even let me in, not much to be done…

Tue, 31 Oct 2006

Google Ate My Brane

After visiting Google for the Summer of Code Summit the other week, I thought I might actually try out some of the web services they’ve come up with, rather than just sticking with search and maps, and see if they did anything for me. To my surprise – as a certified hater of webapps generally – a couple did.

Writely, the web-based word processor, was kind-of interesting, but in the end didn’t work for me. The potential killer feature for me would’ve been SubEthaEdit or Gobby -like interactive collaboration, which seems like something Google ought to be able to do with their whacky AJAX techniques. Unfortunately, it seems to just be some sort of automated merge-on-commit, which does nothing for me.

What I’d really like as far as online document editing goes, is actually to be able to do Gobby-like editing of (moinmoin) wikis, rather than having to deal with advisory locking. I poked a bit further at that, and I suspect it ought to be possible to hack something up by using a tool like editmoin to edit wikipages with an editor rather than a webbrowser, and using gobby to do the editing, via a sobby server hosted on the same site as the wiki. It ought to be possible to automate all that complexity using an application/gobbymoin mime type; but I didn’t get anywhere because sobby seems to require IPv6 support. Oh well, maybe some other time.

I’ve played with GMail and Google Talk before, with minimal impact. GMail is kind-of nice, but I like to be able to read my mail offline, so whatever. It is useful as a backup email address if my regular one goes down though. Google Talk doesn’t seem to handle voice/video under Linux, so it’s just a Jabber server. Which is fine, since I hadn’t ever actually gotten any of these whizbang IM things setup. What’s less brilliant is that Gaim is a bit of a pain when it loses connectivity, which happens everytime I suspend my laptop, which is everytime I stop using it. But I need GMail in order to even try some of the interesting Google services these days, so whatever.

Google Calendar isn’t really something I expected much of. Sure, it’s a calendar app, but I’ve never gotten much use out of appointment diaries or planners or whatever anyway. Having it be web-based actually changes that a bit though, since it makes it trivial to publish to other people, and that even makes a calendar a little bit useful for me too. Having it be able to send reminder SMSes is also neat, at least now I’ve worked out how to default that behaviour to off… Oddly, though, I’ve found I’m getting more value out of it in listing things I’ve done rather than things I’ve got coming up. I guess it’s nicer to have a list of things you’ve actually done, rather than a list of things you should have done (but often didn’t), or a list of things you’ve got to do…

But the real winner is definitely Google Reader even if it’s still in Google Labs, rather than even being “beta”. While I’ve tried some aggregators in the past, none have remotely grabbed me, and I’ve been tending to just remember the URLs for the blogs and webcomics I like, and type them in when I’m feeling bored. That has the benefit that it limits the number of each I read, but the drawback that I waste time typing URLs and waiting for pages to load even when there haven’t been any updates. The keyboard interface to Reader is pretty pleasant, with the only drawback I’ve found a slight lag in loading entries at the start of the day. Having it be in my web browser is perfect, since I generally want to follow a few links from blog posts anyway. It’s also made it easy enough that I’ve added a few feeds from real newspapers (or news channels), which is probably a good thing as far as balancing my take on what’s going on in the world.

There’s a couple of downsides. One is that a lot of webcomics don’t have RSS feeds, or, if they do, don’t seem to include the actual comic, just a link to it. I don’t think there’s much of a reason for that – there are a few blogs I read that include ads in their RSS, so that doesn’t seem difficult to handle, and I can’t see any other potential objections. Also annoying is that posts that get aggregated on multiple planets (such as Planet Debian and Planet Linux Australia) show up multiple times, though admittedly I pretty much expected that. Probably the major downside is that it’s so easy to read stuff that I keep adding feeds to it, though…

Tue, 24 Oct 2006

Todo Lists

A while ago I read Steve Yegge’s rant about Agile development, though I’ve forgotten who linked to it. The thing that struck me as interesting was the bit about “work queues”:

With a priority queue, you have a dumping-ground for any and all ideas (and bugs) that people suggest as the project unfolds. No engineer is ever idle, unless the queue is empty, which by definition means the project has launched. Tasks can be suspended and resumed simply by putting them back in the queue with appropriate notes or documentation. You always know how much work is left, and if you like, you can make time estimates based on the remaining tasks. You can examine closed work items to infer anything from bug regression rates to (if you like) individual productivity. You can see which tasks are often passed over, which can help you discover root causes of pain in the organization. A work queue is completely transparent, so there is minimal risk of accidental duplication of work.

Sadly, googling for “work queue” doesn’t come up with any sort of todo list stuff, but rather a multiprocessing scheduling tool which is cool, but not immediately relevant for me. As far as I can see, whatever was actually being talked about either isn’t public, or is more of a concept than an actual tool.

The only Google Todo thing I could find was an applet thing for the personalised google homepage, which just lets you make todo items and set them as high/medium/low priority. And while that might be all that Steve Yegge was talking about, it doesn’t really feel terribly inspiring to me.

I guess what I’d really like is to have todo items get assigned to a project (so that I can ignore all the todos for projects I don’t want to worry about atm), and also to be able to give them a deadline (so I can treat them with a bit more urgency when necessary) and a priority (so that I can easily spot things I’m willing to defer or ignore completely when I find I don’t have time to do everything I’d like).

I suspect I’ll probably just stick with writing notes in vi to keep track of things, though, same as I have been for years.

Tue, 17 Jan 2006

The GPL Keeps Me Awake At Night

Well, actually that confuses cause and effect. Anyway, a draft of the GPLv3 is out, and, at least at first glance I’m pretty impressed. Let’s add a break, since probably everyone’ll be throwing their two cents in soon enough anyway.

Read the rest ...

Thu, 11 Aug 2005

Code Comments Hate My Freedom

Stewart and Michael have chimed in on whether comments in code are evil or not. Michael reiterates the industry wisdom:

Getting the level of commenting right is hard, especially if you haven’t written much code, or if you are unfamiliar with the domain or the implementation language. But commenting done right can greatly assist yourself and others when you revisit that chunk of code - whether that be to find that heisenbug, or to add new functionality, or even just understand what you were trying to achieve back 3 weeks ago.

I’ve written a reasonable amount of code, and I still find getting the “right” amount of commenting right pretty hard. I’ve even tried a few weird and whacky “commenting” styles, like programming from formal specifications where you specify what you want to achieve and then work from that to the actual code, and literate programming where your primary work is the commentary and the code is scattered about that. Admittedly, I only tried the formal specification stuff at uni, and only for non-real-world problems. In any case, the impression I got from the former was that programming from maths tends to be both harder, and similarly likely to result in mistakes (although hopefully at a more detectable level, of course), and from the latter that explaining things in english doesn’t tend to be that much easier than writing the code (so doing both is twice as much work) and probably causes as much hassle in maintenance as it solves.

Read the rest ...

Wed, 18 May 2005

Shell Hacks

Complex and inefficient shell snippets? Sign me up!

Here’s my version:

dselect update
cat /var/lib/dpkg/available | sed -n 'p;s/^.//p' | sed 's/../&\
/g' | tr A-Z a-z | grep '[a-z][a-z]' | sort | uniq -u

It doesn’t quite match the original – having the letter pair appear twice for a single package will disqualify it, rather than disqualification only happen when it appears in two packages, and it probably looks at more text than it should (like field names as well as contents). On the other hand, it’s a damn sight quicker (O(n*lg(n) time, O(n) space, where n is the size of your available file), and it includes sed fun.

Tue, 10 May 2005

Coding and Codeine

I originally wrote this entry in the middle of last month when the news of the moment was that Linus was giving up Bitkeeper. Then I went to linux.conf.au and got distracted, and by the time I got to thinking about finishing it off and posting it, it didn’t seem relevant anymore, so I deleted it. Then came the spat about Scott and Ian and I thought I’d post anyway, and went to find a copy I could resurrect.

Read the rest ...

Sat, 01 Jan 2005

Test Cases

One of the curious things about testing software is that as far as increasing your confidence in the code is concerned, it’s better to have a test suite that finds bugs, rather than one that doesn’t.

The problem is that for any piece of software that ever works, you can find an infinite number of test cases that pass, no matter how buggy it actually is. So if those are the only test cases you come up with, your testing methodology hasn’t actually achieved anything you hadn’t already managed to do when you were writing the code in the first place.

Read the rest ...

Tue, 21 Dec 2004

Worth Repeating

In his epic battle with Adrian over exceptions, Ben mentioned:

Save your work? That’s for sissys. Use a journalling file-saving model. Save everything the user does immediately. You can support the traditional file save/load facility using checkpoints or other niceties but I fail to see why any application in this modern age of fast hard drives should ever lose data the user has entered more than a few hundred milliseconds ago.

That’s really very true. Certainly there are caching issues – memory is much faster than disk, and you don’t want to store intermediate state all the time, and likewise you probably want to keep some history, but certainly don’t want to keep all of it. But generally, disk is fast enough (compared to user interaction and network speeds) that there really isn’t any excuse for losing data.

If my battery and power shorted out right now, I’d lose: (a) this entry; (b) the fact I’ve got Planet Humbug open to read Ben’s blog and my Back-button history; (c) the fact I’ve got three terminals open, and their history (both scrollback and command invocations); (d) what email I’m currently reading.

Saving this blog entry takes at most half a second, mostly because to do so I have to hit some keys and a menu gets highlighted. Saving all my scrollback from scratch seems to take under 5 seconds, mostly because I have to cut and paste it into another application first. And the other two items are fairly trivial annotation issues.

This isn’t even that hard an issue to solve: it’s only relevant for long-running apps with user interfaces – so xterm and bash, but not sed or ls. It requires somewhere for apps to dump their status ($TMPDIR or a dotfile in $HOME). And it requires some code to do recovery, possibly with a user interface in order to choose which point in history you want to recover. And given those issues, for many apps, there’s no reason not to then just automatically recover after a crash, at least as long as you give the user an easy way of avoiding getting into a “recover to a point where you always crash; always recover when you crash” loop (or at least an easy way of getting out of it).

Thu, 16 Dec 2004

Yay! Memory!

Decided to wander into NextByte to see about getting some more ram for my nice new iBook today. I’ve been tossing up whether to go for an extra 512MB (for 768MB total) or an extra GB (for 1280MB total) – I really wanted as much as possible, since OS X is a memory hog and I want to run virtual hosts on it too; but I couldn’t justify paying three times as much for twice as much memory. In particular, when I rang a few weeks ago, I was quoted something like $600 for memory, and $500 for a copy of VirtualPC with a copy of Windows XP Pro I didn’t want. Yick.

So anyway, I finally got fed up with 256MB, and decided it was time to, well, decide. So as I wandered past the shelves, I noticed the pretty boxes Microsoft has for VirtualPC, and in particular noticed they had Windows 2000 and Windows XP Home variants. They also had a “standalone” variant, that doesn’t include a Windows license! Woot! Also woot-worthy is that the price of a GB had tumbled to only around twice the cost of 512MB. So I said “yay, gimme, gimme”, and got the RAM installed there and then, and got told that the (shrinkwrapped) VirtualPC box was display only (ie, completely empty) and they’d order it in. So, soon I shall have some active Debian and Ubuntu development installs again. Sweet.

Sat, 11 Dec 2004

Thoughts on Darcs and Merging

One of the harder aspects of version control is dealing with merging issues. Normal development is straightforward – all you’re essentially doing is providing an annotated “undo” feature. darcs manages that, IMO, perfectly. And to be honest, that’s probably 80% of what I want form a version control system. But dealing with merging different lines of development is important too – it’s probably 80% of the remaining 20% :)

darcs doesn’t actually do too badly there – when you’re working on separate parts of the code, darcs will do a merge for you automatically quite happily. Where it falls apart is when the changes affect the same bit of code, and can’t be resolved automatically; I find myself really disliking darcs’ behaviour there, even independent of the performance issues.

Read the rest ...

UI Thoughts

One of the central ideas in Jef Raskin’s book The Humane Interface is that the “zooming” interface – rather than 2d windows that you shift around and overlay on each other, you have a huge canvas that you can zoom into and out of, as well as move around on. Obviously your screen only displays a small portion of that canvas at any one time. A further implication is that you have a “document centric” model, and thus that your data is always visible, and you just start editing it rather than starting an app to view/modify it.

That has pretty heavy implications, and renders most existing software simply unsuitable for a “ZUI”. So while there’s a couple of attempts at implementing them, they’re nowhere near being generally usable. Still, they’re a cool idea.

The thing I like most about them (in theory) is that you can get a broad, 2d overview of everything you’re doing, then zoom in on a bit and work on that. I hate trying to work out how to organise myself in the file system, because I always leave crap lying around where it ends up getting in the way later.

But if you’re willing to compromise a bit, it’s probably possible to fake this. If you consider the “project view” to be whatever you can see on the screen (after using Exposé to get rid of overlapping windows maybe), then a ZUI is just a matter of being able to see many project views at once, possibly scaled, move amongst them, and zoom into one and make it functional. There no great need for zoomed out projects to be functional – you can’t see what you’re typing anyway.

That’s not impossible. If you have two modes: “zoomed out” for navigation, and “zoomed fully in” for manipulation, then the latter is obviously implemented by just having running apps, and the former can be implemented by: (a) taking a snapshot of the apps; (b) closing them; (c) embedding the snapshot in a larger canvas, along with other snapshots; (d) allowing you to navigate around the canvas; (e) when a snapshot is selected, restarting all the apps in the same state as when they were closed. (a), (c) and (d) are just graphics manipulation, and are trivial. (b) alone is trivial; (b) and (e) combined are something KDE and Gnome at least have been doing with session management for ages.

A real ZUI would let you drag components from one section of the canvas to another, too. And let you scale running apps so you can get an overall idea of what they’re displaying but not worry about them taking up much screen space. And a zillion other things too.

Wed, 08 Dec 2004

Darcs Hacking!

Cripes. This was meant to be a quick followup note about some more quick darcs hacks. So much for that – I’ve had to write an outline for this post for heaven’s sake.

(Side note: if someone wants a new title for their blog, the above’s free of charge!)

So, when last we met, darcs-repo had just come into the world, and we were still choking on the cigar smoke. Following that there were a couple of discussion threads. Interesting mails include this one, so that you just ask for a repository rather than a “branch” of a “project”, and the program works out how that’s stored, or this one (and its followups) about naming a collection of related repositories an “archive”, and changing the name from darcs-repo to darcshive. This one (and followups from December) includes some (applied!) patches to darcs itself to let me get rid of the horrific ssh/scp hacks.

Where does that leave us? Pretty much at the point of moving from a prototype/proof-of-concept darcs-repo to a functional darcshive. It’s been essentially self-hosting from the beginning, but a more challenging task is hosting darcs itself – since it’s likely that darcs excercises most of the interesting features of the darcs repository format.

Read the rest ...