Adding content from CVS

I wanted to add some old content to the Subversion repository I keep my Daizu websites in. I had this old stuff in CVS, so I thought it would be nice to keep that revision history rather than just copy the latest version of the files into Subversion. Not that I really care about those old commits in this case, but this is something I want Daizu to be able to cope with, so I gave it a try.

Effectively you have to add the revisions from CVS (or from another Subversion repository) to the repository where you're keeping your content, merging the revisions from the two repositories.

First convert the original revisions from CVS into a Subversion dump file. Normally you would import this stuff straight from cvs2svn into a new empty repository, but to merge the repositories you'll have to load the dump file into the existing one. I actually did do the import to a repository first so that I could use a working copy and svn log commands to check the result, and then reran it to get a dump file.

You might want to change the paths of your files in CVS before you import them. CVS doesn't really care what the paths are, so you can go into the repository and, for example, move all the files for your content into a directory matching one in the Subversion repository, so that all the newly imported files will appear in the right place.

Here's the command I used once I had everything suitably arranged:

cvs2svn -q --username=geoff --mime-types /etc/mime.types \
             --no-default-eol --keywords-off \
             --dumpfile=old-website.dump \
             /var/lib/cvs/OldWebsite/

At this point you'll probably have to edit the dump file to remove the addition of any directories which already exist. For example, the first revision will add the trunk directory, which you've presumably already got in your content repository, so that can go. If you don't do this you'll get an error message from svnadmin saying “File already exists,” and it will abort leaving some mess behind in the repository's db/transactions directory. Note that it doesn't matter if you end up with a dump file where the first revision is number 2, because the revisions will get new numbers when they're loaded anyway.

Before you load the revisions in to the target repository it's probably a good idea to make a backup, just in case. Then you can load the dump file in like this:

svnadmin load repos/web_geoff <old-website.dump

You can't change a Subversion revision after it has been committed (except for changing revision properties), and so you can't renumber revisions. This means that the new revisions will be added after all the existing ones, and get new revision numbers. This might mean that your revision numbers are no longer in order of the date and time at which they were committed. That's not really a problem, although it does mean that Subversion's date range features might not work properly. Daizu stores the time at which each revision was committed for just this reason, so that a future web-based interface will be able to display revisions in the order they actually happened.

You can see this effect if you look in the Daizu database, in the revision table. Entries will only appear there once you've loaded the new revisions into the database (with a command like daizu load-revision). Mine looks like this:

Example data from revision table
revnum committed_at
1 2006-04-03 14:23:52.461223
2 2006-04-03 18:56:26.222475
3 2006-04-14 17:52:13.223041
249 2006-10-16 21:26:29.966954
250 2006-10-17 20:11:17.889506
251 2002-05-14 20:18:44
252 2002-05-15 18:15:14
320 2005-11-17 19:07:36
321 2006-02-04 13:33:40
322 2006-10-18 01:56:15.559908
323 2006-10-18 02:22:36.278227

It seems CVS stores commit times accurate to the nearest second, whereas Subversion is a bit more precise.

< New _lib directory | Bloglines and xml:base >