Git can't store metadata with files

Git doesn't have any facility for attaching bits of metadata to files, like Subversion does with its properties feature. Since Daizu relies heavily on metadata stored in properties, I'm going to discuss the problem and some possible work-arounds below.

Separate metadata from file content

In Subversion, properties are named values stored alongside each file. The actual file data is separate, so when you check out a working copy you can access the files as normal (loading images into the Gimp or whatever). The properties can be edited separately from the file, and from each other, using commands like svn propset.

This doesn't seem to be a problem for most people using Git, and it's not even mentioned on the Git/Subversion comparison page (at time of writing). It probably isn't an important feature when you're storing source code, but I think some kind of separate metadata storage is essential when you're storing web content.

Reasons for using properties

Of course, for HTML files this isn't such a big problem. I could ask users to write a complete HTML file for an article, using the <title> and <meta> elements to store metadata – currently Daizu expects the content of an article to just be what you would put inside the <body> element. That doesn't extend to other formats though, for example:

  • I might write a plugin some day to allow articles to be written in a wiki-like plain text format, and I don't want to have to write separate metadata-extraction code for that.

  • I already support POD, where a reasonable title can be automatically extracted, but because of the way the format works I often like to override that with a custom title and description that work better for a web page. The same would apply if I was to write a plugin to publish legacy word-processor files as articles, because people often don't bother going in to the word-processor's ‘Properties’ dialog to set a reasonable title for each of their documents. Whoever was adding the files to a website would want to be able to enter some extra metadata specially for web publishing.

  • Sometimes it's nice to store some default alt text with the image it applies to, and I'm already using a plugin to publish photos as articles, with a title and description attached to the image file itself. This way I don't have to create a separate trivial HTML file to wrap round each image.

Solutions

This isn't an insurmountable problem, but I haven't been able to think of a solution which doesn't smell a little inelegant. But there are various approaches which might work in practice:

  • Store the metadata in the database, and just use the repository for the actual file content. I don't like that, because I think the metadata is important enough that it should be under revision control, and kept close to the data it describes.

  • Combine the metadata and file content into a single file in the repository, separating them when they're loaded by Daizu. This could work well for things like text or HTML content, but for binary files you'd need some special tool to separate the two parts out if you wanted to edit the file content directly through the repository. It might also be confusing to someone trying to look under the hood to find a file called foo.jpg which isn't really a JPEG file (because it's got some textual crud at the start).

  • In the Git repository, for each file which has at least one metadata property attached, store a second file next to it. So you'd have file.pdf and file.pdf.meta or whatever. The metadata file would be some simple text format, easy to edit.

    Unfortunately, if you wanted to rename, move, or delete a file you'd have to remember to do the same to its metadata file. A nice UI could handle this transparently, but it might be awkward for people who want to get at the data directly through the repository for convenience.

I think the last of those is probably the best idea.

< Considering Git as a replacement for Subversion | String-based templating and XML >