I'm storing metadata about content (such as the titles of articles) in
Subversion properties. This has the advantage that the metadata is all
version controlled by Subversion, and can be adjusted using the normal
svn command. The main reason for this though is so that
I can pull out these individual values from the repository without having
to parse them out of the actual content. So if you want to adorn your
images with descriptions, you can just add a property and keep the binary
data in the proper format.
When I update a working copy, I load all the properties for files into
records in the wc_property table. Eventually this will be
where properties are edited (in some hypothetical web interface) before
the changes are committed.
For the most important metadata though I want the values to be more
readily available, and some metadata needs to be processed in some way
before the information becomes useful. So for example a few values,
like the title and publication date of an article, should be stored
alongside the content in the wc_file table, so that you
don't have to do lots of extra queries or joins to get at them.
I've put together a little architecture for loading these values.
Eventually I want to allow plugins to add their own metadata processing
code, but the plugin loading stuff will come later. For now, I've got
a hash of patterns which match the names of properties, and map them
to a callback function. The function is passed a hash of all the properties,
so that it can process them all at once if that's more efficient.
The patterns can just be the name of a property, of something like
foo:* for properties with a foo prefix, or
* for all properties.
These are the properties which I'm currently parsing with a predefined
custom metadata callback, most of which simply store the value in the
wc_file table alongside the content:
(Update: the complete list of properties understood by Daizu is now documented properly.)
- svn:mime-type
- Stored in the
content_typecolumn, if it contains a single valid MIME type. - dcterms:issued
- Stored in the
issued_atcolumn, if it contains a single valid date and time. Currently only the Subversion datetime format is understood, but I'd like to make it accept the full range of W3CDTF formats. - dc:title
- Stored in the
titlecolumn, if it contains anything other than whitespace. Leading and trailing whitespace is removed. - dc:description
- Stored in the
descriptioncolumn, with the same processing asdc:title. - daizu:status
- Sets the
retiredcolumn to true iff it has the valueretired. A retired file should be published as normal, but it shouldn't show up in navigation menus, section indexes, or blog feeds. The idea is that you can use this for an old file which is only kept for historical purposes. I'll probably want the templates to display an appropriate message about it being of historical use only on the actual page. I might want to add other status values in the future, but I can't think of any useful ones yet. - daizu:tags
- Stored separately in the
wc_file_tagtable. The value should be a list of tag names (terms), each on a separate line. This allows tags which contain spaces and commas to be listed. These tags are ‘folksonomy tags’, like the ones used by Technorati.