Each file or directory can generate any number of URLs. Some will generate none at all. Documents and things like image files will typically generate a single URL. Reasons for generating multiple URLs from a single file include:
- Splitting an article into multiple pages
- Generating an additional printer friendly page
- Generating a blog front page as well as feeds (possibly several formats containing the same data)
- Providing an alternative version of a file automatically converted into some other format, for example an SVG file might generate a URL for its original form plus a URL for a PNG rendition of its image
URLs must obviously be unique, so if a file tries to generate a URL which is already pointing at a live file then it can't be published until the problem is resolved somehow.
If a file is deleted then we keep its URL around with a special status, so that we know to redirect somewhere, issue a 410 Gone status, or whatever. If some other file generates the same URL in the future, we can forget about its previous interpretation. Whatever happens from then on, the new file is now the best one to associate with the URL.
Special filenames
I've decided that files and directories whose names start with a single underscore should be considered special. To start with, these ones will affect URL generation:
_template—I'm using this to store template files. Nothing under a directory with this name should ever generate URLs itself, whatever its metadata. Having a special name for these directories means that I can have them anywhere in the directory hierarchy, so that parts of a site can have different templates._hide—should be treated the same way as_template. I'm using directories with this name to store one-off scripts used for importing content, just in case I need to refer to them again.- Any file matching
/^_index\./should result in a URL ending with a/with this filename left off. If a ‘slug’ is required to construct the URL, then the parent directory should be used. Without this, the slug would be the actual filename, with the extension stripped off.
daizu:url
You can set a base URL for a file or directory by giving it
a daizu:url property. The code which generates URLs
will typically use this URL for the file or directory it is set on,
but its main use is for filling in the rest of relative URLs generated
by files beneath a directory.
So for example you might have a directory called example.com
for your website, which might have a base URL of
http://www.example.com/. If you then have a file inside
that directory called foo.html, then the default way of
generating a URL for it would simply be to give it the relative URL
of foo.html, which would then be resolved against
the base URL.
If a subdirectory doesn't have a base URI defined as a property, then its base is the first URL it generates itself.
To avoid problems, I think it will be simplest to say that you
can only generate URLs for files which have a daizu:url
property themselves, or have an ancestor directory which does.
Generator classes
In order to allow the most flexibility in generating URLs, I'll have Perl code generate them. Each time a file is altered, a Perl object of the appropriate class will be created and a method called on it with information about the file. These classes can also be responsible for doing any necessary processing to generate output.
There will be a default class, used if you don't specify any.
I'll call it Daizu::Gen, since it generates things.
To use a different class, put its name in the daizu:generator
property. Since generator classes will handle all the processing for
their descendents, it will for example be possible to make part of a
site into a blog just by setting a directory's generator class
to Daizu::Gen::Blog.
The default class will use daizu:url, and then pile
on the names of directories and files to produce the full URL, taking
off the _index part when a file is named like that.
The blog generator would do things a bit differently, using part of
the publication date in the URL.
The generator classes might be called something like this:
my $generator = Daizu::Gen::Blog->new( cms => $cms, root_file_id => $blog_file_id, root_path => 'daizucms.org/blog', root_url => 'http://www.daizucms.org/blog/', ); $generator->urls( file_id => $file_id, path => 'daizucms.org/blog/article/foo.html', );
The urls method should return a list of hash refs,
each containing a url and a type.
The latter is for the MIME type which the URL is expected to return.
I'm not entirely sure if that's necessary, but it seems important.
Two types of resource
In a Subversion filesystem, the nodes come in two distinct flavours:
files and directories. But this distinction doesn't mean too much when
you're generating webpages, which might have a directory-like URL
(ending in /) or a file-like one.
The important distinction for generating resources (the things you get when you resolve a URL) is whether any processing of the content is required before publication. Almost all websites have some way of templating web pages, so that they all have the same look and feel. I'll call the files which generate those pages articles. Any unprocessed files can be published simply by copying their content to the right place in the output directory.
(Of course this is a slight simplification. There are times when you might want to convert files, or scale images, or whatever. But those things would be handled by some special publishing code. I'm talking about what makes sense for the standard code you'd plug in to publish something like a catalog site or blog.)
I don't think there's any reasonable way to distinguish better
these two types of files without explicitly identifying them, so
I'll use a Subversion property called daizu:type. If
it has the value article then the file is an article,
whatever its other metadata is like, otherwise it's an unprocessed
file.
Example
Here's an example based on this website, showing the properties of a single article in the blog, and an unprocessed image file:
- daizucms.org/
-
- daizu:url: http://www.daizucms.org/
Root directory of the website. Unless overridden, all content will be published with the default generator (
Daizu::Gen). Thedaizu:urlproperty acts as a base URI, so that all the generating code can simply return paths and have them automatically turned into absolute URLs. - daizucms.org/blog/
-
- daizu:generator: Daizu::Gen::Blog
Root directory of the blog. This directory will generate the blog homepage, feeds, and archive pages. Any articles stored under this (using any directory layout you want) will be published with date-based URLs.
- daizucms.org/blog/article/foo/slug/_index.html
-
- daizu:type: article
- dcterms:issued: 2006-04-24T14:35:30Z
- svn:mime-type: text/html
- dc:title: blah blah
A blog article, which will return as its URL the path
2006/04/slug/, which Daizu will then absolutify against the generator's URL ofhttp://www.daizucms.org/blog/. As an article it will also be included in archive listings, feeds, etc. Note that thearticle/foopart of the slug doesn't affect the URL, as it would with theDefaultgenerator, but can still be used to organize articles into categories.Because this is an article, the MIME type is only meaningful as an indication of what type of content the original file has, and doesn't necessarily affect the MIME type of the output. For example a generator might use the MIME type to detect that an article is actually a Word document and automatically convert it to HTML before proceeding with the normal publication process.
- daizucms.org/blog/article/foo/slug/static.png
-
- dcterms:issued: 2006-05-18T21:09:34Z
- svn:mime-type: image/png
- dc:title: blah blah
An unprocessed file, which won't have any templating magic applied to it when its output is generated. Because it lives in the directory belonging to the article above, it will be associated with that and receive a URL alongside it. The relative path returned for the URL will be
2006/04/slug/static.png. This is so that the article can reference it (in animgelement or whatever) simply by its filename.Note that the only significant difference, in terms of metadata, between this and the article is that this one doesn't have the
articletype defined. This means that you can publish an HTML file in the same way, as an unprocessed file. I've done that in the past for blog articles which reference an example file which just happens to be HTML.Images won't typically have titles, but there's no reason why you can't use this value as a caption or something.
The generators will normally only generate a single URL for a non-article file, and the output for that URL will have the same MIME type as the original file.