eneylon

Archive for June, 2010|Monthly archive page

Annotation Correlation

In general on June 18, 2010 at 9:06 pm

When structured content is converted for presentation, the relationship between source and rendition is often lost.

Increasingly documents are being made available not just for reading, but also for writing. Wikis allow editing of content from a raw state, but the bulk of annotation (for example in consultation exercises) still needs moderation or processing before affecting the source document. So the publication of transformed documents for annotation is a legitimate model for soliciting input to those documents.

The problem comes when the comments need to be tied back to the source. Lossy transformations are common when documents are converted into HTML. Reversing a transformation is not often a design consideration and the semantics of source tags are often lost when rendering content.

What is needed is a means of commenting on a presentation form that allows annotation at the precision of the source document. The approach described here assumes the source is an XML document that is transformed for display to the reader.

In order to be able to tie items in the HTML to their corresponding elements in the source document, each element in the source must be uniquely represented in the rendition. The approach advocated is to insert the XPath for each node in the source in a attribute of the rendered content. It is proposed that this attribute be named noid.

Of course there are other ways of achieving the same result: such as creating a lookup table of XPaths and giving each node a corresponding guid. However the direct approach of inserting the XPath in the attribute has the advantage of simplicity, not needing another data structure, and transparency. One disadvantage is the increase in file size becomes a function of content structure and element naming rather than just the number of nodes in the source.

The solution uses XSLT to transform the source. This allows for easy extensibility and placement in a transformation pipeline. The code below enhances the identity transform by adding a new attribute to every element output. That attribute contains the XPath of the element that it corresponds to:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:attribute name="noid">
<xsl:for-each select="ancestor-or-self::*">
<xsl:variable name="my-key-name"><xsl:value-of select="local-name(.)"/></xsl:variable>
<xsl:text>/</xsl:text>
<xsl:value-of select="name()"/>
<xsl:text>[</xsl:text>
<xsl:value-of select="1+count(preceding-sibling::*[local-name(.)=$my-key-name])"/>
<xsl:text>]</xsl:text>
</xsl:for-each>
</xsl:attribute>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>

The additional code performs a path trace which takes the current node and calculates the route that is needed to access that particular node in the source.

The result of documents transformed with this is an identical document which can be used to render HTML and provides a means back to the source for every element. Any subsequent processing can choose to make use of those links. This would typically by using the noid attribute to populate the id on a div or span element in HTML.

This post has shown that an XML document can be transformed to provide a route back to the source document in a subsequent rendition. In the next post in this series I will cover how to make use of that path from the rendered document using JavaScript events and the HTML document object model.