If you're interested in functional programming, you might also want to checkout my second blog which i'm actively working on!!
Showing posts with label Saxon. Show all posts
Showing posts with label Saxon. Show all posts

Thursday, December 6, 2012

Merging DITA maps and topics

This week I did a major data conversion. For about 2k products we generated DITA maps (each pointing to 3 topics). But many products had the same data so the generated topics had the same <body> tag. So we decided to first merge all topics. This also meant we had to rewrite the topicrefs in the maps. And next we could also merge the maps themselves if they had the same topicrefs.

One important lesson learned.. I first used timestamps for the merged files. It seemed like Saxon was able to merge 4 use cases in 1 millisecond so they ended up overwriting each other. So I quickly had to look for another alternative and switched to using the hashcode of the grouping-keys.

Example map:
<?xml version="1.0" encoding="utf-8"?>
<value-proposition id="vp_BC51-10PA" rev="001.001" title="Value proposition" xml:lang="en-US">
  <topicmeta translate="no">
    <subtitle translate="yes">45 V, 1 A PNP medium power transistor</subtitle>
    <prodinfo><prodname>BC51-10PA</prodname></prodinfo>
  </topicmeta>
  <technical-summary-ref href="technical-summary/ts_BC51-10PA.dita"/>
  <features-benefits-ref href="features-benefits/fb_BC51-10PA.dita"/>
  <target-applications-ref href="target-applications/ta_BC51-10PA.dita"/>
</value-proposition>

Example topic
<?xml version="1.0" encoding="utf-8"?>
<p-topic id="fb_BC51-10PA" rev="001.001" xml:lang="en-US">
  <title translate="no">Features and benefits</title>
  <prolog translate="no">...</prolog>
  <body>
    <ul>
      <li><p>High current</p></li>
      <li><p>Three current gain selections</p></li>
      <li><p>High power dissipation capability</p></li>
      <li><p>Exposed heatsink for excellent thermal and electrical conductivity</p></li>
      <li><p>Leadless very small SMD plastic package with medium power capability</p></li>
      <li><p>AEC-Q101 qualified</p></li>
    </ul>
  </body>
</p-topic>

I just am going to share the XSLT's that did the hard work to merge the topics and maps. I'm sure I can reuse the same approach in the future.
topicmerge.xslt
<?xml version="1.0" encoding="UTF-8"?>
<!--
Author: Robby Pelssers
This stylesheet will merge topics if they have the same body tag
-->

<xsl:stylesheet version="2.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:nxp="http://www.nxp.com">
  
  <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>

  <xsl:param name="exportFolder"/>
  <xsl:param name="subFolder"/>
  <xsl:variable name="folderTemplate" select="concat('file:///', $exportFolder, $subFolder, '/topic-type/?select=*.dita')"/>

  <xsl:variable name="featuresandbenefits" select="collection(replace($folderTemplate, 'topic-type', 'features-benefits'))"/>
  <xsl:variable name="technicalsummaries" select="collection(replace($folderTemplate, 'topic-type', 'technical-summary'))"/>
  <xsl:variable name="targetapplications" select="collection(replace($folderTemplate, 'topic-type', 'target-applications'))"/>

  <xsl:variable name="date-format" select="'[Y0001]-[M01]-[D01]T[h01]:[m01]:[s01]'"/>

  <xsl:function name="nxp:getHashCode">
    <xsl:param name="stringvalue" as="xs:string"/>
    <xsl:value-of select="string:hashCode($stringvalue)" xmlns:string="java:java.lang.String"/>
  </xsl:function>

  <!-- handles a logical group of documents (featuresandbenefits | technicalsummaries | targetapplications) -->
  <xsl:template name="mergeDocumentGroup">
    <xsl:param name="documents"/>
    <xsl:for-each-group select="$documents" group-by="p-topic/body">
      <xsl:call-template name="p-topic">
        <xsl:with-param name="topics" select="current-group()/p-topic"/>
        <xsl:with-param name="grouping_key"  select="current-grouping-key()"/>
      </xsl:call-template>
    </xsl:for-each-group>
  </xsl:template>

  <xsl:template match="/">
    <result>
      <xsl:call-template name="mergeDocumentGroup">
        <xsl:with-param name="documents" select="$featuresandbenefits"/>
      </xsl:call-template>
      <xsl:call-template name="mergeDocumentGroup">
        <xsl:with-param name="documents" select="$technicalsummaries"/>
      </xsl:call-template>
      <xsl:call-template name="mergeDocumentGroup">
        <xsl:with-param name="documents" select="$targetapplications"/>
      </xsl:call-template>
    </result>
  </xsl:template>


  <xsl:template name="p-topic">
    <xsl:param name="topics"/>
    <xsl:param name="grouping_key"/>
    <xsl:variable name="topic" select="$topics[1]"/>
    <p-topic>
      <xsl:choose>
        <xsl:when test="count($topics) > 1">
          <xsl:apply-templates select="$topic/@* | $topic/node()" mode="merge">
            <xsl:with-param name="grouping_key" select="$grouping_key" tunnel="yes"/>
          </xsl:apply-templates>
        </xsl:when>
        <xsl:otherwise>
          <xsl:apply-templates select="$topic/@* | $topic/node()"/>
        </xsl:otherwise>
      </xsl:choose>
      <!-- we temporarily add the original topic id's so we can easily alter the topicrefs in a subsequent transform -->
      <topics>
        <xsl:for-each select="$topics">
          <id><xsl:value-of select="./@id"/></id>
        </xsl:for-each>
      </topics>
    </p-topic>
  </xsl:template>

  <xsl:template match="p-topic/@id" mode="merge">
    <xsl:param name="grouping_key" tunnel="yes"/>
    <xsl:attribute name="id"
        select="concat(substring-before(., '_'), '_', translate(nxp:getHashCode($grouping_key), '-', ''))"/>
  </xsl:template>

    <!-- copy all nodes and attributes which are not processed by one of available templates -->
  <xsl:template match="@* | node()">
    <xsl:copy copy-namespaces="no">
      <xsl:apply-templates select="@*"/>
      <xsl:apply-templates/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="@* | node()" mode="merge">
    <xsl:copy copy-namespaces="no">
      <xsl:apply-templates select="@*" mode="merge"/>
      <xsl:apply-templates mode="merge"/>
    </xsl:copy>
  </xsl:template>


</xsl:stylesheet>

mapmerge.xslt
<?xml version="1.0" encoding="UTF-8"?>
<!--
Author: Robby Pelssers
This stylesheet will merge maps which have same topic refs and same title.
-->
<xsl:stylesheet version="2.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:xs="http://www.w3.org/2001/XMLSchema"
                xmlns:nxp="http://www.nxp.com">
  
  <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>

  <xsl:variable name="date-format" select="'[Y0001]-[M01]-[D01]T[h01]:[m01]:[s01]'"/>

  <xsl:function name="nxp:getHashCode">
    <xsl:param name="stringvalue" as="xs:string"/>
    <xsl:value-of select="string:hashCode($stringvalue)" xmlns:string="java:java.lang.String"/>
  </xsl:function>

  <xsl:function name="nxp:getMapGroupingKey" as="xs:string">
    <xsl:param name="vp" as="element(value-proposition)"/>
    <xsl:sequence select="concat($vp/topicmeta/subtitle, $vp/technical-summary-ref/@href,
      $vp/features-benefits-ref/@href, $vp/target-applications-ref/@href)"/>
  </xsl:function>

  <xsl:template match="/">
    <xsl:apply-templates/>
  </xsl:template>

  <xsl:template match="result">
    <result>
      <xsl:apply-templates select="p-topic"/>
      <xsl:for-each-group select="value-proposition" group-by="nxp:getMapGroupingKey(.)">
        <xsl:call-template name="value-proposition">
          <xsl:with-param name="valuepropositions" select="current-group()"/>
          <xsl:with-param name="grouping_key"  select="current-grouping-key()"/>
        </xsl:call-template>
      </xsl:for-each-group>
    </result>
  </xsl:template>

  <xsl:template name="value-proposition">
    <xsl:param name="valuepropositions"/>
    <xsl:param name="grouping_key"/>
    <xsl:variable name="vp" select="$valuepropositions[1]"/>
    <value-proposition>
      <xsl:choose>
        <xsl:when test="count($valuepropositions) > 1">
          <xsl:apply-templates select="$vp/@* | $vp/node()" mode="merge">
            <xsl:with-param name="valuepropositions" select="$valuepropositions" tunnel="yes"/>
            <xsl:with-param name="grouping_key" select="$grouping_key" tunnel="yes"/>
          </xsl:apply-templates>
        </xsl:when>
        <xsl:otherwise>
          <xsl:apply-templates select="$vp/@* | $vp/node()"/>
        </xsl:otherwise>
      </xsl:choose>
    </value-proposition>
  </xsl:template>


  <xsl:template match="value-proposition/@id" mode="merge">
    <xsl:param name="grouping_key" tunnel="yes"/>
    <xsl:attribute name="id"
         select="concat(substring-before(., '_'), '_', translate(nxp:getHashCode($grouping_key), '-', ''))"/>
  </xsl:template>

  <!-- copy all nodes and attributes which are not processed by one of available templates -->
  <xsl:template match="@* | node()">
    <xsl:copy copy-namespaces="no">
      <xsl:apply-templates select="@*"/>
      <xsl:apply-templates/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="@* | node()" mode="merge">
    <xsl:copy copy-namespaces="no">
      <xsl:apply-templates select="@*" mode="merge"/>
      <xsl:apply-templates mode="merge"/>
    </xsl:copy>
  </xsl:template>


</xsl:stylesheet>

Friday, October 19, 2012

Creating UNIX timestamp with XSLT2.0 (Saxon)

Creating timestamps is a quite often used requirement. If you start googling for how to create one in XSLT, you find exotic solutions. Today I set out to find an elegant one using XSLT extension functions.
If you take a look at the Java API, and in particular java.util.Date, you will see a method getTime() which returns exactly what I need.
long getTime()
Returns the number of milliseconds since January 1, 1970, 00:00:00 GMT represented by this Date object.

Now let's see at a simple input XML containing products. For each product we want to generate a timestamp while processing each product node.
<products>
  <product>
    This is a complex node
  </product>
  <product>
    This is a complex node
  </product>  
</products>

To understand how extension functions with Saxon can be used, take a look here. In this case we really need to construct new Date objects and invoke the method getTime on them. We bind the prefix date to the namespace java:java.util.Date. Next we can construct a new date object with date:new(). To invoke a method on any object you actually have to pass the context object to that method. So date:getTime(date:new()) is actually the java equivalent for new java.util.Date().getTime()
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:nxp="http://www.nxp.com">
  
  <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>

  <xsl:function name="nxp:getTimestamp">
    <xsl:value-of select="date:getTime(date:new())"  xmlns:date="java:java.util.Date"/>
  </xsl:function>

  <xsl:template match="product">
   <product processedTimestamp="{nxp:getTimestamp()}">
     <xsl:apply-templates/>
   </product>
  </xsl:template>

</xsl:stylesheet>

So when you execute that stylesheet you will end up with product tags having a new attribute like below:
<product processedTimestamp="1350635976117">
 ...
</product>

Monday, August 27, 2012

Still using XSLT1.0? Time to start using Saxon.


Folder structure:
   - input
        - jsonxml-1.xml
        - jsonxml-2.xml
        - jsonxml-3.xml
   - xslt
        - jsonxmltransformer.xslt
   - output (empty)

Below some basic usage instructions. For more details checkout the official documentation. You can download the saxon.jar from the official saxon home page or from this maven repository
java -jar Saxon-HE-9.4.jar [options] [params]

-s:filename    -- Identifies the source file or directory
-o:filename    -- Send output to named file. In the absence of this option, the results go to standard output.
                  If the source argument identifies a directory, this option is mandatory and must also identify a directory; 
                  on completion it will contain one output file for each file in the source directory
-threads:N     -- Used only when the -s option specifies a directory. Controls the number of threads used to process the files in the directory
-xsl:filename  -- Specifies the file containing the principal stylesheet module

Now let's see how easy it is to transform a single file jsonxml-1.xml and save the result to transformed-result1.xml
java -jar Saxon-HE-9.4.jar -s:C:/tmp/easytransform/input/jsonxml-1.xml -o:C:/tmp/easytransform/output/transformed-result1.xml -xsl:C:/tmp/easytransform/xslt/jsonxmltransformer.xslt
That was easy enough. But suppose we want to transform a complete directory of source files?
java -jar Saxon-HE-9.4.jar -s:C:/tmp/easytransform/input -o:C:/tmp/easytransform/output -xsl:C:/tmp/easytransform/xslt/jsonxmltransformer.xslt
This will by convention save the transformed results using the same filenames as the input files to the specified output directory.

Tuesday, November 29, 2011

Using client side XSLT with Saxon-CE

Yesterday I decided to have a go with Saxon-CE as I definitely can see benefits from doing client side transformations. Saxon-CE is still in alpha state so it is not recommended to use it in production yet.
I wanted to really come up with a nice demo so decided to check if I could build a webpage using live Twitter Search data. I did face my first issue as the @data-source attribute is not working cross-domain and it does not even seem to be working in chrome or IE. I had to manually download the XML results from a Twitter search for Michael Kays tweets. They are included in this downloadable zip but to get an idea of what the data looks like you can open following url: Michael Kay's tweets

The final result can be seen in screenshot below:

Friday, September 30, 2011

The power of Apache Cocoon, Xquery, and XSLT extensions

This article describes the use case where you have product data stored in a XML database.
Your customer wants to be able to search products based on (part of) the product name and display
the product properties in a html page. However, all products are also part of a workflow and to find
the status of a particular product we have to fetch information from another system. To fetch the status
this demo includes a MonitorClient which offers needed functionality. The demo describes how to generate the
entire page using purely Apache Cocoon, Xquery and XSLT. The only custom Cocoon component we developed is
the XQueryGenerator which basically reads an xquery from the specified @src attribute, injects any sitemap parameters from the match pattern and returns the results.


PH3330L.xml: Sample data which is stored in XMLDB:


products.xqlib: XQuery library for retrieving products:



products.xquery: returns a xhtml page containing matching products based on their name:









Beans configured in Spring application context:




client.xslt: replaces client:getStatus with actual value

Wednesday, July 6, 2011

Unit testing XQJ and Saxon

I have to admit that it took me quite a bit of time to get this unit test working. All these namespaces don't make life easier. But let met explain what is going on in the code snippets below. First I wrote a little module which has 1 function that returns the groupId as a string. Luckily saxon did support the "at" hint for importing modules. But it took me quite some time to understand what the base-uri was used by saxon. Default it seems to be "" but it somehow knows how to resolve it to 'file:///c:/development/workspaces/cocoon3/cocoon-xmldb/'.



module containing 1 function that extracts groupId from pom


pom_module.xquery which imports pom.xqlib module and outputs groupId