4
\$\begingroup\$

I used saxon v9 to profile an XSL transformation which converts XML to JSON. The profiler tells me that the function which escapes certain characters takes about 70% of the total processing time. The conversion is important because otherwise the created JSON file will be invalid because of characters that break the strings.

java -jar saxon9he.jar -xsl:jsontransform.xslt -s:input.xml -o:output.json -TP

The "method" used to escape the values looks like this:

<xsl:template name="escapejson">
 <xsl:param name="string"/>
 <xsl:sequence select="replace(
 replace(
 replace(
 replace(
 replace( 
 replace(
 replace(
 replace(
 replace($string, '\\','\\\\'), 
 '/', '\\/'),
 '&quot;', '\\&quot;'),
 '&#xA;','\\n'),
 '&#xD;','\\r'), 
 '&#x9;','\\t'), 
 '\n','\\n'),
 '\r','\\r'),
 '\t','\\t')"/>
 </xsl:template>

Unfortunately, my XSL knowledge is too limited to create an optimum solution on my own. I assume the repeated call of replace is quite inefficient but don't know how to do it better.

Malachi
29k11 gold badges86 silver badges188 bronze badges
asked Jul 21, 2015 at 12:09
\$\endgroup\$

1 Answer 1

3
\$\begingroup\$

I can believe it when the replace is the problem... it's creating a mess of nested code that's creating a lot of intermediate string values.

There's not much I can see to improve, though. The nesting can be reduced by combining a few of replaces...:

replace(.... , '\n|&#xA', '\\n')

Doing the same for \t and \r will reduce six calls to just three.

Additionally, I would consider wrapping the entire replace function in a matches call, that matches any of the patterns you are searching for, so you only need to do the full replacement call stack on values that actually require it. A large matches for all replaced values will, depending on how often values are actually required to be replaced, save time.

answered Jul 21, 2015 at 12:31
\$\endgroup\$
5
  • \$\begingroup\$ Thanks a lot, I will try this when I have to access to my computer again. Do you think I could rewrite this into even less replace calls when using something similar to the declaration of this method: xsltfunctions.com/xsl/functx_escape-for-regex.html \$\endgroup\$ Commented Jul 21, 2015 at 16:12
  • 2
    \$\begingroup\$ I have worked with Micahael Kay in the past on JDOM/Saxon integration, and he's very friendly, knowledgable and active on Stack Overflow too. I would recommend you rephrase your question in terms of a "this code is my attempt at replacing search items in the document as part of the transformation and it sucks! What alternatives are there that I should consider?", and tag it with the saxon tag. I am pretty certain that he reads all saxon questions (and JDOM questions too). \$\endgroup\$ Commented Jul 21, 2015 at 16:26
  • \$\begingroup\$ I tried your first suggestion and converted the 6 replaces into 3 which make use of the regex. The strange / sad thing is that when comparing 10 transformations of the original and the modified xslt the times are worse or equal. \$\endgroup\$ Commented Jul 21, 2015 at 20:38
  • \$\begingroup\$ @Marged - can we chat in The 2nd monitor \$\endgroup\$ Commented Jul 21, 2015 at 20:41
  • 1
    \$\begingroup\$ As suggested by you I created a new question on SO stackoverflow.com/questions/31549633/… \$\endgroup\$ Commented Jul 21, 2015 at 21:10

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.