I used saxon v9 to profile an XSL transformation which converts XML to JSON. The profiler tells me that the function which escapes certain characters takes about 70% of the total processing time. The conversion is important because otherwise the created JSON file will be invalid because of characters that break the strings.
java -jar saxon9he.jar -xsl:jsontransform.xslt -s:input.xml -o:output.json -TP
The "method" used to escape the values looks like this:
<xsl:template name="escapejson">
<xsl:param name="string"/>
<xsl:sequence select="replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace($string, '\\','\\\\'),
'/', '\\/'),
'"', '\\"'),
'
','\\n'),
'
','\\r'),
'	','\\t'),
'\n','\\n'),
'\r','\\r'),
'\t','\\t')"/>
</xsl:template>
Unfortunately, my XSL knowledge is too limited to create an optimum solution on my own. I assume the repeated call of replace
is quite inefficient but don't know how to do it better.
1 Answer 1
I can believe it when the replace is the problem... it's creating a mess of nested code that's creating a lot of intermediate string values.
There's not much I can see to improve, though. The nesting can be reduced by combining a few of replaces...:
replace(.... , '\n|
', '\\n')
Doing the same for \t
and \r
will reduce six calls to just three.
Additionally, I would consider wrapping the entire replace function in a matches
call, that matches any of the patterns you are searching for, so you only need to do the full replacement call stack on values that actually require it. A large matches for all replaced values will, depending on how often values are actually required to be replaced, save time.
-
\$\begingroup\$ Thanks a lot, I will try this when I have to access to my computer again. Do you think I could rewrite this into even less replace calls when using something similar to the declaration of this method: xsltfunctions.com/xsl/functx_escape-for-regex.html \$\endgroup\$Marged– Marged2015年07月21日 16:12:02 +00:00Commented Jul 21, 2015 at 16:12
-
2\$\begingroup\$ I have worked with Micahael Kay in the past on JDOM/Saxon integration, and he's very friendly, knowledgable and active on Stack Overflow too. I would recommend you rephrase your question in terms of a "this code is my attempt at replacing search items in the document as part of the transformation and it sucks! What alternatives are there that I should consider?", and tag it with the saxon tag. I am pretty certain that he reads all saxon questions (and JDOM questions too). \$\endgroup\$rolfl– rolfl2015年07月21日 16:26:19 +00:00Commented Jul 21, 2015 at 16:26
-
\$\begingroup\$ I tried your first suggestion and converted the 6 replaces into 3 which make use of the regex. The strange / sad thing is that when comparing 10 transformations of the original and the modified xslt the times are worse or equal. \$\endgroup\$Marged– Marged2015年07月21日 20:38:34 +00:00Commented Jul 21, 2015 at 20:38
-
\$\begingroup\$ @Marged - can we chat in The 2nd monitor \$\endgroup\$rolfl– rolfl2015年07月21日 20:41:42 +00:00Commented Jul 21, 2015 at 20:41
-
1\$\begingroup\$ As suggested by you I created a new question on SO stackoverflow.com/questions/31549633/… \$\endgroup\$Marged– Marged2015年07月21日 21:10:24 +00:00Commented Jul 21, 2015 at 21:10