2

I am trying to generate a XML file with some of the values that contains special characters such as μmol/l, x103 cells/μl and many more. also need functionality to put in superscripts.

I encoded the text μmol/l to something like this using a ordutf8 function from php.net

&#956&#109&#111&#108&#47&#108

function ords_to_unistr($ords, $encoding = 'UTF-8'){
 // Turns an array of ordinal values into a string of unicode characters
 $str = '';
 for($i = 0; $i < sizeof($ords); $i++){
 // Pack this number into a 4-byte string
 // (Or multiple one-byte strings, depending on context.) 
 $v = $ords[$i];
 $str .= pack("N",$v);
 }
 $str = mb_convert_encoding($str,$encoding,"UCS-4BE");
 return($str); 
}
function unistr_to_ords($str, $encoding = 'UTF-8'){ 
 // Turns a string of unicode characters into an array of ordinal values,
 // Even if some of those characters are multibyte.
 $str = mb_convert_encoding($str,"UCS-4BE",$encoding);
 $ords = array();
 // Visit each unicode character
 for($i = 0; $i < mb_strlen($str,"UCS-4BE"); $i++){ 
 // Now we have 4 bytes. Find their total
 // numeric value.
 $s2 = mb_substr($str,$i,1,"UCS-4BE"); 
 $val = unpack("N",$s2); 
 $ords[] = $val[1]; 
 } 
 return($ords);
}

I have sucessfully converted this code back to "richtext" using PHPExcel to generate Excel documents and PDF, but I now need to put it into a XML.

If i use the &# characters as is I get a error message saying

SimpleXMLElement::addChild(): invalid decimal character value

Here are more values I have in the database that needs to be made "XML" friendly

&#120&#49&#48&#60&#115&#117&#112&#62&#54&#60&#47&#115&#117&#112&#62&#32&#99&#101&#108&#108&#115&#47&#181&#108

Converted from x103 cells/μl

asked May 11, 2016 at 11:15
0

1 Answer 1

3

Here is no need to encode these characters. XML strings can use UTF-8 or another encoding. Depending on the encoding the serializer will encode as necessary.

$foo = new SimpleXmlElement('<?xml version="1.0" encoding="UTF-8"?><foo/>');
$foo->addChild('bar', 'μmol/l, x103 cells/μl'); 
echo $foo->asXml();

Output (special characters not encoded):

<?xml version="1.0" encoding="UTF-8"?>
<foo><bar>μmol/l, x103 cells/μl</bar></foo>

To force entities for the special characters, you need to change the encoding:

$foo = new SimpleXmlElement('<?xml version="1.0" encoding="ASCII"?><foo/>');
$foo->addChild('bar', 'μmol/l, x103 cells/μl');
echo $foo->asXml();

Output (special characters encoded):

<?xml version="1.0" encoding="ASCII"?>
<foo><bar>&#956;mol/l, x10&#179; cells/&#181;l</bar></foo>

I suggest you convert your custom encoding back to UTF-8. That way the XML Api can take care of it. If you like to store string with the custom encoding you need to work around a bug.

A string like &#120&#49&#48&#60&#115&#117 triggers a bug in SimpleXML/DOM. The second argument of SimpleXMLElement::addChild() and DOMDocument::createElement() have a broken escaping. You need to create the content as text node and append it.

Here is a small class that extends SimpleXMLElement and adds a workaround:

class MySimpleXMLElement extends SimpleXMLElement {
 public function addChild($nodeName, $content = NULL) {
 $child = parent::addChild($nodeName);
 if (isset($content)) {
 $node = dom_import_simplexml($child);
 $node->appendChild($node->ownerDocument->createTextNode($content));
 }
 return $child;
 }
}
$foo = new MySimpleXmlElement('<?xml version="1.0" encoding="UTF-8"?><foo/>');
$foo->addChild('bar', '&#120&#49&#48&#60&#115&#117'); 
echo $foo->asXml();

Output:

<?xml version="1.0" encoding="UTF-8"?>
<foo><bar>&amp;#120&amp;#49&amp;#48&amp;#60&amp;#115&amp;#117</bar></foo>

The & from your custom encoding get escaped as the entity &amp; - because it is an special character in XML. The XML parser will decode it.

$xml = <<<'XML'
<?xml version="1.0" encoding="UTF-8"?>
<foo><bar>&amp;#120&amp;#49&amp;#48&amp;#60&amp;#115&amp;#117</bar></foo>
XML;
$foo = new SimpleXMLElement($xml);
var_dump((string)$foo->bar);

Output:

string(27) "&#120&#49&#48&#60&#115&#117"
answered May 11, 2016 at 11:58
Sign up to request clarification or add additional context in comments.

1 Comment

I have these values in the database... &#120&#49&#48&#60&#115&#117&#112&#62&#54&#60&#47&#115&#117&#112&#62&#32&#99&#101&#108&#108&#115&#47&#181&#108

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.