From a company I receive an XML file which has this element in it, with a URL as value (which makes the XML be not well-formed due to including an unescaped ampersand):
<BrowserFormPost>
<URL>https://example.com/asdsad?type=1&id2</URL>
</BrowserFormPost>
They don't encode the & as &, which makes it not XML.
Now the problem: I asked them to properly encode the URL but, unfortunately, they can't. They bought an ERP software and they can only give this data.
Now in my PHP I parse this XML with simple xml:
$returnUrl = mysqli_real_escape_string($conn,$xmlData->Request->PunchOutSetupRequest->BrowserFormPost->URL);
but now I receive an error:
Warning: simplexml_load_string(): Entity: line 28: parser error : EntityRef: expecting ';' in
And as you already guessed, this happens at the & character.
So now I have 2 questions:
Can I myself encode this
&to&in PHP before parsing.How to deal with this kind of situations as the only software developer in a company. Because I explained the tech guy of the other company that this isn't valid XML and the only thing he says is that he cannot change the XML at his side because then the other companies, which also receive their XML, will not work anymore. Our company wants that this project succeed because the other company is a big profit for our company. So how to deal with invalid code from other companies?
UPDATE
I needed to fix the problem as they(the other company) could not change it to & so I did the following to fix the problem:
$xmlFile = trim(file_get_contents('php://input'));
$xmlDataEncoded = preg_replace('/&(?!#?[a-z0-9]+;)/', '&', $xmlFile);
$xmlData = simplexml_load_string($xmlDataEncoded);
-
1"which is very bad." — Which makes it "not XML"Quentin– Quentin2016年02月15日 17:31:54 +00:00Commented Feb 15, 2016 at 17:31
-
By definition, XML is well-formed.Parfait– Parfait2016年02月15日 19:22:35 +00:00Commented Feb 15, 2016 at 19:22
1 Answer 1
Yes, you can treat the textual data you receive as text (it's not XML) and use manual or automated string-based methods to replace
&with&, taking care not to replace it in places where it's already being used as an entity. It's ugly, error prone, and ought to be unnecessary.You tell anyone who cares that the company is not sending XML and is forcing partners to work-around their shortcomings. You then grow large enough that the company will fix their broken code or lose you as a partner. If that's not viable, see #1.
4 Comments
< for instance) including those that might be defined in the document, although the idea behind the approach does. Ultimately it might require regex to fully implement. You'd want to replace & except when it occurs in a pattern like &\w+;.