2

From a company I receive an XML file which has this element in it, with a URL as value (which makes the XML be not well-formed due to including an unescaped ampersand):

 <BrowserFormPost>
 <URL>https://example.com/asdsad?type=1&id2</URL>
 </BrowserFormPost>

They don't encode the & as &amp;, which makes it not XML. Now the problem: I asked them to properly encode the URL but, unfortunately, they can't. They bought an ERP software and they can only give this data.

Now in my PHP I parse this XML with simple xml:

$returnUrl = mysqli_real_escape_string($conn,$xmlData->Request->PunchOutSetupRequest->BrowserFormPost->URL);

but now I receive an error:

Warning: simplexml_load_string(): Entity: line 28: parser error : EntityRef: expecting ';' in

And as you already guessed, this happens at the & character.

So now I have 2 questions:

  1. Can I myself encode this & to &amp; in PHP before parsing.

  2. How to deal with this kind of situations as the only software developer in a company. Because I explained the tech guy of the other company that this isn't valid XML and the only thing he says is that he cannot change the XML at his side because then the other companies, which also receive their XML, will not work anymore. Our company wants that this project succeed because the other company is a big profit for our company. So how to deal with invalid code from other companies?

UPDATE

I needed to fix the problem as they(the other company) could not change it to & so I did the following to fix the problem:

$xmlFile = trim(file_get_contents('php://input'));
$xmlDataEncoded = preg_replace('/&(?!#?[a-z0-9]+;)/', '&amp;', $xmlFile);
$xmlData = simplexml_load_string($xmlDataEncoded);
asked Feb 15, 2016 at 15:31
2
  • 1
    "which is very bad." — Which makes it "not XML" Commented Feb 15, 2016 at 17:31
  • By definition, XML is well-formed. Commented Feb 15, 2016 at 19:22

1 Answer 1

2
  1. Yes, you can treat the textual data you receive as text (it's not XML) and use manual or automated string-based methods to replace & with &amp;, taking care not to replace it in places where it's already being used as an entity. It's ugly, error prone, and ought to be unnecessary.

  2. You tell anyone who cares that the company is not sending XML and is forcing partners to work-around their shortcomings. You then grow large enough that the company will fix their broken code or lose you as a partner. If that's not viable, see #1.

answered Feb 15, 2016 at 17:28
Sign up to request clarification or add additional context in comments.

4 Comments

I especially like no. 2
Perhaps your detractors were trying to express that such repairs are ugly, error prone, and ought to be unnecessary.
1 doesn't quite work because you have to worry about other entities (&lt; for instance) including those that might be defined in the document, although the idea behind the approach does. Ultimately it might require regex to fully implement. You'd want to replace & except when it occurs in a pattern like &\w+;.
@Matthew: Right, I should have made my taking care not to replace it in places where it's already &amp; caution more general. I've fixed it now. Thanks.

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.