-
Notifications
You must be signed in to change notification settings - Fork 147
<img ....> gets corrupted when using parse-html #20
Open
Description
- Mercury Parser API Version:Latest
- Node Version:8
Expected Behavior
The parser should not corrupt the <img> content.
Current Behavior
The <img> tag originally is
<img src=\"https://cdn.example-domain.com/example1.jpg"/>
and after parsing
<img src="https://www.example-domain.com/%22https://cdn.example-domain.com/example1.jpg/%22/">
Steps to Reproduce
- Take the following HTML
<html> <head> <body> Main content <br/> <img src="https://cdn.example-domain.com/example1.jpg"/> More content <br/> More Content to Simulate main content. <img src="https://cdn.example-domain.com/example2.jpg"/> </body> </html>
- Call the api with the path
/parse-html. The API takes a POST with a JSON object containing a URL and HTML. The HMTL is the HTML as provided in step 1 but is first converted to the following format:
<html>\\n<head>\\n<body>\\nMain content\\n<br/>\\n<img src=\"https://cdn.example-domain.com/example1.jpg\"/>\\nMore content\\n<br/>\\nMore Content to Simulate main content.\\n<img src=\"https://cdn.example-domain.com/example2.jpg\"/>\\n</body>\\n</html>\\n
and the URL value that is passed is https://www.example-domain.com
- The JSON result content being returned contains the main content including the images. The image values are however corrupted:
<img src="https://www.example-domain.com/%22https://cdn.example-domain.com/example1.jpg/%22/"> <img src="https://www.example-domain.com/%22https://cdn.example-domain.com/example2.jpg/%22/">
Question/Comment
Am I using the API in a correct way? I could not find any documentation so this is a bit of reverse engineering.
The reason for not doing this directly, i.e. using the /parser?url=..... is that I am trying to work around a problem where a TypeError is returned. See. The page gives back a 202 which the parser cannot handle. I am now downloading the content and try to pass the HTML into the API as a workaround instead. Unfortunately it doesn't react as I expected it would.
Metadata
Metadata
Assignees
Labels
No labels
Type
Fields
Give feedbackNo fields configured for issues without a type.