<img ....> gets corrupted when using parse-html #20

Open

Description

@fappelman

fappelman

opened

on May 6, 2019

Mercury Parser API Version:Latest
Node Version:8

Expected Behavior

The parser should not corrupt the <img> content.

Current Behavior

The <img> tag originally is

 <img src=\"https://cdn.example-domain.com/example1.jpg"/>

and after parsing

 <img src="https://www.example-domain.com/%22https://cdn.example-domain.com/example1.jpg/%22/">

Steps to Reproduce

Take the following HTML

<html>
<head>
<body> 
Main content
<br/>
<img src="https://cdn.example-domain.com/example1.jpg"/>
More content
<br/>
More Content to Simulate main content.
<img src="https://cdn.example-domain.com/example2.jpg"/>
</body>
</html>

Call the api with the path /parse-html. The API takes a POST with a JSON object containing a URL and HTML. The HMTL is the HTML as provided in step 1 but is first converted to the following format:

<html>\\n<head>\\n<body>\\nMain content\\n<br/>\\n<img src=\"https://cdn.example-domain.com/example1.jpg\"/>\\nMore content\\n<br/>\\nMore Content to Simulate main content.\\n<img src=\"https://cdn.example-domain.com/example2.jpg\"/>\\n</body>\\n</html>\\n

and the URL value that is passed is https://www.example-domain.com

The JSON result content being returned contains the main content including the images. The image values are however corrupted:

<img src="https://www.example-domain.com/%22https://cdn.example-domain.com/example1.jpg/%22/">
<img src="https://www.example-domain.com/%22https://cdn.example-domain.com/example2.jpg/%22/">

Question/Comment

Am I using the API in a correct way? I could not find any documentation so this is a bit of reverse engineering.

The reason for not doing this directly, i.e. using the /parser?url=..... is that I am trying to work around a problem where a TypeError is returned. See. The page gives back a 202 which the parser cannot handle. I am now downloading the content and try to pass the HTML into the API as a workaround instead. Unfortunately it doesn't react as I expected it would.

Metadata

Assignees

No one assigned

Labels

No labels

Type

No type

Fields

Give feedback

No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

<img ....> gets corrupted when using parse-html #20

Description

Expected Behavior

Current Behavior

Steps to Reproduce

Question/Comment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions