Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Every accented characters are corrupted #13

Open

Description

  • Platform: AWS Lambda

Expected Behavior

When you POST a request with only the URL parameter. The response is UTF-8 friendly.
When I use the html parameter, response should be utf-8 friendly too.

The API should return a title like this : "Le démantèlement des réacteurs nucléaires, véritable filière industrielle"
And content like this :
... <p><strong>Dans les prochaines ann&#xE9;es, avec la transition &#xE9;nerg&#xE9;tique et le d&#xE9;mant&#xE8;lement ...

Current Behavior

Title returned : "Le d�mant�lement des r�acteurs nucl�aires, v�ritable fili�re industrielle"
Content returned:
...<p><strong>Dans les prochaines ann**&#xFFFD;**es, avec la transition &#xFFFD;nerg&#xFFFD;tique et le d&#xFFFD;mant&#xFFFD;lement ...

Steps to Reproduce

I just do a POST to the parse-html endpoint
{ "url": "https://www.europeanscientist.com/fr/energie/demantelement-reacteurs-nucleaires-dechets-pngmdr/", "html" : [copy_paste_of_html_code] }

Possible Solution

I tried to force header's request Content-type to utf-8 with application/json; charset=utf-8 but it doesn't change the result.
While running this request locally, I've got an Iconv-lite deprecation warning related to encoding
Iconv-lite warning: decode()-ing strings is deprecated. Refer to https://github.com/ashtuchkin/iconv-lite/wiki/Use-Buffers-when-decoding

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

      Relationships

      None yet

      Development

      No branches or pull requests

      Issue actions

        AltStyle によって変換されたページ (->オリジナル) /