Skip to main content
Stack Overflow
  1. About
  2. For Teams

You are not logged in. Your edit will be placed in a queue until it is peer reviewed.

We welcome edits that make the post easier to understand and more valuable for readers. Because community members review edits, please try to make the post substantially better than how you found it, for example, by fixing grammar or adding additional resources and hyperlinks.

Required fields*

Required fields*

Backpropagation of wrongly (double) encoded CSV

I have a CSV file that someone encoded wrongly.

The file is a database of movies with corresponding actors. I downloaded it in order to practise some coding for the so called Bacon number.

It looks like this:

movieId,title,actors
(...)
61,Eye for an Eye (1996),(a ton of other actors)|Dolores VelÌÁzquez|(more actors)
59,The Confessional (1995),(a ton of other actors)|Richard Fr̩chette|Fran̤ois Papineau|Marie Gignac|Normand Daneau|Anne-Marie Cadieux|Suzanne Cl̩ment|Lynda Beaulieu|Pascal Rollin|Billy Merasty|Paul H̩bert|Marthe Turgeon|Adreanne Lepage-Beaulieu|Andr̩e-Anne Th̩roux-Faille|Rodrigue Proteau|Philippe Paquin|Pierre H̩bert|Nathalie D'Anjou|Danielle Fichaud|Jules Philip|Jacques Laroche|Claude-Nicolas Demers|Jean-Philippe C̫t̩|Tristan Wiseman|Marc-Olivier Tremblay|Jacques Brouillet|Jean-Paul L'Allier|Denis Bernard|Ren̩e Hudon|Serge Laflamme|Carl Mathieu
(...)

Now as you can see, instead of Umlauts and letters with accents (ÄÖÜ, É, À, Û etc.), the actors have a combination of other special characters instead.

Thanks to very helpful input on this question, we have established that this is indeed a case of Mojibake.

My goal is to programmatically fix the broken characters by decoding and encoding in the correct order.

Answer*

Draft saved
Draft discarded
Cancel

lang-py

AltStyle によって変換されたページ (->オリジナル) /