I have a CSV file that someone encoded wrongly.
The file is a database of movies with corresponding actors. I downloaded it in order to practise some coding for the so called Bacon number.
It looks like this:
movieId,title,actors
(...)
61,Eye for an Eye (1996),(a ton of other actors)|Dolores VelÌÁzquez|(more actors)
59,The Confessional (1995),(a ton of other actors)|Richard Fr̩chette|Fran̤ois Papineau|Marie Gignac|Normand Daneau|Anne-Marie Cadieux|Suzanne Cl̩ment|Lynda Beaulieu|Pascal Rollin|Billy Merasty|Paul H̩bert|Marthe Turgeon|Adreanne Lepage-Beaulieu|Andr̩e-Anne Th̩roux-Faille|Rodrigue Proteau|Philippe Paquin|Pierre H̩bert|Nathalie D'Anjou|Danielle Fichaud|Jules Philip|Jacques Laroche|Claude-Nicolas Demers|Jean-Philippe C̫t̩|Tristan Wiseman|Marc-Olivier Tremblay|Jacques Brouillet|Jean-Paul L'Allier|Denis Bernard|Ren̩e Hudon|Serge Laflamme|Carl Mathieu
(...)
Now as you can see, instead of Umlauts and letters with accents (ÄÖÜ, É, À, Û etc.), the actors have a combination of other special characters instead.
Thanks to very helpful input on this question, we have established that this is indeed a case of Mojibake.
My goal is to programmatically fix the broken characters by decoding and encoding in the correct order.