1

I am working on a football dataset and am dealing with some exotic names. I would like to ask how do I replace special ALPHABETS that are present in my dataset? These are some of examples of these "exotic" names:

'Lionel Andrés Messi Cuccittini', 'Neymar da Silva Santos Junior', 'Luis Alberto Suárez Díaz', 'David De Gea Quintana', 'Zlatan Ibrahimović'

The special alphabets are é, á, ć, etc (alphabets with a "superscript" above). I want to change them to the "base" form - ć becomes c, á becomes a, so on and so forth.

Many thanks in advance!

asked Dec 21, 2019 at 13:53
2
  • 2
    Does this answer your question? What is the best way to remove accents in a Python unicode string? Commented Dec 21, 2019 at 13:57
  • 4
    If not absolutely necessary to replace, leave the names as they are. In all there languages the "special" characters are not equivalent to the similar "normal" characters. See the German city of Düsseldorf (village at the Düssel) is not Dusseldorf (village of stupids). Commented Dec 21, 2019 at 14:15

3 Answers 3

1

you could try this

for i in range(len(playernames)):
 playernames[i] = playernames[i].replace("é", "e")

and then of course add all the other characters

answered Dec 21, 2019 at 13:57
Sign up to request clarification or add additional context in comments.

Comments

1

You can use unidecode package:

import unidecode
special_str = [u'Lionel Andrés Messi Cuccittini', u'Neymar da Silva Santos Junior', u'Luis Alberto Suárez Díaz', u'David De Gea Quintana', u'Zlatan Ibrahimović']
for item in special_str:
 print(unidecode.unidecode(item))

The output will be:

Lionel Andres Messi Cuccittini
Neymar da Silva Santos Junior
Luis Alberto Suarez Diaz
David De Gea Quintana
Zlatan Ibrahimovic
answered Dec 21, 2019 at 14:19

Comments

0

You can try that:

import unidecode
new_string = unidecode.unidecode(your_string)
answered Dec 21, 2019 at 14:14

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.