I have DataFrame in Python Pandas like below:
sentence
------------
ππ€πΎ
I like it
+1ππ
One :-) :)
hah
I need to select only rows containing emoticons or emojis, so as a result I need something like below:
sentence
------------
ππ€πΎ
+1ππ
One :-) :)
How can I do that in Python ?
1 Answer 1
You can select the unicode emojis with a regex range:
df2 = df[df['sentence'].str.contains(r'[\u263a-\U0001f645]')]
output:
sentence
0 ππ€πΎ
2 +1ππ
This is however much more ambiguous for the ASCII "emojis" as there is no standard definition and probably endless combinations. If you limit it to the smiley faces that contain eyes ';:' and a mouth ')(' you could use:
df[df['sentence'].str.contains(r'[\u263a-\U0001f645]|(?:[:;]\S?[\)\(])')]
output:
sentence
0 ππ€πΎ
2 +1ππ
3 One :-) :)
But you would be missing plenty of potential ASCII possibilities: :O, :P, 8D, etc.
8 Comments
8D could be found in a legitimate product ID, or :P in a sentence with missing space after the colon ( Ν‘° ΝΚ Ν‘°)γ, are not in fact ASCII ;-)Explore related questions
See similar questions with these tags.
:-)is trickyeye_character nose_character mouth_character