4

I have DataFrame in Python Pandas like below:

sentence
------------
😎🀘🏾
I like it
+1😍😘
One :-) :)
hah

I need to select only rows containing emoticons or emojis, so as a result I need something like below:

sentence
------------
😎🀘🏾
+1😍😘
One :-) :)

How can I do that in Python ?

asked Jun 2, 2022 at 13:27
6
  • 1
    you can select the emoji with unicode, but :-) is tricky Commented Jun 2, 2022 at 13:28
  • You could maybe create a table to serve as a dataset with hardcoded emojis that arent actual emojis like ":)" and ":-)" and so on? And then check or match your sentences with those or if those sentences contain any elements of that dataset of hardcoded emojis? Commented Jun 2, 2022 at 13:31
  • Have you defined a set of emoticons you want to find? You could maybe put together a regex pattern if its just combos of eye_character nose_character mouth_character Commented Jun 2, 2022 at 13:33
  • Does this answer your question? How to extract all the emojis from text? Commented Jun 2, 2022 at 13:33
  • eshirvana, but how to use some function from your link to my DataFrame, moreover I need to select rows with emoji and rows with emoticons, so not only emojis :) Commented Jun 2, 2022 at 13:34

1 Answer 1

6

You can select the unicode emojis with a regex range:

df2 = df[df['sentence'].str.contains(r'[\u263a-\U0001f645]')]

output:

 sentence
0 😎🀘🏾
2 +1😍😘

This is however much more ambiguous for the ASCII "emojis" as there is no standard definition and probably endless combinations. If you limit it to the smiley faces that contain eyes ';:' and a mouth ')(' you could use:

df[df['sentence'].str.contains(r'[\u263a-\U0001f645]|(?:[:;]\S?[\)\(])')]

output:

 sentence
0 😎🀘🏾
2 +1😍😘
3 One :-) :)

But you would be missing plenty of potential ASCII possibilities: :O, :P, 8D, etc.

answered Jun 2, 2022 at 13:34
Sign up to request clarification or add additional context in comments.

8 Comments

Having a list of those ASCII emojis and then checking for hits that way is also a possibility? There cant be that many ASCII emojis?
@JosipJuros as an enthusiast, oh yes there can en.wikipedia.org/wiki/List_of_emoticons
You can add more eyes/mouth characters but the more you add the more you risk to have edge cases with false positives, for instance 8D could be found in a legitimate product ID, or :P in a sentence with missing space after the colon ( Ν‘° ΝœΚ– Ν‘°)
Pedantic point: some ASCII emoji, such as ツ, are not in fact ASCII ;-)
@snakecharmerb very true, this one is actually a (unicode) Japanese katakana ;)
|

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.