Skip to main content
Stack Overflow
  1. About
  2. For Teams

You are not logged in. Your edit will be placed in a queue until it is peer reviewed.

We welcome edits that make the post easier to understand and more valuable for readers. Because community members review edits, please try to make the post substantially better than how you found it, for example, by fixing grammar or adding additional resources and hyperlinks.

Required fields*

Required fields*

Parsing Korean text into a list using regex

I have some data stored as pandas data frame and one of the columns contains text strings in Korean. I would like to process each of these text strings as follows:

my_string = '모질상태불량(피부상태불량, 심하게 야윔), 치석심함, 양측 수정체 백탁, 좌측 화농성 눈곱심함(7/22), 코로나음성(활력저하)'

Into a list like this:

parsed_text = '모질상태불량, 피부상태불량, 심하게 야윔, 치석심함, 양측 수정체 백탁, 좌측 화농성 눈곱심함(7/22), 코로나음성, 활력저하'

So the problem is to identify cases where a word (or several words) are followed by parentheses with text only (can be one words or several words separated by commas) and replace them by all the words (before and inside parentheses) separated by comma (for later processing). If a word is followed by parentheses containing numbers (as in this case 7/22), it should be kept as it is. If a word is not followed by any parentheses, it should also be kept as it is. Furthermore, I would like to preserve the order of words (as they appeared in the original string).

I can extract text in parentheses by using regex as follows:

corrected_string = re.findall(r'(\w+)\((\D.*?)\)', my_string)

which yields this:

[('모질상태불량', '피부상태불량, 심하게 야윔'), ('코로나음성', '활력저하')] 

But I'm having difficulty creating my resulting string, i.e. replacing my original text with the pattern I've matched. Any suggestions? Thank you.

Answer*

Draft saved
Draft discarded
Cancel
3
  • Thanks. Works great! I appreciate it. Just one thing, when you look at the 3rd word in the result list, it left the right parentheses in. Commented Jan 25, 2019 at 10:19
  • This is just for your understanding. You need to work from here. also see Wiktor Stribiżew's approach. Commented Jan 25, 2019 at 10:21
  • @ Rahul Thanks, Rahul. Commented Jan 25, 2019 at 10:26

lang-py

AltStyle によって変換されたページ (->オリジナル) /