1

I have a list names.

names = ['Dr. Augsten, BÜNDNIS 90/DIE GRÜNEN', 'Dirk Adams, GRÜNE', 'Blechschmidt, DIE LINKE', 'Steffen Harzer, LINKE', 'Gerd Schuchardt, Minister für Wissenschaft, Forschung und Kultur', 'David-Christian Eckardt, SPD', 'Christine Ursula Klaus, SPD', 'Klaus von der Krone, CDU', 'Antje Ehrlich-Strathausen, SPD', 'Benno Lemke, PDS']

names = [re.sub('(?<!DIE)\sLINKE', ' DIE LINKE', line) for line in names]
names = [re.sub('(?<!DIE)\sGRÜNE', ' BÜNDNIS 90/DIE GRÜNEN', line) for line in names]
names = [re.sub('Die Linke', 'DIE LINKE', line) for line in names]
names = [re.sub('PDS', 'DIE LINKE', line) for line in names]
names = [re.sub('Dr.\s', '', line) for line in names]
actual_names = [re.sub('((?:^|(?:[.!?]\s))(\w+)\s)', '', line) for line in names]
print(actual_names)

actual_names = ['Augsten, BÜNDNIS 90/DIE GRÜNEN', 'Adams, BÜNDNIS 90/DIE GRÜNEN', 'Blechschmidt, DIE LINKE', 'Harzer, DIE LINKE', 'Schuchardt, Minister für Wissenschaft, Forschung und Kultur', 'David-Christian Eckardt, SPD', 'Ursula Klaus, SPD', 'von der Krone, CDU', 'Ehrlich-Strathausen, SPD', 'Lemke, DIE LINKE']

Questions:

  1. How do i need to change the regex in order to account for the names that have a - within them (see 'David-Christian Eckardt, SPD'
  2. How do i need to change the code in order to keep the original elements?

desired_names = ['Augsten, BÜNDNIS 90/DIE GRÜNEN', 'Adams, BÜNDNIS 90/DIE GRÜNEN', 'Adams, GRÜNE', 'Blechschmidt, DIE LINKE', 'Harzer, DIE LINKE', 'Harzer, LINKE', 'Schuchardt, Minister für Wissenschaft, Forschung und Kultur', 'Eckardt, SPD', 'Klaus, SPD', 'von der Krone, CDU', 'Ehrlich-Strathausen, SPD', 'Lemke, PDS', 'Lemke, DIE LINKE']

Order within list does not matter

Wiktor Stribiżew
631k41 gold badges502 silver badges633 bronze badges
asked Jul 5, 2022 at 16:04

1 Answer 1

1

Is regex in this case necessary? You can use str.split with maxsplit=1 parameter:

names = [
 "Dr. Augsten, BÜNDNIS 90/DIE GRÜNEN",
 "Dirk Adams, GRÜNE",
 "Blechschmidt, DIE LINKE",
 "Steffen Harzer, LINKE",
 "Gerd Schuchardt, Minister für Wissenschaft, Forschung und Kultur",
 "David-Christian Eckardt, SPD",
 "Christine Ursula Klaus, SPD",
 "Klaus von der Krone, CDU",
 "Antje Ehrlich-Strathausen, SPD",
 "Benno Lemke, PDS",
]
m = {"LINKE": "DIE LINKE", "GRÜNE": "BÜNDNIS 90/DIE GRÜNEN", "PDS": "DIE LINKE"}
out = [n.split(", ", maxsplit=1) for n in names]
out = [", ".join([a.split()[-1], m.get(b, b)]) for a, b in out]
print(out)

Prints:

[
 "Augsten, BÜNDNIS 90/DIE GRÜNEN",
 "Adams, BÜNDNIS 90/DIE GRÜNEN",
 "Blechschmidt, DIE LINKE",
 "Harzer, DIE LINKE",
 "Schuchardt, Minister für Wissenschaft, Forschung und Kultur",
 "Eckardt, SPD",
 "Klaus, SPD",
 "Krone, CDU",
 "Ehrlich-Strathausen, SPD",
 "Lemke, DIE LINKE",
]
answered Jul 5, 2022 at 16:16
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.