I'm an moderator of a forum and I need to prune all the bots that register on there.
As you can see below, I can list all the users by:
Username number_of_mssages register_date
Example:
- Thurman Valsin0190 0 Sat Jan 14, 2012 5:00 pm
- Rubye Tones01AD 0 Sat Jan 14, 2012 4:59 pm
I need a super simple Python little program that parses me each line of a text file, so I can get, from the string above, only the nick names.
- Thurman Valsin0190
- Rubye Tones01AD
This means that the program has to delete for each line the 0 and everything that is behind him. The text is taken from a .txt file.
I know this is not that difficult but I'm not a lot into Python.
Thanks in advance!
4 Answers 4
It's not a python question really, it's a regex/string parsing question...
Is it correct to say that every line contains the nickname, a tab character, and then a 0?
Then it should be as simple as:
(assuming line contains a single line from the file)
nickname = line.split("\t")[0]
Comments
consider using regular expressions:
import re
pattern = re.compile(r'(.*?)\s+0\s+')
pattern.findall('- Thurman Valsin0190 0 Sat Jan 14, 2012 5:00 pm')[0]
# - Thurman Valsin0190
2 Comments
match the substring before multiple white spaces following a 0 with trailing multiple whitespaces. You may give it a try.Why not split on 0 with leading spaces (or tabs) included as part of split key to avoid splitting other zeros:
with open("filename.txt", "r") as f:
for line in f:
nick = line.split(" 0 ")[0].strip() # OR .split("\t0\t") if those are tabs
print nick
2 Comments
Parse by splitting on " 0 " string e.g., extract-nickname.py:
#!/usr/bin/env python
import fileinput
for line in fileinput.input():
nick, sep, rest = line.partition(" 0 ")
if sep:
print(nick.strip())
It assumes that nicknames can't contain " 0 " and leading/trailing whitespace is not a part of a nickname. Otherwise you could use line.partition("\t") if a tab character is a separator between Username and number_of_mssages.
Example
$ python extract-nickname.py log.txt
- Thurman Valsin0190
- Rubye Tones01AD
If you need to change the file inplace then you could specify inplace=True parameter to fileinput.input() function.
0will always be0? or can be any one digit number?