I'm looping through a XML document and matching usernames from a txt-file.
The txt looks like:
DPL bot
Nick Number
White whirlwind
Polisci
Flannel
And the program looks like:
import xmltodict, json
with open('testarticles.xml', encoding='latin-1') as xml_file:
dic_xml = xmltodict.parse(xml_file.read())
for page in dic_xml['mediawiki']['page']:
for rev in page['revision']:
for user in open("usernames.txt", "r"):
print(user)
if 'username' in rev['contributor'] and rev['contributor']['username'] == user:
print(user)
print(rev['timestamp'])
timestamp = rev['timestamp'];
try:
print(rev['comment'])
comment = rev['comment'];
except:
print("no comment")
comment = ''
print('\n')
with open("User data/" + user + ".json", "a") as outfile:
json.dump({"timestamp": timestamp, "comment": comment}, outfile)
outfile.write('\n')
The problem is that the program only goes through the if-statement for the last line in the text file. It prints all the users' names before the if-statement. All users have matching posts in the XML-file and by changing to another user at the end line, that user's data is extracted into the json file.
1 Answer 1
Maybe all lines except the last have a newline at the end...
Try this:
for user in open("usernames.txt", "r"):
user = user.strip()
if 'username' in rev['contributor'] and rev...
or use this construct so we don't get a headache debating whether or not your code works like a with statement or not :P
with open("usernames.txt", "r") as f:
for line in f:
user = line.strip()
if 'username' in rev['contributor'] and rev...
The main thing is user = user.strip() or user = line.strip()
When in doubt, look at the binary. That goes for all encoding issues as well since encoding is just a way of transforming ones and zeros to characters according to some translation table/code page.
"\n".encode("hex") == "0a" # True
# so if
user.encode("hex")
# has "0a" at the end, there is definitely a newline after "user"
2 Comments
user = user.strip() or line = line.strip() but make a separate variable that's only used in the comparison - or simply add the newline back in when you write to a file again (which you're already doing here: outfile.write('\n')). Alternatively, add all usernames to a dictionary instead and only write and dump the whole thing as json once - after the loop (I’d probably do that instead of appending to the file once per user/line)
elseclause and printrev['contributor']to see what's going on when it fails? Tryif 'username' in rev['contributor'] and rev['contributor']['username'] == user.strip():for line in open(...):automatically assumewithcontext and thus close the file when the loop is done?open()doesn't work like that normally but, then again, it'sopen()plus aforloop which is a rather "ephemeral" construct that doesn't persist beyond it's scope. Also, there's nothing to callfile.close()on sinceopen()has no handle when used in aforloop. It's all speculation though hehe.