I have these two XMLs that I want to merge together based on these conditions:
- If any of the element is more than 7 weeks old, it has to be deleted.
- If any of the element is newer with the same date (regardless of time), take everything from the new XML.
Sample XML:
<programme start="20150317173000 -0000" stop="20150317180000">
<something else />
</programme>
I use start
field to check if the element should be deleted or not.
old_xml = ET.fromstring(old)
new_xml = ET.fromstring(new)
new_programmes = new_xml.findall('programme')
dates_to_delete = set()
programmes_to_add = []
for new_programme in new_programmes:
start = new_programme.get('start')
new_prog_st_t = datetime.datetime.strptime(start, DATETIME_FORMAT)
new_prog_str_t = new_prog_st_t.strftime('%Y%m%d')
dates_to_delete.add(new_prog_str_t)
svn_wks = datetime.datetime.now(TIMEZONE) - datetime.timedelta(weeks=7)
is_less_than_7_weeks = new_prog_st_t > svn_wks
if is_less_than_7_weeks:
programmes_to_add.append(new_programme)
all_programmes = old_xml.findall('programme')
for programme in all_programmes:
start = programme.get('start')
start_time = datetime.datetime.strptime(start, DATETIME_FORMAT)
svn_wks = datetime.datetime.now(TIMEZONE) - datetime.timedelta(weeks=7)
is_older_than_7_weeks = start_time < svn_wks
if is_older_than_7_weeks:
old_xml.remove(programme)
else:
start_st_t = start_time.strftime('%Y%m%d')
if start_st_t in dates_to_delete:
old_xml.remove(programme)
for p in programmes_to_add:
old_xml.append(p)
return old_xml
2 Answers 2
Eliminate for loops where you can
You can probably be more aggressive with
list comprehensions
. I would replace yourfor programme in all_programmes
with a list comprehension that calculated the date and created a list of indices to delete (but I suspect many others would vehemently disagree with this advice):enumerated_programmes = enumerate(all_programmes) to_keep = [i for i,p in enumerated_programmes if datetime.datetime.strptime(p.get('start'), DATETIME_FORMAT) < svn_wks] ... all_programmes = [all_programmes[i] for i in to_keep]
You can make a second list of indices that should be replaced with a newer record, using similar logic.
Naming
- Why not use
old_programmes
as a name instead ofall_programmes
to make the binary nature of the code clear?
Confusing code
- Why is there a
return
statement at the bottom of your code? I didn't see anydef
statements so it's not clear what/why you're returning/
Use available Python functions
There is no need to combine
append
with afor
loop. Instead you can extend one list with another. Say you havelist a
andlist a
. Usea.extend(b)
rather than appending each element ofb
. The code will be clearer, shorter, and more efficient.Similar to my advice above about using
list comprehensions
more aggressively, you could also usingfilter
rather thanif, else
statements. This would take more logic out of yourfor
loops, which is always a good thing.
DRY
- You compute
svn_weeks
in each iteration of each of yourfor loops
but it's a constant for whatever day you are running it. You should compute it once, before either of yourfor loops
.
Naming
Most of your naming is pretty good, there are a few variables that I'd consider renaming though.
svn_wks
- This is a pointless abbreviation. Since it is abbreviated, I don't know whatsvn
means. This should be expanded to something like???_weeks
.new_prog_st_t
,new_prog_str_t
, andstart_st_t
- These all have cryptic abbreviations, likest
, ort
which further obfuscate the meaning of the variable names. Preferably, you should only be using abbreviations if the abbreviation is clear enough, and maintains clarity and readability.p
- In thefor
loopfor p in programmes_to_add:
,p
should preferably be renamed to something likeprogram
, orprogramme
.
Nitpicks
You have two variables, is_less_than_7_weeks
, and is_older_than_7_weeks
. Rather than creating condition variables, I'd just write the two if
statements like this:
if new_prog_st_t > svn_wks:
programmes_to_add.append(new_programme)
...
if start_time < svn_wks:
old_xml.remove(programme)
If you really think that the extra clarity is needed, you can always drop a few inline comments (#
) above them.
Other than those issues, you code looks really good! Good job!