My friend wanted me to send her a list of songs from one of my iTunes playlists, so I exported the playlist within the app itself. It comes as a plain text file, where each field is separated a '\t'
and each line is separated by a '\r'
. Since there is so much extraneous information, I decided to write a Python script that would delete all fields but the song title and artist name.
How is the design and format of my code? How does it fit in with Python best practices? Is there a much easier way to do accomplish the same job that I just overlooked?
The input file looks like:
Name\tArtist\tComposer\tAlbum\tGrouping\tGenre\tSize\tTime\tDisc Number\tDisc Count\tTrack Number\tTrack Count\tYear\tDate Modified\tDate Added\tBit Rate\tSample Rate\tVolume Adjustment\tKind\tEqualizer\tComments\tPlays\tLast Played\tSkips\tLast Skipped\tMy Rating\tLocation You Shook Me All Night Long\tAC/DC\t\tThe Very Best\t\tHard Rock\t5468228\t212\t\t\t9\t\t2001\t4/17/12, 2:29 PM\t4/17/12, 2:26 PM\t192\t44100\t\tMPEG audio file\t\t\t5\t3/12/13, 10:41 PM\t\t\t\tMacintosh HD:Users:UserName:Music:iTunes:iTunes Media:Music:AC_DC:The Very Best:09 You Shook Me All Night Long.mp3
The output file looks like:
Name\tArtist\t You Shook Me All Night Long\tAC/DC\t Miss Murder\tAFI\t
My code is:
from sys import argv
def main(file):
with open(argv[1], 'r') as file:
data = file.read()
newdata = data.split('\r')
output = []
for line in newdata:
tabc = 0
newline = ""
for char in line:
newline += char
if char == '\t':
tabc += 1
if tabc == 2: break
output.append(newline)
outPutString = '\n'.join(output)
with open(argv[1][:-4]+'Out.txt', 'w') as file:
file.write(outPutString)
if __name__ == '__main__':
file = argv[1]
main(file)
-
\$\begingroup\$ Interesting question; sounds like this tool could be handy! \$\endgroup\$Phrancis– Phrancis2014年11月06日 17:32:45 +00:00Commented Nov 6, 2014 at 17:32
-
\$\begingroup\$ 21st century mix tape. \$\endgroup\$Rick– Rick2014年11月06日 17:45:39 +00:00Commented Nov 6, 2014 at 17:45
3 Answers 3
Simple mistake
The function main
takes a filename as an argument but doesn't use it. Instead, it retrieves it from argv
.
Also, filename
would be a better name for a filename than file
.
Easier solution
It would be easier to use a more specialized tool. It could be a one-line awk(1)
script:
awk 'BEGIN { RS="\r" } { print 1ドル "\t" 2ドル }' In.txt > Out.txt
The only non-trivial part is that you have to override the record separator character.
Python
I like that you used with
blocks for opening files. The program could use some improvement, though:
- Make use of the
fileinput
module to open files specified on the command line. The tricky bit, once again, is overriding the line separator. Since Python 2.5,fileinput.input()
accepts amode
parameter, in which you can specify universal newline support. - Avoid reading the entire file, and operate on a line at a time instead. It simplifies your program and scales better (not that your iTunes library would ever be huge enough to break your program, but it's a good programming practice).
- Iterating a character at a time is tedious. I recommend making use of
str.split()
. - Hard-coding the output filename hurts the reusability of your program. It would be better to parameterize the output filename. Even better, I recommend just printing to standard output and redirecting the output using the shell.
The result is quite simple:
import fileinput
for line in fileinput.input(mode='rU'):
columns = line.split("\t", 2) # <-- The 2 is optional, but I consider it good practice
print "\t".join(columns[:2])
-
\$\begingroup\$ How might I redirect the output in the shell? I'm on a mac using Terminal. \$\endgroup\$User1996– User19962014年11月11日 01:42:04 +00:00Commented Nov 11, 2014 at 1:42
-
\$\begingroup\$ When you invoke the Python program from the shell, append
> Out.txt
— similar to theawk
example. \$\endgroup\$200_success– 200_success2014年11月11日 02:11:38 +00:00Commented Nov 11, 2014 at 2:11
The middle part would be simpler this way:
with open(filename) as fh:
output = []
for line in fh:
parts = line.split('\t')[:2]
output.append('\t'.join(parts))
Improvements:
filename
should be a parameter received by the method, instead ofargv[1]
- No need for the
'r'
parameter inopen
, as that's the default anyway file
is a built-in name in Python, so it's better to name differently, for examplefh
- No need to read the entire data and then split by
\r
, you can read line by line fromfh
directly - Instead of iterating over the characters in the line, it's easier to just split it by
\t
It may seem inefficient to split by \t
, which might parse the entire line when you only need the first two columns.
I'm not sure if that's a serious concern.
Splitting by \t
has the advantage of simplicity and flexibility in case you later decide you want some other columns as well.
But if that bothers you, here's an alternative that only processes up to the second column:
with open(filename) as fh:
output = []
for line in fh:
first_two_tabs = line[:line.index('\t', line.index('\t') + 1)]
output.append(first_two_tabs)
UPDATE
As @pjz pointed out in a comment, the solution with split
can still be efficient by using a maxsplit=2
parameter like this:
parts = line.split('\t', 2)[:2]
Also, instead of accumulating the lines in a list, you could process the input and write the output at the same time like this:
filename_out = filename[:-4] + 'Out.txt'
with open(filename) as fh_in:
with open(out_filename, 'w') as fh_out:
for line in fh_in:
parts = line.split('\t', 2)[:2]
fh_out.write('\t'.join(parts) + '\n')
-
1\$\begingroup\$ A more efficient 'efficient way' would be to do
parts = line.split('\t',2)[:2]
which limits it to two splits (so three parts), of which you only care about the first two. Also, you could print as you go instead of storing everything into an output list. \$\endgroup\$pjz– pjz2014年11月10日 03:34:41 +00:00Commented Nov 10, 2014 at 3:34