Extracting two columns from exported iTunes playlist

Question 1

My friend wanted me to send her a list of songs from one of my iTunes playlists, so I exported the playlist within the app itself. It comes as a plain text file, where each field is separated a '\t' and each line is separated by a '\r'. Since there is so much extraneous information, I decided to write a Python script that would delete all fields but the song title and artist name.

How is the design and format of my code? How does it fit in with Python best practices? Is there a much easier way to do accomplish the same job that I just overlooked?

The input file looks like:

Name\tArtist\tComposer\tAlbum\tGrouping\tGenre\tSize\tTime\tDisc Number\tDisc Count\tTrack Number\tTrack Count\tYear\tDate Modified\tDate Added\tBit Rate\tSample Rate\tVolume Adjustment\tKind\tEqualizer\tComments\tPlays\tLast Played\tSkips\tLast Skipped\tMy Rating\tLocation
You Shook Me All Night Long\tAC/DC\t\tThe Very Best\t\tHard Rock\t5468228\t212\t\t\t9\t\t2001\t4/17/12, 2:29 PM\t4/17/12, 2:26 PM\t192\t44100\t\tMPEG audio file\t\t\t5\t3/12/13, 10:41 PM\t\t\t\tMacintosh HD:Users:UserName:Music:iTunes:iTunes Media:Music:AC_DC:The Very Best:09 You Shook Me All Night Long.mp3

The output file looks like:

Name\tArtist\t
You Shook Me All Night Long\tAC/DC\t
Miss Murder\tAFI\t

My code is:

from sys import argv
def main(file):
 with open(argv[1], 'r') as file:
 data = file.read()
 newdata = data.split('\r')
 output = []
 for line in newdata:
 tabc = 0
 newline = ""
 for char in line:
 newline += char
 if char == '\t':
 tabc += 1
 if tabc == 2: break
 output.append(newline)
 outPutString = '\n'.join(output)
 with open(argv[1][:-4]+'Out.txt', 'w') as file:
 file.write(outPutString)
if __name__ == '__main__':
 file = argv[1]
 main(file)

Question 2

Interesting question; sounds like this tool could be handy!

Question 3

21st century mix tape.

Question 4

Simple mistake

The function main takes a filename as an argument but doesn't use it. Instead, it retrieves it from argv.

Also, filename would be a better name for a filename than file.

Question 5

Easier solution

It would be easier to use a more specialized tool. It could be a one-line awk(1) script:

awk 'BEGIN { RS="\r" } { print 1ドル "\t" 2ドル }' In.txt > Out.txt

The only non-trivial part is that you have to override the record separator character.

Python

I like that you used with blocks for opening files. The program could use some improvement, though:

Make use of the fileinput module to open files specified on the command line. The tricky bit, once again, is overriding the line separator. Since Python 2.5, fileinput.input() accepts a mode parameter, in which you can specify universal newline support.
Avoid reading the entire file, and operate on a line at a time instead. It simplifies your program and scales better (not that your iTunes library would ever be huge enough to break your program, but it's a good programming practice).
Iterating a character at a time is tedious. I recommend making use of str.split().
Hard-coding the output filename hurts the reusability of your program. It would be better to parameterize the output filename. Even better, I recommend just printing to standard output and redirecting the output using the shell.

The result is quite simple:

import fileinput
for line in fileinput.input(mode='rU'):
 columns = line.split("\t", 2) # <-- The 2 is optional, but I consider it good practice
 print "\t".join(columns[:2])

Question 6

How might I redirect the output in the shell? I'm on a mac using Terminal.

Question 7

When you invoke the Python program from the shell, append > Out.txt — similar to the awk example.

Question 8

The middle part would be simpler this way:

with open(filename) as fh:
 output = []
 for line in fh:
 parts = line.split('\t')[:2]
 output.append('\t'.join(parts))

Improvements:

filename should be a parameter received by the method, instead of argv[1]
No need for the 'r' parameter in open, as that's the default anyway
file is a built-in name in Python, so it's better to name differently, for example fh
No need to read the entire data and then split by \r, you can read line by line from fh directly
Instead of iterating over the characters in the line, it's easier to just split it by \t

It may seem inefficient to split by \t, which might parse the entire line when you only need the first two columns. I'm not sure if that's a serious concern. Splitting by \t has the advantage of simplicity and flexibility in case you later decide you want some other columns as well. But if that bothers you, here's an alternative that only processes up to the second column:

with open(filename) as fh:
 output = []
 for line in fh:
 first_two_tabs = line[:line.index('\t', line.index('\t') + 1)]
 output.append(first_two_tabs)

UPDATE

As @pjz pointed out in a comment, the solution with split can still be efficient by using a maxsplit=2 parameter like this:

 parts = line.split('\t', 2)[:2]

Also, instead of accumulating the lines in a list, you could process the input and write the output at the same time like this:

filename_out = filename[:-4] + 'Out.txt'
with open(filename) as fh_in:
 with open(out_filename, 'w') as fh_out:
 for line in fh_in:
 parts = line.split('\t', 2)[:2]
 fh_out.write('\t'.join(parts) + '\n')

Question 9

A more efficient 'efficient way' would be to do parts = line.split('\t',2)[:2] which limits it to two splits (so three parts), of which you only care about the first two. Also, you could print as you go instead of storing everything into an output list.

SylvainD SylvainD 29.7k1 gold badge49 silver badges93 bronze badges · Answer 1 · 2014-11-06 17:44:29Z

Simple mistake

The function main takes a filename as an argument but doesn't use it. Instead, it retrieves it from argv.

Also, filename would be a better name for a filename than file.

200_success 200_success 145k22 gold badges190 silver badges478 bronze badges · Answer 2 · 2014-11-10 00:23:11Z

Easier solution

It would be easier to use a more specialized tool. It could be a one-line awk(1) script:

awk 'BEGIN { RS="\r" } { print 1ドル "\t" 2ドル }' In.txt > Out.txt

The only non-trivial part is that you have to override the record separator character.

Python

I like that you used with blocks for opening files. The program could use some improvement, though:

Make use of the fileinput module to open files specified on the command line. The tricky bit, once again, is overriding the line separator. Since Python 2.5, fileinput.input() accepts a mode parameter, in which you can specify universal newline support.
Avoid reading the entire file, and operate on a line at a time instead. It simplifies your program and scales better (not that your iTunes library would ever be huge enough to break your program, but it's a good programming practice).
Iterating a character at a time is tedious. I recommend making use of str.split().
Hard-coding the output filename hurts the reusability of your program. It would be better to parameterize the output filename. Even better, I recommend just printing to standard output and redirecting the output using the shell.

The result is quite simple:

import fileinput
for line in fileinput.input(mode='rU'):
 columns = line.split("\t", 2) # <-- The 2 is optional, but I consider it good practice
 print "\t".join(columns[:2])

How might I redirect the output in the shell? I'm on a mac using Terminal.
When you invoke the Python program from the shell, append > Out.txt — similar to the awk example.

janos janos 113k15 gold badges154 silver badges396 bronze badges · Answer 3 · 2014-11-08 09:04:57Z

The middle part would be simpler this way:

with open(filename) as fh:
 output = []
 for line in fh:
 parts = line.split('\t')[:2]
 output.append('\t'.join(parts))

Improvements:

filename should be a parameter received by the method, instead of argv[1]
No need for the 'r' parameter in open, as that's the default anyway
file is a built-in name in Python, so it's better to name differently, for example fh
No need to read the entire data and then split by \r, you can read line by line from fh directly
Instead of iterating over the characters in the line, it's easier to just split it by \t

It may seem inefficient to split by \t, which might parse the entire line when you only need the first two columns. I'm not sure if that's a serious concern. Splitting by \t has the advantage of simplicity and flexibility in case you later decide you want some other columns as well. But if that bothers you, here's an alternative that only processes up to the second column:

with open(filename) as fh:
 output = []
 for line in fh:
 first_two_tabs = line[:line.index('\t', line.index('\t') + 1)]
 output.append(first_two_tabs)

UPDATE

As @pjz pointed out in a comment, the solution with split can still be efficient by using a maxsplit=2 parameter like this:

 parts = line.split('\t', 2)[:2]

Also, instead of accumulating the lines in a list, you could process the input and write the output at the same time like this:

filename_out = filename[:-4] + 'Out.txt'
with open(filename) as fh_in:
 with open(out_filename, 'w') as fh_out:
 for line in fh_in:
 parts = line.split('\t', 2)[:2]
 fh_out.write('\t'.join(parts) + '\n')

A more efficient 'efficient way' would be to do parts = line.split('\t',2)[:2] which limits it to two splits (so three parts), of which you only care about the first two. Also, you could print as you go instead of storing everything into an output list.

Stack Exchange Network

Extracting two columns from exported iTunes playlist

3 Answers 3

Easier solution

Python

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Extracting two columns from exported iTunes playlist

3 Answers 3

Easier solution

Python

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions