using a text file in python

Question 1

Im trying to take a text file and use only the first 30 lines of it in python. this is what I wrote:

text = open("myText.txt")
lines = myText.readlines(30)
print lines

for some reason I get more then 150 lines when I print? What am I doing wrong?

Question 2

Shouldn't be lines = text.readlines(30)?

Question 3

Use itertools.islice

import itertools
for line in itertools.islice(open("myText.txt"), 0, 30)):
 print line

Question 4

This solution seems to be affected by the same limitation of @ShawnChin one: it appears the entire file is loaded into memory before the slicing. I got: [1.9277660846710205, 1.9260480403900146, 1.9186549186706543] for a file of about 500 lines, and [1.5532219409942627, 1.5311739444732666, 1.5274620056152344] for one of 50, but I would appreciate cross-checking my findings...

Question 5

@mac no it doesn't. If you pass a file object into islice and repeat the operation twice, you'll see that it continues where it left off i.e. the file was not read till the end.

Question 6

@ShawnChin - Thank you for this, it's most definitively a better way to test than using times as I did! :)

Question 7

If you are going to process your lines individually, an alternative could be to use a loop:

file = open('myText.txt')
for i in range(30):
 line = file.readline()
 # do stuff with line here

EDIT: some of the comments below express concern about this method assuming there are at least 30 lines in the file. If that is an issue for your application, you can check the value of line before processing. readline() will return an empty string '' once EOF has been reached:

for i in range(30):
 line = file.readline()
 if line == '': # note that an empty line will return '\n', not ''!
 break
 index = new_index
 # do stuff with line here

Question 8

@CésarBustíos - right, I just tried it locally with a smaller file and did not remember to update the code. Fixed!

Question 9

This only works if you know there are 30 lines. Otherwise the last readlines() will return "".

Question 10

@AndrewDalke - did you mean "if you know there are at least 30 lines" and then readline() instead of readlines()?

Question 11

@mac he meant if the file has less than 30 lines, then the remaining calls to readline() will return "". You're still iterating through 30 values even if there are less lines in the file.

Question 12

@mac: yes, and yes. "If there are 5 people waiting for a cashier then open up a new register" doesn't mean that if 6 people are waiting then a new register won't open up. And by "readlines()" I meant "readline()s", meaning "final calls to readline()".

Question 13

The sizehint argument for readlines isn't what you think it is (bytes, not lines).

If you really want to use readlines, try text.readlines()[:30] instead.

Do note that this is inefficient for large files as it first creates a list containing the whole file before returning a slice of it.

A straight-forward solution would be to use readline within a loop (as shown in mac's answer).

To handle files of various sizes (more or less than 30), Andrew's answer provides a robust solution using itertools.islice(). To achieve similar results without itertools, consider:

output = [line for _, line in zip(range(30), open("yourfile.txt", "r"))]

or as a generator expression (Python>2.4):

output = (line for _, line in zip(range(30), open("yourfile.txt", "r")))
for line in output:
 # do something with line.

Question 14

Not entirely sure, but won't this read all the lines into memory for then keeping just the first 30?

Question 15

The argument for readlines is the size (in bytes) that you want to read in. Apparently 150+ lines is 30 bytes worth of data.

Doing it with a for loop instead will give you proper results. Unfortunately, there doesn't seem to be a better built-in function for that.

Andrew Dalke 15.4k4 gold badges41 silver badges54 bronze badges · Accepted Answer · 2011-11-30 19:08:04Z

5

Use itertools.islice

import itertools
for line in itertools.islice(open("myText.txt"), 0, 30)):
 print line

Share

Improve this answer

answered Nov 30, 2011 at 19:08

Andrew Dalke's user avatar

Andrew Dalke

15.4k4 gold badges41 silver badges54 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

mac

mac Over a year ago

This solution seems to be affected by the same limitation of @ShawnChin one: it appears the entire file is loaded into memory before the slicing. I got: [1.9277660846710205, 1.9260480403900146, 1.9186549186706543] for a file of about 500 lines, and [1.5532219409942627, 1.5311739444732666, 1.5274620056152344] for one of 50, but I would appreciate cross-checking my findings...

2011年12月01日T12:54:11.99Z+00:00

Shawn Chin

Shawn Chin Over a year ago

@mac no it doesn't. If you pass a file object into islice and repeat the operation twice, you'll see that it continues where it left off i.e. the file was not read till the end.

2011年12月01日T13:10:14.657Z+00:00

mac

mac Over a year ago

@ShawnChin - Thank you for this, it's most definitively a better way to test than using times as I did! :)

2011年12月01日T13:19:38.84Z+00:00

CollectivesTM on Stack Overflow

using a text file in python

4 Answers 4

3 Comments

7 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

4 Answers 4

3 Comments

7 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related