Python: list of strings

Question 1

I am trying to look at a .txt file and make a list of words in it. I want the words to be strings, but the ouput makes them lists.

import csv, math, os
os.chdir(r'C:\Users\jmela\canopy')
f=open("romeo.txt")
words = []
 for row in csv.reader(f):
 line = str(row)
 for word in line.split():
 if word not in words: 
 print word
 words.append(word)
 words.sort()
 print words

Does anyone know what I am doing wrong?

Question 2

Why in the earth you convert your rows to string then split that?

Question 3

This doesn't directly address your problem, but if you want a collection that has no duplicate values, consider using a set.

Question 4

You are getting a list of strings, you probably are confusing it because some of them have [ in them. See @Kasra comment for why

Question 5

how does your text file looks like? csv reader try to read rows and split columns based on delimiter. if your file is a list of words separated with comma, "row" will already be a list of words as strings.

Question 6

When I try to do it directly: for row in csv.reader(f): for word in row.split(): if word not in words: print word words.append(word) I get this error: AttributeError: 'list' object has no attribute 'split'

Question 7

based on your latest comment, doesn't look like you really need to use csv reader. just try this:

words = []
for line in open("romeo.txt", "r"):
 for word in line.split():
 if word not in words: 
 words.append(word)
words.sort()
print words

and like Kevin suggested, use set() instead of list.

Question 8

Thanks, that works perfectly. I don't follow what was wrong with my original code. Do you know why that didn't work?

Question 9

yes like I said csv reader split every row into columns based on given delimiter (default to comma). so row was actually something like ["this is a sentence"] (list with one string which is the whole line, since there were no commas), and then you turned it into string (eg '["this is a sentence"]'), and then you tried to split it based on spaces... please read about csv reader some more, and next time you should debug and see what you get in every iteration of the loop, it will save you some time.. :)

Question 10

I understand this & have learned from your explanation. Thank you.

Question 11

Don't read the text file as csv then. Simply remove all punctuation and non-letter/non-space characters like this:

def replacePunct(string):
 alphabets = " abcdefghijklmnopqrstuvwxyz"
 for s in string:
 if s not in alphabets:
 string = string.replace(s, " ")
 replacePunct(string)
 string = string.split()
 string = [x for x in string if x != " "]
 return {set(string): len(string)}

Question 12

Read the file as a normal text file and run this program for each line

Question 13

You could use a set to hold your words. This would give you a unique word list. Any non-alpha characters and converted to spaces. The line is split into words and lowercased to make sure they match.

word_set = set()
re_nonalpha = re.compile('[^a-zA-Z ]+')
with open(r"romeo.txt", "r") as f_input:
 for line in f_input:
 line = re_nonalpha.sub(' ', line) # Convert all non a-z to spaces
 for word in line.split():
 word_set.add(word.lower())
word_list = list(word_set)
word_list.sort()
print word_list

This would give you a list holding the following words:

['already', 'and', 'arise', 'breaks', 'but', 'east', 'envious', 'fair', 'grief', 'is', 'it', 'juliet', 'kill', 'light', 'moon', 'pale', 'sick', 'soft', 'sun', 'the', 'through', 'what', 'who', 'window', 'with', 'yonder']

Updated to also remove any punctuation.

Question 14

Make sure to account for an extra space or hyphens

Ronen Ness 10.9k4 gold badges36 silver badges51 bronze badges · Accepted Answer · 2015-07-12 14:06:14Z

1

based on your latest comment, doesn't look like you really need to use csv reader. just try this:

words = []
for line in open("romeo.txt", "r"):
 for word in line.split():
 if word not in words: 
 words.append(word)
words.sort()
print words

and like Kevin suggested, use set() instead of list.

Share

Improve this answer

answered Jul 12, 2015 at 14:06

Ronen Ness's user avatar

Ronen Ness

10.9k4 gold badges36 silver badges51 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Joe

Joe Over a year ago

Thanks, that works perfectly. I don't follow what was wrong with my original code. Do you know why that didn't work?

2015年07月12日T14:16:57.893Z+00:00

Ronen Ness

Ronen Ness Over a year ago

yes like I said csv reader split every row into columns based on given delimiter (default to comma). so row was actually something like ["this is a sentence"] (list with one string which is the whole line, since there were no commas), and then you turned it into string (eg '["this is a sentence"]'), and then you tried to split it based on spaces... please read about csv reader some more, and next time you should debug and see what you get in every iteration of the loop, it will save you some time.. :)

2015年07月12日T14:20:24.883Z+00:00

Joe

Joe Over a year ago

I understand this & have learned from your explanation. Thank you.

2015年07月12日T14:43:08.82Z+00:00

CollectivesTM on Stack Overflow

Python: list of strings

3 Answers 3

3 Comments

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

3 Answers 3

3 Comments

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related