Extracting information from a string in Python

Question 1

My .csv data looks like this:

June 8, 2009 Monday
June 8, 2009 Monday
June 6, 2009 Saturday
June 6, 2009 Saturday Correction Appended
June 6, 2009 Saturday
June 6, 2009 Saturday
June 6, 2009 Saturday
etc...

The data spans 10 years. I need to separate the months and years (and don't care about the dates and days).

To single out months I have the next lines of code:

for row in reader:
 date = row[1]
 month = date.partition(' ')[0]
 print month

However I can't figure out how to extract the numeric year from the string? Would I have to use regex for this?

Question 2

If your month is June 6, then you can get 6 by month.split(" ")[1]

Question 3

@TheMonk OP wants to extract "the numeric year"

Question 4

@OhAuth Was in a hurry and glanced the same. Good catch. _/

Question 5

Try:

for row in reader:
 row_split = row[1].split()
 month = row_split[0]
 year = int(row_split[3])

Explaination

row[1] == "June 8, 2009 Monday"

Therefore:

row[1].split() == ["June", "8,", "2009", "Monday"]

So, your month and year are extracted as follows:

"June" == row[1].split()[0]
2009 == int(row[1].split()[2])

Question 6

Shouldn't both be [0]?

Question 7

@bereal There is a space at the start of the second column. If OPs data is truly comma separated this adds a blank string at the start of the split array. See the edit explanation.

Question 8

' a b c '.split() returns ['a', 'b', 'c'], unlike ' a b c '.split(' ')

Question 9

@bereal I originally had split(' '), however, a suggested edit changed it to split(). Needless to say I've fixed the issue from the suggested edit in my answer. +1 for the tip.

Question 10

Thanks for the explanation, but actually the data format I provided above is already extracted from row[1] in the .csv. So row[1] contains the full string e.g., June 6, 2009 Saturday Correction Appended.

thodic 2,2791 gold badge21 silver badges38 bronze badges · Accepted Answer · 2015-06-09 10:45:23Z

5

Try:

for row in reader:
 row_split = row[1].split()
 month = row_split[0]
 year = int(row_split[3])

Explaination

row[1] == "June 8, 2009 Monday"

Therefore:

row[1].split() == ["June", "8,", "2009", "Monday"]

So, your month and year are extracted as follows:

"June" == row[1].split()[0]
2009 == int(row[1].split()[2])

Share

Improve this answer

edited Jun 20, 2020 at 9:12

Community's user avatar

Community Bot

11 silver badge

answered Jun 9, 2015 at 10:45

thodic's user avatar

thodic

2,2791 gold badge21 silver badges38 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

bereal

bereal Over a year ago

Shouldn't both be [0]?

2015年06月09日T10:46:46.023Z+00:00

thodic

thodic Over a year ago

@bereal There is a space at the start of the second column. If OPs data is truly comma separated this adds a blank string at the start of the split array. See the edit explanation.

2015年06月09日T10:54:49.697Z+00:00

bereal

bereal Over a year ago

' a b c '.split() returns ['a', 'b', 'c'], unlike ' a b c '.split(' ')

2015年06月09日T10:58:27.003Z+00:00

thodic

thodic Over a year ago

@bereal I originally had split(' '), however, a suggested edit changed it to split(). Needless to say I've fixed the issue from the suggested edit in my answer. +1 for the tip.

2015年06月09日T11:02:41.36Z+00:00

Zlo

Zlo Over a year ago

Thanks for the explanation, but actually the data format I provided above is already extracted from row[1] in the .csv. So row[1] contains the full string e.g., June 6, 2009 Saturday Correction Appended.

2015年06月09日T11:05:43.88Z+00:00

|

CollectivesTM on Stack Overflow

Extracting information from a string in Python

1 Answer 1

Explaination

6 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

1 Answer 1

Explaination

6 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related