I have data grabbed to Python that is in table form:
Name Sport Score
John Golf 100
Jill Rugby 55
John Hockey 100
Bob Golf 45
How can I format this table in Python that would make it easy to sort or group items. For example, if I wanted to see all the names of people that played Golf or all of the people that scored 100 on any sport. Or all of the data for just John.
-
Please clarify; is your problem storing this data, or printing this data?JesseTG– JesseTG2015年10月24日 19:13:50 +00:00Commented Oct 24, 2015 at 19:13
-
ordered dictionary or named tuple may serve your purposeAhasanul Haque– Ahasanul Haque2015年10月24日 19:13:51 +00:00Commented Oct 24, 2015 at 19:13
-
@JesseTG The data will be stored and then written to excelProf– Prof2015年10月24日 19:21:32 +00:00Commented Oct 24, 2015 at 19:21
-
@AhsanulHaque Everyone says that dictionaries are inherently unordered?Prof– Prof2015年10月24日 19:21:53 +00:00Commented Oct 24, 2015 at 19:21
-
Is there a reason why you do not use pandas ?Moritz– Moritz2015年10月24日 19:22:42 +00:00Commented Oct 24, 2015 at 19:22
5 Answers 5
pandas' DataFrame will be the way to go:
import pandas as pd
df = pd.DataFrame({'Name': ['John', 'Jill', 'John', 'Bob'],
'Sport' : ['Golf', 'Rugby', 'Hockey', 'Golf'],
'Score': [100, 50, 100, 45]})
# the names of people that played Golf
df[df['Sport'] == 'Golf']['Name'].unique()
>> ['John' 'Bob']
# all of the people that scored 100 on any sport.
df[df['Score'] == 100]['Name'].unique()
>> ['John']
# all of the data for just John.
df[df['Name'] == 'John']
>> Name Score Sport
0 John 100 Golf
2 John 100 Hockey
Comments
map and filter with namedtuples and lambdas can be used for this task.
from collections import namedtuple
# Create a named tuple to store the rows
Row = namedtuple('Row', ('name', 'sport', 'score'))
data = '''Name Sport Score
John Golf 100
Jill Rugby 55
John Hockey 100
Bob Golf 45'''
# Read the data, skip the first line
lines = data.splitlines()[1:]
rows = []
for line in lines:
name, sport, score = line.strip().split()
rows.append(Row(name, sport, int(score)))
# People that played Golf
golf_filter = lambda row: row.sport == 'Golf'
golf_players = filter(golf_filter, rows)
# People that scored 100 on any sport
score_filter = lambda row: row.score == 100
scorers = filter(score_filter, rows)
# People named John
john_filter = lambda row: row.name == 'John'
john_data = filter(john_filter, rows)
# If you want a specific column than you can map the data
# Names of golf players
get_name = lambda row: row.name
golf_players_names = map(get_name, golf_players)
Results:
>>> golf_players
[Row(name='John', sport='Golf', score=100),
Row(name='Bob', sport='Golf', score=45)]
>>> john_data
[Row(name='John', sport='Golf', score=100),
Row(name='John', sport='Hockey', score=100)]
>>> scorers
[Row(name='John', sport='Golf', score=100),
Row(name='John', sport='Hockey', score=100)]
>>> golf_players_names
['John', 'Bob']
Comments
What about this one?
yourDS={"name":["John","Jill","John","Bob"],
"sport":["Golf","Rugby","Hockey","Golf"],
"score":[100,55,100,45]
}
This should hold the relation of each entry as list are ordered.
To avoid the effect of duplicate element in a list, first make a new set from the list.
For your expected query, you can do something like that.
for index,value in enumerate(yourDS["score"]):
if value=="x":
print yourDS["name"][index]
It's better to use a list to store the result and make it a set, to avoid some cases for example, if a man has score of x in two different games.
1 Comment
You can create list of lists. each row will be a list inside a list.
lst1=[['John','Golf',100],['Jill','Rugby',55],['John','Hockey',100],['Bob','Golf',45]]
lst100=[]
for lst in lst1:
if lst[2]==100:
lst100.append(lst)
print lst100
Comments
If you want to retrieve information based on your data, I'd go with SQL. It's well-suited to answering questions like these:
...to see all the names of people that played Golf...
...all of the people that scored 100 on any sport...
...all of the data for just John.
The most popular database language these days is SQL, and as it happens Python actually has built-in support for it through the sqlite3 module.
SQL, while not a monumental task to learn, is beyond the scope of this answer. To learn that, I'd recommend checking out Codecademy, Code School, or SQLZOO (they're all interactive).
Or, if you just want to read it in and write it out without caring about what it actually means, consider using the csv module, which is also built-in.
2 Comments
sqlite3 is bundled with python, so it's already there. Part of the standard library.