Data structure for working with rows and columns

Question 1

I have data grabbed to Python that is in table form:

Name Sport Score 
John Golf 100
Jill Rugby 55
John Hockey 100
Bob Golf 45

How can I format this table in Python that would make it easy to sort or group items. For example, if I wanted to see all the names of people that played Golf or all of the people that scored 100 on any sport. Or all of the data for just John.

Question 2

Please clarify; is your problem storing this data, or printing this data?

Question 3

ordered dictionary or named tuple may serve your purpose

Question 4

@JesseTG The data will be stored and then written to excel

Question 5

@AhsanulHaque Everyone says that dictionaries are inherently unordered?

Question 6

Is there a reason why you do not use pandas ?

Question 7

pandas' DataFrame will be the way to go:

import pandas as pd
df = pd.DataFrame({'Name': ['John', 'Jill', 'John', 'Bob'], 
 'Sport' : ['Golf', 'Rugby', 'Hockey', 'Golf'],
 'Score': [100, 50, 100, 45]}) 
# the names of people that played Golf
df[df['Sport'] == 'Golf']['Name'].unique()
>> ['John' 'Bob']
# all of the people that scored 100 on any sport.
df[df['Score'] == 100]['Name'].unique()
>> ['John']
# all of the data for just John.
df[df['Name'] == 'John']
>> Name Score Sport
 0 John 100 Golf
 2 John 100 Hockey

Question 8

map and filter with namedtuples and lambdas can be used for this task.

from collections import namedtuple
# Create a named tuple to store the rows
Row = namedtuple('Row', ('name', 'sport', 'score'))
data = '''Name Sport Score 
 John Golf 100
 Jill Rugby 55
 John Hockey 100
 Bob Golf 45'''
# Read the data, skip the first line
lines = data.splitlines()[1:]
rows = []
for line in lines:
 name, sport, score = line.strip().split()
 rows.append(Row(name, sport, int(score)))
# People that played Golf
golf_filter = lambda row: row.sport == 'Golf'
golf_players = filter(golf_filter, rows)
# People that scored 100 on any sport
score_filter = lambda row: row.score == 100
scorers = filter(score_filter, rows)
# People named John
john_filter = lambda row: row.name == 'John'
john_data = filter(john_filter, rows)
# If you want a specific column than you can map the data
# Names of golf players
get_name = lambda row: row.name
golf_players_names = map(get_name, golf_players)

Results:

>>> golf_players
[Row(name='John', sport='Golf', score=100),
 Row(name='Bob', sport='Golf', score=45)]
>>> john_data
[Row(name='John', sport='Golf', score=100),
 Row(name='John', sport='Hockey', score=100)]
>>> scorers
[Row(name='John', sport='Golf', score=100),
 Row(name='John', sport='Hockey', score=100)]
>>> golf_players_names
['John', 'Bob']

Question 9

What about this one?

yourDS={"name":["John","Jill","John","Bob"],
 "sport":["Golf","Rugby","Hockey","Golf"],
 "score":[100,55,100,45]
}

This should hold the relation of each entry as list are ordered.

To avoid the effect of duplicate element in a list, first make a new set from the list.

For your expected query, you can do something like that.

for index,value in enumerate(yourDS["score"]):
 if value=="x":
 print yourDS["name"][index]

It's better to use a list to store the result and make it a set, to avoid some cases for example, if a man has score of x in two different games.

Question 10

But how would I "query" it, for example, how would I get all of the people that have score x

Question 11

You can create list of lists. each row will be a list inside a list.

lst1=[['John','Golf',100],['Jill','Rugby',55],['John','Hockey',100],['Bob','Golf',45]]
lst100=[]
for lst in lst1:
 if lst[2]==100:
 lst100.append(lst)
print lst100

Question 12

If you want to retrieve information based on your data, I'd go with SQL. It's well-suited to answering questions like these:

...to see all the names of people that played Golf...

...all of the people that scored 100 on any sport...

...all of the data for just John.

The most popular database language these days is SQL, and as it happens Python actually has built-in support for it through the sqlite3 module.

SQL, while not a monumental task to learn, is beyond the scope of this answer. To learn that, I'd recommend checking out Codecademy, Code School, or SQLZOO (they're all interactive).

Or, if you just want to read it in and write it out without caring about what it actually means, consider using the csv module, which is also built-in.

Question 13

Yes I thought this all sounds perfect for SQL but as I want to develop this code for users that may not have SQL installed (may even push this as .exe with dependancies bundled in) how would that affect things?

Question 14

Actually, sqlite3 is bundled with python, so it's already there. Part of the standard library.

DeepSpace 82.2k12 gold badges119 silver badges166 bronze badges · Accepted Answer · 2015-10-24 19:23:58Z

pandas' DataFrame will be the way to go:

import pandas as pd
df = pd.DataFrame({'Name': ['John', 'Jill', 'John', 'Bob'], 
 'Sport' : ['Golf', 'Rugby', 'Hockey', 'Golf'],
 'Score': [100, 50, 100, 45]}) 
# the names of people that played Golf
df[df['Sport'] == 'Golf']['Name'].unique()
>> ['John' 'Bob']
# all of the people that scored 100 on any sport.
df[df['Score'] == 100]['Name'].unique()
>> ['John']
# all of the data for just John.
df[df['Name'] == 'John']
>> Name Score Sport
 0 John 100 Golf
 2 John 100 Hockey

CollectivesTM on Stack Overflow

Data structure for working with rows and columns

5 Answers 5

Comments

Comments

1 Comment

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

5 Answers 5

Comments

Comments

1 Comment

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related