How to query columns that are lists or dicts?

Question 1

How can I query columns that are lists or dicts? Here is some basic JSON-like data.

[
 {
 "id": 1,
 "name": "John Doe",
 "age": 30,
 "email": "[email protected]",
 "isStudent": false,
 "hobbies": ["reading", "gaming", "hiking"],
 "address": {
 "street": "123 Main St",
 "city": "Anytown",
 "country": "USA"
 }
 },
 {
 "id": 2,
 "name": "Jane Smith",
 "age": 25,
 "email": "[email protected]",
 "isStudent": true,
 "hobbies": ["painting", "yoga", "photography"],
 "address": {
 "street": "456 Oak Ave",
 "city": "Somewhere",
 "country": "Canada"
 }
 },
 {
 "id": 3,
 "name": "Bob Johnson",
 "age": 42,
 "email": "[email protected]",
 "isStudent": false,
 "hobbies": ["cooking", "fishing", "gardening"],
 "address": {
 "street": "789 Pine Rd",
 "city": "Otherville",
 "country": "UK"
 }
 },
 {
 "id": 4,
 "name": "Alice Chen",
 "age": 28,
 "email": "[email protected]",
 "isStudent": false,
 "hobbies": ["coding", "chess", "traveling"],
 "address": {
 "street": "321 Maple Blvd",
 "city": "Techcity",
 "country": "USA"
 }
 },
 {
 "id": 5,
 "name": "David Wilson",
 "age": 19,
 "email": "[email protected]",
 "isStudent": true,
 "hobbies": ["basketball", "music", "movies"],
 "address": {
 "street": "654 Cedar Ln",
 "city": "University Town",
 "country": "Australia"
 }
 }
]

df = pd.read_json("file.json")

For example, I want to find anyone who put gaming as a hobby and whose city is Anytown.

My initial instinct was to either explode the list indices and dict items into new columns, or to read the value of the cell as a string for searching.

Question 2

Please add your Pandas code for reference. I ask because this example doesn't contain any columns per se.

Question 3

Could you also add an example query, one for hobbies and one for address? Conceptually, there's a big difference between searching for people who put cooking as a hobby vs people who put cooking as the first choice out of three.

Question 4

Ope, I just noticed you put those details in your answer, so I went ahead and added them to the question for you :)

Question 5

Why don't you load your data with pd.json_normalize:

df = pd.json_normalize(data) # assuming data is a python object
# if a string, use json.loads
# import json
# df = pd.json_normalize(json.loads(data))

This will directly expand the dictionary keys as new columns (address.street/address.city/address.country):

 id name age email isStudent hobbies address.street address.city address.country
0 1 John Doe 30 [email protected] False [reading, gaming, hiking] 123 Main St Anytown USA
1 2 Jane Smith 25 [email protected] True [painting, yoga, photography] 456 Oak Ave Somewhere Canada
2 3 Bob Johnson 42 [email protected] False [cooking, fishing, gardening] 789 Pine Rd Otherville UK
3 4 Alice Chen 28 [email protected] False [coding, chess, traveling] 321 Maple Blvd Techcity USA
4 5 David Wilson 19 [email protected] True [basketball, music, movies] 654 Cedar Ln University Town Australia

An alternative might be to use the str accessor (warning, this "hack" might change in the future*):

df = pd.read_json(data)
df.address.str['street']

Output:

0 123 Main St
1 456 Oak Ave
2 789 Pine Rd
3 321 Maple Blvd
4 654 Cedar Ln
Name: address, dtype: object

For example:

out = df[df.address.str['street'].str.contains('Ave')]

Output:

 id name age email isStudent hobbies address
1 2 Jane Smith 25 [email protected] True [painting, yoga, photography] {'street': '456 Oak Ave', 'city': 'Somewhere', 'country': 'Canada'}

* "Generally speaking, the .str accessor is intended to work only on strings. With very few exceptions, other uses are not supported, and may be disabled at a later point."

Question 6

I'd suggest two paths:

Preprocess the json before coming into Pandas
Use a library dedicated to ragged/non-rectanglular data, in this case akimbo

In [64]: import akimbo.pandas
In [65]: df.hobbies.ak
Out[65]: <Array [['reading', 'gaming', 'hiking'], ..., [...]] type='5 * var * ?string'>
In [66]: df.hobbies.ak[1]
Out[66]:
0 painting
1 yoga
2 photography
dtype: string[pyarrow]
In [67]: df.address.ak['street']
Out[67]:
0 123 Main St
1 456 Oak Ave
2 789 Pine Rd
3 321 Maple Blvd
4 654 Cedar Ln
dtype: string[pyarrow]
# match a substring
In [71]: df.address.ak['street'].ak.str.match_substring('Ave')
Out[71]:
0 False
1 True
2 False
3 False
4 False
dtype: bool[pyarrow]
In [72]: df.loc[df.address.ak['street'].ak.str.match_substring('Ave')]
Out[72]:
 id name age email isStudent hobbies address
1 2 Jane Smith 25 [email protected] True [painting, yoga, photography] {'street': '456 Oak Ave', 'city': 'Somewhere',...

Have a look at the docs for various ways of using the library. It is quite powerful. It is built on awkward-arrays so you have access to a plethora of methods, via the .ak attribute.

Question 7

Nice replacement if someday the behavior of str changes.

Question 8

Thanks for sharing. How would you actually query it for specific values though?

Question 9

You'll have to give some more details on what U mean by specific values

Question 10

added an example for searching for a value, if that is what u have in mind

Question 11

The Pandas community generally steers away from the use of apply/ lambda, but this is actually quite simple.

It's also more flexible than arbitrary explosion of the columns e.g. hobby_1, hobby_2, etc. because I can have lists of different lengths without sparsity. I can add/remove address fields as needed e.g. zip+4 or delivery_location.

df = pd.read_json("file.json")
df[
 # Working with lists
 df['hobbies'].apply(lambda x: 'gaming' in x) &
 # Working with dicts
 df['address'].apply(lambda x: x['city'] == 'Anytown')
]

Result

 id name age email isStudent hobbies address
0 1 John Doe 30 [email protected] False [reading, gaming, hiking] {'street': '123 Main St', 'city': 'Anytown', '...

Question 12

.apply() is generally avoided because it's slow on large datasets (since it operates at the Python layer instead of taking advantage of faster code at the C layer). If your datasets are small, have at it!

Question 13

"generally steers away from the use of apply/ lambda" - Maybe this is a minor thing, but I'm not sure why you say lambda here; I haven't seen anyone steer away from it in my experience, e.g. df.pipe(lambda) is perfectly idiomatic in the right context.

mozway 267k13 gold badges56 silver badges107 bronze badges · Accepted Answer · 2025-12-16 10:41:09Z

Why don't you load your data with pd.json_normalize:

df = pd.json_normalize(data) # assuming data is a python object
# if a string, use json.loads
# import json
# df = pd.json_normalize(json.loads(data))

This will directly expand the dictionary keys as new columns (address.street/address.city/address.country):

 id name age email isStudent hobbies address.street address.city address.country
0 1 John Doe 30 [email protected] False [reading, gaming, hiking] 123 Main St Anytown USA
1 2 Jane Smith 25 [email protected] True [painting, yoga, photography] 456 Oak Ave Somewhere Canada
2 3 Bob Johnson 42 [email protected] False [cooking, fishing, gardening] 789 Pine Rd Otherville UK
3 4 Alice Chen 28 [email protected] False [coding, chess, traveling] 321 Maple Blvd Techcity USA
4 5 David Wilson 19 [email protected] True [basketball, music, movies] 654 Cedar Ln University Town Australia

An alternative might be to use the str accessor (warning, this "hack" might change in the future*):

df = pd.read_json(data)
df.address.str['street']

Output:

0 123 Main St
1 456 Oak Ave
2 789 Pine Rd
3 321 Maple Blvd
4 654 Cedar Ln
Name: address, dtype: object

For example:

out = df[df.address.str['street'].str.contains('Ave')]

Output:

 id name age email isStudent hobbies address
1 2 Jane Smith 25 [email protected] True [painting, yoga, photography] {'street': '456 Oak Ave', 'city': 'Somewhere', 'country': 'Canada'}

* "Generally speaking, the .str accessor is intended to work only on strings. With very few exceptions, other uses are not supported, and may be disabled at a later point."

CollectivesTM on Stack Overflow

How to query columns that are lists or dicts?

3 Answers 3

Comments

4 Comments

Result

2 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

CollectivesTM on Stack Overflow

3 Answers 3

Comments

4 Comments

Result

2 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related