How can I query columns that are lists or dicts? Here is some basic JSON-like data.
[
{
"id": 1,
"name": "John Doe",
"age": 30,
"email": "[email protected]",
"isStudent": false,
"hobbies": ["reading", "gaming", "hiking"],
"address": {
"street": "123 Main St",
"city": "Anytown",
"country": "USA"
}
},
{
"id": 2,
"name": "Jane Smith",
"age": 25,
"email": "[email protected]",
"isStudent": true,
"hobbies": ["painting", "yoga", "photography"],
"address": {
"street": "456 Oak Ave",
"city": "Somewhere",
"country": "Canada"
}
},
{
"id": 3,
"name": "Bob Johnson",
"age": 42,
"email": "[email protected]",
"isStudent": false,
"hobbies": ["cooking", "fishing", "gardening"],
"address": {
"street": "789 Pine Rd",
"city": "Otherville",
"country": "UK"
}
},
{
"id": 4,
"name": "Alice Chen",
"age": 28,
"email": "[email protected]",
"isStudent": false,
"hobbies": ["coding", "chess", "traveling"],
"address": {
"street": "321 Maple Blvd",
"city": "Techcity",
"country": "USA"
}
},
{
"id": 5,
"name": "David Wilson",
"age": 19,
"email": "[email protected]",
"isStudent": true,
"hobbies": ["basketball", "music", "movies"],
"address": {
"street": "654 Cedar Ln",
"city": "University Town",
"country": "Australia"
}
}
]
df = pd.read_json("file.json")
For example, I want to find anyone who put gaming as a hobby and whose city is Anytown.
My initial instinct was to either explode the list indices and dict items into new columns, or to read the value of the cell as a string for searching.
-
1Please add your Pandas code for reference. I ask because this example doesn't contain any columns per se.wjandrea– wjandrea2025年12月16日 15:29:26 +00:00Commented Dec 16, 2025 at 15:29
-
Could you also add an example query, one for hobbies and one for address? Conceptually, there's a big difference between searching for people who put cooking as a hobby vs people who put cooking as the first choice out of three.wjandrea– wjandrea2025年12月16日 15:34:13 +00:00Commented Dec 16, 2025 at 15:34
-
1Ope, I just noticed you put those details in your answer, so I went ahead and added them to the question for you :)wjandrea– wjandrea2025年12月16日 15:45:24 +00:00Commented Dec 16, 2025 at 15:45
3 Answers 3
Why don't you load your data with pd.json_normalize:
df = pd.json_normalize(data) # assuming data is a python object
# if a string, use json.loads
# import json
# df = pd.json_normalize(json.loads(data))
This will directly expand the dictionary keys as new columns (address.street/address.city/address.country):
id name age email isStudent hobbies address.street address.city address.country
0 1 John Doe 30 [email protected] False [reading, gaming, hiking] 123 Main St Anytown USA
1 2 Jane Smith 25 [email protected] True [painting, yoga, photography] 456 Oak Ave Somewhere Canada
2 3 Bob Johnson 42 [email protected] False [cooking, fishing, gardening] 789 Pine Rd Otherville UK
3 4 Alice Chen 28 [email protected] False [coding, chess, traveling] 321 Maple Blvd Techcity USA
4 5 David Wilson 19 [email protected] True [basketball, music, movies] 654 Cedar Ln University Town Australia
An alternative might be to use the str accessor (warning, this "hack" might change in the future*):
df = pd.read_json(data)
df.address.str['street']
Output:
0 123 Main St
1 456 Oak Ave
2 789 Pine Rd
3 321 Maple Blvd
4 654 Cedar Ln
Name: address, dtype: object
For example:
out = df[df.address.str['street'].str.contains('Ave')]
Output:
id name age email isStudent hobbies address
1 2 Jane Smith 25 [email protected] True [painting, yoga, photography] {'street': '456 Oak Ave', 'city': 'Somewhere', 'country': 'Canada'}
Comments
I'd suggest two paths:
- Preprocess the json before coming into Pandas
- Use a library dedicated to ragged/non-rectanglular data, in this case akimbo
In [64]: import akimbo.pandas
In [65]: df.hobbies.ak
Out[65]: <Array [['reading', 'gaming', 'hiking'], ..., [...]] type='5 * var * ?string'>
In [66]: df.hobbies.ak[1]
Out[66]:
0 painting
1 yoga
2 photography
dtype: string[pyarrow]
In [67]: df.address.ak['street']
Out[67]:
0 123 Main St
1 456 Oak Ave
2 789 Pine Rd
3 321 Maple Blvd
4 654 Cedar Ln
dtype: string[pyarrow]
# match a substring
In [71]: df.address.ak['street'].ak.str.match_substring('Ave')
Out[71]:
0 False
1 True
2 False
3 False
4 False
dtype: bool[pyarrow]
In [72]: df.loc[df.address.ak['street'].ak.str.match_substring('Ave')]
Out[72]:
id name age email isStudent hobbies address
1 2 Jane Smith 25 [email protected] True [painting, yoga, photography] {'street': '456 Oak Ave', 'city': 'Somewhere',...
Have a look at the docs for various ways of using the library. It is quite powerful. It is built on awkward-arrays so you have access to a plethora of methods, via the .ak attribute.
4 Comments
str changes.The Pandas community generally steers away from the use of apply/ lambda, but this is actually quite simple.
It's also more flexible than arbitrary explosion of the columns e.g. hobby_1, hobby_2, etc. because I can have lists of different lengths without sparsity. I can add/remove address fields as needed e.g. zip+4 or delivery_location.
df = pd.read_json("file.json")
df[
# Working with lists
df['hobbies'].apply(lambda x: 'gaming' in x) &
# Working with dicts
df['address'].apply(lambda x: x['city'] == 'Anytown')
]
Result
id name age email isStudent hobbies address
0 1 John Doe 30 [email protected] False [reading, gaming, hiking] {'street': '123 Main St', 'city': 'Anytown', '...
2 Comments
.apply() is generally avoided because it's slow on large datasets (since it operates at the Python layer instead of taking advantage of faster code at the C layer). If your datasets are small, have at it!df.pipe(lambda) is perfectly idiomatic in the right context.