5

How can I query columns that are lists or dicts? Here is some basic JSON-like data.

[
 {
 "id": 1,
 "name": "John Doe",
 "age": 30,
 "email": "[email protected]",
 "isStudent": false,
 "hobbies": ["reading", "gaming", "hiking"],
 "address": {
 "street": "123 Main St",
 "city": "Anytown",
 "country": "USA"
 }
 },
 {
 "id": 2,
 "name": "Jane Smith",
 "age": 25,
 "email": "[email protected]",
 "isStudent": true,
 "hobbies": ["painting", "yoga", "photography"],
 "address": {
 "street": "456 Oak Ave",
 "city": "Somewhere",
 "country": "Canada"
 }
 },
 {
 "id": 3,
 "name": "Bob Johnson",
 "age": 42,
 "email": "[email protected]",
 "isStudent": false,
 "hobbies": ["cooking", "fishing", "gardening"],
 "address": {
 "street": "789 Pine Rd",
 "city": "Otherville",
 "country": "UK"
 }
 },
 {
 "id": 4,
 "name": "Alice Chen",
 "age": 28,
 "email": "[email protected]",
 "isStudent": false,
 "hobbies": ["coding", "chess", "traveling"],
 "address": {
 "street": "321 Maple Blvd",
 "city": "Techcity",
 "country": "USA"
 }
 },
 {
 "id": 5,
 "name": "David Wilson",
 "age": 19,
 "email": "[email protected]",
 "isStudent": true,
 "hobbies": ["basketball", "music", "movies"],
 "address": {
 "street": "654 Cedar Ln",
 "city": "University Town",
 "country": "Australia"
 }
 }
]
df = pd.read_json("file.json")

For example, I want to find anyone who put gaming as a hobby and whose city is Anytown.

My initial instinct was to either explode the list indices and dict items into new columns, or to read the value of the cell as a string for searching.

jqurious
24.4k6 gold badges24 silver badges44 bronze badges
asked Dec 16, 2025 at 3:21
3
  • 1
    Please add your Pandas code for reference. I ask because this example doesn't contain any columns per se. Commented Dec 16, 2025 at 15:29
  • Could you also add an example query, one for hobbies and one for address? Conceptually, there's a big difference between searching for people who put cooking as a hobby vs people who put cooking as the first choice out of three. Commented Dec 16, 2025 at 15:34
  • 1
    Ope, I just noticed you put those details in your answer, so I went ahead and added them to the question for you :) Commented Dec 16, 2025 at 15:45

3 Answers 3

4

Why don't you load your data with pd.json_normalize:

df = pd.json_normalize(data) # assuming data is a python object
# if a string, use json.loads
# import json
# df = pd.json_normalize(json.loads(data))

This will directly expand the dictionary keys as new columns (address.street/address.city/address.country):

 id name age email isStudent hobbies address.street address.city address.country
0 1 John Doe 30 [email protected] False [reading, gaming, hiking] 123 Main St Anytown USA
1 2 Jane Smith 25 [email protected] True [painting, yoga, photography] 456 Oak Ave Somewhere Canada
2 3 Bob Johnson 42 [email protected] False [cooking, fishing, gardening] 789 Pine Rd Otherville UK
3 4 Alice Chen 28 [email protected] False [coding, chess, traveling] 321 Maple Blvd Techcity USA
4 5 David Wilson 19 [email protected] True [basketball, music, movies] 654 Cedar Ln University Town Australia

An alternative might be to use the str accessor (warning, this "hack" might change in the future*):

df = pd.read_json(data)
df.address.str['street']

Output:

0 123 Main St
1 456 Oak Ave
2 789 Pine Rd
3 321 Maple Blvd
4 654 Cedar Ln
Name: address, dtype: object

For example:

out = df[df.address.str['street'].str.contains('Ave')]

Output:

 id name age email isStudent hobbies address
1 2 Jane Smith 25 [email protected] True [painting, yoga, photography] {'street': '456 Oak Ave', 'city': 'Somewhere', 'country': 'Canada'}

* "Generally speaking, the .str accessor is intended to work only on strings. With very few exceptions, other uses are not supported, and may be disabled at a later point."

wjandrea
34.1k10 gold badges69 silver badges107 bronze badges
answered Dec 16, 2025 at 10:41
Sign up to request clarification or add additional context in comments.

Comments

2

I'd suggest two paths:

  • Preprocess the json before coming into Pandas
  • Use a library dedicated to ragged/non-rectanglular data, in this case akimbo
In [64]: import akimbo.pandas
In [65]: df.hobbies.ak
Out[65]: <Array [['reading', 'gaming', 'hiking'], ..., [...]] type='5 * var * ?string'>
In [66]: df.hobbies.ak[1]
Out[66]:
0 painting
1 yoga
2 photography
dtype: string[pyarrow]
In [67]: df.address.ak['street']
Out[67]:
0 123 Main St
1 456 Oak Ave
2 789 Pine Rd
3 321 Maple Blvd
4 654 Cedar Ln
dtype: string[pyarrow]
# match a substring
In [71]: df.address.ak['street'].ak.str.match_substring('Ave')
Out[71]:
0 False
1 True
2 False
3 False
4 False
dtype: bool[pyarrow]
In [72]: df.loc[df.address.ak['street'].ak.str.match_substring('Ave')]
Out[72]:
 id name age email isStudent hobbies address
1 2 Jane Smith 25 [email protected] True [painting, yoga, photography] {'street': '456 Oak Ave', 'city': 'Somewhere',...

Have a look at the docs for various ways of using the library. It is quite powerful. It is built on awkward-arrays so you have access to a plethora of methods, via the .ak attribute.

wjandrea
34.1k10 gold badges69 silver badges107 bronze badges
answered Dec 16, 2025 at 4:51

4 Comments

Nice replacement if someday the behavior of str changes.
Thanks for sharing. How would you actually query it for specific values though?
You'll have to give some more details on what U mean by specific values
added an example for searching for a value, if that is what u have in mind
1

The Pandas community generally steers away from the use of apply/ lambda, but this is actually quite simple.

It's also more flexible than arbitrary explosion of the columns e.g. hobby_1, hobby_2, etc. because I can have lists of different lengths without sparsity. I can add/remove address fields as needed e.g. zip+4 or delivery_location.

df = pd.read_json("file.json")
df[
 # Working with lists
 df['hobbies'].apply(lambda x: 'gaming' in x) &
 # Working with dicts
 df['address'].apply(lambda x: x['city'] == 'Anytown')
]

Result

 id name age email isStudent hobbies address
0 1 John Doe 30 [email protected] False [reading, gaming, hiking] {'street': '123 Main St', 'city': 'Anytown', '...
wjandrea
34.1k10 gold badges69 silver badges107 bronze badges
answered Dec 16, 2025 at 3:21

2 Comments

.apply() is generally avoided because it's slow on large datasets (since it operates at the Python layer instead of taking advantage of faster code at the C layer). If your datasets are small, have at it!
"generally steers away from the use of apply/ lambda" - Maybe this is a minor thing, but I'm not sure why you say lambda here; I haven't seen anyone steer away from it in my experience, e.g. df.pipe(lambda) is perfectly idiomatic in the right context.

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.