1

Trying to load a nested JSON file into Duckdb.
Here is the JSON contents:

{
 "MainLevel":[
 {
 "More":{
 }
 },
 {
 "More":{
 "Level2":[
 {
 "Field1":"A"
 }
 ]
 }
 }
 ]
}

This is the Duckdb SQL script I am using to load:

CREATE TABLE duckdbtest1.main.nested_JSON AS
SELECT 
 Level2.Field1,
FROM 
 (SELECT unnest(MainLevel) as MainLevel
 FROM read_JSON_auto('C:\\jsonfiles\\*.json', maximum_object_size = 999999999))
 as MainLevel,
 unnest(MainLevel.More.Level2) as Level2;

The error I get is:

SQL Error: java.sql.SQLException: Binder Error: Table "Level2" does not have a column named "Field1"
LINE 3: Level2.Field1,

I have done similar things in the past with no issues.

The only thing I can think of is that the first "More" has no "Level2".

Could that be causing an issue? Ideas of how to get this to work?

Have tried various combinations of unnest and left joins on unnest with no success.

Here is my hacky trial-and-error (I obviously have no idea what I'm doing):

1)

SELECT 
 *
FROM 
 (SELECT unnest(MainLevel) as MainLevel
 FROM read_JSON_auto('C:\\jsonfiles\\*.json', maximum_object_size = 999999999))
 as MainLevel,
 unnest(MainLevel.More.Level2) as Level2;

Gives:

MainLevel unnest
{More={Level2=[{Field1=A}]}} {Field1=A}

2)

SELECT 
 Mainlevel
FROM 
 (SELECT unnest(MainLevel) as MainLevel
 FROM read_JSON_auto('C:\\jsonfiles\\*.json', maximum_object_size = 999999999))
 as MainLevel,
 unnest(MainLevel.More.Level2) as Level2;

Gives:

MainLevel
{More={Level2=[{Field1=A}]}}

3)

SELECT 
 Mainlevel
FROM 
 (SELECT unnest(MainLevel) as MainLevel
 FROM read_JSON_auto('C:\\jsonfiles\\*.json', maximum_object_size = 999999999))
 as MainLevel;
 

Gives:

MainLevel
{More={Level2=null}}
{More={Level2=[{Field1=A}]}}

4)

SELECT 
 Mainlevel
FROM 
 (SELECT unnest(MainLevel, recursive := true) as MainLevel
 FROM read_JSON_auto('C:\\jsonfiles\\*.json', maximum_object_size = 999999999))
 as MainLevel;
 

Gives:

Mainlevel
{Level2=null}
{Level2=[{Field1=A}]}
peak
119k21 gold badges185 silver badges218 bronze badges
asked Sep 27, 2024 at 22:43
6
  • It works on older versions. If you remove the CREATE TABLE / SELECT Level2 and see the full output - you can see the 2nd unnest() call is ignoring the as Level2 name - which looks like a bug? Can you use unnest(..., recursive := true)? Commented Sep 27, 2024 at 23:23
  • Yeah, see attempt 1, the as Level2 is being ignored and it has a default "unnest(..)" name instead. This looks like a bug to me? Are you using a new version of DuckDB? 1.1.0? Commented Sep 28, 2024 at 0:29
  • It also means that SELECT "unnest".Field1 should work as it literally has the name unnest Commented Sep 28, 2024 at 0:31
  • Need to update to newest duckdb. Will try again and report back. Commented Sep 28, 2024 at 1:10
  • 1
    I'm using Python but your query works for me in 1.0 - this buggy behaviour happens in 1.1.0 and 1.1.1 (the nightly build) - so it seems to me that this is a regression bug that is yet to be reported to the DuckDB team. Commented Sep 28, 2024 at 1:23

2 Answers 2

0

There seems to be a behaviour change starting in DuckDB 1.1.0

duckdb.sql("""
from (
 from values ([{'More': {'Level2': [{'Field1': 'A'}]}}]) as tbl(MainLevel)
 select unnest(MainLevel) as MainLevel
), unnest(MainLevel.More.Level2) as FOOBAR
""") # ^^^^^^^^^ this was always "ignored" in all versions

Previously, it seems the unnest was inherting the name from the final identifier in the path.

DuckDB 1.0.0

┌──────────────────────────────────────────────────────┬────────────────────────┐
│ MainLevel │ Level2 │
│ struct(more struct(level2 struct(field1 varchar)[])) │ struct(field1 varchar) │
├──────────────────────────────────────────────────────┼────────────────────────┤
│ {'More': {'Level2': [{'Field1': A}]}} │ {'Field1': A} │
└──────────────────────────────────────────────────────┴────────────────────────┘

This made it seem like your as Level2 was doing something, but it was infact a "no-op".

DuckDB 1.1.0

┌──────────────────────────────────────────────────────┬────────────────────────┐
│ MainLevel │ unnest │
│ struct(more struct(level2 struct(field1 varchar)[])) │ struct(field1 varchar) │
├──────────────────────────────────────────────────────┼────────────────────────┤
│ {'More': {'Level2': [{'Field1': A}]}} │ {'Field1': A} │
└──────────────────────────────────────────────────────┴────────────────────────┘

Now you end up with the literal name unnest (I'm not sure if this is intended or not?)

As for the structure of your query: from (subquery), unnest()

That would read to me as from (subquery) select ... in which case the as ... applies.

duckdb.sql("""
from (
 from values ([{'More': {'Level2': [{'Field1': 'A'}]}}]) as tbl(MainLevel)
 select unnest(MainLevel) as MainLevel
)
select unnest(MainLevel.More.Level2) as FOOBAR
""")
┌────────────────────────┐
│ FOOBAR │
│ struct(field1 varchar) │
├────────────────────────┤
│ {'Field1': A} │
└────────────────────────┘
answered Sep 28, 2024 at 11:41
Sign up to request clarification or add additional context in comments.

Comments

0

An unproblematic way to express the query would be:

SELECT Level2.Field1
FROM (SELECT unnest(MainLevel.More.Level2) as Level2
 FROM (SELECT unnest(MainLevel) as MainLevel,
 FROM read_JSON_auto('input.json') ));

The hierarchy of SELECT/FROM statements allows DuckDB to work its unnest magic in a straightforward way. In particular, since unnest(NULL) has the effect of zapping a row, by the time the SELECT Level2.Field1 statement starts executing, the row that would otherwise have caused it to fail has been removed.

By contrast, the statement having the form:

FROM 
 (SELECT unnest(MainLevel) as MainLevel
 FROM read_JSON_auto('input.json') as MainLevel,
 unnest(MainLevel.More.Level2) as Level2

seems to be at best dubious: notice that in the following, the result does not include any column labeled Level2:


D select ({'a':1}) as MainLevel, unnest(MainLevel) as Level2;
┌───────────────────┬───────┐
│ MainLevel │ a │
│ struct(a integer) │ int32 │
├───────────────────┼───────┤
│ {'a': 1} │ 1 │
└───────────────────┴───────┘
answered Sep 28, 2024 at 20:37

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.