Trying to load a nested JSON file into Duckdb.
Here is the JSON contents:
{
"MainLevel":[
{
"More":{
}
},
{
"More":{
"Level2":[
{
"Field1":"A"
}
]
}
}
]
}
This is the Duckdb SQL script I am using to load:
CREATE TABLE duckdbtest1.main.nested_JSON AS
SELECT
Level2.Field1,
FROM
(SELECT unnest(MainLevel) as MainLevel
FROM read_JSON_auto('C:\\jsonfiles\\*.json', maximum_object_size = 999999999))
as MainLevel,
unnest(MainLevel.More.Level2) as Level2;
The error I get is:
SQL Error: java.sql.SQLException: Binder Error: Table "Level2" does not have a column named "Field1"
LINE 3: Level2.Field1,
I have done similar things in the past with no issues.
The only thing I can think of is that the first "More" has no "Level2".
Could that be causing an issue? Ideas of how to get this to work?
Have tried various combinations of unnest and left joins on unnest with no success.
Here is my hacky trial-and-error (I obviously have no idea what I'm doing):
1)
SELECT
*
FROM
(SELECT unnest(MainLevel) as MainLevel
FROM read_JSON_auto('C:\\jsonfiles\\*.json', maximum_object_size = 999999999))
as MainLevel,
unnest(MainLevel.More.Level2) as Level2;
Gives:
| MainLevel | unnest |
|---|---|
| {More={Level2=[{Field1=A}]}} | {Field1=A} |
2)
SELECT
Mainlevel
FROM
(SELECT unnest(MainLevel) as MainLevel
FROM read_JSON_auto('C:\\jsonfiles\\*.json', maximum_object_size = 999999999))
as MainLevel,
unnest(MainLevel.More.Level2) as Level2;
Gives:
| MainLevel |
|---|
| {More={Level2=[{Field1=A}]}} |
3)
SELECT
Mainlevel
FROM
(SELECT unnest(MainLevel) as MainLevel
FROM read_JSON_auto('C:\\jsonfiles\\*.json', maximum_object_size = 999999999))
as MainLevel;
Gives:
| MainLevel |
|---|
| {More={Level2=null}} |
| {More={Level2=[{Field1=A}]}} |
4)
SELECT
Mainlevel
FROM
(SELECT unnest(MainLevel, recursive := true) as MainLevel
FROM read_JSON_auto('C:\\jsonfiles\\*.json', maximum_object_size = 999999999))
as MainLevel;
Gives:
| Mainlevel |
|---|
| {Level2=null} |
| {Level2=[{Field1=A}]} |
2 Answers 2
There seems to be a behaviour change starting in DuckDB 1.1.0
duckdb.sql("""
from (
from values ([{'More': {'Level2': [{'Field1': 'A'}]}}]) as tbl(MainLevel)
select unnest(MainLevel) as MainLevel
), unnest(MainLevel.More.Level2) as FOOBAR
""") # ^^^^^^^^^ this was always "ignored" in all versions
Previously, it seems the unnest was inherting the name from the final identifier in the path.
DuckDB 1.0.0
┌──────────────────────────────────────────────────────┬────────────────────────┐
│ MainLevel │ Level2 │
│ struct(more struct(level2 struct(field1 varchar)[])) │ struct(field1 varchar) │
├──────────────────────────────────────────────────────┼────────────────────────┤
│ {'More': {'Level2': [{'Field1': A}]}} │ {'Field1': A} │
└──────────────────────────────────────────────────────┴────────────────────────┘
This made it seem like your as Level2 was doing something, but it was infact a "no-op".
DuckDB 1.1.0
┌──────────────────────────────────────────────────────┬────────────────────────┐
│ MainLevel │ unnest │
│ struct(more struct(level2 struct(field1 varchar)[])) │ struct(field1 varchar) │
├──────────────────────────────────────────────────────┼────────────────────────┤
│ {'More': {'Level2': [{'Field1': A}]}} │ {'Field1': A} │
└──────────────────────────────────────────────────────┴────────────────────────┘
Now you end up with the literal name unnest (I'm not sure if this is intended or not?)
As for the structure of your query: from (subquery), unnest()
That would read to me as from (subquery) select ... in which case the as ... applies.
duckdb.sql("""
from (
from values ([{'More': {'Level2': [{'Field1': 'A'}]}}]) as tbl(MainLevel)
select unnest(MainLevel) as MainLevel
)
select unnest(MainLevel.More.Level2) as FOOBAR
""")
┌────────────────────────┐
│ FOOBAR │
│ struct(field1 varchar) │
├────────────────────────┤
│ {'Field1': A} │
└────────────────────────┘
Comments
An unproblematic way to express the query would be:
SELECT Level2.Field1
FROM (SELECT unnest(MainLevel.More.Level2) as Level2
FROM (SELECT unnest(MainLevel) as MainLevel,
FROM read_JSON_auto('input.json') ));
The hierarchy of SELECT/FROM statements allows DuckDB to work its unnest magic in a straightforward way. In particular, since unnest(NULL) has the effect of zapping a row, by the time the
SELECT Level2.Field1 statement starts executing, the row that would otherwise have caused it to fail has been removed.
By contrast, the statement having the form:
FROM
(SELECT unnest(MainLevel) as MainLevel
FROM read_JSON_auto('input.json') as MainLevel,
unnest(MainLevel.More.Level2) as Level2
seems to be at best dubious: notice that in the following, the result does not include any column labeled Level2:
D select ({'a':1}) as MainLevel, unnest(MainLevel) as Level2;
┌───────────────────┬───────┐
│ MainLevel │ a │
│ struct(a integer) │ int32 │
├───────────────────┼───────┤
│ {'a': 1} │ 1 │
└───────────────────┴───────┘
unnest()call is ignoring theas Level2name - which looks like a bug? Can you useunnest(..., recursive := true)?as Level2is being ignored and it has a default "unnest(..)" name instead. This looks like a bug to me? Are you using a new version of DuckDB? 1.1.0?SELECT "unnest".Field1should work as it literally has the nameunnest1.0- this buggy behaviour happens in1.1.0and1.1.1(the nightly build) - so it seems to me that this is a regression bug that is yet to be reported to the DuckDB team.