Crashes while handling non-select result set (DataFrame) #428

Open

Labels

needs triage

Description

@joaoe

joaoe

opened

on Apr 15, 2026

What happens?

Hi.

Problem

As the result of a sparl.sql("non-select") where non-select is any SQL statement that is not a select, e.g., USE, INSERT, DROP, CREATE, ... the sql() function will correctly return an empty DataFrame, which is the behavior of the pyspark API.

However, that object crashes when using any of its APIs, because the internal relation object is None. The same applies when trying to create an empty DataFrame without columns. A

Fix

I think the best fix would require fixing the underlying c++ Relation object from the duckdb C++ library to support an empty relation without columns. There are also a couple other fixes like allowing the underlying duckdb.struct_type() to have no fields. That would make the low-level API more robust and require less patching in the python layer.

Then the DuckDBPyConnection::RunQuery function needs to return an empty relation for non-select statement, instead of nullptr. All these fixes felt a bit overwhelming so I won't submit a patch.

To Reproduce

Testcase. All this works with Spark.

@pytest.mark.parametrize("mode", ["pandas", "list", "non-select"])
def test_empty_sdf( spark_session_g, mode):
 from pyspark.sql import functions as f
 from pyspark.sql import types as t
 import pandas as pd
 spark = spark_session_g
 if mode =="pandas":
 sdf = spark.createDataFrame(pd.DataFrame(), t.StructType([]))
 elif mode == "list":
 sdf = spark.createDataFrame([], t.StructType([]))
 else:
 curr_db = spark.catalog.currentDatabase()
 sdf = spark.sql(f"USE {curr_db}") # non-result set query
 assert sdf.schema == t.StructType([])
 assert sdf.columns == []
 assert sdf.collect() == []
 assert sdf.toPandas().empty
 assert sdf.toArrow().shape == (0, 0)
 sdf.createOrReplaceTempView("my_vv1")
 assert spark.sql("SELECT * from my_vv1").toArrow().shape == (0, 0)
 sdf.show() # no-op, no crash
 assert sdf.withColumn("col1", f.lit(1)).columns == ["col1"]
 assert sdf.withColumns({"col1": f.lit(1)}).columns == ["col1"]
 assert sdf.drop("noop").columns == []

OS:

Any

DuckDB Package Version:

Main branch

Python Version:

3.12

Full Name:

João Eiras

Affiliation:

private

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

I have tested with a source build

Did you include all relevant data sets for reproducing the issue?

Yes

Did you include all code required to reproduce the issue?

Yes, I have

Did you include all relevant configuration to reproduce the issue?

Yes, I have

Metadata

Assignees

No one assigned

Labels

needs triage

Type

No type

Fields

Give feedback

No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Crashes while handling non-select result set (DataFrame) #428

Description

What happens?

Problem

Fix

To Reproduce

OS:

DuckDB Package Version:

Python Version:

Full Name:

Affiliation:

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

Did you include all relevant data sets for reproducing the issue?

Did you include all code required to reproduce the issue?

Did you include all relevant configuration to reproduce the issue?

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions