-
-
Notifications
You must be signed in to change notification settings - Fork 18.9k
Open
@jorisvandenbossche
Description
This might eventually be something to support in PyArrow, but I think it is good to have an issue about this on the pandas side as well (and I was surprised to not find an existing one).
When trying the write the result of pd.cut()
, i.e. which returns a column of categorical dtype with Interval categories, you get the following error:
>>> df = pd.DataFrame({"col": np.random.randn(100)})
>>> df["bins"] = pd.cut(df["col"], bins=10)
>>> df.to_parquet("test_category_interval.parquet")
...
File ~/conda/envs/dev/lib/python3.11/site-packages/pyarrow/parquet/core.py:1115, in ParquetWriter.write_table(self, table, row_group_size)
1110 msg = ('Table schema does not match schema used to create file: '
1111 '\ntable:\n{!s} vs. \nfile:\n{!s}'
1112 .format(table.schema, self.schema))
1113 raise ValueError(msg)
-> 1115 self.writer.write_table(table, row_group_size=row_group_size)
File ~/conda/envs/dev/lib/python3.11/site-packages/pyarrow/_parquet.pyx:2226, in pyarrow._parquet.ParquetWriter.write_table()
File ~/conda/envs/dev/lib/python3.11/site-packages/pyarrow/error.pxi:92, in pyarrow.lib.check_status()
ArrowNotImplementedError: Unsupported cast from dictionary<values=extension<pandas.interval<ArrowIntervalType>>, indices=int8, ordered=1> to struct using function cast_struct