The metadata JSON file contains the schema for all snapshots. I have a few tables with thousands of columns, and the metadata JSON quickly grows to 1 GB, which impacts the Trino coordinator. I have to manually remove the schema for older snapshots.
I already run maintenance tasks (via spark) to expire snapshots, but this does not clean the schemas of older snapshots from the latest metadata.json file.
How can this be fixed?
1 Answer 1
clean_expired_metadata was added to the expire_snapshot procedure in Iceberg 1.10.0.
When true, cleans up metadata such as partition specs and schemas that are no longer referenced by snapshots.
Example:
CALL {catalog}.system.expire_snapshots(table => '{table_name}', clean_expired_metadata => true)