Newest 'apache-iceberg' Questions

1. Home
2. Questions
3. AI Assist
4. Tags
5. Challenges
6. Chat
7. Articles
8. Users
9. Companies
11. Communities for your favorite technologies. Explore all Collectives
Stack Internal

Stack Overflow for Teams is now called Stack Internal. Bring the best of human thought and AI automation together at your work.
Try for free Learn more
Bring the best of human thought and AI automation together at your work. Learn more

331 questions

1 vote

1 answer

30 views

How to keep iceberg metadata.json size in check

The metadata JSON file contains the schema for all snapshots. I have a few tables with thousands of columns, and the metadata JSON quickly grows to 1 GB, which impacts the Trino coordinator. I have to ...

apache-iceberg

Dev's user avatar

Dev

13.8k

asked Dec 20, 2025 at 7:06

0 votes

1 answer

49 views

Iceberg field-id values - can I specify my own when creating a table?

I'm using AWS Glue Data Catalog to store Apache Iceberg tables. I use the Iceberg Java SDK to define the tables there. When I create an Iceberg table, I provide field-id values associated with each ...

pedorro's user avatar

pedorro

3,419

asked Dec 16, 2025 at 21:24

0 votes

0 answers

62 views

Flink: Could not initialize class org.apache.flink.runtime.util.HadoopUtils (Error when creating catalog with iceberg type in Flink)

I am trying to run a very simple Flink (Java) job that: Creates an Iceberg JDBC catalog backed by PostgreSQL Sets the Iceberg warehouse to the Hadoop FileSystem The job is built successfully with ...

Tai Lu's user avatar

Tai Lu

asked Dec 11, 2025 at 4:27

Advice

4 votes

1 replies

78 views

Parquet VS ORC In Iceberg

Hi I have been interested lately in learning iceberg. There is something was not able to get so I thought I would ask here. I really wanna know why is Apache parquet the native file format used when ...

katz daniel's user avatar

katz daniel

asked Nov 24, 2025 at 15:00

0 votes

0 answers

106 views

Timestamp precision (3) not supported for Iceberg. Use "timestamp(6)"

I am trying to create a incremental DBT model. The output is a Iceberg format based lakehouse which I am doing CRUD using Trino. {{ config( materialized='incremental', incremental_strategy='...

Kumar Sambhav's user avatar

Kumar Sambhav

7,775

asked Nov 18, 2025 at 10:30

Advice

0 votes

0 replies

91 views

Flink Iceberg job loses authentication with REST catalog (Keycloak OAuth2) after short time

I’m running a Flink DataStream job that reads events from a Kafka topic and writes them into an Apache Iceberg table using the REST catalog (Lakekeeper). Authentication to the REST catalog is ...

Andrey's user avatar

Andrey

asked Nov 10, 2025 at 7:37

0 votes

0 answers

102 views

How to do bucket logic in partition for Iceberg Table using AWS Glue?

# ===================================================== # 🧊 Step 4. Write Data to Iceberg Table (Glue Catalog) # ===================================================== table_name = "glue_catalog....

Mohammed Suhail's user avatar

Mohammed Suhail

asked Nov 4, 2025 at 17:35

0 votes

0 answers

88 views

How to Check if a Query Touches Data Files or just Uses Manifests and Metadata in Iceberg

I created a table as follows: CREATE TABLE IF NOT EXISTS raw_data.civ ( date timestamp, marketplace_id int, ... some more columns ) USING ICEBERG PARTITIONED BY ( marketplace_id, ...

shiva's user avatar

shiva

2,781

asked Oct 25, 2025 at 15:11

1 vote

2 answers

115 views

Spark OutOfMemoryError when reading large JSON file (3.5GB) as wholeText due to colon in path

I’m trying to load JSON data into an Iceberg table. The source files are named with timestamps that include colons (:), so I need to read them as plain text first. Additionally, each file is in a ...

Raj Mhatre's user avatar

Raj Mhatre

asked Oct 25, 2025 at 4:56

0 votes

0 answers

100 views

Unexpected Write Behavior when using MERGE INTO/INSERT INTO Iceberg Spark Queries

I am observing different write behaviors when executing queries on EMR Notebook (correct behavior) vs when using spark-submit to submit a spark application to EMR Cluster (incorrect behavior). When I ...

shiva's user avatar

shiva

2,781

asked Oct 21, 2025 at 20:58

0 votes

1 answer

109 views

Snowflake Iceberg Immutable data getting rewritten (which violates snapshot level isolation guarantee)

There appears to be a bug in Snowflakes implementation Iceberg Tables which is evident if you try to capture change data. This seems to be a result of a more fundamental problem where a single parquet ...

Sumeet Keswani's user avatar

Sumeet Keswani

asked Oct 17, 2025 at 16:30

0 votes

0 answers

63 views

Getting error while deploying IcebergSink connector

I am trying to deploy icebergSinkConnector. but while connector trying to write data getting error Error: 'write.object-storage.path' has been depricated and will be removed in 2.0, use 'write.data....

Krishna Mane's user avatar

Krishna Mane

asked Oct 15, 2025 at 5:22

0 votes

1 answer

205 views

Unable to Create Tables in Iceberg via Trino with MinIO and Iceberg REST Catalog

I'm setting up a data lake using the following stack, using docker compose. MinIO + IceBerg Rest + Trino + Superset. docker-compose.yml version: "3.9" services: # ---------------- MinIO --...

Jisson's user avatar

Jisson

3,735

asked Sep 17, 2025 at 6:25

0 votes

0 answers

69 views

How do you expire snapshot from Iceberg Glue Table

I have one Iceberg table in Glue Catalog. I am unable to runw a select * as one of metadata file is missing. I am trying to point to latest metadata file. How can I do that? I am using EMR 7.7 with ...

user3858193's user avatar

user3858193

1,568

asked Aug 18, 2025 at 16:41

0 votes

1 answer

99 views

Unable to run trino for iceberg table: Invalid value 'hadoop' for type CatalogType

I have generated iceberg table with spark java program. Now I want to access it via trino. My docker compose is: version: '3.8' services: trino: image: trinodb/trino:latest container_name: ...

Mandroid's user avatar

Mandroid

7,818

asked Aug 17, 2025 at 10:08

15 30 50 per page

2 3 4 5

...

23 Next

CollectivesTM on Stack Overflow

How to keep iceberg metadata.json size in check

Iceberg field-id values - can I specify my own when creating a table?

Flink: Could not initialize class org.apache.flink.runtime.util.HadoopUtils (Error when creating catalog with iceberg type in Flink)

Parquet VS ORC In Iceberg

Timestamp precision (3) not supported for Iceberg. Use "timestamp(6)"

Flink Iceberg job loses authentication with REST catalog (Keycloak OAuth2) after short time

How to do bucket logic in partition for Iceberg Table using AWS Glue?

How to Check if a Query Touches Data Files or just Uses Manifests and Metadata in Iceberg

Spark OutOfMemoryError when reading large JSON file (3.5GB) as wholeText due to colon in path

Unexpected Write Behavior when using MERGE INTO/INSERT INTO Iceberg Spark Queries

Snowflake Iceberg Immutable data getting rewritten (which violates snapshot level isolation guarantee)

Getting error while deploying IcebergSink connector

Unable to Create Tables in Iceberg via Trino with MinIO and Iceberg REST Catalog

How do you expire snapshot from Iceberg Glue Table

Unable to run trino for iceberg table: Invalid value 'hadoop' for type CatalogType

Hot Network Questions