331 questions
- Bountied 0
- Unanswered
- Frequent
- Score
- Trending
- Week
- Month
- Unanswered (my tags)
1
vote
1
answer
30
views
How to keep iceberg metadata.json size in check
The metadata JSON file contains the schema for all snapshots. I have a few tables with thousands of columns, and the metadata JSON quickly grows to 1 GB, which impacts the Trino coordinator. I have to ...
0
votes
1
answer
49
views
Iceberg field-id values - can I specify my own when creating a table?
I'm using AWS Glue Data Catalog to store Apache Iceberg tables. I use the Iceberg Java SDK to define the tables there. When I create an Iceberg table, I provide field-id values associated with each ...
0
votes
0
answers
62
views
Flink: Could not initialize class org.apache.flink.runtime.util.HadoopUtils (Error when creating catalog with iceberg type in Flink)
I am trying to run a very simple Flink (Java) job that:
Creates an Iceberg JDBC catalog backed by PostgreSQL
Sets the Iceberg warehouse to the Hadoop FileSystem
The job is built successfully with ...
Advice
4
votes
1
replies
78
views
Parquet VS ORC In Iceberg
Hi I have been interested lately in learning iceberg. There is something was not able to get so I thought I would ask here.
I really wanna know why is Apache parquet the native file format used when ...
0
votes
0
answers
106
views
Timestamp precision (3) not supported for Iceberg. Use "timestamp(6)"
I am trying to create a incremental DBT model. The output is a Iceberg format based lakehouse which I am doing CRUD using Trino.
{{
config(
materialized='incremental',
incremental_strategy='...
Advice
0
votes
0
replies
91
views
Flink Iceberg job loses authentication with REST catalog (Keycloak OAuth2) after short time
Iβm running a Flink DataStream job that reads events from a Kafka topic and writes them into an Apache Iceberg table using the REST catalog (Lakekeeper).
Authentication to the REST catalog is ...
0
votes
0
answers
102
views
How to do bucket logic in partition for Iceberg Table using AWS Glue?
# =====================================================
# π§ Step 4. Write Data to Iceberg Table (Glue Catalog)
# =====================================================
table_name = "glue_catalog....
0
votes
0
answers
88
views
How to Check if a Query Touches Data Files or just Uses Manifests and Metadata in Iceberg
I created a table as follows:
CREATE TABLE IF NOT EXISTS raw_data.civ (
date timestamp,
marketplace_id int,
... some more columns
)
USING ICEBERG
PARTITIONED BY (
marketplace_id,
...
1
vote
2
answers
115
views
Spark OutOfMemoryError when reading large JSON file (3.5GB) as wholeText due to colon in path
Iβm trying to load JSON data into an Iceberg table. The source files are named with timestamps that include colons (:), so I need to read them as plain text first. Additionally, each file is in a ...
0
votes
0
answers
100
views
Unexpected Write Behavior when using MERGE INTO/INSERT INTO Iceberg Spark Queries
I am observing different write behaviors when executing queries on EMR Notebook (correct behavior) vs when using spark-submit to submit a spark application to EMR Cluster (incorrect behavior).
When I ...
0
votes
1
answer
109
views
Snowflake Iceberg Immutable data getting rewritten (which violates snapshot level isolation guarantee)
There appears to be a bug in Snowflakes implementation Iceberg Tables which is evident if you try to capture change data. This seems to be a result of a more fundamental problem where a single parquet ...
0
votes
0
answers
63
views
Getting error while deploying IcebergSink connector
I am trying to deploy icebergSinkConnector. but while connector trying to write data getting error
Error: 'write.object-storage.path' has been depricated and will be removed in 2.0, use 'write.data....
0
votes
1
answer
205
views
Unable to Create Tables in Iceberg via Trino with MinIO and Iceberg REST Catalog
I'm setting up a data lake using the following stack, using docker compose.
MinIO + IceBerg Rest + Trino + Superset.
docker-compose.yml
version: "3.9"
services:
# ---------------- MinIO --...
0
votes
0
answers
69
views
How do you expire snapshot from Iceberg Glue Table
I have one Iceberg table in Glue Catalog. I am unable to runw a select * as one of metadata file is missing. I am trying to point to latest metadata file. How can I do that? I am using EMR 7.7 with ...
0
votes
1
answer
99
views
Unable to run trino for iceberg table: Invalid value 'hadoop' for type CatalogType
I have generated iceberg table with spark java program. Now I want to access it via trino.
My docker compose is:
version: '3.8'
services:
trino:
image: trinodb/trino:latest
container_name: ...