10 questions
- Bountied 0
- Unanswered
- Frequent
- Score
- Trending
- Week
- Month
- Unanswered (my tags)
0
votes
0
answers
102
views
How to do bucket logic in partition for Iceberg Table using AWS Glue?
# =====================================================
# π§ Step 4. Write Data to Iceberg Table (Glue Catalog)
# =====================================================
table_name = "glue_catalog....
0
votes
0
answers
79
views
Pyiceberg - S3tables and boto3 session
In Pyiceberg 0.10.0, it is now possible to use a botocore session for a rest catalog, so:
import io
import os
import pandas as pd
import pyarrow as pa
from boto3 import Session
from pyiceberg....
0
votes
1
answer
274
views
PyIceberg append fails with "Signer set, but token is not available"
I'm working on writing data to an Iceberg table using PyIceberg (0.6.0+) with a Ceph S3-compatible backend, via Lakekeeper (https://github.com/lakekeeper/lakekeeper) as my REST catalog and metadata ...
0
votes
0
answers
21
views
How to specify the AWS Credentials seperately for the S3 Table Rest Connect?
We are planning to use S3 table for storing our clients' data but we want to have RBAC-based feature so that we can make sure that data is access based on the permission. We are planning to create ...
0
votes
0
answers
170
views
Cannot access iceberg table metadata on S3 compatible "MINIO" server from Nessie
I can write and read aniceberg successfully using pyspark, Nessie and MINIO:
#!/usr/bin/python3.9
from pyspark.sql import SparkSession
iceberg_spark_jar = '/AKE/iceberg-spark-runtime-3.5_2.12-1.9.0....
1
vote
1
answer
107
views
Hive metastore cannot find Postgres suitable driver in classpath when dropping Apache Iceberg table
I have Hadoop 3.3.6 and HIVE 4.0.0 downloaded, extracted, configured, up and running.
I have postgresql-42.7.4.jar downloaded to $HIVE_HOME/lib.
I can connect to postgres14 successfully using "...
0
votes
0
answers
89
views
Apache Iceberg table partitioning based on ID
Can I partition iceberg table in ID ranging in millions? Or Bucketing is the best option?
Am pushing 40- 50 million records from sql which has ID identity column using pyflink. And then I want to ...
0
votes
0
answers
40
views
pyiceberg-s3fs: can't set custom config_kwargs
When creating an instance of S3FileSystem class, you can provide the config_kwargs dictionary to set further properties (like region or signature_version).
The pyiceberg FileIO implementation is based ...
1
vote
0
answers
67
views
PyIceberg with AWS Glue Creates Unwanted Nested Directories in S3 Tables
I'm using PyIceberg with AWS Glue REST catalog to insert data into an Iceberg table stored in S3. The data insertion works fine, but I noticed that PyIceberg creates unwanted nested directories in S3 ...
1
vote
0
answers
145
views
Writing map types with pyiceberg
I'm not sure if this is a bug or I'm just not structuring the data correctly, I couldn't find any examples for writing maps.
Given a table with a simple schema with a map field
from pyiceberg.schema ...