Pyiceberg - S3tables and boto3 session

Asked 2 months ago

Viewed 79 times

Part of AWS Collective

In Pyiceberg 0.10.0, it is now possible to use a botocore session for a rest catalog, so:

import io
import os
import pandas as pd
import pyarrow as pa
from boto3 import Session
from pyiceberg.catalog import load_catalog
boto3_session = Session(profile_name='a_profile', region_name='us-east-1')
catalog = load_catalog(
 "catalog",
 type="rest",
 botocore_session=boto3_session._session,
 warehouse="arn:aws:s3tables:us-east-1:XXXXXXXXXXX:bucket/a_bucket",
 uri=f"https://s3tables.us-east-1.amazonaws.com/iceberg",
 **{
 "rest.sigv4-enabled": "true",
 "rest.signing-name": "s3tables",
 "rest.signing-region": "us-east-1"
 })
table = catalog.load_table("namespace.a_table")
json_string = "[{\"data\":\"000000000000\", ...}]"
df = pd.read_json(io.StringIO(json_string), orient='records')
arrow_table = pa.Table.from_pandas(df=df, schema=table.schema().as_arrow())
table.overwrite(arrow_table)

It works until we overwrite:

OSError: When reading information for key 'metadata/snap-6778585584222594295-0-3ae9518f-fd1c-488f-b3d2-4ca1724317a1.avro' in bucket '2c8e7acb-67a1-4dc9-8ym9eg38966b8bazzfjn487w5o9wruse1b--table-s3': AWS Error UNKNOWN (HTTP status 400) during HeadObject operation: No response body.

To "fix" it, we can do:

boto3_session = Session(profile_name='a_profile', region_name='us-east-1')
catalog = load_catalog(
 "catalog",
 type="rest",
 botocore_session=boto3_session._session,
 warehouse="arn:aws:s3tables:us-east-1:XXXXXXXXXXX:bucket/a_bucket",
 uri=f"https://s3tables.us-east-1.amazonaws.com/iceberg",
 **{
 "rest.sigv4-enabled": "true",
 "rest.signing-name": "s3tables",
 "rest.signing-region": "us-east-1"
 })
table = catalog.load_table("namespace.a_table")
json_string = "[{\"data\":\"000000000000\", ...}]"
df = pd.read_json(io.StringIO(json_string), orient='records')
arrow_table = pa.Table.from_pandas(df=df, schema=table.schema().as_arrow())
credentials = boto3_session.get_credentials().get_frozen_credentials()
os.environ["AWS_ACCESS_KEY_ID"] = credentials.access_key
os.environ["AWS_SECRET_ACCESS_KEY"] = credentials.secret_key
if credentials.token:
 os.environ["AWS_SESSION_TOKEN"] = credentials.token
table.overwrite(arrow_table)

which works but defeats the purpose.

We can access .schema() and such. So it seems the overwrite method is not using the proper SigV4Adapter (pyiceberg/catalog/rest/init.py).

I am not able to fix it. I'd like to not need the environments to access iceberg tables from s3tables buckets.

Improve this question

asked Oct 22, 2025 at 21:52

Flo's user avatar

Flo

4856 silver badges24 bronze badges

Did you manage to solve the issue? I have the same issue trying to use refreshable credentials. Somehow pyiceberg needs the credentials (key/token) to work properly - related issue: using botocore

asyraf
– asyraf

2025年12月09日 08:41:23 +00:00
Commented Dec 9, 2025 at 8:41
No, it's still an issue: github.com/apache/iceberg-python/issues/2657

Flo
– Flo

2025年12月09日 13:53:19 +00:00
Commented Dec 9, 2025 at 13:53

Add a comment |

0

Sorted by: Reset to default

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Draft saved

Draft discarded

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

CollectivesTM on Stack Overflow

Pyiceberg - S3tables and boto3 session

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions