In Pyiceberg 0.10.0, it is now possible to use a botocore session for a rest catalog, so:
import io
import os
import pandas as pd
import pyarrow as pa
from boto3 import Session
from pyiceberg.catalog import load_catalog
boto3_session = Session(profile_name='a_profile', region_name='us-east-1')
catalog = load_catalog(
"catalog",
type="rest",
botocore_session=boto3_session._session,
warehouse="arn:aws:s3tables:us-east-1:XXXXXXXXXXX:bucket/a_bucket",
uri=f"https://s3tables.us-east-1.amazonaws.com/iceberg",
**{
"rest.sigv4-enabled": "true",
"rest.signing-name": "s3tables",
"rest.signing-region": "us-east-1"
})
table = catalog.load_table("namespace.a_table")
json_string = "[{\"data\":\"000000000000\", ...}]"
df = pd.read_json(io.StringIO(json_string), orient='records')
arrow_table = pa.Table.from_pandas(df=df, schema=table.schema().as_arrow())
table.overwrite(arrow_table)
It works until we overwrite:
OSError: When reading information for key 'metadata/snap-6778585584222594295-0-3ae9518f-fd1c-488f-b3d2-4ca1724317a1.avro' in bucket '2c8e7acb-67a1-4dc9-8ym9eg38966b8bazzfjn487w5o9wruse1b--table-s3': AWS Error UNKNOWN (HTTP status 400) during HeadObject operation: No response body.
To "fix" it, we can do:
boto3_session = Session(profile_name='a_profile', region_name='us-east-1')
catalog = load_catalog(
"catalog",
type="rest",
botocore_session=boto3_session._session,
warehouse="arn:aws:s3tables:us-east-1:XXXXXXXXXXX:bucket/a_bucket",
uri=f"https://s3tables.us-east-1.amazonaws.com/iceberg",
**{
"rest.sigv4-enabled": "true",
"rest.signing-name": "s3tables",
"rest.signing-region": "us-east-1"
})
table = catalog.load_table("namespace.a_table")
json_string = "[{\"data\":\"000000000000\", ...}]"
df = pd.read_json(io.StringIO(json_string), orient='records')
arrow_table = pa.Table.from_pandas(df=df, schema=table.schema().as_arrow())
credentials = boto3_session.get_credentials().get_frozen_credentials()
os.environ["AWS_ACCESS_KEY_ID"] = credentials.access_key
os.environ["AWS_SECRET_ACCESS_KEY"] = credentials.secret_key
if credentials.token:
os.environ["AWS_SESSION_TOKEN"] = credentials.token
table.overwrite(arrow_table)
which works but defeats the purpose.
We can access .schema() and such. So it seems the overwrite method is not using the proper SigV4Adapter (pyiceberg/catalog/rest/init.py).
I am not able to fix it. I'd like to not need the environments to access iceberg tables from s3tables buckets.
-
Did you manage to solve the issue? I have the same issue trying to use refreshable credentials. Somehow pyiceberg needs the credentials (key/token) to work properly - related issue: using botocoreasyraf– asyraf2025年12月09日 08:41:23 +00:00Commented Dec 9, 2025 at 8:41
-
No, it's still an issue: github.com/apache/iceberg-python/issues/2657Flo– Flo2025年12月09日 13:53:19 +00:00Commented Dec 9, 2025 at 13:53