Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

adding support to use a kms key for s3 buckets data encryption (AWS only) #2802

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
fabio-rizzo-01 wants to merge 10 commits into apache:main
base: main
Choose a base branch
Loading
from fabio-rizzo-01:feature/s3-kms-key

Conversation

@fabio-rizzo-01
Copy link
Contributor

@fabio-rizzo-01 fabio-rizzo-01 commented Oct 13, 2025
edited
Loading

What changes were proposed in this pull request?

The addition of a kms key used to encrypt data in the s3 bucket

Why are the changes needed?

For anyone wish to use data encryption.

Does this PR introduce any user-facing change?

Yes, the addition of new param in the catalog creation : kms key arn

How was this patch tested?

Tested in AWS EKS

CHANGELOG.md

Update catalogs creation to include AWS kms key to be used in S3 data encryption

@jbonofre jbonofre self-requested a review October 13, 2025 14:27
Comment on lines 249 to 256
IamStatement.Builder allowKms =
IamStatement.builder()
.effect(IamEffect.ALLOW)
.addAction("kms:GenerateDataKeyWithoutPlaintext")
.addAction("kms:Encrypt")
.addAction("kms:DescribeKey")
.addAction("kms:Decrypt")
.addAction("kms:GenerateDataKey");
Copy link
Contributor

@collado-mike collado-mike Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add the kms encryption context to this statement?

singhpk234 reacted with thumbs up emoji
Copy link
Contributor Author

@fabio-rizzo-01 fabio-rizzo-01 Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you suggesting to allow the user to specify a new param during catalog creation for the encryption context condition?

Copy link
Contributor

@collado-mike collado-mike Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I mean for SSE, the encryption context is always the path to the file. In our IAM policy statement, we can specify that the key can be used only to en/decrypt files in the encrpytion context pattern we specify - e.g., see https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingKMSEncryption.html and https://docs.aws.amazon.com/kms/latest/developerguide/encrypt_context.html#encryption-context-authorization . The user doesn't need to specify this parameter - it's automatic.

Copy link
Contributor Author

@fabio-rizzo-01 fabio-rizzo-01 Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it is needed, the policy for the key and S3 bucket is restrictive enough and adding more restriction would cause the GenerateDataKey operation during the assume role to probably fail.

This is an example of a policy generated while creating a table:

IamPolicy(version=2012年10月17日, statements=[
IamStatement(effect=IamEffect(value=Allow), actions=[IamAction(value=s3:PutObject), IamAction(value=s3:DeleteObject)], resources=[IamResource(value=arn:aws:s3:::app-id-bucket/fns/t3/*)]), 
IamStatement(effect=IamEffect(value=Allow), actions=[IamAction(value=kms:GenerateDataKeyWithoutPlaintext), IamAction(value=kms:DescribeKey), IamAction(value=kms:Decrypt), IamAction(value=kms:GenerateDataKey), IamAction(value=kms:Encrypt)], resources=[IamResource(value=arn:aws:kms:us-east-1:*********:key/id)]), 
IamStatement(effect=IamEffect(value=Allow), actions=[IamAction(value=s3:ListBucket)], resources=[IamResource(value=arn:aws:s3:::app-id-bucket)], conditions=[IamCondition(operator=StringLike, key=s3:prefix, value=fns/t3/*)]), 
IamStatement(effect=IamEffect(value=Allow), actions=[IamAction(value=s3:GetBucketLocation)], resources=[IamResource(value=arn:aws:s3:::app-id-bucket)]), 
IamStatement(effect=IamEffect(value=Allow), actions=[IamAction(value=s3:GetObject), IamAction(value=s3:GetObjectVersion)], resources=[IamResource(value=arn:aws:s3:::app-id-bucket/fns/t3/*)])])

Also in https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingKMSEncryption.html#encryption-context
it says that the bucket path is added by default in the encryption context.

Checking the encryption context in CloudTrail for the GenerateDataKey request I can see:

"encryptionContext": {
"aws:s3:arn": "arn:aws:s3:::app-id-*****"
}

Comment on lines 253 to 255
.addAction("kms:Encrypt")
.addAction("kms:DescribeKey")
.addAction("kms:Decrypt")
Copy link
Contributor

@singhpk234 singhpk234 Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we give encrypt or decrypt based on the priviledge ? for example TABLE_READ should get decrypt and TABLE_READ_AND_WRITE should both encrypt and decrypt ?

example: "arn:aws:iam::123456789001:user/abc1-b-self1234"
kmsKeyArn:
type: string
description: the aws kms key arn used to encrypt s3 data
Copy link
Contributor

@singhpk234 singhpk234 Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by s3 data we mean data stored in s3 or iceberg data files ?
if its just data stored in s3, a part of iceberg metadata is generated by polaris itself, do we need to plumb the encryption to the fileIO polaris uses ?

if its just iceberg data files how are we making sure its not used against iceberg metadata files such as manifest lists ?

Copy link
Contributor Author

@fabio-rizzo-01 fabio-rizzo-01 Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the key is used by s3 to do encryption of any data in the bucket at rest, so as long as the IAM role has got the right entitlements on the key everything will work fine. I don't think anything needs to be changed on the polaris side.

Copy link
Contributor

@singhpk234 singhpk234 Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am talking of setting these properties in the FileIO which polaris uses - https://iceberg.apache.org/docs/nightly/aws/#s3-server-side-encryption or are you relying on these catalog properties seeping automatically to the FileIO

Copy link
Contributor Author

@fabio-rizzo-01 fabio-rizzo-01 Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow I might be missing something, how would you specify the kms key in Polaris without adding in the storage config, also we need to add the policy in there to be able to use right? thx

Copy link
Contributor

dimas-b commented Oct 14, 2025

What's the relationship of this PR and #1424 ?

singhpk234 reacted with thumbs up emoji

type: string
description: the aws user arn used to assume the aws role
example: "arn:aws:iam::123456789001:user/abc1-b-self1234"
kmsKeyArn:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By convention, adding REST API parameters needs to be discussed on the dev ML. Could you start a thread there and reference this PR?

Copy link
Contributor Author

@fabio-rizzo-01 fabio-rizzo-01 Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is already a thread for this PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My mistake... I did not see it 🤦

For ref: https://lists.apache.org/thread/to7lbdw1k6dym5c6gnk1nm32vqdw24qk

fabio-rizzo-01 reacted with thumbs up emoji
Copy link
Contributor Author

What's the relationship of this PR and #1424 ?

sorry I didn't see that one.

Copy link
Contributor

dimas-b commented Oct 14, 2025
edited
Loading

Re: #1424 - that PR appears to be dormant, so it's fine to offer an alternative. I was just trying to understand whether the feature is the same or not :)


String arnPrefix = arnPrefixForPartition(awsPartition);
String arnPrefix = arnPrefixForPartition(storageConfigurationInfo.getAwsPartition());
String kmsKeyArn = storageConfigurationInfo.getKmsKeyArn();
Copy link
Contributor

@dimas-b dimas-b Oct 14, 2025
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can / will this work with MinIO KMS? Cf. https://github.com/minio/minio/blob/master/docs/kms/README.md

Copy link
Contributor Author

@fabio-rizzo-01 fabio-rizzo-01 Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will need to test that.

Copy link
Contributor

@dimas-b dimas-b Oct 15, 2025
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine either way, but if it did not work with MinIO it would be good to mention AWS (only) in the PR description.

Also, I'm pretty sure we'll have to support KMS in MinIO at some point, so if current changes were reusable it would be great (but not required).

fabio-rizzo-01 reacted with thumbs up emoji
@fabio-rizzo-01 fabio-rizzo-01 changed the title (削除) adding support to use a kms key for s3 buckets data encryption (削除ここまで) (追記) adding support to use a kms key for s3 buckets data encryption for AWS (追記ここまで) Oct 21, 2025
Copy link
Contributor Author

Updated the PR with temporary changes, to see if this is the right direction. The build will fail because tests have been updated yet.

@fabio-rizzo-01 fabio-rizzo-01 changed the title (削除) adding support to use a kms key for s3 buckets data encryption for AWS (削除ここまで) (追記) adding support to use a kms key for s3 buckets data encryption (AWS only) (追記ここまで) Oct 21, 2025
k.allowedWriteLocations(),
k.refreshCredentialsEndpoint());
k.refreshCredentialsEndpoint(),
props);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If prop is an argument to generating AccessConfig is should be part of the cache key.

Optional<String> refreshCredentialsEndpoint,
Map props) {
LOGGER.info("Getting subscoped creds props: {}", props);
String kmsKey = props.get("s3.sse.key") != null ? props.get("s3.sse.key").toString() : null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally think that "storage integration" code should work independently of table properties. In other words Polaris should own storage integration and AccessConfig. Table properties are settable by the end users outside the context of a catalog (e.g. via registerTable). Mixing both sources of integration config is confusing IMHO.

Is the intention here to support different KMS config per table?

Polaris may need to be able to get AccessConfig in order to load table metadata (which has table properties)... Does this not create a chicken-and-egg problem?

That said, I'm open to a wider discussion about this on the dev ML.

Copy link
Contributor Author

@fabio-rizzo-01 fabio-rizzo-01 Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that is to support the usage of the s3 properties defined in iceberg, we don't really need to pass down props or have that code in there as long as we can agree on the code https://github.com/apache/polaris/pull/2802/files#diff-d305f7a426a7690c576722c114257792b3fcee726624655d15893b71499827f8R274.
If no kms key was defined in the polaris AWS storage then we allow the usage of any keys owned by that account.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before we discuss arnKeyAll (and I do not have any objections there), I think we need to resolve the conceptual issue of using metadata properties for generating AccessConfig. Polaris may need AccessConfig in order to read metadata JSON files (which have table properties)... How can this work in principle?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

@singhpk234 singhpk234 singhpk234 left review comments

@collado-mike collado-mike collado-mike left review comments

@dimas-b dimas-b dimas-b left review comments

@jbonofre jbonofre Awaiting requested review from jbonofre

At least 1 approving review is required to merge this pull request.

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

AltStyle によって変換されたページ (->オリジナル) /