-
Notifications
You must be signed in to change notification settings - Fork 327
adding support to use a kms key for s3 buckets data encryption (AWS only) #2802
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add the kms encryption context to this statement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you suggesting to allow the user to specify a new param during catalog creation for the encryption context condition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I mean for SSE, the encryption context is always the path to the file. In our IAM policy statement, we can specify that the key can be used only to en/decrypt files in the encrpytion context pattern we specify - e.g., see https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingKMSEncryption.html and https://docs.aws.amazon.com/kms/latest/developerguide/encrypt_context.html#encryption-context-authorization . The user doesn't need to specify this parameter - it's automatic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it is needed, the policy for the key and S3 bucket is restrictive enough and adding more restriction would cause the GenerateDataKey operation during the assume role to probably fail.
This is an example of a policy generated while creating a table:
IamPolicy(version=2012年10月17日, statements=[
IamStatement(effect=IamEffect(value=Allow), actions=[IamAction(value=s3:PutObject), IamAction(value=s3:DeleteObject)], resources=[IamResource(value=arn:aws:s3:::app-id-bucket/fns/t3/*)]),
IamStatement(effect=IamEffect(value=Allow), actions=[IamAction(value=kms:GenerateDataKeyWithoutPlaintext), IamAction(value=kms:DescribeKey), IamAction(value=kms:Decrypt), IamAction(value=kms:GenerateDataKey), IamAction(value=kms:Encrypt)], resources=[IamResource(value=arn:aws:kms:us-east-1:*********:key/id)]),
IamStatement(effect=IamEffect(value=Allow), actions=[IamAction(value=s3:ListBucket)], resources=[IamResource(value=arn:aws:s3:::app-id-bucket)], conditions=[IamCondition(operator=StringLike, key=s3:prefix, value=fns/t3/*)]),
IamStatement(effect=IamEffect(value=Allow), actions=[IamAction(value=s3:GetBucketLocation)], resources=[IamResource(value=arn:aws:s3:::app-id-bucket)]),
IamStatement(effect=IamEffect(value=Allow), actions=[IamAction(value=s3:GetObject), IamAction(value=s3:GetObjectVersion)], resources=[IamResource(value=arn:aws:s3:::app-id-bucket/fns/t3/*)])])
Also in https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingKMSEncryption.html#encryption-context
it says that the bucket path is added by default in the encryption context.
Checking the encryption context in CloudTrail for the GenerateDataKey request I can see:
"encryptionContext": {
"aws:s3:arn": "arn:aws:s3:::app-id-*****"
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we give encrypt or decrypt based on the priviledge ? for example TABLE_READ should get decrypt and TABLE_READ_AND_WRITE should both encrypt and decrypt ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by s3 data we mean data stored in s3 or iceberg data files ?
if its just data stored in s3, a part of iceberg metadata is generated by polaris itself, do we need to plumb the encryption to the fileIO polaris uses ?
if its just iceberg data files how are we making sure its not used against iceberg metadata files such as manifest lists ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the key is used by s3 to do encryption of any data in the bucket at rest, so as long as the IAM role has got the right entitlements on the key everything will work fine. I don't think anything needs to be changed on the polaris side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am talking of setting these properties in the FileIO which polaris uses - https://iceberg.apache.org/docs/nightly/aws/#s3-server-side-encryption or are you relying on these catalog properties seeping automatically to the FileIO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I follow I might be missing something, how would you specify the kms key in Polaris without adding in the storage config, also we need to add the policy in there to be able to use right? thx
What's the relationship of this PR and #1424 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By convention, adding REST API parameters needs to be discussed on the dev ML. Could you start a thread there and reference this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is already a thread for this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My mistake... I did not see it 🤦
For ref: https://lists.apache.org/thread/to7lbdw1k6dym5c6gnk1nm32vqdw24qk
What's the relationship of this PR and #1424 ?
sorry I didn't see that one.
Re: #1424 - that PR appears to be dormant, so it's fine to offer an alternative. I was just trying to understand whether the feature is the same or not :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can / will this work with MinIO KMS? Cf. https://github.com/minio/minio/blob/master/docs/kms/README.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will need to test that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's fine either way, but if it did not work with MinIO it would be good to mention AWS (only) in the PR description.
Also, I'm pretty sure we'll have to support KMS in MinIO at some point, so if current changes were reusable it would be great (but not required).
5234672 to
351dc07
Compare
Updated the PR with temporary changes, to see if this is the right direction. The build will fail because tests have been updated yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If prop is an argument to generating AccessConfig is should be part of the cache key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally think that "storage integration" code should work independently of table properties. In other words Polaris should own storage integration and AccessConfig. Table properties are settable by the end users outside the context of a catalog (e.g. via registerTable). Mixing both sources of integration config is confusing IMHO.
Is the intention here to support different KMS config per table?
Polaris may need to be able to get AccessConfig in order to load table metadata (which has table properties)... Does this not create a chicken-and-egg problem?
That said, I'm open to a wider discussion about this on the dev ML.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that is to support the usage of the s3 properties defined in iceberg, we don't really need to pass down props or have that code in there as long as we can agree on the code https://github.com/apache/polaris/pull/2802/files#diff-d305f7a426a7690c576722c114257792b3fcee726624655d15893b71499827f8R274.
If no kms key was defined in the polaris AWS storage then we allow the usage of any keys owned by that account.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before we discuss arnKeyAll (and I do not have any objections there), I think we need to resolve the conceptual issue of using metadata properties for generating AccessConfig. Polaris may need AccessConfig in order to read metadata JSON files (which have table properties)... How can this work in principle?
Uh oh!
There was an error while loading. Please reload this page.
What changes were proposed in this pull request?
The addition of a kms key used to encrypt data in the s3 bucket
Why are the changes needed?
For anyone wish to use data encryption.
Does this PR introduce any user-facing change?
Yes, the addition of new param in the catalog creation : kms key arn
How was this patch tested?
Tested in AWS EKS
CHANGELOG.md
Update catalogs creation to include AWS kms key to be used in S3 data encryption