Newest 'amazon-emr' Questions

1. Home
2. Questions
3. AI Assist
4. Tags
5. Challenges
6. Chat
7. Articles
8. Users
9. Companies
11. Communities for your favorite technologies. Explore all Collectives
Stack Internal

Stack Overflow for Teams is now called Stack Internal. Bring the best of human thought and AI automation together at your work.
Try for free Learn more
Bring the best of human thought and AI automation together at your work. Learn more

4,988 questions

0 votes

0 answers

15 views

EMR Spark cluster getting stuck on resizing

I have a EMR spark cluster, on which I have enabled EMR managed auto scaling as auto scaling configuration and primary - c5a.xlarge Core - c5a.xlarge Task - c5a.xlarge With these cluster ...

Koushik's user avatar

Koushik

asked Dec 24 at 5:14

-3 votes

1 answer

159 views

Flink Job Manager Direct Buffer Memory gets exhausted when checkpointing enabled

Issue: Flink application throws Thread 'jobmanager-io-thread-25' produced an uncaught exception. java.lang.OutOfMemoryError: Direct buffer memory and terminates after running for 2-3 days. No matter ...

Strange's user avatar

Strange

1,514

asked Nov 12 at 18:14

0 votes

0 answers

97 views

Unexpected Write Behavior when using MERGE INTO/INSERT INTO Iceberg Spark Queries

I am observing different write behaviors when executing queries on EMR Notebook (correct behavior) vs when using spark-submit to submit a spark application to EMR Cluster (incorrect behavior). When I ...

shiva's user avatar

shiva

2,781

asked Oct 21 at 20:58

0 votes

0 answers

81 views

EMR Spark Job Fails to Connect to MSK with IAM Auth - Timeout Waiting for Node Assignment Error

I am running an Apache Spark job on Amazon EMR that needs to connect to an Amazon MSK cluster configured with IAM authentication. The EMR cluster has an IAM role with full MSK permissions, and I can ...

Vishwas Singh's user avatar

Vishwas Singh

asked Oct 1 at 11:20

1 vote

0 answers

76 views

Sagemaker Unified Studio overriding delta lake configuration to iceberg on EMR

I am connecting to an EMR cluster through SageMaker Unified Studio(JupyterLab). My EMR cluster is configured with Delta Lake support, and I have the following Spark properties set on the cluster: ...

sakshi's user avatar

sakshi

asked Sep 11 at 17:55

0 votes

0 answers

68 views

How do you expire snapshot from Iceberg Glue Table

I have one Iceberg table in Glue Catalog. I am unable to runw a select * as one of metadata file is missing. I am trying to point to latest metadata file. How can I do that? I am using EMR 7.7 with ...

user3858193's user avatar

user3858193

1,568

asked Aug 18 at 16:41

2 votes

0 answers

179 views

Unable to connect to EMR cluster from SageMaker Unified Studio using runtime role – credentials are null

I'm trying to connect to an existing EMR cluster from SageMaker Unified Studio to run SQL queries via JupyterLab. SageMaker requires that the EMR cluster be runtime role-enabled to integrate with ...

valzor's user avatar

valzor

asked Jul 30 at 19:00

0 votes

1 answer

63 views

Unable to register database/table in aws glue when hudi job is submitted from emrserverless

I am using emr 6.15 and hudi 0.14 I submitted following hudi job which should create a database and a table in aws glue. IAM Role assigned to EMR serverless has all neccessary permissions of s3 and ...

Roobal Jindal's user avatar

Roobal Jindal

asked Jul 9 at 7:00

1 vote

0 answers

64 views

Spark Dynamic Resource Allocation Configuration while using IBM S3 Shuffle Plugin on EMR on EKS

I have successfully implemented the IBM S3 Shuffle Plugin v0.9.6 (https://github.com/IBM/spark-s3-shuffle) on EMR on EKS (Spark 3.5.0) and the shuffle operations are working correctly with S3 storage. ...

metersk's user avatar

metersk

12.7k

asked Jul 1 at 16:26

0 votes

1 answer

172 views

Why Iceberg load is creating many folders in s3?

I am writing data into s3 and table format is Iceberg in Glue Catalog. I see the /data and /metadata folders are getting created. However when I am writing data, it's creating 001/002 kind of folders. ...

user3858193's user avatar

user3858193

1,568

asked Jun 28 at 11:19

0 votes

0 answers

41 views

Installing external python packages on EMR on EC2

I want to install external Python packages on EMR with an EC2 setup, but currently, apart from bootstrap actions, nothing else seems to be working. The problem with this setup is that if I want to ...

RushHour's user avatar

RushHour

asked Jun 27 at 6:23

3 votes

1 answer

131 views

EMR on EKS: Dynamic Allocation + FSx Lustre -- Executors with shuffle data won't terminate despite idle timeout

Having trouble getting dynamic allocation to properly terminate idle executors when using FSx Lustre for shuffle persistence on EMR 7.8 (Spark 3.5.4) on EKS. Trying this strategy out to battle cost ...

metersk's user avatar

metersk

12.7k

asked Jun 26 at 18:59

0 votes

0 answers

45 views

Data write into Iceberg Glue Table (saveAsTable vs option("path", s3_output_path))

I am exploring data write into glue Table (Iceberg Table format). I have been using saveAsTable method mentioned as option1 . However is there any difference between two methods. Iceberg stores ...

user3858193's user avatar

user3858193

1,568

asked Jun 26 at 15:21

0 votes

1 answer

111 views

Can not read from S3 with AssumedRoleCredentialProvider after upgrade from EMR serverless 6.9 to 7.5

I have a pyspark script that reads data from S3 in a different AWS account, using AssumedRoleCredentialProvider , it is working on emr serverless 6.9 but when I upgrade to EMR Serverless 7.5 it fails ...

Sayed's user avatar

Sayed

asked Jun 14 at 16:00

0 votes

0 answers

33 views

Unable to access Livy after enabling IAM Identity Center (SSO) on my EMR cluster

I have an EMR cluster configured with the following SecurityConfiguration: "AuthenticationConfiguration": { "IdentityCenterConfiguration": { "EnableIdentityCenter":...

ExK's user avatar

ExK

asked Jun 3 at 17:08

15 30 50 per page

2 3 4 5

...

333 Next

CollectivesTM on Stack Overflow

EMR Spark cluster getting stuck on resizing

Flink Job Manager Direct Buffer Memory gets exhausted when checkpointing enabled

Unexpected Write Behavior when using MERGE INTO/INSERT INTO Iceberg Spark Queries

EMR Spark Job Fails to Connect to MSK with IAM Auth - Timeout Waiting for Node Assignment Error

Sagemaker Unified Studio overriding delta lake configuration to iceberg on EMR

How do you expire snapshot from Iceberg Glue Table

Unable to connect to EMR cluster from SageMaker Unified Studio using runtime role – credentials are null

Unable to register database/table in aws glue when hudi job is submitted from emrserverless

Spark Dynamic Resource Allocation Configuration while using IBM S3 Shuffle Plugin on EMR on EKS

Why Iceberg load is creating many folders in s3?

Installing external python packages on EMR on EC2

EMR on EKS: Dynamic Allocation + FSx Lustre -- Executors with shuffle data won't terminate despite idle timeout

Data write into Iceberg Glue Table (saveAsTable vs option("path", s3_output_path))

Can not read from S3 with AssumedRoleCredentialProvider after upgrade from EMR serverless 6.9 to 7.5

Unable to access Livy after enabling IAM Identity Center (SSO) on my EMR cluster

Hot Network Questions