Skip to main content
Stack Overflow
  1. About
  2. For Teams
Filter by
Sorted by
Tagged with
0 votes
0 answers
15 views

I have a EMR spark cluster, on which I have enabled EMR managed auto scaling as auto scaling configuration and primary - c5a.xlarge Core - c5a.xlarge Task - c5a.xlarge With these cluster ...
-3 votes
1 answer
159 views

Issue: Flink application throws Thread 'jobmanager-io-thread-25' produced an uncaught exception. java.lang.OutOfMemoryError: Direct buffer memory and terminates after running for 2-3 days. No matter ...
0 votes
0 answers
97 views

I am observing different write behaviors when executing queries on EMR Notebook (correct behavior) vs when using spark-submit to submit a spark application to EMR Cluster (incorrect behavior). When I ...
0 votes
0 answers
81 views

I am running an Apache Spark job on Amazon EMR that needs to connect to an Amazon MSK cluster configured with IAM authentication. The EMR cluster has an IAM role with full MSK permissions, and I can ...
1 vote
0 answers
76 views

I am connecting to an EMR cluster through SageMaker Unified Studio(JupyterLab). My EMR cluster is configured with Delta Lake support, and I have the following Spark properties set on the cluster: ...
0 votes
0 answers
68 views

I have one Iceberg table in Glue Catalog. I am unable to runw a select * as one of metadata file is missing. I am trying to point to latest metadata file. How can I do that? I am using EMR 7.7 with ...
2 votes
0 answers
179 views

I'm trying to connect to an existing EMR cluster from SageMaker Unified Studio to run SQL queries via JupyterLab. SageMaker requires that the EMR cluster be runtime role-enabled to integrate with ...
0 votes
1 answer
63 views

I am using emr 6.15 and hudi 0.14 I submitted following hudi job which should create a database and a table in aws glue. IAM Role assigned to EMR serverless has all neccessary permissions of s3 and ...
1 vote
0 answers
64 views

I have successfully implemented the IBM S3 Shuffle Plugin v0.9.6 (https://github.com/IBM/spark-s3-shuffle) on EMR on EKS (Spark 3.5.0) and the shuffle operations are working correctly with S3 storage. ...
0 votes
1 answer
172 views

I am writing data into s3 and table format is Iceberg in Glue Catalog. I see the /data and /metadata folders are getting created. However when I am writing data, it's creating 001/002 kind of folders. ...
0 votes
0 answers
41 views

I want to install external Python packages on EMR with an EC2 setup, but currently, apart from bootstrap actions, nothing else seems to be working. The problem with this setup is that if I want to ...
3 votes
1 answer
131 views

Having trouble getting dynamic allocation to properly terminate idle executors when using FSx Lustre for shuffle persistence on EMR 7.8 (Spark 3.5.4) on EKS. Trying this strategy out to battle cost ...
0 votes
0 answers
45 views

I am exploring data write into glue Table (Iceberg Table format). I have been using saveAsTable method mentioned as option1 . However is there any difference between two methods. Iceberg stores ...
0 votes
1 answer
111 views

I have a pyspark script that reads data from S3 in a different AWS account, using AssumedRoleCredentialProvider , it is working on emr serverless 6.9 but when I upgrade to EMR Serverless 7.5 it fails ...
0 votes
0 answers
33 views

I have an EMR cluster configured with the following SecurityConfiguration: "AuthenticationConfiguration": { "IdentityCenterConfiguration": { "EnableIdentityCenter":...

15 30 50 per page
1
2 3 4 5
...
333

AltStyle によって変換されたページ (->オリジナル) /