108 questions
- Bountied 0
- Unanswered
- Frequent
- Score
- Trending
- Week
- Month
- Unanswered (my tags)
0
votes
0
answers
81
views
EMR Spark Job Fails to Connect to MSK with IAM Auth - Timeout Waiting for Node Assignment Error
I am running an Apache Spark job on Amazon EMR that needs to connect to an Amazon MSK cluster configured with IAM authentication. The EMR cluster has an IAM role with full MSK permissions, and I can ...
0
votes
0
answers
52
views
[Spark-stream]: Stream Batches processing time reduce over time causing Kafka Lag
I have been using Spark v3.5 Spark Stream functionality for the below use case. I am observing the issue below on one of the environments with Spark Stream. Please if I can get some assistance with ...
0
votes
1
answer
143
views
Unable to push the data from the written kafka topic to Postgres table
I am trying to load the data written into the Kafka topic into the Postgres table. I can see the topic is receiving new messages every second and also the data looks good.
However, when I use the ...
0
votes
1
answer
125
views
ERROR SparkContext: Failed to add spark-streaming-kafka-0-10_2.13-3.5.2.jar
ERROR SparkContext: Failed to add home/areaapache/software/spark-3.5.2-bin-hadoop3/jars/spark-streaming-kafka-0-10_2.13-3.5.2.jar \
to Spark environment
import logging
from pyspark.sql import ...
3
votes
0
answers
83
views
Is there options to send Spark streaming executor metrics directly instead via driver?
I have Spark Streaming application lives on Argo + K8S that reads Kafka topics by subscribe pattern then there are some transformations and writing to a target.
Several different producers may write ...
0
votes
1
answer
309
views
Save a file in Databricks Workspace using Scala/Java
My goal is to run a Spark job using Databricks, and my challenge is that I can't store files in the local filesystem since the file is saved in the driver, but when my executors tried to access the ...
1
vote
0
answers
400
views
Spark : java.lang.NoClassDefFoundError: org/apache/spark/kafka010/KafkaConfigUpdater
I am working on spark streaming and reading data from kafka topic, but getting error java.lang.NoClassDefFoundError: org/apache/spark/kafka010/KafkaConfigUpdater. Running my code in K8s and provide ...
2
votes
0
answers
132
views
Multiple Kafka source topic + Spark Structured streaming + multiple delta table sink
I have multiple topics in kafka that I need to sink in their respective delta table.
A) 1 Streaming query for all topics
If i use one streaming query, then the RDD/DF should contains data from ...
2
votes
1
answer
3k
views
ClassNotFoundException for scala.$less$colon$less. Problem with different Scala versions?
When I try to run this .py:
import logging
from cassandra.cluster import Cluster
from pyspark.sql import SparkSession
from pyspark.sql.functions import from_json, col
from pyspark.sql.types import ...
2
votes
1
answer
556
views
"java.lang.NoSuchMethodError: 'scala.collection.JavaConverters$AsJava scala.collection" error when I stream kafka messages using Pyspark
I am in a bind here. I am trying to implement a very basic pipeline which reads data from kafka and process it in Spark. The problem I am facing is that apache spark shuts down abruptly giving the ...
1
vote
0
answers
99
views
spark-connect with standalone spark cluster error
I'm trying to read stream from Kafka using pyspark.
The Stack I'm working with:
Kubernetes.
Stand alone spark cluster with 2 workers.
spark-connect connected to the cluster and has the dependencies ...
0
votes
1
answer
171
views
Py4JJavaError An error occurred while calling javalangNoSuchMethodError org.apache.spark.sql.AnalysisException org.apache.spark.sql.kafka.KafkaWriter
I can't write to Kafka from Spark, Spark is reading but not writing, if I write to the console it doesn't give an error
Traceback (most recent call last):
File "f:\Sistema de Informação\TCC\...
0
votes
1
answer
78
views
Spark incoming JSON stream processing
I have been trying to complete a project in which I needed to send data stream using kafka to local Spark to process the incoming data. However I can not show and use the data frame in the right ...
0
votes
0
answers
33
views
Failed to find data source: kafka. Please deploy the application as per the deployment section of Structured Streaming + Kafka Integration Guide Spark [duplicate]
Hello I am trying to use pyspark + kafka in order to do this I execute this command in order to set up the Spark application
Spark version is 3.5.0 | spark-3.5.0-bin-hadoop3
Kafka version is - ...
0
votes
0
answers
273
views
Spark-Kafka Integration not working: Kafka broker with producer and consumer script are getting stuck as soon as we run spark script(consumer.py)
I'm trying to read data from kafka topic by using spark structured streaming on ec2(ubuntu) machine.
If I try to read the data by using kafka stream only(kafka-console-consumer.sh) then there is no ...