In Azure VM, I have installed standalone Spark 4.0. On the same VM I have Python 3.11 with Jupyter deployed. In my notebook I submitted the following program:
from pyspark.sql import SparkSession
spark = SparkSession.builder.remote("sc://192.168.2.5:15002").getOrCreate()
df = spark.range(10)
df.show()
Everything works fine. Now I'm trying to read sample data, submitting the following program:
UsersDF=spark.read.load("examples/src/main/resources/users.parquet","parquet")
UsersDF.show()
This program generates the following error message:
UnknownException: (java.net.ConnectException) Call From vm-name/192.168.2.5 to vm-name.internal.cloudapp.net:9001 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
I'd be grateful for any suggestions on how to fix it!
2 Answers 2
The error message is completely misleading. The path should be written in the following way. It works without any further problems
UsersDF=spark.read.load("file:///examples/src/main/resources/users.parquet","parquet")
"sc://192.168.2.5:15002"do you have the Spark cluster with connect service running?