Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit a7a6308

Browse files
Add Pyspark listener (#248)
This PR adds support for a Pyspark listener that is intended to receive queries from the MC2 Client. It contains SBT code to generate Python protobuf files and package them into a zip automatically for submitting to a standalone Spark cluster. The instructions for running the listener are in the client docs. Unfortunately, it also introduces a Python dependency to Opaque SQL, since there are required packages to start the Python listener service.
1 parent 2cc3f6e commit a7a6308

File tree

15 files changed

+122
-16
lines changed

15 files changed

+122
-16
lines changed

‎.github/scripts/build.sh

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ sudo apt-get -y install clang-8 libssl-dev gdb libsgx-enclave-common libsgx-quot
1111

1212
# Install Opaque Dependencies
1313
sudo apt-get -y install wget build-essential openjdk-8-jdk python libssl-dev libmbedtls-dev
14+
pip3 install grpcio grpcio-tools # Needed for PySpark listener
1415

1516
# Install a newer version of CMake (3.15)
1617
wget https://github.com/Kitware/CMake/releases/download/v3.15.6/cmake-3.15.6-Linux-x86_64.sh

‎.gitignore

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,11 @@ lint-r-report.log
6868
/.tags
6969
.metals/
7070

71+
# For generated Python protobuf files
72+
/src/python/protobuf/
73+
!/src/python/protobuf/__init__.py
74+
!/src/python/protobuf/README.md
75+
7176
# For Hive
7277
metastore_db/
7378
metastore/

‎build.sbt

Lines changed: 34 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -161,8 +161,6 @@ resourceGenerators in Compile += copyEnclaveLibrariesToResourcesTask.taskValue
161161
// Add the managed resource directory to the resource classpath so we can find libraries at runtime
162162
managedResourceDirectories in Compile += resourceManaged.value
163163

164-
unmanagedResources in Compile ++= ((sourceDirectory.value / "python") ** "*.py").get
165-
166164
Compile / PB.protoSources := Seq(sourceDirectory.value / "protobuf")
167165
Compile / PB.targets := Seq(scalapb.gen() -> (Compile / sourceManaged).value / "scalapb")
168166

@@ -192,6 +190,13 @@ def data = Command.command("data") { state =>
192190
state
193191
}
194192

193+
val zipPythonFilesTask = TaskKey[Unit](
194+
"zipPythonFilesTask",
195+
"Creates a zip of all the Python files (including generated protobuf files) and the listener in the target directory."
196+
)
197+
198+
compile in Compile := { (compile in Compile).dependsOn(zipPythonFilesTask).value }
199+
195200
commands += oeGdbCommand
196201
commands += data
197202

@@ -438,3 +443,30 @@ synthBenchmarkDataTask := {
438443
}
439444
}
440445
}
446+
447+
zipPythonFilesTask := {
448+
val pythonDir = sourceDirectory.value / "python"
449+
450+
// First generate the Python protobuf files
451+
val protoSourceDir = sourceDirectory.value / "protobuf"
452+
val pythonProtoDir = pythonDir / "protobuf"
453+
Process(
454+
Seq(
455+
"python3",
456+
"-m",
457+
"grpc_tools.protoc",
458+
s"-I=${protoSourceDir}",
459+
s"--python_out=${pythonProtoDir}",
460+
s"--grpc_python_out=${pythonProtoDir}",
461+
s"${protoSourceDir}/client.proto",
462+
s"${protoSourceDir}/listener.proto"
463+
)
464+
).!
465+
466+
// Then move all Python files
467+
Process(Seq("cp", "-a", s"${pythonDir}", s"${target.value}")).!
468+
469+
// Finally zip everything together
470+
val pythonBuildDir = target.value / "python"
471+
IO.zip(Path.allSubpaths(pythonBuildDir), target.value / "python.zip")
472+
}

‎docker/Dockerfile

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,8 @@ RUN apt-get update && \
44
apt-get install -y git wget build-essential openjdk-8-jdk-headless python cmake libssl-dev libmbedtls-dev && \
55
rm -rf /var/lib/apt/lists/*
66

7+
RUN pip3 install grpcio grpcio-tools # Needed for PySpark listener
8+
79
RUN wget -O sgx_installer.bin https://download.01.org/intel-sgx/linux-2.3.1/ubuntu16.04/sgx_linux_x64_sdk_2.3.101.46683.bin && \
810
chmod +x ./sgx_installer.bin && \
911
echo $'no\n/usr/local' | ./sgx_installer.bin && \

‎opaque-sql-docs/src/install/install.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@ After downloading the Opaque codebase, install necessary dependencies as follows
2727
2828
# For Ubuntu 18.04:
2929
sudo apt -y install wget build-essential openjdk-8-jdk python libssl-dev libmbedtls-dev
30+
pip3 install grpcio grpcio-tools # Needed for PySpark listener
3031
3132
# Install a newer version of CMake (3.15)
3233
wget https://github.com/Kitware/CMake/releases/download/v3.15.6/cmake-3.15.6-Linux-x86_64.sh

‎opaque-sql-docs/src/usage/client.rst

Lines changed: 26 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -9,36 +9,54 @@ Query submission via the MC\ :sup:`2` Client
99
Overview of Opaque SQL running with the MC\ :sup:`2` Client
1010

1111

12-
Starting Opaque SQL
13-
###################
14-
15-
Here we outline how to use `the client <https://github.com/mc2-project/mc2>`_ to submit queries. This is the primary method of running Opaque SQL and integrates with the rest of the MC\ :sup:`2` projects, such as `Secure XGBoost <https://github.com/mc2-project/secure-xgboost>`_.
12+
In this section, we outline how to use `the client <https://github.com/mc2-project/mc2>`_ for submitting queries to Opaque SQL. This is the primary method of running Opaque SQL and integrates with the rest of the MC\ :sup:`2` projects, such as `Secure XGBoost <https://github.com/mc2-project/secure-xgboost>`_.
1613

1714
.. warning::
1815
Note that, by default, Opaque SQL uses a pre-generated RSA private key for enclave signing located at ``${OPAQUE_HOME}/src/test/keys/mc2_test_key.pem``. This key **should not** be used in a production environment. This can be reconfigured by changing the environment variable ``PRIVATE_KEY_PATH`` to another key file and having the MC\ :sup:`2` Client use its public peer to verify the signature.
1916

2017
Running in this mode enables the driver to be located on *untrusted* hardware while still providing the complete security guarantees of the MC\ :sup:`2` platform.
2118

2219
Starting the gRPC Listeners
23-
***************************
20+
###########################
2421

2522
Opaque SQL uses two listeners: the first for remote attestation on port 50051, and the second for query requests on port 50052. The MC\ :sup:`2` Client uses these by default to connect to the Opaque SQL driver.
2623

24+
Opaque SQL supports running both regular (Scala) Spark as well as PySpark.
25+
26+
Scala Instructions
27+
******************
28+
2729
1. To run both listeners locally using ``sbt``:
2830

2931
.. code-block:: bash
3032
3133
build/sbt run
3234
33-
2. To launch the listener on a standalone Spark cluster:
35+
2. To launch the listeners on a standalone Spark cluster:
3436

3537
.. code-block:: bash
3638
3739
build/sbt assembly # create a fat jar with all necessary dependencies
3840
3941
spark-submit --class edu.berkeley.cs.rise.opaque.rpc.Listener \
40-
<Spark configuration parameters> \
41-
--deploy-mode client ${OPAQUE_HOME}/target/scala-2.12/opaque-assembly-0.1.jar
42+
<Spark configuration parameters> \
43+
--deploy-mode client ${OPAQUE_HOME}/target/scala-2.12/opaque-assembly-0.1.jar
44+
45+
Python Instructions
46+
*******************
47+
48+
To launch the listeners on a standalone Spark cluster:
49+
50+
.. code-block:: bash
51+
52+
build/sbt assembly # create a fat jar with all necessary dependencies
53+
54+
spark-submit --class edu.berkeley.cs.rise.opaque.rpc.Listener \
55+
<Spark configuration parameters> \
56+
--deploy-mode client \
57+
--jars ${OPAQUE_HOME}/target/scala-2.12/opaque-assembly-0.1.jar \
58+
--py-files ${OPAQUE_HOME}/target/python.zip \
59+
${OPAQUE_HOME}/target/python/listener.py \
4260
4361
Using the MC\ :sup:`2` Client
4462
#############################

‎opaque-sql-docs/src/usage/untrusted.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -78,7 +78,7 @@ Encrypting a DataFrame with the Driver
7878
######################################
7979

8080
.. note::
81-
The methods shown in this section are *only* supported in insecure mode.
81+
The Opaque SQL methods shown in this section are *only* supported in insecure mode, since the driver needs the key for encryption/decryption.
8282

8383
1. Create an unencrypted DataFrame.
8484

‎opaque-sql-docs/src/usage/usage.rst

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,6 @@ Usage
44

55
This section goes over how to manipulate an encrypted DataFrame in either client or insecure mode.
66

7-
.. note::
8-
Currently, Python is only supported while running in **insecure** mode.
9-
107
.. _save_df:
118

129
Saving a DataFrame

‎scripts/install.sh

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@ sudo apt -y install clang-8 libssl-dev gdb libsgx-enclave-common libsgx-quote-ex
2626

2727
# Install SBT dependencies
2828
sudo apt -y install wget build-essential openjdk-8-jdk python libssl-dev libmbedtls-dev
29+
pip3 install grpcio grpcio-tools # Needed for Pyspark listener
2930

3031
# Install a newer version of CMake (3.15)
3132
wget https://github.com/Kitware/CMake/releases/download/v3.15.6/cmake-3.15.6-Linux-x86_64.sh

‎src/main/resources/log4j.properties

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
# Set everything to be logged to the console
2-
log4j.rootCategory=INFO, console
2+
log4j.rootCategory=WARN, console
33
log4j.appender.console=org.apache.log4j.ConsoleAppender
44
log4j.appender.console.target=System.out
55
log4j.appender.console.layout=org.apache.log4j.PatternLayout

0 commit comments

Comments
(0)

AltStyle によって変換されたページ (->オリジナル) /