Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings
Edwin Chan edited this page Sep 5, 2022 · 12 revisions

Development Guide

Getting Started on spark development (spark-branch!)

  1. Installing system requirements (spark, java, anaconda)

Mac

Linux or Windows with WSL

export SPARK_VERSION=3.2.0
export SPARK_DIRECTORY=/opt/spark
export HADOOP_VERSION=2.7
mkdir -p ${SPARK_DIRECTORY}
sudo apt-get update
sudo apt-get -y install openjdk-8-jdk
curl https://archive.apache.org/dist/spark/spark-${SPARK_VERSION}/spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz \
--output ${SPARK_DIRECTORY}/spark.tgz
cd ${SPARK_DIRECTORY} && tar -xvzf spark.tgz && mv spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION} spark
  1. Installing python library requirements in a conda env. Pull [spark-branch](https://github.com/ydataai/pandas-profiling/tree/spark-branch) and run
conda env create -f venv/spark.yml

This creates your conda env for spark called spark-env with all requirements packed inside

then activate the environment using

source activate spark-env
  1. Finally, run the command which should execute and provide profiling for some spark data
tests/backends/spark_backend/example.py

Don’t worry about any errors you see for now - as long as the report builds properly.

image

image

AltStyle によって変換されたページ (->オリジナル) /