GitHub - gasplus/piflow: πflow is a big data flow engine with spark support

Name	Name	Last commit message	Last commit date
Latest commit History 416 Commits
conf	conf
doc	doc
piflow-bundle	piflow-bundle
piflow-core	piflow-core
piflow-server	piflow-server
testdata	testdata
.gitignore	.gitignore
LICENSE	LICENSE
README.md	README.md
README_CN.md	README_CN.md
config.properties	config.properties
pom.xml	pom.xml
readMe.txt	readMe.txt

piflow is an easy to use, powerful big data pipeline system.

Features
Architecture
Requirements
Getting Started
Getting Help
Documentation

Features

Easy to use
- provide a WYSIWYG web interface to configure data flow
- monitor data flow status
- check the logs of data flow
- provide checkpoints
Strong scalability:
- Support customized development of data processing components
Superior performance
- based on distributed computing engine Spark
Powerful
- 100+ data processing components available
- include spark、mllib、hadoop、hive、hbase、solr、redis、memcache、elasticSearch、jdbc、mongodb、http、ftp、xml、csv、json,etc.

Architecture

Requirements

JDK 1.8 or newer
Apache Maven 3.1.0 or newer
Git Client (used during build process by 'bower' plugin)
Spark-2.1.0
Hadoop-2.6.0
Hive-1.2.1

Getting Started

To Build: mvn clean package -Dmaven.test.skip=true

 [INFO] Replacing original artifact with shaded artifact.
 [INFO] Replacing /opt/project/piflow/piflow-server/target/piflow-server-0.9.jar with /opt/project/piflow/piflow-server/target/piflow-server-0.9-shaded.jar
 [INFO] ------------------------------------------------------------------------
 [INFO] Reactor Summary:
 [INFO] 
 [INFO] piflow-project ..................................... SUCCESS [ 4.602 s]
 [INFO] piflow-core ........................................ SUCCESS [ 56.533 s]
 [INFO] piflow-bundle ...................................... SUCCESS [02:15 min]
 [INFO] piflow-server ...................................... SUCCESS [03:01 min]
 [INFO] ------------------------------------------------------------------------
 [INFO] BUILD SUCCESS
 [INFO] ------------------------------------------------------------------------
 [INFO] Total time: 06:18 min
 [INFO] Finished at: 2018年12月24日T16:54:16+08:00
 [INFO] Final Memory: 41M/812M
 [INFO] ------------------------------------------------------------------------

To Run Piflow Server:

run piflow server on intellij:
- edit config.properties
- build piflow to generate piflow-server.jar
- main class is cn.piflow.api.Main(remember to set SPARK_HOME)
run piflow server by release version:
- download piflow.tar.gz: https://github.com/cas-bigdatalab/piflow/releases/download/v0.5/piflow.tar.gz
- unzip piflow.tar.gz: tar -zxvf piflow.tar.gz
- edit config.properties
- run start.sh

how to configure config.properties

#server ip and port
server.ip=10.0.86.191
server.port=8002
h2.port=50002
#spark and yarn config
spark.master=yarn
spark.deploy.mode=cluster
yarn.resourcemanager.hostname=10.0.86.191
yarn.resourcemanager.address=10.0.86.191:8032
yarn.access.namenode=hdfs://10.0.86.191:9000
yarn.stagingDir=hdfs://10.0.86.191:9000/tmp/
yarn.jars=hdfs://10.0.86.191:9000/user/spark/share/lib/*.jar
yarn.url=http://10.0.86.191:8088/ws/v1/cluster/apps/
#hive config
hive.metastore.uris=thrift://10.0.86.191:9083
#piflow-server.jar path
piflow.bundle=/opt/piflowServer/piflow-server-0.9.jar
#checkpoint hdfs path
checkpoint.path=hdfs://10.0.86.89:9000/piflow/checkpoints/
#debug path
debug.path=hdfs://10.0.88.191:9000/piflow/debug/
#yarn url
yarn.url=http://10.0.86.191:8088/ws/v1/cluster/apps/
#the count of data shown in log
data.show=10
#h2 db port
h2.port=50002

To Run Piflow Web:

https://github.com/cas-bigdatalab/piflow-web

To Use:

command line

flow config example

{
 "flow":{
 "name":"test",
 "uuid":"1234",
 "checkpoint":"Merge",
 "stops":[
 {
 "uuid":"1111",
 "name":"XmlParser",
 "bundle":"cn.piflow.bundle.xml.XmlParser",
 "properties":{
 "xmlpath":"hdfs://10.0.86.89:9000/xjzhu/dblp.mini.xml",
 "rowTag":"phdthesis"
 }
 },
 {
 "uuid":"2222",
 "name":"SelectField",
 "bundle":"cn.piflow.bundle.common.SelectField",
 "properties":{
 "schema":"title,author,pages"
 }
 },
 {
 "uuid":"3333",
 "name":"PutHiveStreaming",
 "bundle":"cn.piflow.bundle.hive.PutHiveStreaming",
 "properties":{
 "database":"sparktest",
 "table":"dblp_phdthesis"
 }
 },
 {
 "uuid":"4444",
 "name":"CsvParser",
 "bundle":"cn.piflow.bundle.csv.CsvParser",
 "properties":{
 "csvPath":"hdfs://10.0.86.89:9000/xjzhu/phdthesis.csv",
 "header":"false",
 "delimiter":",",
 "schema":"title,author,pages"
 }
 },
 {
 "uuid":"555",
 "name":"Merge",
 "bundle":"cn.piflow.bundle.common.Merge",
 "properties":{
 "inports":"data1,data2"
 }
 },
 {
 "uuid":"666",
 "name":"Fork",
 "bundle":"cn.piflow.bundle.common.Fork",
 "properties":{
 "outports":"out1,out2,out3"
 }
 },
 {
 "uuid":"777",
 "name":"JsonSave",
 "bundle":"cn.piflow.bundle.json.JsonSave",
 "properties":{
 "jsonSavePath":"hdfs://10.0.86.89:9000/xjzhu/phdthesis.json"
 }
 },
 {
 "uuid":"888",
 "name":"CsvSave",
 "bundle":"cn.piflow.bundle.csv.CsvSave",
 "properties":{
 "csvSavePath":"hdfs://10.0.86.89:9000/xjzhu/phdthesis_result.csv",
 "header":"true",
 "delimiter":","
 }
 }
],
"paths":[
 {
 "from":"XmlParser",
 "outport":"",
 "inport":"",
 "to":"SelectField"
 },
 {
 "from":"SelectField",
 "outport":"",
 "inport":"data1",
 "to":"Merge"
 },
 {
 "from":"CsvParser",
 "outport":"",
 "inport":"data2",
 "to":"Merge"
 },
 {
 "from":"Merge",
 "outport":"",
 "inport":"",
 "to":"Fork"
 },
 {
 "from":"Fork",
 "outport":"out1",
 "inport":"",
 "to":"PutHiveStreaming"
 },
 {
 "from":"Fork",
 "outport":"out2",
 "inport":"",
 "to":"JsonSave"
 },
 {
 "from":"Fork",
 "outport":"out3",
 "inport":"",
 "to":"CsvSave"
 }
]

} }

curl -0 -X POST http://10.0.86.191:8002/flow/start -H "Content-type: application/json" -d 'this is your flow json'

piflow web: try with "http://piflow.ml/piflow-web", user/password: admin/admin

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

gasplus/piflow

Folders and files

Latest commit

History

Repository files navigation

Table of Contents

Features

Architecture

Requirements

Getting Started

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Languages

License

gasplus/piflow

Folders and files

Latest commit

History

Repository files navigation

Table of Contents

Features

Architecture

Requirements

Getting Started

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages