|
2 | 2 |
|
3 | 3 | General info in main [Readme](../readme.md)
|
4 | 4 |
|
5 | | -### Example 1 - Gradient Boosting with H2O.ai for Prediction of Flight Delays |
| 5 | +## Example 1 - Gradient Boosting with H2O.ai for Prediction of Flight Delays |
6 | 6 |
|
7 | | -**Use Case** |
| 7 | +### Use Case |
8 | 8 |
|
9 | 9 | Gradient Boosting Method (GBM) to predict flight delays.
|
10 | 10 | A H2O generated GBM Java model (POJO) is instantiated and used in a Kafka Streams application to do interference on new events.
|
11 | 11 |
|
12 | | -**Machine Learning Technology** |
| 12 | +### Machine Learning Technology |
13 | 13 |
|
14 | 14 | * [H2O](https://www.h2o.ai)
|
15 | 15 | * Check the [H2O demo](https://github.com/h2oai/h2o-2/wiki/Hacking-Airline-DataSet-with-H2O) to understand the test and and how the model was built
|
16 | 16 | * You can re-use the generated Java model attached to this project ([gbm_pojo_test.java](src/main/java/com/github/megachucky/kafka/streams/machinelearning/models/gbm_pojo_test.java)) or build your own model using R, Python, Flow UI or any other technologies supported by H2O framework.
|
17 | 17 |
|
18 | | -**Source Code** |
| 18 | +### Source Code |
19 | 19 |
|
| 20 | +Business Logic (applying the analytic model to do the prediction): |
| 21 | +[Kafka_Streams_MachineLearning_H2O_Application.java](src/main/java/com/github/megachucky/kafka/streams/machinelearning/Kafka_Streams_MachineLearning_H2O_Application.java) |
| 22 | + |
| 23 | +Specification of the used model: |
20 | 24 | [Kafka_Streams_MachineLearning_H2O_GBM_Example.java](src/main/java/com/github/megachucky/kafka/streams/machinelearning/Kafka_Streams_MachineLearning_H2O_GBM_Example.java)
|
21 | | -->Logic in [Kafka_Streams_MachineLearning_H2O_Application.java](src/main/java/com/github/megachucky/kafka/streams/machinelearning/Kafka_Streams_MachineLearning_H2O_Application.java) |
22 | 25 |
|
23 | | -**Unit Test** |
| 26 | +### Automated Tests |
24 | 27 |
|
| 28 | +Unit Test using TopologyTestDriver: |
25 | 29 | [Kafka_Streams_MachineLearning_H2O_GBM_ExampleTest.java](src/test/java/com/github/megachucky/kafka/streams/machinelearning/Kafka_Streams_MachineLearning_H2O_GBM_ExampleTest.java)
|
26 | | -[Kafka_Streams_MachineLearning_H2O_GBM_Example_IntegrationTest.java](src/test/java/com/github/megachucky/kafka/streams/machinelearning/test/Kafka_Streams_MachineLearning_H2O_GBM_Example_IntegrationTest.java) |
27 | 30 |
|
28 | | -**Manual Testing** |
| 31 | +Integration Test using EmbeddedKafkaCluster: |
| 32 | +[Kafka_Streams_MachineLearning_H2O_GBM_Example_IntegrationTest.java](src/test/java/com/github/megachucky/kafka/streams/machinelearning/Kafka_Streams_MachineLearning_H2O_GBM_Example_IntegrationTest.java) |
| 33 | + |
| 34 | +### Manual Testing |
29 | 35 |
|
30 | 36 | You can easily test this by yourself. Here are the steps:
|
31 | | -- Start Kafka, e.g. with Confluent CLI: |
| 37 | + |
| 38 | +* Start Kafka, e.g. with Confluent CLI: |
32 | 39 |
|
33 | 40 | confluent start kafka
|
34 | | -- Create topics AirlineInputTopic and AirlineOutputTopic |
| 41 | +* Create topics AirlineInputTopic and AirlineOutputTopic |
35 | 42 |
|
36 | 43 | kafka-topics --zookeeper localhost:2181 --create --topic AirlineInputTopic --partitions 3 --replication-factor 1
|
37 | 44 |
|
38 | 45 | kafka-topics --zookeeper localhost:2181 --create --topic AirlineOutputTopic --partitions 3 --replication-factor 1
|
39 | | -- Start the Kafka Streams app: |
| 46 | +* Start the Kafka Streams app: |
40 | 47 |
|
41 | | - java -cp target/h2o-gbm-CP51_AK21-jar-with-dependencies.jar com.github.megachucky.kafka.streams.machinelearning.Kafka_Streams_MachineLearning_H2O_GBM_Example |
42 | | -- Send messages, e.g. with kafkacat: |
| 48 | + java -cp h2o-gbm/target/h2o-gbm-CP51_AK21-jar-with-dependencies.jar com.github.megachucky.kafka.streams.machinelearning.Kafka_Streams_MachineLearning_H2O_GBM_Example |
| 49 | +* Send messages, e.g. with kafkacat: |
43 | 50 |
|
44 | 51 | echo -e "1987,10,14,3,741,730,912,849,PS,1451,NA,91,79,NA,23,11,SAN,SFO,447,NA,NA,0,NA,0,NA,NA,NA,NA,NA,YES,YES" | kafkacat -b localhost:9092 -P -t AirlineInputTopic
|
45 | | -- Consume predictions: |
| 52 | +* Consume predictions: |
46 | 53 |
|
47 | 54 | kafka-console-consumer --bootstrap-server localhost:9092 --topic AirlineOutputTopic --from-beginning
|
48 | | -- Find more details in the unit test... |
| 55 | +* Find more details in the unit test... |
49 | 56 |
|
50 | | - |
51 | | -**H2O Deep Learning instead of H2O GBM Model** |
| 57 | +## H2O Deep Learning instead of H2O GBM Model |
52 | 58 |
|
53 | 59 | The project includes another example with similar code to use a [H2O Deep Learning model](src/main/java/com/github/megachucky/kafka/streams/machinelearning/models/deeplearning_fe7c1f02_08ec_4070_b784_c2531147e451.java) instead of H2O GBM Model: [Kafka_Streams_MachineLearning_H2O_DeepLearning_Example_IntegrationTest.java](src/test/java/com/github/megachucky/kafka/streams/machinelearning/test/Kafka_Streams_MachineLearning_H2O_DeepLearning_Example_IntegrationTest.java)
|
54 | 60 | This shows how you can easily test or replace different analytic models for one use case, or even use them for A/B testing.
|
55 | 61 |
|
56 | | -**Source Code** |
| 62 | +### Source Code |
| 63 | + |
| 64 | +Business Logic (applying the analytic model to do the prediction): |
| 65 | +[Kafka_Streams_MachineLearning_H2O_Application.java](src/main/java/com/github/megachucky/kafka/streams/machinelearning/Kafka_Streams_MachineLearning_H2O_Application.java) |
57 | 66 |
|
| 67 | +Specification of the used model: |
58 | 68 | [Kafka_Streams_MachineLearning_H2O_DeepLearning_Example.java](src/main/java/com/github/megachucky/kafka/streams/machinelearning/Kafka_Streams_MachineLearning_H2O_DeepLearning_Example.java)
|
59 | | -->Logic in [Kafka_Streams_MachineLearning_H2O_Application.java](src/main/java/com/github/megachucky/kafka/streams/machinelearning/Kafka_Streams_MachineLearning_H2O_Application.java) |
60 | 69 |
|
61 | | -**Unit Test** |
| 70 | +### Unit Test |
62 | 71 |
|
| 72 | +Unit Test using TopologyTestDriver: |
63 | 73 | [Kafka_Streams_MachineLearning_H2O_DeepLearning_ExampleTest.java](src/test/java/com/github/megachucky/kafka/streams/machinelearning/Kafka_Streams_MachineLearning_H2O_DeepLearning_ExampleTest.java)
|
64 | | -[Kafka_Streams_MachineLearning_H2O_DeepLearning_Example_IntegrationTest.java](src/test/java/com/github/megachucky/kafka/streams/machinelearning/test/Kafka_Streams_MachineLearning_H2O_DeepLearning_Example_IntegrationTest.java) |
65 | 74 |
|
| 75 | +Integration Test using EmbeddedKafkaCluster: |
| 76 | +[Kafka_Streams_MachineLearning_H2O_DeepLearning_Example_IntegrationTest.java](src/test/java/com/github/megachucky/kafka/streams/machinelearning/Kafka_Streams_MachineLearning_H2O_DeepLearning_Example_IntegrationTest.java) |
66 | 77 |
|
67 | | -**Manual Testing** |
| 78 | +### Manual Testing |
68 | 79 |
|
69 | 80 | Same as above but change class to start app:
|
70 | 81 |
|
71 | | -- Start the Kafka Streams app: |
72 | | - |
73 | | - java -cp target/h2o-gbm-CP51_AK21-jar-with-dependencies.jar com.github.megachucky.kafka.streams.machinelearning.Kafka_Streams_MachineLearning_H2O_DeepLearning_Example |
74 | | - |
75 | | - |
76 | | - |
| 82 | +* Start the Kafka Streams app: |
77 | 83 |
|
| 84 | + java -cp h2o-gbm/target/h2o-gbm-CP51_AK21-jar-with-dependencies.jar com.github.megachucky.kafka.streams.machinelearning.Kafka_Streams_MachineLearning_H2O_DeepLearning_Example |
0 commit comments