-
Notifications
You must be signed in to change notification settings - Fork 618
Gluten Flink Pending tasks #12164
-
Currently, we have supported nexmark sql taks from q0 to q22` in gluten flink, but some operator still need to be refined. We plan to prioritize support for stateless operators first , then add support for stateful operators.
Stateless opeators
- Connector support: kafka, pulsar, hdfs
kafka[Flink][Feature]Support kafka connector bigo-sg/velox#2 ,hdfs[FLINK]Support filesystem sink bigo-sg/velox#6 connectors are already implementd,need more tests and refine- pulsar connector not implement yet
- Format support: json, pb, parquet, orc
- json format is done. pb, parquet ,orc is still not implemented.
- state management: pulsar/kafka offset, hive write part counter ...;
- checkpoint: periodic snapshot and restore
- interfaces have been defined
- failover:exceptions or errors need to be catched by velox, and flink controll the failover process
- support mutiple parallelism
currently only support single pallelism - basic data type
- primitive and complex types already supported: Boolean, Int, Bigint, Double, Varchar, Char, Timestamp, Decimal, Date, Row, Array, Map;
- types not fully supported : Timestamp with precision or timezone, e.g timestamp(n), timestamp_ltz
- experssion support
FROM_UNIXTIME,DATE_FORMAT,UNIX_TIMESTAMP,HOUR,DAY,YEAR,TO_TIMESTAMP,PROCTIME,unix_timestamp,concat,Json_Value,TO_DATE,TO_TIMESTAMP_LTZ,SUBSTRING,CURRENT_DATE,FROM_BASE64,COALESCE,ifnull,TIMESTAMPADD,LOCALTIMESTAMP, ``
Stateful operators
-
window aggregate:tumble / hop / session / cumulative
some prs: [FLINK] Support Processing time window aggregate for nexmark q12 bigo-sg/velox#30 (processing time window)
[FLINK]Support event time window KevinyhZou/velox#2 (event time window)
session / cumulative window need to be implemented -
watermark
a basic version has been implemented -
retraction semantics
this feature is required for group aggregate and join, which has not been implemented, maybe we should introduce therowkindjust as defined inflink, combine it with row vector. -
group aggregate
A simple version has been implemented. -
TopN rank
A simple version has been implemented,need more tests and refine -
join: regular join / window join / interval join / temporary join
some prs: [FLINK] Support filesystem dim table join bigo-sg/velox#16 (temporary join for nexmark q13) -
state: rockdb state / heap state
A simple version for rocksdb state backend has been implemented: [FLINK]Support rocksdb state bigo-sg/velox#21 , yet further refinement is still needed.
A simple version for heap state backend has been introduced in pr: [FLINK] Support Processing time window aggregate for nexmark q12 bigo-sg/velox#30 -
timers: processing time timers and event time timers
A basic version has been implemented: [FLINK] Support timer service bigo-sg/velox#29 , still need some additional work:timers serialization / stateful storage
Feature list
- Support parquet format: [FLINK] Support parquet format in hdfs writer #12202
- Support orc format: [FLINK] Support orc format in hdfs writer #12203
- Support pulsar connector: [FLINK] Support pulsar connector #12204
- Kafka / Pulsar offset state management: [FLINK] Kafka/Pulsar offset state management #12205
- Checkpoint snapshot and restore: [FLINK] Checkpoint snapshot and restore #12206
- Support multiple parallelism: [FLINK] Support multiple parallelism #12207
- Failover controll by Flink: [FLINK] Failover controll by flink #12208
- Data type support: timestamp with precision and timezone: [FLINK] Support timestamp with precision and timezone #12209
- Expressions support: [FLINK] Support regular expressions #12210
Beta Was this translation helpful? Give feedback.
All reactions
-
👍 7
Replies: 2 comments
-
Great! We have to make it happen. Let's do it from stateless SQL, please make some issues from stateless computing.
Beta Was this translation helpful? Give feedback.
All reactions
-
👍 1
-
bigo-sg/velox4j#35 pulsar connector
support mutiple parallelism
Beta Was this translation helpful? Give feedback.