Skip to main content
Stack Overflow
  1. About
  2. For Teams
Filter by
Sorted by
Tagged with
0 votes
0 answers
84 views

I use Spark+ Hudi to write data into S3. I was writing data in bulk_insert mode, which cause there be many small paruqet files in Hudi table. Then I try to schedule clustering on the Hudi table: ...
0 votes
0 answers
69 views

I'm using Flink CDC + Apache Hudi in Flink to sync data from MySQl to AWS S3. My Flink job looks like: parallelism = 1 env = StreamExecutionEnvironment.get_execution_environment(config) ...
0 votes
1 answer
65 views

I was using Flink in batch mode to read data from one source and then directly write the data into file system as Parquet format. The code was like: hudi_source_ddl = f""" ...
0 votes
1 answer
63 views

I am using emr 6.15 and hudi 0.14 I submitted following hudi job which should create a database and a table in aws glue. IAM Role assigned to EMR serverless has all neccessary permissions of s3 and ...
0 votes
1 answer
52 views

Getting the error when I try to execute spark sql. Caused by: org.apache.spark.sql.AnalysisException: [NOT_SUPPORTED_COMMAND_WITHOUT_HIVE_SUPPORT] CREATE Hive TABLE (AS SELECT) is not supported, if ...
0 votes
1 answer
100 views

I'm trying to use Flink-cdc to capture data change from Mysql and update the Hudi table in S3. My pyFlink job was like: env = StreamExecutionEnvironment.get_execution_environment(config) env....
0 votes
1 answer
139 views

I run a Flink in Docker on my local env. And I try to write a Flink job to use CDC to sync Mysql data to S3 (stored as Apache Hudi format). My Flink job looks like: t_env = StreamTableEnvironment....
0 votes
0 answers
57 views

I have written a pipeline in which I am sinking the data from Kafka to Hudi-S3. It is working, but compaction is very very slow. It is a batch job that runs every hour and sinks the last hour data to ...
0 votes
0 answers
75 views

I have one use case. Sink(S1) -> I have written a job in Spark that is sinking the data from OpenSearch to S3. Sink(S2) -> I have another job which is sinking the data from Kafka to S3 into the ...
1 vote
0 answers
21 views

this is the exception by the VarScoreData is a case class code: case class VarScoreData(part: String, day: String, tel: String, var_array: Array[Double], score_array: Array[Double]) ...
-1 votes
1 answer
53 views

Question: I am working with Apache Flink (Flink SQL) to manage Hudi tables, and I noticed that Hudi supports multiple index types. According to the official documentation on Index Types in Hudi, these ...
1 vote
1 answer
148 views

When Pyspark is used to write data to the hudi table and the options content is as follows: hudi_options = { 'hoodie.datasource.write.keygenerator.class':'org.apache.hudi.keygen.ComplexKeyGenerator',...
1 vote
0 answers
38 views

We have data written to S3 in Hudi format with dt partition. Recently, we started receiving very large numbers for some columns stored as long datatype. These numbers exceeded the maximum limit of the ...
0 votes
1 answer
148 views

It's not really clear to me how does Hudi ensure efficient snapshot queries (see https://hudi.apache.org/docs/next/table_types/) What I see in the .hoodie folder is just a timeline consisting of lots ...
oceansize's user avatar
  • 731
0 votes
1 answer
169 views

Using EMR 7.2 spark-sql (default)> ALTER TABLE account RENAME TO accountinfo; 24/08/14 02:31:04 WARN SessionState: METASTORE_FILTER_HOOK will be ignored, since hive.security.authorization.manager ...

15 30 50 per page
1
2 3 4 5
...
13

AltStyle によって変換されたページ (->オリジナル) /