So with starrocks I have a pipe that looks for new files in my s3 bucket, now I would like to have this file in a temporary table which would get overwritten with the latest file that lands in my s3 bucket. But a starrocks pipe does not support INSERT OVERWRITE so how would one achieve something like this. My materialized view looks something like this.
CREATE MATERIALIZED VIEW mv_segmentation_manager REFRESH ASYNC AS
SELECT *
FROM segmentation_manager
WHERE segment_name NOT IN (SELECT DISTINCT segment_name FROM temp_segments)
UNION ALL
SELECT *
FROM temp_segments;
My pipe has the following configuration
CREATE PIPE segment_manager
PROPERTIES
(
"AUTO_INGEST" = "TRUE",
"POLL_INTERVAL" = "5"
)
AS
INSERT OVERWRITE temp_segments
SELECT * FROM FILES
(
"path" = "s3://my-bucket/*.snappy.parquet",
"format" = "parquet",
"aws.s3.region" = "region",
"aws.s3.use_instance_profile" = "true"
);
So basically what needs to happen is that my materialized view checks for the segment_name in the latest s3 file and excludes all the rows in the segmentation_manager table and unions the new s3 file this is just a upsert without checking. But currently I am inserting this s3 file into a temp_table which if not truncated will become very big depending on the amount of s3 files that land in the bucket so this should be empty whenever a s3 file lands in the bucket. Truncating on a schedule would not be good as this could overlap with an s3 file that lands in the bucket and potentially lose data. Any ideas how this could be achieved in starrocks? Looking forward to hearing you guys ideas!
-
I removed mysql tag as there is no reference in the question to mysql and mysql does not support the view creation syntax in the questionShadow– Shadow2025年01月14日 11:24:10 +00:00Commented Jan 14, 2025 at 11:24