Skip to main content
Stack Overflow
  1. About
  2. For Teams
Filter by
Sorted by
Tagged with
0 votes
0 answers
85 views

I use Spark+ Hudi to write data into S3. I was writing data in bulk_insert mode, which cause there be many small paruqet files in Hudi table. Then I try to schedule clustering on the Hudi table: ...
0 votes
1 answer
261 views

I am trying to import DataPipeline, Notebook and Warehouse files into a remote fabric workspace: https://api.fabric.microsoft.com/.default What I have tried: Connected to the workspace: fab auth ...
0 votes
0 answers
73 views

I’m currently using the following stack: MinIO + Apache Iceberg + Project Nessie + Dremio. In My MinIO I have two buckets one is datalake bucket and another is warehouse. Here is image: In MinIO, I ...
1 vote
1 answer
122 views

Does auto compaction break existing z-ordered tables in delta lake?
0 votes
1 answer
53 views

Iam trying to deploy data lake on AWS using the source : https://aws-ia.github.io/cfn-ps-datalake-foundation/ but iam getting error. Data-lake-foundation-DataLakeFoundationStack-IS67G4LRJQIU-...
-2 votes
1 answer
167 views

I'm wondering what is the most efficient way to store older data that I don't need to access outside of the database. Context: I'm managing billions of rows of data in a single relational database ...
0 votes
0 answers
100 views

We use AWS DMS to replicate (initial load + cdc) AWS Aurora MySQL data into our Redshift (in the future we are going to use ICEBERG). In case there is any issue with DMS (RI break, Task break, MySQL ...
1 vote
0 answers
164 views

Does delta lake provide transactional level consistency? I know delta lake provide optimistic concurrency control for 2 concurrent operation, but I am talking about two concurrent transaction, not ...
2 votes
0 answers
259 views

We have a number of datasets we would like to move to iceberg that already have a historical or audit component to them. In each case we either capture changes in the same table or in an audit table (...
-2 votes
1 answer
2k views

I'm having trouble starting the MinIO service in a distributed setting across 2 virtual machines (not Docker). I'm encountering a Error: Read failure. Insufficient number of drives online. Waiting for ...
0 votes
1 answer
114 views

I have a order table with order_date and customer_id fields and for each date in 2022 and beyond I want to compute the 12 month rolling distinct active customers using SQL. I've tried SELECT ...
0 votes
2 answers
1k views

I have shared a database called ingest between account A (Source) and account B (Target). Once shared I went on Resource Access Manager on account B and accepted the share request. I can now see the ...
1 vote
1 answer
780 views

I'm trying to write data to Iceberg table in Spark streaming (written in Scala). Writer code: val streamResult = joined.writeStream .format("iceberg") .partitionBy("...
0 votes
1 answer
282 views

According to Doc Currently, Delta format support for lake databases is not supported in Synapse Studio. Does it means, at present database designer do not support the Delta format to visually create ...
0 votes
1 answer
29 views

There is a framework used to ingest files into a DataLake in AWS S3, the name is Serverless DataLake Framework aka SDLF, some configuration is needed to move a file through many stages in the S3 ...

15 30 50 per page
1
2 3 4 5
...
11

AltStyle によって変換されたページ (->オリジナル) /