159 questions
- Bountied 0
- Unanswered
- Frequent
- Score
- Trending
- Week
- Month
- Unanswered (my tags)
0
votes
0
answers
85
views
How to merge small parquet files in Hudi into larger files
I use Spark+ Hudi to write data into S3. I was writing data in bulk_insert mode, which cause there be many small paruqet files in Hudi table.
Then I try to schedule clustering on the Hudi table:
...
0
votes
1
answer
261
views
Using Fabric CLI import files into workspace
I am trying to import DataPipeline, Notebook and Warehouse files into a remote fabric workspace: https://api.fabric.microsoft.com/.default
What I have tried:
Connected to the workspace:
fab auth ...
0
votes
0
answers
73
views
How to load CSV data from a Data Lake into a Nessie table
I’m currently using the following stack: MinIO + Apache Iceberg + Project Nessie + Dremio. In My MinIO I have two buckets one is datalake bucket and another is warehouse.
Here is image:
In MinIO, I ...
1
vote
1
answer
122
views
Does auto compaction break z-ordering? [closed]
Does auto compaction break existing z-ordered tables in delta lake?
0
votes
1
answer
53
views
copyLamdaFunction creating is failing when i try to deploy Data Lake in AWS
Iam trying to deploy data lake on AWS using the source : https://aws-ia.github.io/cfn-ps-datalake-foundation/ but iam getting error.
Data-lake-foundation-DataLakeFoundationStack-IS67G4LRJQIU-...
-2
votes
1
answer
167
views
Database Management: Where to store old data outside the database [closed]
I'm wondering what is the most efficient way to store older data that I don't need to access outside of the database.
Context: I'm managing billions of rows of data in a single relational database ...
0
votes
0
answers
100
views
Break dependency from DMS reload/restart
We use AWS DMS to replicate (initial load + cdc) AWS Aurora MySQL data into our Redshift (in the future we are going to use ICEBERG). In case there is any issue with DMS (RI break, Task break, MySQL ...
1
vote
0
answers
164
views
Transactional consistency in Delta lake
Does delta lake provide transactional level consistency? I know delta lake provide optimistic concurrency control for 2 concurrent operation, but I am talking about two concurrent transaction, not ...
2
votes
0
answers
259
views
Apache Iceberg: Is it possible to manually set snapshot time during historical load
We have a number of datasets we would like to move to iceberg that already have a historical or audit component to them. In each case we either capture changes in the same table or in an audit table (...
-2
votes
1
answer
2k
views
"Insufficient number of drives online" error when running distributed MinIO in virtual machines [closed]
I'm having trouble starting the MinIO service in a distributed setting across 2 virtual machines (not Docker). I'm encountering a Error: Read failure. Insufficient number of drives online. Waiting for ...
0
votes
1
answer
114
views
12M Rolling Active Customers
I have a order table with order_date and customer_id fields and for each date in 2022 and beyond I want to compute the 12 month rolling distinct active customers using SQL.
I've tried
SELECT
...
0
votes
2
answers
1k
views
Lake Formation Cross-Account Share, Issues seeing database in external account
I have shared a database called ingest between account A (Source) and account B (Target). Once shared I went on Resource Access Manager on account B and accepted the share request. I can now see the ...
1
vote
1
answer
780
views
Scala Spark Iceberg writeStream. How to set bucket?
I'm trying to write data to Iceberg table in Spark streaming (written in Scala).
Writer code:
val streamResult = joined.writeStream
.format("iceberg")
.partitionBy("...
0
votes
1
answer
282
views
Does Azure Synapse Analytics Database designer do not support Delta format
According to Doc
Currently, Delta format support for lake databases is not supported in
Synapse Studio.
Does it means, at present database designer do not support the Delta format to visually create ...
0
votes
1
answer
29
views
How to set a numeric value in the source_mappings.json file in a AWS SDLF pipeline?
There is a framework used to ingest files into a DataLake in AWS S3, the name is Serverless DataLake Framework aka SDLF, some configuration is needed to move a file through many stages in the S3 ...