Newest 'data-lake' Questions

1. Home
2. Questions
3. AI Assist
4. Tags
5. Challenges
6. Chat
7. Articles
8. Users
9. Companies
11. Communities for your favorite technologies. Explore all Collectives
Stack Internal

Stack Overflow for Teams is now called Stack Internal. Bring the best of human thought and AI automation together at your work.
Try for free Learn more
Bring the best of human thought and AI automation together at your work. Learn more

159 questions

0 votes

0 answers

85 views

How to merge small parquet files in Hudi into larger files

I use Spark+ Hudi to write data into S3. I was writing data in bulk_insert mode, which cause there be many small paruqet files in Hudi table. Then I try to schedule clustering on the Hudi table: ...

Rinze's user avatar

Rinze

asked Sep 17, 2025 at 10:35

0 votes

1 answer

261 views

Using Fabric CLI import files into workspace

I am trying to import DataPipeline, Notebook and Warehouse files into a remote fabric workspace: https://api.fabric.microsoft.com/.default What I have tried: Connected to the workspace: fab auth ...

Igbe Chukwudi's user avatar

Igbe Chukwudi

asked Jun 13, 2025 at 3:37

0 votes

0 answers

73 views

How to load CSV data from a Data Lake into a Nessie table

I’m currently using the following stack: MinIO + Apache Iceberg + Project Nessie + Dremio. In My MinIO I have two buckets one is datalake bucket and another is warehouse. Here is image: In MinIO, I ...

Enamul Haque's user avatar

Enamul Haque

5,095

asked May 9, 2025 at 19:03

1 vote

1 answer

122 views

Does auto compaction break z-ordering? [closed]

Does auto compaction break existing z-ordered tables in delta lake?

Ryan Byoun's user avatar

Ryan Byoun

asked Apr 2, 2025 at 7:53

0 votes

1 answer

53 views

copyLamdaFunction creating is failing when i try to deploy Data Lake in AWS

Iam trying to deploy data lake on AWS using the source : https://aws-ia.github.io/cfn-ps-datalake-foundation/ but iam getting error. Data-lake-foundation-DataLakeFoundationStack-IS67G4LRJQIU-...

Nikhil Chittimalla's user avatar

Nikhil Chittimalla

asked Sep 25, 2024 at 12:57

-2 votes

1 answer

167 views

Database Management: Where to store old data outside the database [closed]

I'm wondering what is the most efficient way to store older data that I don't need to access outside of the database. Context: I'm managing billions of rows of data in a single relational database ...

Mister Milk's user avatar

Mister Milk

asked Aug 12, 2024 at 15:12

0 votes

0 answers

100 views

Break dependency from DMS reload/restart

We use AWS DMS to replicate (initial load + cdc) AWS Aurora MySQL data into our Redshift (in the future we are going to use ICEBERG). In case there is any issue with DMS (RI break, Task break, MySQL ...

Nir's user avatar

Nir

2,677

asked Jun 19, 2024 at 6:18

1 vote

0 answers

164 views

Transactional consistency in Delta lake

Does delta lake provide transactional level consistency? I know delta lake provide optimistic concurrency control for 2 concurrent operation, but I am talking about two concurrent transaction, not ...

Shailendra Kirtikar's user avatar

Shailendra Kirtikar

asked Apr 9, 2024 at 11:14

2 votes

0 answers

259 views

Apache Iceberg: Is it possible to manually set snapshot time during historical load

We have a number of datasets we would like to move to iceberg that already have a historical or audit component to them. In each case we either capture changes in the same table or in an audit table (...

Scott J.'s user avatar

Scott J.

asked Mar 6, 2024 at 22:27

-2 votes

1 answer

2k views

"Insufficient number of drives online" error when running distributed MinIO in virtual machines [closed]

I'm having trouble starting the MinIO service in a distributed setting across 2 virtual machines (not Docker). I'm encountering a Error: Read failure. Insufficient number of drives online. Waiting for ...

Minimartzz's user avatar

Minimartzz

asked Jan 23, 2024 at 10:39

0 votes

1 answer

114 views

12M Rolling Active Customers

I have a order table with order_date and customer_id fields and for each date in 2022 and beyond I want to compute the 12 month rolling distinct active customers using SQL. I've tried SELECT ...

Danyal Imran's user avatar

Danyal Imran

2,605

asked Oct 8, 2023 at 11:34

0 votes

2 answers

1k views

Lake Formation Cross-Account Share, Issues seeing database in external account

I have shared a database called ingest between account A (Source) and account B (Target). Once shared I went on Resource Access Manager on account B and accepted the share request. I can now see the ...

techrs's user avatar

techrs

asked Oct 7, 2023 at 21:47

1 vote

1 answer

780 views

Scala Spark Iceberg writeStream. How to set bucket?

I'm trying to write data to Iceberg table in Spark streaming (written in Scala). Writer code: val streamResult = joined.writeStream .format("iceberg") .partitionBy("...

Netrunner's user avatar

Netrunner

asked Sep 25, 2023 at 7:51

0 votes

1 answer

282 views

Does Azure Synapse Analytics Database designer do not support Delta format

According to Doc Currently, Delta format support for lake databases is not supported in Synapse Studio. Does it means, at present database designer do not support the Delta format to visually create ...

Tony's user avatar

Tony

asked Sep 9, 2023 at 16:03

0 votes

1 answer

29 views

How to set a numeric value in the source_mappings.json file in a AWS SDLF pipeline?

There is a framework used to ingest files into a DataLake in AWS S3, the name is Serverless DataLake Framework aka SDLF, some configuration is needed to move a file through many stages in the S3 ...

Artemination's user avatar

Artemination

asked Jul 13, 2023 at 22:18

15 30 50 per page

2 3 4 5

...

11 Next

CollectivesTM on Stack Overflow

How to merge small parquet files in Hudi into larger files

Using Fabric CLI import files into workspace

How to load CSV data from a Data Lake into a Nessie table

Does auto compaction break z-ordering? [closed]

copyLamdaFunction creating is failing when i try to deploy Data Lake in AWS

Database Management: Where to store old data outside the database [closed]

Break dependency from DMS reload/restart

Transactional consistency in Delta lake

Apache Iceberg: Is it possible to manually set snapshot time during historical load

"Insufficient number of drives online" error when running distributed MinIO in virtual machines [closed]

12M Rolling Active Customers

Lake Formation Cross-Account Share, Issues seeing database in external account

Scala Spark Iceberg writeStream. How to set bucket?

Does Azure Synapse Analytics Database designer do not support Delta format

How to set a numeric value in the source_mappings.json file in a AWS SDLF pipeline?

Hot Network Questions