Newest 'etl' Questions

Stack Overflow

1. Home
2. Questions
3. AI Assist
4. Tags
5. Challenges
6. Chat
7. Articles
8. Users
9. Companies
11. Communities for your favorite technologies. Explore all Collectives
Stack Internal

Stack Overflow for Teams is now called Stack Internal. Bring the best of human thought and AI automation together at your work.
Try for free Learn more
Bring the best of human thought and AI automation together at your work. Learn more

5,958 questions

Newest Active Bountied Unanswered

Best practices

1 vote

2 replies

110 views

T-SQL ETL update evaluation recommendation: What is the most elegant & performative way to evaluate a source value to update a target value?

What recommendations might be offered for the most elegant and performative way with T-SQL to evaluate whether a source value should update a target value, as part of an ETL update process in which ...

504more's user avatar

504more

asked Dec 27, 2025 at 18:05

1 vote

1 answer

66 views

AWS Glue PySpark job taking 4 hours to process small JSON files from S3

I have an AWS Glue job that processes thousands of small JSON files from S3 (historical data load for Adobe Experience Platform). The job is taking approximately 4 hours to complete, which is ...

Jayron Soares's user avatar

Jayron Soares

asked Dec 20, 2025 at 12:08

Best practices

0 votes

0 replies

45 views

How can I set up an ETL process for DataFrames (dbt or alternative tools)?

Question: I'm currently working on a dashboard prototype and storing data from a 22-page PDF document as 22 separate DataFrames. These DataFrames should undergo an ETL process (especially data type ...

Ricca's user avatar

Ricca

asked Nov 27, 2025 at 21:37

0 votes

1 answer

87 views

Why are my data quality validation rules not triggering for null values in my dataset?

I’m working on a data quality workflow where I validate incoming records for null or missing values. Even when a column clearly contains nulls, my rule doesn’t trigger and the record passes validation....

Neha _DQ's user avatar

Neha _DQ

asked Nov 17, 2025 at 12:55

0 votes

1 answer

61 views

DataStage XML export modified via Python — new stage not appearing after re-import

I’m working with IBM InfoSphere DataStage 11.7. I exported several jobs as XML files using istool export. Then, using a Python script, I modified the XML to add another database stage in parallel to ...

techguy11's user avatar

techguy11

asked Oct 27, 2025 at 5:19

1 vote

1 answer

69 views

Why aren’t my changes reflected after modifying and reimporting an IBM DataStage job XML export?

I’m trying to programmatically modify IBM DataStage jobs to add a new database connector stage in parallel to an existing Database stage. Here’s my workflow: Export a job from DataStage Designer as ...

techguy11's user avatar

techguy11

asked Oct 26, 2025 at 18:52

2 votes

0 answers

102 views

Using Prefect with FastAPI is still displaying old logs

I tried using Prefect with FastAPI project. Then when I updated logs and redeployed the repo as well as Prefect deployments and flows. It runs and displays the logs (Basically , Prefect is still ...

Needa's user avatar

Needa

asked Oct 25, 2025 at 11:38

-4 votes

1 answer

83 views

Programmatically modifying IBM DataStage job XML – changes not reflected after reimport [closed]

I’m trying to programmatically add a new database stage in parallel to an existing DataStage job by modifying its exported XML. I export the job from DataStage Designer, modify the XML via a Python ...

DataEngineer03's user avatar

DataEngineer03

asked Oct 24, 2025 at 11:29

0 votes

0 answers

52 views

How to use data pre-computed in previous ETL SSIS Nodes?

I'm building ETL packages in SSIS. My data comes from an OLE DB Source that calls a stored procedure in SQL Server. I want to add a new Lookup (or a similar transformation) that uses some of the input ...

sabotage's user avatar

sabotage

asked Sep 27, 2025 at 12:07

0 votes

0 answers

201 views

Unable to start worker on prefect - httpx.connecterror: all connection attempts failed

I have started prefect server on Remote Desktop using prefect server start —-host 0.0.0.0 —-port 8080 After this I am able to access the UI from different computers present on this network. I create a ...

Anzar's user avatar

Anzar

asked Sep 11, 2025 at 15:09

1 vote

2 answers

185 views

Power Query – Cancel last N positives based on N negatives

I have a table in Power Query like this: PO - Purchase Order SID - Ship ID QTY - Quantity PO SID QTY 1001 A001 2000 1001 A001 2000 1001 A001 -2000 (This line cancel the previous one) 1002 A002 3000 ...

Stephane Ducci's user avatar

Stephane Ducci

asked Sep 3, 2025 at 18:39

0 votes

1 answer

149 views

Can't connect to Ollama hosted locally from python script

I am building ETL using LLM to extract some information. I have ollama installed locally. I am on Macbook M4 Max. I don't understand why I have this error from my worker. ads-worker-1 | 2025年08月28日 15:...

Mael Fosso's user avatar

Mael Fosso

asked Aug 28, 2025 at 15:24

0 votes

0 answers

101 views

Apache Flink FileSink compaction extremely slow with many hot buckets/paths

I have a Flink ETL job that reads from ~13 Kafka topics and writes data into HDFS using a FileSink with compaction enabled. Right now, we have around 40 different output paths (buckets), and roughly ...

Hello's user avatar

Hello

asked Aug 26, 2025 at 19:41

0 votes

0 answers

56 views

Error loading data: 'Engine' object has no attribute 'cursor': chan="stdout": source="task"

I am trying to run a batch process using Apache Airflow. The Extract and Transform stages work very fine but the load stages is giving an error. Here is my code: from airflow.decorators import dag, ...

Nwaogu Eziuche's user avatar

Nwaogu Eziuche

asked Aug 17, 2025 at 2:06

0 votes

0 answers

94 views

Airflow ModuleNotFoundError: No module named 'pyarrow'

I'm trying Apache Airflow for the first time and built a simple ETL. But after loading the data and proceeding to the transform phase, it throws an error because it says pyarrow was not found. Im ...

Enzo Martins's user avatar

Enzo Martins

asked Aug 12, 2025 at 6:14

15 30 50 per page

2 3 4 5

...

398 Next

CollectivesTM on Stack Overflow

T-SQL ETL update evaluation recommendation: What is the most elegant & performative way to evaluate a source value to update a target value?

AWS Glue PySpark job taking 4 hours to process small JSON files from S3

How can I set up an ETL process for DataFrames (dbt or alternative tools)?

Why are my data quality validation rules not triggering for null values in my dataset?

DataStage XML export modified via Python — new stage not appearing after re-import

Why aren’t my changes reflected after modifying and reimporting an IBM DataStage job XML export?

Using Prefect with FastAPI is still displaying old logs

Programmatically modifying IBM DataStage job XML – changes not reflected after reimport [closed]

How to use data pre-computed in previous ETL SSIS Nodes?

Unable to start worker on prefect - httpx.connecterror: all connection attempts failed

Power Query – Cancel last N positives based on N negatives

Can't connect to Ollama hosted locally from python script

Apache Flink FileSink compaction extremely slow with many hot buckets/paths

Error loading data: 'Engine' object has no attribute 'cursor': chan="stdout": source="task"

Airflow ModuleNotFoundError: No module named 'pyarrow'

Hot Network Questions