Skip to main content
Stack Overflow
  1. About
  2. For Teams
Filter by
Sorted by
Tagged with
Advice
0 votes
1 replies
36 views

In my dataset, I noticed that the actual data type of a column differs from the expected data type.In this situation, should the data be type-cast during processing, or should such records be moved to ...
Best practices
0 votes
1 replies
35 views

Can anybody help with understanding the Archive and Reject folders in bronze layer at Medallion Architecture. Let say i have 4 folders in Bronze namely Raw, Stage, Archive and Reject. At what extent a ...
Best practices
0 votes
0 replies
61 views

Imagine there's a reporting tool for which users might have the permission 'Admin' or 'User'. We have a dimension in our models called admin_view and if the value is true then only users with Admin ...
Advice
4 votes
1 replies
77 views

Hi I have been interested lately in learning iceberg. There is something was not able to get so I thought I would ask here. I really wanna know why is Apache parquet the native file format used when ...
Advice
0 votes
4 replies
97 views

I’m attempting high-volume bulk inserts into Azure SQL, but the performance is lower than expected. One known factor is the Max Log Rate (MiB/s) limit, which depends on the service tier (see Microsoft’...
Best practices
0 votes
5 replies
96 views

I have been working as a Data Engineer and got this issue. I came across a use case where I have a view(lets name it as inputView) which is created by reading data from some source. Now somewhere ...
1 vote
0 answers
84 views

Is there any solution to use Python to extract BigQuery Dataform metadata of something else to get dependencies/dependents of each action in repository? The purpose is that I want to collect the ...
Best practices
0 votes
0 replies
36 views

How would this relational schema be drawn as an ERD? My attempt is shown above, though it is incorrect. I do not understand why. Here is the relational schema: CREATE TABLE student ( name TEXT, ...
-4 votes
1 answer
81 views

I’m trying to programmatically add a new database stage in parallel to an existing DataStage job by modifying its exported XML. I export the job from DataStage Designer, modify the XML via a Python ...
1 vote
1 answer
86 views

Our old ERP system generates orphaned HTML reports with the following format which I import into Pandas Work Order Item Type Material Labor 0 552603 Budget 71119 4567 1 552603 ...
0 votes
0 answers
65 views

I work with clinical data at a company that, until I arrived, didn't have a data policy. Currently, raw data extraction relies solely on manually downloading CSV/Excel files from an internal portal ...
0 votes
0 answers
89 views

I'm trying Apache Airflow for the first time and built a simple ETL. But after loading the data and proceeding to the transform phase, it throws an error because it says pyarrow was not found. Im ...
0 votes
0 answers
315 views

Just wanted to flag a frustrating issue I've run into with dbt snapshots that seems to be a regression. Maybe get your ideas for work arounds? TL;DR: If you use the check strategy with check_cols: all,...
-1 votes
1 answer
112 views

I'm fairly new to Azure Data Factory and need help with a pipeline I'm building. My goal is to read data from a CSV file stored in an Amazon S3 bucket, filter out records where the Status column is '...
0 votes
1 answer
274 views

In my current dbt project, each time a dbt model is run, a new container is created, and run the command dbt run --select <model name>. So, each time it runs, the whole dbt project needs to ...

15 30 50 per page
1
2 3 4 5

AltStyle によって変換されたページ (->オリジナル) /