8,467 questions
- Bountied 0
- Unanswered
- Frequent
- Score
- Trending
- Week
- Month
- Unanswered (my tags)
0
votes
0
answers
36
views
Erreur OLE DB ou ODBC: [DataSource.Error] ERROR [HY000] [Microsoft][Hardy] (35)
I have this error when i'm trying to pass my databricks dataset from directquery to import. Do you know where this cam from ?
Error Message
I'm trying to modify the data sources in order to be able to ...
0
votes
1
answer
19
views
Databricks Notebook debugger says I'm not attached to a cluster, but I am
Environment
Azure-Databricks
Cluster version 13.3 (see small screenshot)
A cell in a Notebook with some breakpoints.
Problem
When I want to debug that cell ("Debug Cell" option), that ...
Advice
0
votes
0
replies
62
views
HOW TO: [DBT Local] Elementary Anomaly Tests Custom Thresholds
Elementary is a tool that focus on data quality for dbt models. One of the data quality test it has is -- anomaly detection https://docs.elementary-data.com/data-tests/how-anomaly-detection-works
I am ...
Advice
0
votes
1
replies
36
views
what are validation checks that are made inorder to push a data to reject folder in medallion architecture?
In my dataset, I noticed that the actual data type of a column differs from the expected data type.In this situation, should the data be type-cast during processing, or should such records be moved to ...
Best practices
0
votes
1
replies
35
views
When should data go to Archive vs Reject in Bronze layer (Medallion Architecture)?
Can anybody help with understanding the Archive and Reject folders in bronze layer at Medallion Architecture. Let say i have 4 folders in Bronze namely Raw, Stage, Archive and Reject. At what extent a ...
0
votes
0
answers
67
views
How to stop Databricks retaining widget selection between runs?
I have a Python notebook in Databricks. Within it I have a multiselect widget, which is defined like this:
widget_values = spark.sql(f'''
SELECT my_column
FROM my_table
GROUP BY ...
0
votes
0
answers
68
views
Spark (Databricks) fails to read SPSS .sav files extracted from ZIP
I’m reading various file types in Databricks using Spark — including PDF, DOCX, PPTX, XLSX, and CSV.
Some inputs are ZIP archives that contain multiple files, including SPSS .sav files.
My workflow is:...
0
votes
0
answers
66
views
Databricks pipeline fails on execute python script for expectations with error: Update FAILES; _UNCLASSIFIED_PYTHON_COMMAND_ERROR
I'm working on a databricks pipelibe and trying to create and apply expectations on a pipeline. I have the code but I keep getting an error that I cannot resolve.There is not much to go on, but I keep ...
1
vote
0
answers
91
views
php odbc connection to databricks returns junk bytes
I'm using the SimbaSparkODBC driver provided by DataBricks on Windows to connect to a DataBricks instance which is running in Azure.
Most of the sqls are running fine, but sometimes the result ...
Advice
0
votes
4
replies
78
views
Use RSA key snowflake connection options instead of Password
I want to connect to a Snowflake database from the Data Bricks notebook. I have an RSA key(.pem file) and I don't want to use a traditional method like username and password as it is not as secure as ...
0
votes
1
answer
110
views
Does Databricks Spark SQL evaluate all CASE branches for UDFs?
I'm using Databricks SQL and have SQL UDFs for GeoIP / ISP lookups.
Each UDF branches on IPv4 vs IPv6 using a CASE expression like:
CASE
WHEN ip_address LIKE '%:%:%' THEN -- IPv6 path
...
...
-1
votes
0
answers
53
views
PySpark - Azure Databricks Writing from Dataframe to saveAsTable string values are different from dataframe to table?
My team is new to Pyspark and we are specifically using Azure Databricks. We have a piece of code where we are essentially
Displaying a dataframe
Saving it to a table
Displaying the output of the ...
Advice
0
votes
1
replies
58
views
Databricks - How to restrict the values permitted in a job or task parameter?
I have a notebook with a parameter set within it via a widget, like this:
dbutils.widgets.dropdown("My widget", "A", ["A", "B", "C"])
my_variable = ...
0
votes
0
answers
33
views
Databricks Federated Token Exchange (GCP →Databricks)
I’m trying to implement federated authentication (token exchange) from Google Cloud → Databricks without using a client ID / client secret only using a Google-issued service account token. I have also ...
0
votes
0
answers
32
views
Databricks external table lagging behind source files
I have a databricks external table which is pointed at an S3 bucket which contains an ever-growing number of parquet files (currently around 2000 of them). Each row in the file is timestamped to ...