Add effective error handling to Python Notebook in Azure Synapse Analytics

Question 1

I have some Python code that runs as part of an Azure Synapse Analytics Apache Spark Notebook (or Synapse Notebook) and would like to add effective error handling. The code simply executes a given SQL script against the database. The code runs but I sometimes see errors like attempt to use closed connection. I would like to do the following:

Improve code that I wrote through peer review
~~(削除) Can I improve the error handling? eg pseudo-code if connection still open close connection (削除ここまで)~~
(削除) The code using SQL auth works. I would like to authenticate as the Managed Identity, I've tried using the object id of the MI in the connection string with Authentication=ActiveDirectoryMsi but it didn't work (削除ここまで)

Cell1 - parameters

pAccountName = 'someStorageAccount'
pContainerName = 'someContainer'
pRelativePath = '/raw/sql/some_sql_files/'
pFileName = 'someSQLscript.sql'

Cell 2 - main

import pyodbc
from pyspark import SparkFiles
try:
 # Add the file to the cluster so we can view it
 sqlFilepath = f"""abfss://{pContainerName}""" + "@" + f"""{pAccountName}.dfs.core.windows.net{pRelativePath}{pFileName}"""
 sc.addFile(sqlFilepath, False)
 # Open the file for reading
 with open(SparkFiles.getRootDirectory() + f'/{pFileName}', 'r') as f:
 lines = f.read() ## read all text from a file into a string
 # Open the database connection
 conn = pyodbc.connect( 'DRIVER={ODBC Driver 17 for SQL Server};'
 'SERVER=someServer.sql.azuresynapse.net;'
 'DATABASE=someDatabase;UID=batchOnlyUser;'
 'PWD=youWish;', autocommit = True )
 # Split the script into batches separated by "GO"
 for batch in lines.split("GO"):
 conn.execute(batch) # execute the SQL statement
except:
 raise
finally:
 # Tidy up
 conn.close()

Question 2

Can you show an example of what someSQLscript contains?

Question 3

It could be any sql script really, so anything from just select @@version to big create table, insert statements etc

Question 4

Welcome to the Code Review Community. We can review the code and make suggestions on how to improve it. We can't help you write new code or debug the code. Please read How do I ask a good question?. At least 2 things are making the question off-topic, the first is that your valid concerns about security are making you use generic names. The second is that Authentication=ActiveDirectoryMsi doesn't work. We can only review working code.

Question 5

@pacmaninbw Per e.g. codereview.meta.stackexchange.com/a/696/25834 I don't see the generic names as a problem, and it's common practice to omit or change credentials; that on its own is not a reason for closure.

Question 6

Welcome to Code Review! The current question title, asks for code to be written, which is off-topic for Code Review. The site standard is for the title to simply state the task accomplished by the code. Please see How to Ask for examples, and revise the title accordingly.

Question 7

Ignoring your closed-connection and managed-identity questions, since (a) I don't know how to answer them and (b) on their own they're off topic, there is still review to be done:

A Pythonic way to tell the standard library to format your URI instead of you doing it yourself is to call into urlunparse
Consider using pathlib
connect accepts kwargs as an alternative to the conn string, and the former will lay out your connection parameters more nicely
except / raise is redundant and should be deleted
Your try starts too early and should only start after the connection has been established
You're not reading a lines list; you're reading content, which would be a better variable name

Suggested

from pathlib import Path
from urllib.parse import urlunparse, ParseResult
import pyodbc
from pyspark import SparkFiles
pAccountName = 'someStorageAccount'
pContainerName = 'someContainer'
pRelativePath = '/raw/sql/some_sql_files/'
pFileName = 'someSQLscript.sql'
# Add the file to the cluster so we can view it
sql_filename = urlunparse(
 ParseResult(
 scheme='abfss',
 netloc=f'{pContainerName}@{pAccountName}.dfs.core.windows.net',
 path=f'{pRelativePath}{pFileName}',
 params=None, query=None, fragment=None,
 )
)
sc.addFile(sql_filename, False)
# Open the file for reading
sql_file_path = Path(SparkFiles.getRootDirectory()) / pFileName
with sql_file_path.open() as f:
 content = f.read()
# Open the database connection
conn = pyodbc.connect(
 DRIVER='{ODBC Driver 17 for SQL Server}',
 SERVER='someServer.sql.azuresynapse.net',
 DATABASE='someDatabase',
 UID='batchOnlyUser',
 PWD='youWish',
 autocommit=True,
)
try:
 # Split the script into batches separated by "GO"
 for batch in content.split("GO"):
 conn.execute(batch) # execute the SQL statement
finally:
 conn.close()

Reinderien Reinderien 70.9k5 gold badges76 silver badges256 bronze badges · Accepted Answer · 2021-05-28 15:37:58Z

Ignoring your closed-connection and managed-identity questions, since (a) I don't know how to answer them and (b) on their own they're off topic, there is still review to be done:

A Pythonic way to tell the standard library to format your URI instead of you doing it yourself is to call into urlunparse
Consider using pathlib
connect accepts kwargs as an alternative to the conn string, and the former will lay out your connection parameters more nicely
except / raise is redundant and should be deleted
Your try starts too early and should only start after the connection has been established
You're not reading a lines list; you're reading content, which would be a better variable name

Suggested

from pathlib import Path
from urllib.parse import urlunparse, ParseResult
import pyodbc
from pyspark import SparkFiles
pAccountName = 'someStorageAccount'
pContainerName = 'someContainer'
pRelativePath = '/raw/sql/some_sql_files/'
pFileName = 'someSQLscript.sql'
# Add the file to the cluster so we can view it
sql_filename = urlunparse(
 ParseResult(
 scheme='abfss',
 netloc=f'{pContainerName}@{pAccountName}.dfs.core.windows.net',
 path=f'{pRelativePath}{pFileName}',
 params=None, query=None, fragment=None,
 )
)
sc.addFile(sql_filename, False)
# Open the file for reading
sql_file_path = Path(SparkFiles.getRootDirectory()) / pFileName
with sql_file_path.open() as f:
 content = f.read()
# Open the database connection
conn = pyodbc.connect(
 DRIVER='{ODBC Driver 17 for SQL Server}',
 SERVER='someServer.sql.azuresynapse.net',
 DATABASE='someDatabase',
 UID='batchOnlyUser',
 PWD='youWish',
 autocommit=True,
)
try:
 # Split the script into batches separated by "GO"
 for batch in content.split("GO"):
 conn.execute(batch) # execute the SQL statement
finally:
 conn.close()

Stack Exchange Network

Add effective error handling to Python Notebook in Azure Synapse Analytics

Cell1 - parameters

Cell 2 - main

1 Answer 1

Suggested

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Add effective error handling to Python Notebook in Azure Synapse Analytics

Cell1 - parameters

Cell 2 - main

1 Answer 1

Suggested

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions