Generate SQL UPDATE from Excel CSV file

Question 1

I must write updates in a database from a flat file (CSV). I want to do that in the shell, with tools such as AWK.

#!/bin/bash
cat in.csv | sed -e '1d' | awk -F';' -v q=\' '{ # For each line.
 print "DECLARE @v_trmID varchar(16) = " q 1ドル q
 print "DECLARE @v_trmNom varchar(6) = " q 3ドル q
 print "DECLARE @v_trmNbrTrav smallint = " 4ドル
 print "IF EXISTS (SELECT 1 FROM trimestre WHERE trmID = @v_trmID AND trmNom = @v_trmNom)"
 print " BEGIN"
 print " UPDATE trimestre"
 print " SET trmNbrTrav = @v_trmNbrTrav"
 print " WHERE trmID = @v_trmID AND trmNom = @v_trmNom"
 print " END"
 print "ELSE"
 print " BEGIN"
 print " PRINT " q "The script execution FAILED for record " NR " (pfiID " q " + @v_trmID + " q ", trimestre " q " + @v_trmNom + " q ")." q
 print " END"
 print "go"
 print ""
}'

Though, there a 2 things I don't like:

The way quotes are inserted; it becomes really difficult to follow, even if I choose for the simpler way I found to write quotes inside an AWK string (instead of multiple escape sequences). Still, not that readable.
The fact that every SQL line is not readable as is. Code is not highlighted as SQL. I'd like to find a "here doc" solution, where I wouldn't have to prefix lines with printf.

Do you have any pieces of advice or better ideas on the way to write robust (because more readable / easily modifiable) code?

Question 2

Why involve bash, cat, sed, and awk, when the whole solution could be done in awk alone? Not only does it simplify the execution, it also reduces your quoting headaches.

In addition, I recommend dropping the v_ Hungarian prefix.

Here, I've used printf instead of print, but it's your choice.

#!/usr/bin/awk -f
BEGIN { FS = ";" }
NR > 1 { # Skip the header row
 trmID = 1ドル;
 trmNom = 3ドル;
 trmNbrTrav = 4ドル;
 printf "IF EXISTS (SELECT 1 FROM trimestre WHERE trmID = '%s' AND trmNom = '%s')\n", trmID, trmNom;
 print " BEGIN"
 printf " UPDATE trimestre SET trmNbrTrav = %d WHERE trmID = '%s' AND trmNom = '%s'\n", trmNbrTrav, trmID, trmNom;
 print " END"
 print "ELSE"
 print " BEGIN"
 printf " PRINT 'The script execution FAILED for record %d (pfiID ''%s'', trimestre ''%s'').'\n", NR - 1, trmID, trmNom;
 print " END"
 print "go"
}

Note that any awk-based technique is susceptible to SQL injection. Presumably your CSV data isn't hostile.

Personally, I would do it a bit differently, such that the entire operation is atomic:

Create a temporary table.

CREATE TABLE #csv_upload
( trmID VARCHAR(16)
, trmNom VARCHAR(6)
, trmNbrTrav SMALLINT
);

Copy all of the CSV data to the temporary table. You can use awk to generate INSERT statements or do a BULK INSERT #csv_upload FROM 'filename.csv' WITH ( FIELDTERMINATOR = ';', FIRSTROW = 2 ).
Do a JOIN query to verify that every row in the temporary table corresponds to a row in the trimestre table. If not, then report an error before anything gets modified in the trimestre table.
Perform one single UPDATE for the entire batch, preferably within a transaction.

Question 3

This code is the most simpler to read... And thanks a lot for your new way to handle the problem at hand! Did not know about BULK INSERT...

Question 4

You can turn this into "here doc" solution by using wrapping it in a process substitution <(...), like this:

awk -f <(cat << "EOF"
{
 ... // awk script, as if in a file
}
EOF
)

This way, you can write ' directly, as if in a .awk script file, because there ' is no longer an enclosing character.

Note that cat << "EOF" is necessary instead of simply cat << EOF to avoid variable expansion of 1ドル, 3ドル and 4ドル.

Also, to avoid multiple print statements, you will have to embed newlines with \n, and end lines with \ to continue the same print statement on the next line.

Finally, you don't need the sed -e 1d, because awk can do this alone, using a NR > 1 filter. However, to account for the extra line in the input, whenever you use NR in the awk script, you would have to change that to NR - 1.

Putting the above together, this script is equivalent to yours (produces same output):

awk -F';' -f <(cat << "EOF"
NR > 1 {
print "\
DECLARE @v_trmPfiID_fk varchar(16) = '"1ドル"'\n\
DECLARE @v_trmNom varchar(6) = '"3ドル"'\n\
DECLARE @v_trmNbrTrav smallint = "4ドル"\n\
IF EXISTS (SELECT 1 FROM trimestre WHERE trmPfiID_fk = @v_trmPfiID_fk AND trmNom = @v_trmNom)\n\
 BEGIN\n\
 UPDATE trimestre\n\
 SET trmNbrTrav = @v_trmNbrTrav\n\
 WHERE trmPfiID_fk = @v_trmPfiID_fk AND trmNom = @v_trmNom\n\
 END\n\
ELSE\n\
 BEGIN\n\
 PRINT 'The script execution FAILED for record " (NR - 1) " (pfiID ' + @v_trmPfiID_fk + ', trimestre ' + @v_trmNom + ').'\n\
 END\n\
go\n\
"
}
EOF
) < in.csv

200_success 200_success 145k22 gold badges190 silver badges478 bronze badges · Accepted Answer · 2016-09-30 23:06:12Z

Why involve bash, cat, sed, and awk, when the whole solution could be done in awk alone? Not only does it simplify the execution, it also reduces your quoting headaches.

In addition, I recommend dropping the v_ Hungarian prefix.

Here, I've used printf instead of print, but it's your choice.

#!/usr/bin/awk -f
BEGIN { FS = ";" }
NR > 1 { # Skip the header row
 trmID = 1ドル;
 trmNom = 3ドル;
 trmNbrTrav = 4ドル;
 printf "IF EXISTS (SELECT 1 FROM trimestre WHERE trmID = '%s' AND trmNom = '%s')\n", trmID, trmNom;
 print " BEGIN"
 printf " UPDATE trimestre SET trmNbrTrav = %d WHERE trmID = '%s' AND trmNom = '%s'\n", trmNbrTrav, trmID, trmNom;
 print " END"
 print "ELSE"
 print " BEGIN"
 printf " PRINT 'The script execution FAILED for record %d (pfiID ''%s'', trimestre ''%s'').'\n", NR - 1, trmID, trmNom;
 print " END"
 print "go"
}

Note that any awk-based technique is susceptible to SQL injection. Presumably your CSV data isn't hostile.

Personally, I would do it a bit differently, such that the entire operation is atomic:

Create a temporary table.

CREATE TABLE #csv_upload
( trmID VARCHAR(16)
, trmNom VARCHAR(6)
, trmNbrTrav SMALLINT
);

Copy all of the CSV data to the temporary table. You can use awk to generate INSERT statements or do a BULK INSERT #csv_upload FROM 'filename.csv' WITH ( FIELDTERMINATOR = ';', FIRSTROW = 2 ).
Do a JOIN query to verify that every row in the temporary table corresponds to a row in the trimestre table. If not, then report an error before anything gets modified in the trimestre table.
Perform one single UPDATE for the entire batch, preferably within a transaction.

This code is the most simpler to read... And thanks a lot for your new way to handle the problem at hand! Did not know about BULK INSERT...

Stack Exchange Network

Generate SQL UPDATE from Excel CSV file

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Generate SQL UPDATE from Excel CSV file

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions