I must write updates in a database from a flat file (CSV). I want to do that in the shell, with tools such as AWK.
#!/bin/bash
cat in.csv | sed -e '1d' | awk -F';' -v q=\' '{ # For each line.
print "DECLARE @v_trmID varchar(16) = " q 1ドル q
print "DECLARE @v_trmNom varchar(6) = " q 3ドル q
print "DECLARE @v_trmNbrTrav smallint = " 4ドル
print "IF EXISTS (SELECT 1 FROM trimestre WHERE trmID = @v_trmID AND trmNom = @v_trmNom)"
print " BEGIN"
print " UPDATE trimestre"
print " SET trmNbrTrav = @v_trmNbrTrav"
print " WHERE trmID = @v_trmID AND trmNom = @v_trmNom"
print " END"
print "ELSE"
print " BEGIN"
print " PRINT " q "The script execution FAILED for record " NR " (pfiID " q " + @v_trmID + " q ", trimestre " q " + @v_trmNom + " q ")." q
print " END"
print "go"
print ""
}'
Though, there a 2 things I don't like:
The way quotes are inserted; it becomes really difficult to follow, even if I choose for the simpler way I found to write quotes inside an AWK string (instead of multiple escape sequences). Still, not that readable.
The fact that every SQL line is not readable as is. Code is not highlighted as SQL. I'd like to find a "here doc" solution, where I wouldn't have to prefix lines with
printf
.
Do you have any pieces of advice or better ideas on the way to write robust (because more readable / easily modifiable) code?
2 Answers 2
Why involve bash
, cat
, sed
, and awk
, when the whole solution could be done in awk
alone? Not only does it simplify the execution, it also reduces your quoting headaches.
In addition, I recommend dropping the v_
Hungarian prefix.
Here, I've used printf
instead of print
, but it's your choice.
#!/usr/bin/awk -f
BEGIN { FS = ";" }
NR > 1 { # Skip the header row
trmID = 1ドル;
trmNom = 3ドル;
trmNbrTrav = 4ドル;
printf "IF EXISTS (SELECT 1 FROM trimestre WHERE trmID = '%s' AND trmNom = '%s')\n", trmID, trmNom;
print " BEGIN"
printf " UPDATE trimestre SET trmNbrTrav = %d WHERE trmID = '%s' AND trmNom = '%s'\n", trmNbrTrav, trmID, trmNom;
print " END"
print "ELSE"
print " BEGIN"
printf " PRINT 'The script execution FAILED for record %d (pfiID ''%s'', trimestre ''%s'').'\n", NR - 1, trmID, trmNom;
print " END"
print "go"
}
Note that any awk
-based technique is susceptible to SQL injection. Presumably your CSV data isn't hostile.
Personally, I would do it a bit differently, such that the entire operation is atomic:
Create a temporary table.
CREATE TABLE #csv_upload ( trmID VARCHAR(16) , trmNom VARCHAR(6) , trmNbrTrav SMALLINT );
Copy all of the CSV data to the temporary table. You can use
awk
to generateINSERT
statements or do aBULK INSERT #csv_upload FROM 'filename.csv' WITH ( FIELDTERMINATOR = ';', FIRSTROW = 2 )
.Do a
JOIN
query to verify that every row in the temporary table corresponds to a row in thetrimestre
table. If not, then report an error before anything gets modified in thetrimestre
table.Perform one single
UPDATE
for the entire batch, preferably within a transaction.
-
\$\begingroup\$ This code is the most simpler to read... And thanks a lot for your new way to handle the problem at hand! Did not know about BULK INSERT... \$\endgroup\$user3341592– user33415922016年10月04日 19:25:00 +00:00Commented Oct 4, 2016 at 19:25
You can turn this into "here doc" solution by using wrapping it in a process substitution <(...)
, like this:
awk -f <(cat << "EOF"
{
... // awk script, as if in a file
}
EOF
)
This way, you can write '
directly, as if in a .awk
script file,
because there '
is no longer an enclosing character.
Note that cat << "EOF"
is necessary instead of simply cat << EOF
to avoid variable expansion of 1ドル
, 3ドル
and 4ドル
.
Also, to avoid multiple print
statements,
you will have to embed newlines with \n
, and end lines with \
to continue the same print
statement on the next line.
Finally, you don't need the sed -e 1d
, because awk can do this alone, using a NR > 1
filter. However, to account for the extra line in the input, whenever you use NR
in the awk script, you would have to change that to NR - 1
.
Putting the above together, this script is equivalent to yours (produces same output):
awk -F';' -f <(cat << "EOF"
NR > 1 {
print "\
DECLARE @v_trmPfiID_fk varchar(16) = '"1ドル"'\n\
DECLARE @v_trmNom varchar(6) = '"3ドル"'\n\
DECLARE @v_trmNbrTrav smallint = "4ドル"\n\
IF EXISTS (SELECT 1 FROM trimestre WHERE trmPfiID_fk = @v_trmPfiID_fk AND trmNom = @v_trmNom)\n\
BEGIN\n\
UPDATE trimestre\n\
SET trmNbrTrav = @v_trmNbrTrav\n\
WHERE trmPfiID_fk = @v_trmPfiID_fk AND trmNom = @v_trmNom\n\
END\n\
ELSE\n\
BEGIN\n\
PRINT 'The script execution FAILED for record " (NR - 1) " (pfiID ' + @v_trmPfiID_fk + ', trimestre ' + @v_trmNom + ').'\n\
END\n\
go\n\
"
}
EOF
) < in.csv