I'm not too experienced with database problems, so please excuse the wall of text, I believe the context is important to understand the problem and goal.
I receive a CSV file from another system every day that I want to retain in a database running on Microsoft SQL Server. To accomplish this, I have a PowerShell-script scheduled daily that uses the bcp utility to "load" the CSV (accounts.csv) into a table (accounts_stage
) that has the "exact" layout of the CSV file. In addition, I "write" the timestamp into another table (import_stage
). The accounts_stage
table has a trigger set, and bcp is configured to fire that trigger.
After the import is finished, I want to "normalize" the data. For a basic example, assume this concerns only the contract number (in the CSV file a field called "Vnr"), which is a string. I have another table (contracts
), which gets filled by the trigger:
INSERT INTO dzp.contractid(vnr)
SELECT DISTINCT
a.Vnr
FROM dzp.accounts_stage a
WHERE NOT EXISTS(
SELECT id.vnr
FROM dzp.contractid id
)
This should, and seems to accomplish, add all contract numbers that are ever imported to the table contractids
. contractids.contractId
is a BIGINT PRIMARY KEY, IDENTITY, and auto increment.
I add the import-metadata to my table like this:
CREATE TABLE #IMPORTMETADATA
([importid] bigint
,[generated_timestamp] datetime
,[imported_timestamp] datetime
)
INSERT INTO #IMPORTMETADATA
SELECT TOP 1 [importid], [timestamp_import], [generated_timestamp]
FROM
[QGTAA].[dzp].[import]
ORDER BY
[timestamp_import] DESC
Next, the trigger should INSERT the data FROM accounts_stage
INTO accounts
, such that the contractId
is replaced (from the original string representation to the PRIMARY KEY of the contracts
table).
INSERT INTO [dzp].[accounts]
([Udnr]
,[ContractId]
-- snip
,[generated_timestamp]
,[importid]
)
SELECT
a.[Udnr],
cid.contractId,
-- snip
meta.[generated_timestamp],
meta.[importid]
FROM [QGTAA].[dzp].[accounts_stage] a
LEFT JOIN dzp.contractid cid ON cid.vnr = a.Vnr
CROSS JOIN #IMPORTMETADATA meta
However, this last step fails, because the SELECT operation in the last block yields rows where contractId
IS NULL, which I did not expect. Now I believe the issue is somewhere with the LEFT JOIN, but I'm not sure why.
Questions
Can someone please point me in the right direction on this specific problem?
I am not too experienced with databases, is this even a sane way of accomplishing what I intend, or is this a completely insane approach? Is there a better way?
Thank you!
-
For a better understanding of joins, start here: blog.codinghorror.com/a-visual-explanation-of-sql-joinsErik Reasonable Rates Darling– Erik Reasonable Rates Darling2017年09月06日 16:30:15 +00:00Commented Sep 6, 2017 at 16:30
1 Answer 1
There's a problem with your first NOT EXISTS
. Let's take a look at your current query:
INSERT INTO dzp.contractid(vnr)
SELECT DISTINCT
a.Vnr
FROM dzp.accounts_stage a
WHERE NOT EXISTS(
SELECT id.vnr
FROM dzp.contractid id
)
So, what's happening here is that the NOT EXISTS
that you are using will return true only when the dzp.contractid
table is empty. Once it has one row (or more), the NOT EXISTS
will return false
for every row of your main table, since SELECT id.vnr FROM dzp.contractid
has a result set, and you end up inserting duplicated values in your table.
What you need to do is correlate this subquery with the main table:
INSERT INTO dzp.contractid(vnr)
SELECT DISTINCT
a.Vnr
FROM dzp.accounts_stage a
WHERE NOT EXISTS(
SELECT id.vnr
FROM dzp.contractid id
WHERE id.vnr = a.vnr
);
Also, the CROSS JOIN
with the data from the latest import doesn't make much sense to me, and neither the fact that you aren't checking if the data exists already in the accounts
table.
-
Additionally, the final step requires contractId to not be null but the query is left joining the table that it comes from and leaving open the possibility it will be null. If it's required that it exists then this should be changed to an inner join.indiri– indiri2017年09月06日 16:35:54 +00:00Commented Sep 6, 2017 at 16:35
-
1@indiri yeah, the last query needs to be rewritten, but I don't understand completely if that's what op needs or notLamak– Lamak2017年09月06日 16:36:52 +00:00Commented Sep 6, 2017 at 16:36
-
I finally came to continue working on this problem, and what you describe is exactly right. The solution works, i had to add the part about the import though. The import table only stores the one file that is currently being imported (as I need to get metadata about the source file into the database through Powershell + bcp. Thanks!zuiqo– zuiqo2017年09月15日 08:29:42 +00:00Commented Sep 15, 2017 at 8:29
Explore related questions
See similar questions with these tags.