Joining multiple tables results in duplicate rows

Question 1

I am getting more rows then I expect returned by my query.

I believe it has something to do with my join statements.

There are multiple tables that have different information in them. Person holds the primary information about the person but not the address, or phone, or email. This is because the original designer wanted the table to be able to hold multiple phone numbers and emails and addresses.

SELECT (person.FirstName + ' ' + person.LastName) as FullName
 ,ISNULL(Person.isClient, '')
 ,ISNULL(Person.UDF1, '')
 ,ISNULL(Address.City, '')
 ,ISNULL(Address.state, '')
 ,PersonAddress.Person
 ,PersonAddress.Address
 ,ISNULL(Phone.PhoneNumber, 'N/A')
 ,Email.Email
 ,Person.Website
FROM Person
 left join PersonAddress on Person.ID = PersonAddress.Person
 left join Address on PersonAddress.Address = Address.ID
 left join PersonPhone on Person.ID = PersonPhone.Person
 left join Phone on PersonPhone.Person = Phone.ID
 left join Email with (nolock) on Person.ID = Email.Person
WHERE (
 isclient = 'prospect'
 or isclient = 'client'
 )
 and Address is not null
 and name like '%Mike%'
ORDER BY isClient asc;

For this example, I get 6 rows of "Mike Worths". 3 of the copies have one email and three have another email.

For "Mike Pamstein" I get two duplicate rows with the same email.

I need the results to only contain a single unique row for each person.

I want to drop the second email.

Question 2

Presumably, you want to see a single entry for each unique person/address/email/website combination. If so, try this:

SELECT (person.FirstName + ' ' + person.LastName) as FullName
 , ISNULL(Person.isClient, '')
 , ISNULL(Person.UDF1, '')
 , ISNULL([Address].City, '')
 , ISNULL([Address].[state], '')
 , PersonAddress.Person
 , PersonAddress.[Address]
 , ISNULL(Phone.PhoneNumber, 'N/A')
 , Email.Email
 , Person.Website
FROM dbo.Person
 LEFT JOIN dbo.PersonAddress ON Person.ID = PersonAddress.Person
 LEFT JOIN dbo.[Address] ON PersonAddress.[Address] = [Address].ID
 LEFT JOIN dbo.PersonPhone ON Person.ID = PersonPhone.Person
 LEFT JOIN dbo.Phone ON PersonPhone.Person = Phone.ID
 LEFT JOIN dbo.Email WITH (NOLOCK) ON Person.ID = Email.Person
WHERE (
 isclient = 'prospect'
 or isclient = 'client'
 )
 and [Address] is not null
 and name like '%Mike%'
GROUP BY (person.FirstName + ' ' + person.LastName)
 , ISNULL(Person.isClient, '')
 , ISNULL(Person.UDF1, '')
 , ISNULL([Address].City, '')
 , ISNULL([Address].state, '')
 , PersonAddress.Person
 , PersonAddress.[Address]
 , ISNULL(Phone.PhoneNumber, 'N/A')
 , Email.Email
 , Person.Website
ORDER BY isClient asc;

The GROUP BY clause at the end ensures only a single row is returned for each unique combination of columns in the GROUP BY clause. This should prevent duplicate rows being displayed in your results.

A couple of things to note:

Always use the schema qualifier on the FROM clause. FROM Person should be FROM dbo.Person -> this eliminates any confusion if you introduce new schemas in the future, and prevents the query optimizer from having to look for the default schema for your user.
For maintainability in the future, you probably want to name columns the same regardless of which table they are in. So for instance, instead of the ID column in the People table being named ID, and it being named Person in the Address table, I'd name it PersonID in both tables. This prevents confusion (read bugs) in joins such as dbo.Person LEFT JOIN dbo.Address ON Person.ID = Address.Person.
Instead of naming tables like Person, they should be named after the collection of items they contain, in plural. So, Person becomes People, and Address becomes Addresses. This eliminates confusion -> does the Address table actually contain a single address or multiple addresses?
WITH (NOLOCK) should be avoided like the plague it is unless you fully understand the consequences of reading rows that have been modified by other transactions but not yet committed. From MSDN:

Transactions running at the READ UNCOMMITTED level do not issue shared locks to prevent other transactions from modifying data read by the current transaction. READ UNCOMMITTED transactions are also not blocked by exclusive locks that would prevent the current transaction from reading rows that have been modified but not committed by other transactions. When this option is set, it is possible to read uncommitted modifications, which are called dirty reads. Values in the data can be changed and rows can appear or disappear in the data set before the end of the transaction. This option has the same effect as setting NOLOCK on all tables in all SELECT statements in a transaction. This is the least restrictive of the isolation levels.

In SQL Server, you can also minimize locking contention while protecting transactions from dirty reads of uncommitted data modifications using either:

The READ COMMITTED isolation level with the READ_COMMITTED_SNAPSHOT database option set to ON.

The SNAPSHOT isolation level.

Question 3

Could you use subqueries to return one record back in your joins like so?

 ...
 FROM dbo.Person
 LEFT JOIN (SELECT MAX(AddressID) AS AddressID, Person FROM dbo.PersonAddress GROUP BY Person) PersonAddress ON Person.ID = PersonAddress.Person
 LEFT JOIN dbo.[Address] ON PersonAddress.[Address] = [Address].ID

In this case, I'm just using MAX to force it to one Person, but you could use other logic to get it down it down to one record per person and eliminate duplicates that way.

score 7 · Answer 1 · 2015-06-04 18:40:30Z

Presumably, you want to see a single entry for each unique person/address/email/website combination. If so, try this:

SELECT (person.FirstName + ' ' + person.LastName) as FullName
 , ISNULL(Person.isClient, '')
 , ISNULL(Person.UDF1, '')
 , ISNULL([Address].City, '')
 , ISNULL([Address].[state], '')
 , PersonAddress.Person
 , PersonAddress.[Address]
 , ISNULL(Phone.PhoneNumber, 'N/A')
 , Email.Email
 , Person.Website
FROM dbo.Person
 LEFT JOIN dbo.PersonAddress ON Person.ID = PersonAddress.Person
 LEFT JOIN dbo.[Address] ON PersonAddress.[Address] = [Address].ID
 LEFT JOIN dbo.PersonPhone ON Person.ID = PersonPhone.Person
 LEFT JOIN dbo.Phone ON PersonPhone.Person = Phone.ID
 LEFT JOIN dbo.Email WITH (NOLOCK) ON Person.ID = Email.Person
WHERE (
 isclient = 'prospect'
 or isclient = 'client'
 )
 and [Address] is not null
 and name like '%Mike%'
GROUP BY (person.FirstName + ' ' + person.LastName)
 , ISNULL(Person.isClient, '')
 , ISNULL(Person.UDF1, '')
 , ISNULL([Address].City, '')
 , ISNULL([Address].state, '')
 , PersonAddress.Person
 , PersonAddress.[Address]
 , ISNULL(Phone.PhoneNumber, 'N/A')
 , Email.Email
 , Person.Website
ORDER BY isClient asc;

The GROUP BY clause at the end ensures only a single row is returned for each unique combination of columns in the GROUP BY clause. This should prevent duplicate rows being displayed in your results.

A couple of things to note:

Always use the schema qualifier on the FROM clause. FROM Person should be FROM dbo.Person -> this eliminates any confusion if you introduce new schemas in the future, and prevents the query optimizer from having to look for the default schema for your user.
For maintainability in the future, you probably want to name columns the same regardless of which table they are in. So for instance, instead of the ID column in the People table being named ID, and it being named Person in the Address table, I'd name it PersonID in both tables. This prevents confusion (read bugs) in joins such as dbo.Person LEFT JOIN dbo.Address ON Person.ID = Address.Person.
Instead of naming tables like Person, they should be named after the collection of items they contain, in plural. So, Person becomes People, and Address becomes Addresses. This eliminates confusion -> does the Address table actually contain a single address or multiple addresses?
WITH (NOLOCK) should be avoided like the plague it is unless you fully understand the consequences of reading rows that have been modified by other transactions but not yet committed. From MSDN:

Transactions running at the READ UNCOMMITTED level do not issue shared locks to prevent other transactions from modifying data read by the current transaction. READ UNCOMMITTED transactions are also not blocked by exclusive locks that would prevent the current transaction from reading rows that have been modified but not committed by other transactions. When this option is set, it is possible to read uncommitted modifications, which are called dirty reads. Values in the data can be changed and rows can appear or disappear in the data set before the end of the transaction. This option has the same effect as setting NOLOCK on all tables in all SELECT statements in a transaction. This is the least restrictive of the isolation levels.

In SQL Server, you can also minimize locking contention while protecting transactions from dirty reads of uncommitted data modifications using either:

The READ COMMITTED isolation level with the READ_COMMITTED_SNAPSHOT database option set to ON.

The SNAPSHOT isolation level.

codedawg82 codedawg82 5443 silver badges6 bronze badges · Answer 2 · 2017-07-28 20:36:46Z

Could you use subqueries to return one record back in your joins like so?

 ...
 FROM dbo.Person
 LEFT JOIN (SELECT MAX(AddressID) AS AddressID, Person FROM dbo.PersonAddress GROUP BY Person) PersonAddress ON Person.ID = PersonAddress.Person
 LEFT JOIN dbo.[Address] ON PersonAddress.[Address] = [Address].ID

In this case, I'm just using MAX to force it to one Person, but you could use other logic to get it down it down to one record per person and eliminate duplicates that way.

Stack Exchange Network

Joining multiple tables results in duplicate rows

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Joining multiple tables results in duplicate rows

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions