Below is an example of my customer table. There some records having multiple values in BIRTHDAY DATE (by mistake or so). I only want to select those records that have same values for LASTNAME, MIDDLENAME, FIRSTNAME, SSN but different BIRTHDAY:
Member table
LASTNAME MIDDLENAME FIRSTNAME SSN BIRTHDAY
Jones M Carol 1234 17-DEC-45
Jones M Carol 1234 17-DEC-45
Jones M Carol 4425 20-APR-70
Black S Ted 5555 15-MAY-57
Roberts T Cole 1412 14-MAY-57
Roberts T Cole 1412 20-OCT-57
Roberts S Cole 1412 15-MAY-57
I would like the result to be:
LASTNAME MIDDLEANME FIRSTNAME SSN BIRTHDAY
Roberts T Cole 1412 14-MAY-57
Roberts T Cole 1412 20-OCT-57
Notice that there were few accounts with same SSN or full name in the table, they are not selected since they don't have everything same. Also Jones M. Carol with 1234 as SSN is not selected either since she does not have different Birthday for two different account.
This is my SQL query that I have so far and this is not working well necessarily.
SELECT x.FIRST_NM, x.MDL_NM, x.LAST_NM, x.SSN, x.BRTH_DT
FROM Member_table x
WHERE EXISTS
(
SELECT FIRST_NM, MDL_NM, LAST_NM, SSN, COUNT(*)
from Member_table
WHERE CURRENT_RECORD_IN = 'Y'
group by FIRST_NM, MDL_NM, LAST_NM, SSN
having count(distinct BRTH_DT) > 1
)
ORDER BY FIRST_NM ASC, LAST_NM ASC, MDL_NM ASC, SSN ASC;
Any advice for this query?
-
1What should happen if there are 2 rows with same birthday and a 3rd row with different birthday?ypercubeᵀᴹ– ypercubeᵀᴹ2018年06月28日 13:30:03 +00:00Commented Jun 28, 2018 at 13:30
5 Answers 5
Here's an example of using EXISTS
and a correlated subquery. I tested on SQL Server, but will probably work on other RDBMS's.
drop table if exists table1
CREATE TABLE Table1
(LASTNAME varchar(7), MIDDLENAME varchar(1), FIRSTNAME varchar(5), SSN int, BIRTHDAY varchar(9))
;
INSERT INTO Table1
(LASTNAME, MIDDLENAME, FIRSTNAME, SSN, BIRTHDAY)
VALUES
('Jones', 'M', 'Carol', 1234, '17-DEC-45'),
('Jones', 'M', 'Carol', 1234, '17-DEC-45'),
('Jones', 'M', 'Carol', 4425, '20-APR-70'),
('Black', 'S', 'Ted', 5555, '15-MAY-57'),
('Roberts', 'T', 'Cole', 1412, '14-MAY-57'),
('Roberts', 'T', 'Cole', 1412, '20-OCT-57'),
('Roberts', 'S', 'Cole', 1412, '15-MAY-57')
;
SELECT *
FROM table1 t1
WHERE EXISTS (
SELECT *
FROM table1
WHERE LASTNAME = t1.LASTNAME
AND MIDDLENAME = t1.MIDDLENAME
AND FIRSTNAME = t1.FIRSTNAME
AND SSN = t1.SSN
AND BIRTHDAY <> t1.BIRTHDAY
)
| LASTNAME | MIDDLENAME | FIRSTNAME | SSN | BIRTHDAY |
|----------|------------|-----------|------|-----------|
| Roberts | T | Cole | 1412 | 14-MAY-57 |
| Roberts | T | Cole | 1412 | 20-OCT-57 |
-
1It might be rewritten using an
IINNER JOIN
if you need the list of all pairsSELECT * FROM table1 AS t1 INNER JOIN table1 AS t2 ON (t2.LASTNAME = t1.LASTNAME AND t2.MIDDLENAME = t1.MIDDLENAME AND t2.FIRSTNAME = t1.FIRSTNAME AND t2.SSN = t1.SSN AND t2.BIRTHDAY <> t1.BIRTHDAY)
Otherwise,SELECT DISTINCT t1.* FROM...
can be used to get onlyt1
rows (not thet1 t2
pairs)Xenos– Xenos2018年06月28日 13:45:39 +00:00Commented Jun 28, 2018 at 13:45
This also works on Oracle.
CREATE TABLE MEMBER
(
LASTNAME VARCHAR2(7),
MIDDLENAME CHAR(1),
FIRSTNAME VARCHAR2(5),
SSN INT,
BIRTHDAY VARCHAR2(9)
);
Insert into MEMBER (LASTNAME,MIDDLENAME,FIRSTNAME,SSN,BIRTHDAY) values ('Jones','M','Carol',1234,'17-DEC-45');
Insert into MEMBER (LASTNAME,MIDDLENAME,FIRSTNAME,SSN,BIRTHDAY) values ('Jones','M','Carol',1234,'17-DEC-45');
Insert into MEMBER (LASTNAME,MIDDLENAME,FIRSTNAME,SSN,BIRTHDAY) values ('Jones','M','Carol',4425,'20-APR-70');
Insert into MEMBER (LASTNAME,MIDDLENAME,FIRSTNAME,SSN,BIRTHDAY) values ('Black','S','Ted',5555,'15-MAY-57');
Insert into MEMBER (LASTNAME,MIDDLENAME,FIRSTNAME,SSN,BIRTHDAY) values ('Roberts','T','Cole',1412,'14-MAY-57');
Insert into MEMBER (LASTNAME,MIDDLENAME,FIRSTNAME,SSN,BIRTHDAY) values ('Roberts','T','Cole',1412,'20-OCT-57');
Insert into MEMBER (LASTNAME,MIDDLENAME,FIRSTNAME,SSN,BIRTHDAY) values ('Roberts','S','Cole',1412,'15-MAY-57');
Insert into MEMBER (LASTNAME,MIDDLENAME,FIRSTNAME,SSN,BIRTHDAY) values ('James','N','Rob',7890,'18-JUN-58');
Insert into MEMBER (LASTNAME,MIDDLENAME,FIRSTNAME,SSN,BIRTHDAY) values ('James','N','Rob',7890,'15-JUN-58');
Insert into MEMBER (LASTNAME,MIDDLENAME,FIRSTNAME,SSN,BIRTHDAY) values ('James','N','Rob',7890,'20-MAR-56');
Insert into MEMBER (LASTNAME,MIDDLENAME,FIRSTNAME,SSN,BIRTHDAY) values ('James','N','Rob',7890,'14-APR-55');
SELECT DISTINCT a.*
FROM member a,member b
WHERE a.lastname=b.lastname
AND a.middlename=b.middlename
AND a.firstname=b.firstname
AND a.ssn=b.ssn
AND a.birthday != b.birthday
ORDER BY a.lastname,a.middlename,a.firstname,a.ssn,a.birthday;
output
James N Rob 7890 14-APR-55
James N Rob 7890 15-JUN-58
James N Rob 7890 18-JUN-58
James N Rob 7890 20-MAR-56
Roberts T Cole 1412 14-MAY-57
Roberts T Cole 1412 20-OCT-57
-
1This would work but may produce multiple results (if for example there are 4 rows with same everything and different birthday, all 4 will appear in the result 3 times)ypercubeᵀᴹ– ypercubeᵀᴹ2018年06月28日 13:26:56 +00:00Commented Jun 28, 2018 at 13:26
-
You're right ,fixed the code with "distinct".I never thought of that possibility but it was nice of you to point out.user153556– user1535562018年06月28日 14:33:44 +00:00Commented Jun 28, 2018 at 14:33
Using aggregation, this is simple:
select *
from member_table
where (firstname, middle_name, last_name, ssn) in (
select firstname, middle_name, last_name, ssn
from member_table
group by firstname, middle_name, last_name, ssn
having min(birthday) <> max(birthday)
);
The outer query is only necessary if you need the actual birthdays (as in your sample output), otherwise the inner query suffices.
Note that this works for all datatypes; min()
and max()
might not be returning the minimum and maximum values, respectively, but in this particular case that doesn't matter -- as long as they are different.
Note that there's nothing to tie your WHERE EXISTS
subquery to the outer query; if there's any records in your table where the first four values are identical, but the birth date is different, then all rows of your table qualify.
The simplest solution would be to use the WHERE EXISTS
subquery as a derived table, and joining it to Member_table
:
SELECT DISTINCT x.FIRST_NM, x.MDL_NM, x.LAST_NM, x.SSN, x.BRTH_DT
FROM Member_table x
INNER JOIN (
SELECT FIRST_NM, MDL_NM, LAST_NM, SSN, COUNT(*)
from Member_table
WHERE CURRENT_RECORD_IN = 'Y'
group by FIRST_NM, MDL_NM, LAST_NM, SSN
having count(distinct BRTH_DT) > 1
) bd ON ( x.FIRST_NM = bd.FIRST_NM
AND x.MDL_NM = bd.MDL_NM
AND x.LAST_NM = bd.LAST_NM
AND x.SSN = bd.SSN
)
ORDER BY FIRST_NM ASC, LAST_NM ASC, MDL_NM ASC, SSN ASC;
So:
- Your subquery returns the name and SSN of all rows where different birth dates exist for those same name and SSN values.
- By joining that to
Member_table
on the name and SSN columns, you ensure that you're only grabbing rows where multiple birth dates exist. - I also added
DISTINCT to the main query's
SELECT` list, so that you only get 1 copy of each row in your output.
Untested, as your specific DBMS was unstated.
If your Oracle version supports window function:
select LASTNAME, MIDDLENAME, FIRSTNAME, SSN, BIRTHDAY
from (
select LASTNAME, MIDDLENAME, FIRSTNAME, SSN, BIRTHDAY
, first_value(BIRTHDAY) over (partition by LASTNAME, MIDDLENAME, FIRSTNAME, SSN order by BIRTHDAY) as fst
, first_value(BIRTHDAY) over (partition by LASTNAME, MIDDLENAME, FIRSTNAME, SSN order by BIRTHDAY desc) as lst
from Table1
) t
where fst <> lst;
Note that fst have asc order and lst have desc order. An alternative is to use last_value for the latter, but then we would have to declare the window frame:
last_value(BIRTHDAY) over (
partition by LASTNAME, MIDDLENAME, FIRSTNAME, SSN
order by BIRTHDAY
rows between current row and unbounded following
) as lst
This is not necessary for first_value since the default is:
range between unbounded preceding and current row