I have large patient dataset with diagnosis information (It has more columns than shown below, which are unnecessary for this). I have generated a sample datset like below:
ID Name Gender Diagnosis_Class
ID_100 John Male Primary
ID_101 David Male Primary
ID_102 Susan Female Primary
ID_102 Susan Female Related
ID_103 Steve Male Primary
ID_103 Steve Male Primary
ID_103 Steve Male Related
ID_104 Kevin Male Primary
ID_104 Kevin Male Primary
ID_105 Peter Male Related
ID_106 Ben Male Primary
ID_107 Sophie Female Related
As you can see from the dataset from the dataset, some records are duplicated but their Diagnosis_Class values are different. Some of them only have Primary, some of them only have Related diagnosis class information, and some of them have both.
My question is, How can I make sure entries with Related Diagnosis_Class also have Primary Diagnosis_Class?? For example, Susan has both Primary and Related Diagnosis_Class, so I need to wtite a query to flag these entries with Related as values (Can be in another column with YES or NO, or Y or N, etc).
Also I only need only one of each Diagnosis_Class type (One entry with Primary and one enrty with Related diagnosis classes) when the entries have multiple Primary or Related values. For example, Steve has two records with Primary diagnosis class and one record with Related diagnosis class. I only need two records from those three (One with primary and one with related. Doesn't really matter which primary record is picked). As per records with only Related diagnosis class, the record can be kept as it it.
I am fairly new to SQL Server (v17.9.1) world. Any help is greatly appreciated. Thanks you very much for your time.
Thanks for your quick response.
Here's the desired output:
ID Name Gender Diagnosis_Class Flag
ID_100 John Male Primary
ID_101 David Male Primary
ID_102 Susan Female Primary
ID_102 Susan Female Related Yes
ID_103 Steve Male Primary
ID_103 Steve Male Related Yes
ID_104 Kevin Male Primary
ID_104 Kevin Male Primary
ID_105 Peter Male Related No
ID_106 Ben Male Primary
ID_107 Sophie Female Related No
As you can see, I have removed following entries:
ID_103 Steve Male Primary
and ID_103 Steve Male Primary
because there's two entries with Primary Diagnosis_Class and I only need one Primary and one Related (which Primary we ignore is not important as I am more interested in Related whenever an entry has one)
I also flagged all entries with Related Diagnosis_Class Yes when they have a Primary Diagnosis_Class attached to them, and No when there's no Primary Diagnosis_Class attached to them. We can leave the rest NULL or add 'N/A' or simply leave empty (Not Important).
As per entries with multiple Primary Diagnosis_Class but no Related Diagnosis_Class, I kept them all as they were.
I hope I am making sense here as I am struggling since it's too complicated for my experience.
Thank you for your time. I really appreciate it.
1 Answer 1
You can take advantage of DISTINCT clause to avoid duplicates and for the second part:
How can I make sure entries with Related Diagnosis_Class also have Primary Diagnosis_Class??
You can use a CASE statement for those rows where Diagnosis_Class = 'Related' that try to find a matching row with Diagnosis_class = 'Primary'.
Assuming your table name is patient:
SELECT DISTINCT
ID,
Name,
Gender,
Diagnosis_Class,
CASE WHEN Diagnosis_Class = 'Related' THEN
IIF (EXISTS (SELECT 1 FROM patient WHERE ID=p.ID AND Diagnosis_Class='Primary'), 'Yes', 'No')
ELSE ''
END AS Flag
FROM
patient p;
Gives you this result:
ID | Name | Gender | Diagnosis_Class | Flag |
---|---|---|---|---|
ID_100 | John | Male | Primary | |
ID_101 | David | Male | Primary | |
ID_102 | Susan | Female | Primary | |
ID_102 | Susan | Female | Related | Yes |
ID_103 | Steve | Male | Primary | |
ID_103 | Steve | Male | Related | Yes |
ID_104 | Kevin | Male | Primary | |
ID_105 | Peter | Male | Related | No |
ID_106 | Ben | Male | Primary | |
ID_107 | Sophie | Female | Related | No |
db<>fiddle here
-
This is exactly the output I need and it works perfctly on my dataset. Thanks for your answer McNets.PDRA– PDRA2021年09月30日 11:21:42 +00:00Commented Sep 30, 2021 at 11:21
Diagnosis_Class
values arePrimary
andRelated
?