I am trying to join two tables on muliple criteria, however the set of criteria would be different for each row. Then, in addition, I want to get a percentile rank for values from one column from one of the tables compared to a column from the other. Let's say these are the tables: T1:
+--------+-------+--------+-------+-------+-------+
| height | width | weight | power | price | score1|
+--------+-------+--------+-------+-------+-------+
| 180 | 130 | 30 | 20 | 100 | 1 |
| 170 | 130 | 90 | 50 | 200 | 5 |
| 180 | 130 | 30 | 20 | 50 | 1 |
| 210 | 180 | 90 | 90 | 1000 | 1 |
| 210 | 300 | 90 | 90 | 1000 | 5 |
| 100 | 130 | 30 | 20 | 1000 | 5 |
| 170 | 300 | 90 | 90 | 200 | 4 |
| 180 | 80 | 90 | 10 | 1000 | 3 |
| 210 | 300 | 90 | 90 | 1000 | 6 |
+--------+-------+--------+-------+-------+-------+
T2:
+-----------+----------+-----------+----------+----------+-------+
| height_t2 | width_t2 | weight_t2 | power_t2 | price_t2 | score2|
+-----------+----------+-----------+----------+----------+-------+
| - | 130 | 30 | 20 | - | 2 |
| 170 | - | 90 | - | 200 | 2 |
| 180 | 80 | - | 10 | - | 5 |
| 210 | - | 90 | - | 1000 | 6 |
| | | | | | |
+-----------+----------+-----------+----------+----------+-------+
I would like to join them on all values that are non-null from the first 5 columns (that is: everything except for the score) and then I would like to check the percentile rank for the scores from the second table, when the range would be all scores from the matching rows from the first table. So: the first score (2) would be tested against 1, 1 and 5, so the result would be 67%, because it's greater than the scores in 2/3 of the matching rows
I have already tried the following approach(please bear in mind that I simplified the tables, my dataset is much larger than this):
SELECT * FROM (SELECT
*,
APPROX_PERCENTILE (score1 OVER (
PARTITION BY height, width, weight, power, price
)) AS approx_percentile
FROM
t1
JOIN
t2 ON (
(
t1.height IS NULL
OR height = height_t2
)
AND (
t1.weight IS NULL
OR weight = weight_t2
)
AND (
power IS NULL
OR power = power_t2
)
AND (
price IS NULL
OR price = price_t2
)
AND (
width IS NULL
OR width = width_t2
)
))
WHERE
score1 = score2
...But the results are not accurate. Is there anything wrong with my logic? Or perhaps I should try something completely different?
1 Answer 1
I believe your logic/concept is correct; however you are testing if the values in T1 IS NULL when your null values are in T2.
Changing the logic to test T2 for null results in output that matches what you explain:
SELECT *
FROM
t1
JOIN
t2 ON (
(
height_t2 IS NULL
OR height = height_t2
)
AND (
weight_t2 IS NULL
OR weight = weight_t2
)
AND (
power_t2 IS NULL
OR power = power_t2
)
AND (
price_t2 IS NULL
OR price = price_t2
)
AND (
width_t2 IS NULL
OR width = width_t2
)
)
| height | width | weight | power | price | score1 | height_t2 | width_t2 | weight_t2 | power_t2 | price_t2 | score2 |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 180 | 130 | 30 | 20 | 100 | 1 | null | 130 | 30 | 20 | null | 2 |
| 180 | 130 | 30 | 20 | 100 | 1 | null | null | null | null | null | null |
| 170 | 130 | 90 | 50 | 200 | 5 | 170 | null | 90 | null | 200 | 2 |
| 170 | 130 | 90 | 50 | 200 | 5 | null | null | null | null | null | null |
| 180 | 130 | 30 | 20 | 50 | 1 | null | 130 | 30 | 20 | null | 2 |
| 180 | 130 | 30 | 20 | 50 | 1 | null | null | null | null | null | null |
| 210 | 180 | 90 | 90 | 1000 | 1 | 210 | null | 90 | null | 1000 | 6 |
| 210 | 180 | 90 | 90 | 1000 | 1 | null | null | null | null | null | null |
| 210 | 300 | 90 | 90 | 1000 | 5 | 210 | null | 90 | null | 1000 | 6 |
| 210 | 300 | 90 | 90 | 1000 | 5 | null | null | null | null | null | null |
| 100 | 130 | 30 | 20 | 1000 | 5 | null | 130 | 30 | 20 | null | 2 |
| 100 | 130 | 30 | 20 | 1000 | 5 | null | null | null | null | null | null |
| 170 | 300 | 90 | 90 | 200 | 4 | 170 | null | 90 | null | 200 | 2 |
| 170 | 300 | 90 | 90 | 200 | 4 | null | null | null | null | null | null |
| 180 | 80 | 90 | 10 | 1000 | 3 | 180 | 80 | null | 10 | null | 5 |
| 180 | 80 | 90 | 10 | 1000 | 3 | null | null | null | null | null | null |
| 210 | 300 | 90 | 90 | 1000 | 6 | 210 | null | 90 | null | 1000 | 6 |
| 210 | 300 | 90 | 90 | 1000 | 6 | null | null | null | null | null | null |
Comments
Explore related questions
See similar questions with these tags.