3

I have a table that has path1, path2, and sha1 value. For any values of path2 and sha1, there can be multiple values of path1. I just want one of those paths. I don't really care which one.

I'm thinking I can do a group by for path2 and sha1. Now I just need to select one of the values of path1. I suppose I could select the minimum value of path1 but that would be doing extra work that isn't really needed.

Google tells me that Microsoft has "FIRST" but I don't see that in the postgres pages. Plus... I'd like to stick with normal SQL if possible.

asked Dec 18, 2017 at 19:15
1
  • 1
    You can use DISTINCT ON. See this similar but generic question: dba.stackexchange.com/questions/24327/… SELECT DISTINCT ON (path2, sha1) path2, sha1, path1 FROM table_name ; Commented Dec 18, 2017 at 19:21

2 Answers 2

2

There are a bunch of ways you can do this, one of them is with DISTINCT ON as @Ypercube has suggested,

SELECT DISTINCT ON (path2, sha1) path2, sha1, path1
FROM table_name
ORDER BY path2, sha1;

You can also use an ordered-set aggregate which should generally be slower.

SELECT percentile_disc(0) WITHIN GROUP (ORDER BY path1) AS path1, path2, sha1
FROM table_name
GROUP BY path2, sha1;
answered Dec 18, 2017 at 19:36
0

A simple approach is to take the min/max of the path1:

select path2,sha1,max(path1) from table_name group by path2,sha1

This works in mysql also, where you don't have the window functions. Index on path2,sha1,path1 speeds up the query.

answered Dec 18, 2017 at 21:31
5
  • I may run this as a test. The DISTINCT ON took a very long time. Commented Dec 19, 2017 at 22:04
  • The index would probably help in both cases. Commented Dec 19, 2017 at 23:07
  • If anyone is curious... the database has almost 500M tuples. Using DISTINCT ON as Evan suggested took 553 minutes and the max(path) that Razvan suggested took 925 minutes. Thank you to both for helping out. I did not create an index because this is a one type query -- so far ;-) Commented Dec 21, 2017 at 21:40
  • Makes sense, max/min involves extra computation. What is the size of the result? Commented Dec 22, 2017 at 15:52
  • The result was 908800 tuples. Commented Dec 27, 2017 at 18:18

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.