I want to use TABLESAMPLE to randomly sample from rows that meet a certain condition with PostgreSQL.
This runs fine:
select * from customers tablesample bernoulli (1);
But I cannot figure out how to embed the condition in the script. This for example
select * from customers where last_name = 'powell' tablesample bernoulli (1);
throws this error:
SQL Error [42601]: ERROR: syntax error at or near "tablesample"
Position: 71
2 Answers 2
tablesample
is an "attribute" of a table, not a query. So you need to write it write after the table name:
select *
from customers tablesample system (1)
where last_name = 'powell';
Note that the where
clause will be applied after the table has been sampled. It will not return 1% of all customers that have the last name 'powell'. But instead it will apply the filter on the sampled table.
-
1Thank you. So, by design, it is not possible to apply the filter first and then do the tablesample? Your note is actually my concern. For example, when the rows that meet the condition are very rare.Reveille– Reveille2020年01月27日 14:49:40 +00:00Commented Jan 27, 2020 at 14:49
-
3@reveille No, it is called tablesample not rowsample for a reason. You can just add
and random()<0.01
ororder by random() limit ...
jjanes– jjanes2020年01月27日 17:00:55 +00:00Commented Jan 27, 2020 at 17:00
I was also wondering if this was possible and stumbled on this post. I didn't do much research on the tablesample attribute, so technically the other answers are more correct to your direct question.
However you can work around this if you really want to use the tablesample attribute by creating a temporary table (or similar) based on your conditional query.
create temp table powell_customers as select * from customers where last_name = 'powell';
SELECT * from powell_customers tablesample system(1);
Explore related questions
See similar questions with these tags.