PostgreSQL 9.4
I have a table called customers which has a column income integer. After runnig ANALYZE against it I got the following statistic for the column:
most_common_vals
{20000,80000,40000,60000,100000}
Now, I run the following simple query EXPLAIN ANALYZE SELECT * FROM customers WHERE income=123123 to understand the rows count estimating. Output:
Seq Scan on customers (cost=0.00..738.00 rows=1 width=268) (actual time=4.669..4.669 rows=0 loops=1)
Filter: (income = 123123)
Rows Removed by Filter: 20000
Since the optimizer doesn't have statistic for the income value 123123, it made a wild guess for 0.5% of the table size. So, the estimated row count should have been 500. But the optimizer returned the 1. Why? Maybe I didn't understand the estimating process of row counting of unknown values?
Couldn't you explain it a bit?
1 Answer 1
Since it was not in most_common_vals, PostgreSQL then looks in the histogram. If there is no histogram, it would conclude the most common values are the only values present in the table, and therefore the estimate for 123123 is zero. But it doesn't allow zero estimates in most places, to prevent div by zero error, and instead clamps it at 1.
0.5% is for cases where there is no information, like for a generic query plan or a join where the value won't be known at planning time.
But having a list of MCV and the one you want not being present in that list and there being no histogram to fallback on does constitute information.
-
You mean the
histogram_boundscolumn? In my case it's an empty string. BTW,correlation = 0.199117St.Antario– St.Antario2015年11月24日 05:49:58 +00:00Commented Nov 24, 2015 at 5:49 -
But what if there's a histogram. I tried another column of
customers. Percisely,ziphas most_common_vals andhistogram_boundsnon empty. I tried to executeEXPLAIN ANALYZE SELECT * FROM customers WHERE zip=62358where62538within the last bucket of thehistogram_bound: 62539 and 72365, but not presented in themost_common_vals. The planner guessed that there's only one row again. So, if it doesn't have themost_common_valsstatistic, it conludes that there's only one row?St.Antario– St.Antario2015年11月24日 05:57:41 +00:00Commented Nov 24, 2015 at 5:57