(追記) (追記ここまで)

| |

Subscribe / Log in / New account

Haas: Big ideas [for PostgreSQL]

[Posted May 5, 2010 by corbet]

Robert Haas contemplates development ideas for PostgreSQL once the 9.0 release is done. "When I first started using PostgreSQL, the product had a reputation for being slow, principally because of issues relating to VACUUM. That reputation wasn't entirely justified even back then, and I think we've made enormous progress here in 8.3 and 8.4, but there might be more improvements we can make. Where are the remaining bottlenecks?"

Haas: Big ideas [for PostgreSQL]

Posted May 5, 2010 23:08 UTC (Wed) by arjan (subscriber, #36785) [Link] (11 responses)

one thing I would love to see fixed is postgres using indexes on date fields.

right now if you do selects on time ranges (lets say, last 7 days) it will not use the index but instead walk all records.

Haas: Big ideas [for PostgreSQL]

Posted May 6, 2010 0:03 UTC (Thu) by leonov (guest, #6295) [Link] (5 responses)

What I do is to simply use UNIX epoch timestamps for time/date storage. They are just integers, so take little space and index well, and have other benefits too.

Sounds crazy at first blush, but has worked out great for me over the years. I wrote a short article about them here:
http://lost.co.nz/programming/epoch.html

Haas: Big ideas [for PostgreSQL]

Posted May 6, 2010 3:10 UTC (Thu) by jhardin@impsec.org (guest, #15045) [Link] (1 responses)

...that's the sort of thing the db engine should be doing for you.

Haas: Big ideas [for PostgreSQL]

Posted May 10, 2010 16:11 UTC (Mon) by intgr (subscriber, #39733) [Link]

And it does, the original poster is simply confused.

Haas: Big ideas [for PostgreSQL]

Posted May 6, 2010 4:33 UTC (Thu) by wahern (subscriber, #37304) [Link] (2 responses)

Your web page assumes that `time_t' stores a Unix Timestamp--number of seconds from the epoch 1970年01月01日. But the C standard has no such requirement, either concerning the epoch or whether time_t stores second units--C specifies difftime() to get a difference between two time_t values in seconds.

Languages such as Perl and Python which expose the C routines also do not make any such guarantees, and the documentation mentions this (actually, Python specifies the unit but leaves unspecified the epoch). If you pass a time() value from one system to another, for example by converting to an integer and inserting into a database, you can't assume that it refers to the same calendar time, or is even meaningful at all, merely because both systems are using C. Basically, the only meaningful thing you can do with a time_t value, without relying on other standards, is to use it with strftime(), difftime(), and the other date-time related C interfaces.

The only way to exchange a time_t value between systems in a portable manner is by serializing the value, for example into the ISO8601 format, though historically the asctime() and RFC822 formats were preferred.

Of course, depending on POSIX is another story altogether. But it's POSIX that's being used, not C.

Haas: Big ideas [for PostgreSQL]

Posted May 6, 2010 14:46 UTC (Thu) by cruff (subscriber, #7201) [Link]

You also can not make an assumption that the time_t is even an integer, it could be floating point, which is permissible according to IEEE 1003.1.

Haas: Big ideas [for PostgreSQL]

Posted May 8, 2010 19:49 UTC (Sat) by HenrikH (subscriber, #31152) [Link]

In theory that is correct yes, but in practice you can even export a time_t value to Windows without any problems.

Honestly you have to search very far and wide to find a system using time_t that is:

1. not using integers
2. not counting the number of seconds
3. not having a base of 1970年01月01日T00:00:00

And as noted, even Windows follows the rules.

Haas: Big ideas [for PostgreSQL]

Posted May 6, 2010 8:23 UTC (Thu) by petereisentraut (guest, #59453) [Link] (1 responses)

Indexes on dates are supported just fine. What you are apparently experiencing is an optimizer decision of one plan over another, based on statistics and the optimization cost model. If you think the optimizer is wrong in a particular case, please submit a bug report. But there is no new "feature" here; it already works in theory.

Haas: Big ideas [for PostgreSQL]

Posted May 6, 2010 19:47 UTC (Thu) by jeremiah (subscriber, #1221) [Link]

you might also find that you need to adjust the stats collection a little bit and then rerun vaccum analyze. You'll laso find the a sorted index on your dates works EXTREMELY well.

Haas: Big ideas [for PostgreSQL]

Posted May 6, 2010 16:27 UTC (Thu) by endecotp (guest, #36428) [Link] (2 responses)

My recollection is that

age(fieldname) < '1 week'::interval

does not use the index, while

fieldname > now()-'1 week'::interval

does use the index.

This is from a while ago, so I might have forgotten something and things will have changed.

If this is important to you, I suggest you ask on the mailing list. It is normally one of the more friendly places to ask such questions.

Haas: Big ideas [for PostgreSQL]

Posted May 6, 2010 21:02 UTC (Thu) by nevyn (guest, #33129) [Link] (1 responses)

That seems reasonable, as you'd actually need an index on age(mydate) ... or the optimizer would need to know that even though age(mydate) gives different results each time they are always directly related to mydate (so if mydate1 > Y and mydate2 > Y, then if age(mydate1) > Z it can be inferred age(mydate2) > Z). This seems hard though (but I'm not a PSQL developer :).

I'm also assuming that current_date is fixed for the SELECT/transaction ... but that might not be true either, in which case it's impossible to optimize.

Haas: Big ideas [for PostgreSQL]

Posted May 10, 2010 16:07 UTC (Mon) by intgr (subscriber, #39733) [Link]

> as you'd actually need an index on age(mydate)

Obviously you can't do that, because the age() function is not stable -- its output changes every microsecond.

Heck, I didn't even know that there's an age() function. I always did it the old fasioned way: where some_timestamp > now() - interval '1 day'

Haas: Big ideas [for PostgreSQL]

Posted May 6, 2010 0:23 UTC (Thu) by alankila (guest, #47141) [Link] (1 responses)

Optimized NOT IN queries. "SELECT * FROM foo WHERE id NOT IN (SELECT id FROM foo)" should return no rows, but Postgresql appears to take quite some time to work this one out already.

Haas: Big ideas [for PostgreSQL]

Posted May 8, 2010 14:01 UTC (Sat) by kleptog (subscriber, #1183) [Link]

NOT IN is a somewhat annoying construct in that the SQL standard specifies that if *any* row in the subquery is NULL, then it must evaluate to true in all cases. The equivalent NOT EXISTS construct doesn't have this which is why that *can* be optimised.

Other than this slight annoyance, it is just an anti-join which postgres can optimise just fine. It just needs to prove that the subquery has no NULLs first. There would need to be some work in the optimiser to see when this can be shown, and that hasn't happened.

Haas: Big ideas [for PostgreSQL]

Posted May 6, 2010 11:49 UTC (Thu) by ringerc (subscriber, #3071) [Link] (2 responses)

Pg scales up well on a many-cored many-spindled machine to service many users, though above a certain point a pool is required because Pg doesn't separate query executor/backend from connection/frontend and the backends are expensive to have too many of.

Pg doesn't scale out particularly well across many machines, though various 3rd party replication options address this at varying degrees of complexity cost (slony) etc. Hopefully SR+HS in 9.0 will help here, too, especially if 9.x grows automatic client referral or proxy capability to direct write attempts to to the master one way or the other. It's still perfectly usable for horizontal scaling (see: bucardo, skype, etc), just limited.

The big one to me is that Pg scales up rather poorly for a workload focused on one or two big, complex queries at a time. Each backend is single threaded and uses synchronous I/O so it's less able to request blocks from multiple fast storage devices to concurrently read indexes, table partitions, etc. It can use at most one core, which for CPU-heavy queries is crippling. These days it's hard to "just buy a faster CPU" - and easy to buy 10 slightly slower ones.

So, I guess where I think "where to go from here" is - in an ideal world - is "make Pg able to use the multi-core multi-CPU multi-disk/array grunt of modern machines to execute single complex queries faster".

Pity that'd be a near-total re-design :S

It doesn't help that partitioning is clunky and limited (fkey/ref issues, management, etc) which further limits scaling for big workloads.

I'd like to note that I _really_ like Pg, and use it heavily. Furthermore, these issues don't actually impact me all that much, though they do others. Pg keeps on getting better in significant ways that I use, all thanks to the hard work of Tom Lane, Robert Haas, and all the other experts on pgsql-hackers. While I'm bluntly describing perceived shortcomings in the above, I don't want that to detract from the general day-to-day awesome of the database.

Haas: Big ideas [for PostgreSQL]

Posted May 6, 2010 13:41 UTC (Thu) by hazmat (subscriber, #668) [Link] (1 responses)

Doesn't directly address your comment re lack of distributed query processing. But as for horizontal scale up, i think the promising solution for the future is http://sf.net/projects/postgres-xc a multi-master write/read scalable postgres cluster.

Haas: Big ideas [for PostgreSQL]

Posted May 10, 2010 16:33 UTC (Mon) by intgr (subscriber, #39733) [Link]

> i think the promising solution for the future is postgres-xc

Yeah right. Remember PGCluster? PGCluster II? Postgres-R? pgReplicator? Daffodil Replicator? GridSQL?

As far as I can tell, all of these projects touted to bring multi-node scalability to PostgreSQL, some more more ambitious than others, and some even claimed no drawbacks. Who uses them? If none of these projects managed to solve the problem, why would Postgres-XC?

As far as I can tell, the _only_ clustering and write-scalable system that anyone significant has used, is pgpool-II, and it places significant restrictions on the way you can write your application. Why do people keep thinking that it's a simple problem that can be solved once and for everyone?

Multi-master is just too complex for a database that aims to support ACID properties and all kinds of complex queries -- it actually tends to scale negatively. It's a cost-benefit tradeoff, but if you do outgrow monolithic SQL beasts, then NoSQL and eventual consistency is the way to go. I'll be glad to be proven wrong, however. :)

(追記) (追記ここまで)