When I first started using PostgreSQL, the product had a reputation for being slow, principally because of issues relating to VACUUM. That reputation wasn't entirely justified even back then, and I think we've made enormous progress here in 8.3 and 8.4, but there might be more improvements we can make. Where are the remaining bottlenecks?"
Posted May 5, 2010 23:08 UTC (Wed)
by arjan (subscriber, #36785)
[Link] (11 responses)
right now if you do selects on time ranges (lets say, last 7 days) it will not use the index but instead walk all records.
Posted May 6, 2010 0:03 UTC (Thu)
by leonov (guest, #6295)
[Link] (5 responses)
Sounds crazy at first blush, but has worked out great for me over the years. I wrote a short article about them here:
Posted May 6, 2010 3:10 UTC (Thu)
by jhardin@impsec.org (guest, #15045)
[Link] (1 responses)
Posted May 10, 2010 16:11 UTC (Mon)
by intgr (subscriber, #39733)
[Link]
Posted May 6, 2010 4:33 UTC (Thu)
by wahern (subscriber, #37304)
[Link] (2 responses)
Languages such as Perl and Python which expose the C routines also do not make any such guarantees, and the documentation mentions this (actually, Python specifies the unit but leaves unspecified the epoch). If you pass a time() value from one system to another, for example by converting to an integer and inserting into a database, you can't assume that it refers to the same calendar time, or is even meaningful at all, merely because both systems are using C. Basically, the only meaningful thing you can do with a time_t value, without relying on other standards, is to use it with strftime(), difftime(), and the other date-time related C interfaces.
The only way to exchange a time_t value between systems in a portable manner is by serializing the value, for example into the ISO8601 format, though historically the asctime() and RFC822 formats were preferred.
Of course, depending on POSIX is another story altogether. But it's POSIX that's being used, not C.
Posted May 6, 2010 14:46 UTC (Thu)
by cruff (subscriber, #7201)
[Link]
Posted May 8, 2010 19:49 UTC (Sat)
by HenrikH (subscriber, #31152)
[Link]
Honestly you have to search very far and wide to find a system using time_t that is:
1. not using integers
And as noted, even Windows follows the rules.
Posted May 6, 2010 8:23 UTC (Thu)
by petereisentraut (guest, #59453)
[Link] (1 responses)
Posted May 6, 2010 19:47 UTC (Thu)
by jeremiah (subscriber, #1221)
[Link]
Posted May 6, 2010 16:27 UTC (Thu)
by endecotp (guest, #36428)
[Link] (2 responses)
age(fieldname) < '1 week'::interval
does not use the index, while
fieldname > now()-'1 week'::interval
does use the index.
This is from a while ago, so I might have forgotten something and things will have changed.
If this is important to you, I suggest you ask on the mailing list. It is normally one of the more friendly places to ask such questions.
Posted May 6, 2010 21:02 UTC (Thu)
by nevyn (guest, #33129)
[Link] (1 responses)
I'm also assuming that current_date is fixed for the SELECT/transaction ... but that might not be true either, in which case it's impossible to optimize.
Posted May 10, 2010 16:07 UTC (Mon)
by intgr (subscriber, #39733)
[Link]
Obviously you can't do that, because the age() function is not stable -- its output changes every microsecond.
Heck, I didn't even know that there's an age() function. I always did it the old fasioned way: where some_timestamp > now() - interval '1 day'
Posted May 6, 2010 0:23 UTC (Thu)
by alankila (guest, #47141)
[Link] (1 responses)
Posted May 8, 2010 14:01 UTC (Sat)
by kleptog (subscriber, #1183)
[Link]
Other than this slight annoyance, it is just an anti-join which postgres can optimise just fine. It just needs to prove that the subquery has no NULLs first. There would need to be some work in the optimiser to see when this can be shown, and that hasn't happened.
Posted May 6, 2010 11:49 UTC (Thu)
by ringerc (subscriber, #3071)
[Link] (2 responses)
Pg doesn't scale out particularly well across many machines, though various 3rd party replication options address this at varying degrees of complexity cost (slony) etc. Hopefully SR+HS in 9.0 will help here, too, especially if 9.x grows automatic client referral or proxy capability to direct write attempts to to the master one way or the other. It's still perfectly usable for horizontal scaling (see: bucardo, skype, etc), just limited.
The big one to me is that Pg scales up rather poorly for a workload focused on one or two big, complex queries at a time. Each backend is single threaded and uses synchronous I/O so it's less able to request blocks from multiple fast storage devices to concurrently read indexes, table partitions, etc. It can use at most one core, which for CPU-heavy queries is crippling. These days it's hard to "just buy a faster CPU" - and easy to buy 10 slightly slower ones.
So, I guess where I think "where to go from here" is - in an ideal world - is "make Pg able to use the multi-core multi-CPU multi-disk/array grunt of modern machines to execute single complex queries faster".
Pity that'd be a near-total re-design :S
It doesn't help that partitioning is clunky and limited (fkey/ref issues, management, etc) which further limits scaling for big workloads.
I'd like to note that I _really_ like Pg, and use it heavily. Furthermore, these issues don't actually impact me all that much, though they do others. Pg keeps on getting better in significant ways that I use, all thanks to the hard work of Tom Lane, Robert Haas, and all the other experts on pgsql-hackers. While I'm bluntly describing perceived shortcomings in the above, I don't want that to detract from the general day-to-day awesome of the database.
Posted May 6, 2010 13:41 UTC (Thu)
by hazmat (subscriber, #668)
[Link] (1 responses)
Posted May 10, 2010 16:33 UTC (Mon)
by intgr (subscriber, #39733)
[Link]
Yeah right. Remember PGCluster? PGCluster II? Postgres-R? pgReplicator? Daffodil Replicator? GridSQL?
As far as I can tell, all of these projects touted to bring multi-node scalability to PostgreSQL, some more more ambitious than others, and some even claimed no drawbacks. Who uses them? If none of these projects managed to solve the problem, why would Postgres-XC?
As far as I can tell, the _only_ clustering and write-scalable system that anyone significant has used, is pgpool-II, and it places significant restrictions on the way you can write your application. Why do people keep thinking that it's a simple problem that can be solved once and for everyone?
Multi-master is just too complex for a database that aims to support ACID properties and all kinds of complex queries -- it actually tends to scale negatively. It's a cost-benefit tradeoff, but if you do outgrow monolithic SQL beasts, then NoSQL and eventual consistency is the way to go. I'll be glad to be proven wrong, however. :)
Haas: Big ideas [for PostgreSQL]
Haas: Big ideas [for PostgreSQL]
http://lost.co.nz/programming/epoch.html
...that's the sort of thing the db engine should be doing for you.
Haas: Big ideas [for PostgreSQL]
Haas: Big ideas [for PostgreSQL]
Haas: Big ideas [for PostgreSQL]
Haas: Big ideas [for PostgreSQL]
Haas: Big ideas [for PostgreSQL]
2. not counting the number of seconds
3. not having a base of 1970年01月01日T00:00:00
Haas: Big ideas [for PostgreSQL]
Haas: Big ideas [for PostgreSQL]
Haas: Big ideas [for PostgreSQL]
Haas: Big ideas [for PostgreSQL]
Haas: Big ideas [for PostgreSQL]
Haas: Big ideas [for PostgreSQL]
Haas: Big ideas [for PostgreSQL]
Haas: Big ideas [for PostgreSQL]
Haas: Big ideas [for PostgreSQL]
Haas: Big ideas [for PostgreSQL]
Copyright © 2010, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds