Everyone knows that, in tables that use InnoDB as engine, queries like SELECT COUNT(*) FROM mytable
are very inexact and very slow, especially when the table gets bigger and there are constant row insertions/deletions while that query executes.
As I understood it, InnoDB doesn't store the row count in an internal variable, which is the reason for this problem.
My question is: Why is this so ? Would it be so hard to store such information ? It's an important information to know in so many situations. The only difficulty I see if such an internal count would be implemented is when transactions are involved: if the transaction is uncommitted, do you count the rows inserted by it or not ?
PS: I'm not an expert on DBs, I'm just someone who has MySQL as a simple hobby. So if I just asked something stupid, don't be excessively critical :D .
3 Answers 3
I agree with @RemusRusanu (+1 for his answer)
SELECT COUNT(*) FROM mydb.mytable
in InnoDB behaves like a transactional storage engine should. Compare it to MyISAM.
MyISAM
If mydb.mytable
is a MyISAM table, launching SELECT COUNT(*) FROM mydb.mytable;
is just like running SELECT table_rows FROM information_schema.table WHERE table_schema = 'mydb' AND table_name = 'mytable';
. This triggers a quick lookup of the row count in the header of the MyISAM table.
InnoDB
If mydb.mytable
is a InnoDB table, you get hodge-podge of things going on. You have MVCC going on, governing the following:
- ib_logfile0/ib_logfile1 (Redo Logs)
- ibdata1
- Undo Logs
- Rollbacks
- Data Dictionary Changes
- Buffer Pool Management
- Transaction Isolation (4 types)
- Repeatable Reads
- Read Committed
- Read Uncommitted
- Serializable
Asking InnoDB for a table count requires navigation through these ominous things. In fact, one never really knows if SELECT COUNT(*) from mydb.mytable
counts repeatable reads only or includes reads that have been committed and those that are uncommitted.
You could try to stabilize things a little by enabling innodb_stats_on_metadata.
According to the MySQL Documentation on innodb_stats_on_meta_data
When this variable is enabled (which is the default, as before the variable was created), InnoDB updates statistics during metadata statements such as SHOW TABLE STATUS or SHOW INDEX, or when accessing the INFORMATION_SCHEMA tables TABLES or STATISTICS. (These updates are similar to what happens for ANALYZE TABLE.) When disabled, InnoDB does not update statistics during these operations. Disabling this variable can improve access speed for schemas that have a large number of tables or indexes. It can also improve the stability of execution plans for queries that involve InnoDB tables.
Disabling it may or may not give you a more stable count in terms of setting up EXPLAIN plans. It may affect performance of SELECT COUNT(*) from mydb.mytable
in either a good way, bad way, or not at all. Give it a Try and See !!!
For starter there is no such thing as the 'current count' to store in a variable. A query like SELECT COUNT(*) FROM ...
is subject to the current isolation level and all concurrent pending transactions. Depending on the isolation level, the query can see or not see rows inserted or deleted by pending uncommitted transactions. The only way to answer is to count the rows that are visible to the current transaction.
Note that I did not even touch the even more thorny subject of concurrent transactions that start or end during the count. Not to mention rollbacks...
-
1Ok, so it's dependent on the isolation level, that makes sense. But it still can be implemented.Radu Murzea– Radu Murzea2012年05月16日 08:31:26 +00:00Commented May 16, 2012 at 8:31
-
@SoboLAN There are plenty of reasons why it shouldn't & can't be, most of which are listed above. Would you implement it by maintaining a list of counts per table per transaction start (whatever Oracle's SCN is in MySQL)? Managing such counts would be a massive overhead - think about a database with 100s or 1000s of concurrent sessions each doing large amounts of INSERTs/DELETEs on the same table. Impossible to maintain.Philᵀᴹ– Philᵀᴹ2012年05月16日 08:36:15 +00:00Commented May 16, 2012 at 8:36
-
Implementing this is quite difficult. Just think that the count has to be persisted in the DB, that means somewhere in the metadata, and this count has to be maintained by every transaction that inserts or deletes a row. How would you lock that metadata? And how would you handle rollbacks? Is far from trivial. And the result would be usable for a very very narrow subset of queries.Remus Rusanu– Remus Rusanu2012年05月16日 08:47:23 +00:00Commented May 16, 2012 at 8:47
-
3@JackDouglas Interesting. From what I've seen in the past
COUNT(*)
queries are seldom needed in reality & are usually the result of developer inexperience (count the rows before we select them!) or bad app design.Philᵀᴹ– Philᵀᴹ2012年05月16日 09:13:44 +00:00Commented May 16, 2012 at 9:13 -
1@SoboLAN - no, it wouldn't. Having a service that updates some sort of statistics table at predefined time intervals is much better. Imagine having a large database and several administrators querying most of the tables with
SELECT COUNT(*)
, add a non-optimizedWHERE
to the table and you'll have a few users bringing the db to its knees for several questionably-useful stat counters.N.B.– N.B.2012年05月16日 12:04:23 +00:00Commented May 16, 2012 at 12:04
While it would theoretically be possible to keep an accurate count of the number of rows for a given table with InnoDB, it would be at the cost of a lot of locking, which would negatively affect performance. It would also differ based on the isolation level.
MyISAM already does table level locking, so no extra cost there.
I seldom require a row count for a table, though I do use COUNT(*) quite a bit. I generally have a WHERE clause attached. Using an efficient index on a small result set, I find that they're fast enough.
I disagree that the counts are inaccurate. The counts represent a snapshot of the data, and I've always found them to be exact.
In short, MySQL leaves it up to you to implement this for InnoDB. You could store a count and increment/decrement it after each query. Though, the easier solution is probably to switch to MyISAM.
-
2It's not possible to keep an accurate count of the of rows in a transactional system. Because there are as many different (and correct) rowcounts as active transactions.user1822– user18222012年05月16日 20:19:44 +00:00Commented May 16, 2012 at 20:19
-
5I gave a -1 here for 'Though, the easier solution is probably to switch to MyISAM.' I would never recommend switching to MyISAM simply to get the row count.Derek Downey– Derek Downey2012年05月16日 20:30:04 +00:00Commented May 16, 2012 at 20:30
-
@a_horse_with_no_name, so you agree that there would be a "correct" rowcount for each transaction. Seems possible to me.Marcus Adams– Marcus Adams2012年05月17日 12:40:37 +00:00Commented May 17, 2012 at 12:40
-
1@DTest, I never said "simply to get the row count".Marcus Adams– Marcus Adams2012年05月17日 12:40:57 +00:00Commented May 17, 2012 at 12:40
-
1@a_horse_with_no_name, That doesn't seem right. Surely we are only counting the number of rows when the transactions gets committed right?Pacerier– Pacerier2015年04月09日 15:21:15 +00:00Commented Apr 9, 2015 at 15:21
SELECT COUNT(*) FROM ...
queries are precise. If you prefer, phpMyAdmin can be configured to always use exact row counts at the expense of speed. More info: stackoverflow.com/questions/11926259/…