Why doesn't InnoDB store the row count?

Question 1

Everyone knows that, in tables that use InnoDB as engine, queries like SELECT COUNT(*) FROM mytable are very inexact and very slow, especially when the table gets bigger and there are constant row insertions/deletions while that query executes.

As I understood it, InnoDB doesn't store the row count in an internal variable, which is the reason for this problem.

My question is: Why is this so ? Would it be so hard to store such information ? It's an important information to know in so many situations. The only difficulty I see if such an internal count would be implemented is when transactions are involved: if the transaction is uncommitted, do you count the rows inserted by it or not ?

PS: I'm not an expert on DBs, I'm just someone who has MySQL as a simple hobby. So if I just asked something stupid, don't be excessively critical :D .

Question 2

Slow, yes. Inexact, no. It's slow because it gives the exact result. When you have a 200M rows table, and possibly many other transactions that insert/delete into the same table, possibly many rows per second, another question is "do you need the exact number?"

Question 3

@ypercube I know I saw a few times in phpmyadmin some row count values that were very off. Plus, there is a comment there saying something like "may not be accurate".

Question 4

@RaduMurzea phpMyAdmin users an alternate method to calculate table counts for InnoDB tables for the speed reasons you know about. This is where the innaccuracy you mentioned comes into play. Actual SELECT COUNT(*) FROM ... queries are precise. If you prefer, phpMyAdmin can be configured to always use exact row counts at the expense of speed. More info: stackoverflow.com/questions/11926259/…

Question 5

I agree with @RemusRusanu (+1 for his answer)

SELECT COUNT(*) FROM mydb.mytable in InnoDB behaves like a transactional storage engine should. Compare it to MyISAM.

MyISAM

If mydb.mytable is a MyISAM table, launching SELECT COUNT(*) FROM mydb.mytable; is just like running SELECT table_rows FROM information_schema.table WHERE table_schema = 'mydb' AND table_name = 'mytable';. This triggers a quick lookup of the row count in the header of the MyISAM table.

InnoDB

If mydb.mytable is a InnoDB table, you get hodge-podge of things going on. You have MVCC going on, governing the following:

ib_logfile0/ib_logfile1 (Redo Logs)
ibdata1
- Undo Logs
- Rollbacks
- Data Dictionary Changes
Buffer Pool Management
Transaction Isolation (4 types)
- Repeatable Reads
- Read Committed
- Read Uncommitted
- Serializable

Asking InnoDB for a table count requires navigation through these ominous things. In fact, one never really knows if SELECT COUNT(*) from mydb.mytable counts repeatable reads only or includes reads that have been committed and those that are uncommitted.

You could try to stabilize things a little by enabling innodb_stats_on_metadata.

According to the MySQL Documentation on innodb_stats_on_meta_data

When this variable is enabled (which is the default, as before the variable was created), InnoDB updates statistics during metadata statements such as SHOW TABLE STATUS or SHOW INDEX, or when accessing the INFORMATION_SCHEMA tables TABLES or STATISTICS. (These updates are similar to what happens for ANALYZE TABLE.) When disabled, InnoDB does not update statistics during these operations. Disabling this variable can improve access speed for schemas that have a large number of tables or indexes. It can also improve the stability of execution plans for queries that involve InnoDB tables.

Disabling it may or may not give you a more stable count in terms of setting up EXPLAIN plans. It may affect performance of SELECT COUNT(*) from mydb.mytable in either a good way, bad way, or not at all. Give it a Try and See !!!

Question 6

For starter there is no such thing as the 'current count' to store in a variable. A query like SELECT COUNT(*) FROM ... is subject to the current isolation level and all concurrent pending transactions. Depending on the isolation level, the query can see or not see rows inserted or deleted by pending uncommitted transactions. The only way to answer is to count the rows that are visible to the current transaction.

Note that I did not even touch the even more thorny subject of concurrent transactions that start or end during the count. Not to mention rollbacks...

Question 7

Ok, so it's dependent on the isolation level, that makes sense. But it still can be implemented.

Question 8

@SoboLAN There are plenty of reasons why it shouldn't & can't be, most of which are listed above. Would you implement it by maintaining a list of counts per table per transaction start (whatever Oracle's SCN is in MySQL)? Managing such counts would be a massive overhead - think about a database with 100s or 1000s of concurrent sessions each doing large amounts of INSERTs/DELETEs on the same table. Impossible to maintain.

Question 9

Implementing this is quite difficult. Just think that the count has to be persisted in the DB, that means somewhere in the metadata, and this count has to be maintained by every transaction that inserts or deletes a row. How would you lock that metadata? And how would you handle rollbacks? Is far from trivial. And the result would be usable for a very very narrow subset of queries.

Question 10

@JackDouglas Interesting. From what I've seen in the past COUNT(*) queries are seldom needed in reality & are usually the result of developer inexperience (count the rows before we select them!) or bad app design.

Question 11

@SoboLAN - no, it wouldn't. Having a service that updates some sort of statistics table at predefined time intervals is much better. Imagine having a large database and several administrators querying most of the tables with SELECT COUNT(*), add a non-optimized WHERE to the table and you'll have a few users bringing the db to its knees for several questionably-useful stat counters.

Question 12

While it would theoretically be possible to keep an accurate count of the number of rows for a given table with InnoDB, it would be at the cost of a lot of locking, which would negatively affect performance. It would also differ based on the isolation level.

MyISAM already does table level locking, so no extra cost there.

I seldom require a row count for a table, though I do use COUNT(*) quite a bit. I generally have a WHERE clause attached. Using an efficient index on a small result set, I find that they're fast enough.

I disagree that the counts are inaccurate. The counts represent a snapshot of the data, and I've always found them to be exact.

In short, MySQL leaves it up to you to implement this for InnoDB. You could store a count and increment/decrement it after each query. Though, the easier solution is probably to switch to MyISAM.

Question 13

It's not possible to keep an accurate count of the of rows in a transactional system. Because there are as many different (and correct) rowcounts as active transactions.

Question 14

I gave a -1 here for 'Though, the easier solution is probably to switch to MyISAM.' I would never recommend switching to MyISAM simply to get the row count.

Question 15

@a_horse_with_no_name, so you agree that there would be a "correct" rowcount for each transaction. Seems possible to me.

Question 16

@DTest, I never said "simply to get the row count".

Question 17

@a_horse_with_no_name, That doesn't seem right. Surely we are only counting the number of rows when the transactions gets committed right?

score 9 · Accepted Answer · 2012-05-16 15:33:36Z

I agree with @RemusRusanu (+1 for his answer)

SELECT COUNT(*) FROM mydb.mytable in InnoDB behaves like a transactional storage engine should. Compare it to MyISAM.

MyISAM

If mydb.mytable is a MyISAM table, launching SELECT COUNT(*) FROM mydb.mytable; is just like running SELECT table_rows FROM information_schema.table WHERE table_schema = 'mydb' AND table_name = 'mytable';. This triggers a quick lookup of the row count in the header of the MyISAM table.

InnoDB

If mydb.mytable is a InnoDB table, you get hodge-podge of things going on. You have MVCC going on, governing the following:

ib_logfile0/ib_logfile1 (Redo Logs)
ibdata1
- Undo Logs
- Rollbacks
- Data Dictionary Changes
Buffer Pool Management
Transaction Isolation (4 types)
- Repeatable Reads
- Read Committed
- Read Uncommitted
- Serializable

Asking InnoDB for a table count requires navigation through these ominous things. In fact, one never really knows if SELECT COUNT(*) from mydb.mytable counts repeatable reads only or includes reads that have been committed and those that are uncommitted.

You could try to stabilize things a little by enabling innodb_stats_on_metadata.

According to the MySQL Documentation on innodb_stats_on_meta_data

When this variable is enabled (which is the default, as before the variable was created), InnoDB updates statistics during metadata statements such as SHOW TABLE STATUS or SHOW INDEX, or when accessing the INFORMATION_SCHEMA tables TABLES or STATISTICS. (These updates are similar to what happens for ANALYZE TABLE.) When disabled, InnoDB does not update statistics during these operations. Disabling this variable can improve access speed for schemas that have a large number of tables or indexes. It can also improve the stability of execution plans for queries that involve InnoDB tables.

Disabling it may or may not give you a more stable count in terms of setting up EXPLAIN plans. It may affect performance of SELECT COUNT(*) from mydb.mytable in either a good way, bad way, or not at all. Give it a Try and See !!!

Stack Exchange Network

Why doesn't InnoDB store the row count?

3 Answers 3

MyISAM

InnoDB

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Why doesn't InnoDB store the row count?

3 Answers 3

MyISAM

InnoDB

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions