I have a database table (created by someone else). This table consists over billions of records and records are being inserted every second or so.
I need to optimize this table and hence the queries to fetch stuffs faster. Following is the table ProductCatalog
structure.
id int(10)
SerialNumber varchar(20)
BasePrice decimal(4,1)
BatchCode tinyint(3)
Type varchar(5)
ItemCode varchar(5)
ArrivalDate datetime
InsertTimestamp int(10)
BrandID tinyint(3)
CompanyID tinyint(4)
Model varchar(10)
Description text
Here is the script to create table
CREATE TABLE `ProductCatalog` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`SerialNumber` varchar(20) DEFAULT NULL,
`BasePrice` decimal(4,1) DEFAULT NULL,
`BatchCode` tinyint(3) unsigned DEFAULT NULL,
`Type` varchar(5) DEFAULT NULL,
`ItemCode` varchar(5) DEFAULT NULL,
`ArrivalDate` datetime DEFAULT NULL,
`InsertTimestamp` int(10) unsigned NOT NULL,
`BrandID` tinyint(3) unsigned DEFAULT NULL,
`Model` varchar(10) NOT NULL DEFAULT 'RX209',
`Description` text,
PRIMARY KEY (`id`),
KEY `SerialNumber` (`SerialNumber`,`ArrivalDate`,`ItemCode`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
Here are the sample insert queries.
insert into `productcatalog` (`id`, `SerialNumber`, `BasePrice`, `BatchCode`, `Type`, `ItemCode`, `ArrivalDate`, `InsertTimestamp`, `BrandID`, `Model`, `Description`) values('1','345618761','23','25','ACC','ABCD','2021-02-14 12:24:44','1613371259','37','RX209',NULL);
insert into `productcatalog` (`id`, `SerialNumber`, `BasePrice`, `BatchCode`, `Type`, `ItemCode`, `ArrivalDate`, `InsertTimestamp`, `BrandID`, `Model`, `Description`) values('2','345618761','24','25','RAF','ABCD','2021-02-15 12:55:17','1613373031','45','GA317',NULL);
insert into `productcatalog` (`id`, `SerialNumber`, `BasePrice`, `BatchCode`, `Type`, `ItemCode`, `ArrivalDate`, `InsertTimestamp`, `BrandID`, `Model`, `Description`) values('3','569014575','43','21','DAT','TPRS','2021-02-13 12:56:34','1613373082','34','PX452',NULL);
There could be many many entries for same SerialNumber
and ItemCode
in Different ArrivalDate
Initially there were three indexes
1. id => Primary
2. SerialNumber
3. ArrivalDate
Following are the queries i run against this table.
SELECT *
FROM ProductCatalog
WHERE SerialNumber='1234567890'
AND ItemCode!="ABCD"
ORDER BY id DESC LIMIT 1; //This Query Seems slower
SELECT BasePrice
FROM ProductCatalog
WHERE SerialNumber='123456789'
AND ItemCode!="ABCD"
and ItemCode!="PQRS"
AND ItemCode!="MNOP"
ORDER BY ID Desc LIMIT 1 //This Query Seems Slower
SELECT *
FROM ProductCatalog
WHERE SerialNumber='123456789'
AND (ArrivalDate>='2019-01-01 00:00:00' AND ArrivalDate<='2020-12-31 23:59:59')
AND ItemCode='ABCD'
ORDER BY ArrivalDate ASC //This query looks ok
SELECT BatchCode
FROM ProductCatalog
WHERE SerialNumber='123456789'
AND ItemCode!="ABCD"
and ItemCode!="PQRS"
AND ItemCode!="MNOP"
ORDER BY ID Desc LIMIT 1 //This Query Seems Slower
Then I changed the indexes such that we have only two indexes now. The primary and the composite one.
1. id => Primary
2. SerialNumber, ArrivalDate, ItemCode
Mysql Information
MySQL Version 5.7
Engine: InnoDB
Problem
The results are still not that satisfactory.
- Are the indexes I changed correct to get performance gain?
- Are the order of
columns
in index Correct? - Column
SerialNumber
contains 16 digit numeric value, shall i changed it toint
instead ofvarchar
to gain performance?
1 Answer 1
"billions of records" and id int(10) unsigned NOT NULL AUTO_INCREMENT
-- Beware! That will top out at about 4 billion.
To speed up the inserts, see if you can batch them something liek this:
insert into `productcatalog`
(`id`, `SerialNumber`, `BasePrice`, `BatchCode`, `Type`, `ItemCode`,
`ArrivalDate`, `InsertTimestamp`, `BrandID`, `Model`, `Description`)
values
('1','345618761','23','25','ACC','ABCD','2021-02-14 12:24:44','1613371259','37','RX209',NULL),
('2','345618761','24','25','RAF','ABCD','2021-02-15 12:55:17','1613373031','45','GA317',NULL),
('3','569014575','43','21','DAT','TPRS','2021-02-13 12:56:34','1613373082','34','PX452',NULL);
For the SELECTs
, then indexes (with the columns in the order specified) should help:
INDEX(SerialNumber, ID, ItemCode, BasePrice) -- for queries 1,2
INDEX(ItemCode, SerialNumber, ArrivalDate) -- for query 3
Query 1 will (I think) use SerialNumber, ID
from the first index and stop after one or two rows.
Query 2 is "covering", so even it needs to scan several rows, it would have to bounce between the index's BTree and the data's BTree.
Query 3 -- The =
columns need to be first and the "range" column (Arrival Date
) last.
The following changes won't improve performance; I just think they are clearer. Instead of
AND ItemCode!="ABCD"
AND ItemCode!="PQRS"
AND ItemCode!="MNOP"
AND (ArrivalDate>='2019-01-01 00:00:00' AND ArrivalDate<='2020-12-31 23:59:59')
I prefer these:
AND ItemCode NOT IN ("ABCD", "PQRS", "MNOP")
AND ArrivalDate >= '2019-01-01'
AND ArrivalDate < '2019-01-01' + INTERVAL 1 YEAR
The following will explain some of my suggestions: http://mysql.rjweb.org/doc.php/index_cookbook_mysql
This seems odd for a "price": decimal(4,1)
-
Thanks, so third query should be changed like this this .
..SerialNumber='123456789' AND ItemCode='ABCD' AND (ArrivalDate>='2019年01月01日 00:00:00' AND ArrivalDate<='2020年12月31日 23:59:59')
. ? You have addedINTERVAL 1 YEAR
, but in my case the date range is selected by users, it could be any.WatsMyName– WatsMyName2021年02月15日 08:26:11 +00:00Commented Feb 15, 2021 at 8:26 -
1@WatsMyName - True. If you force the user to type that out, keep the range you describe. It might help to change the UI to imply >= and <, so as to not force the user to think about, for example "leap year" --
< 20..--03-01
implies "through February in all cases.Rick James– Rick James2021年02月15日 19:51:21 +00:00Commented Feb 15, 2021 at 19:51 -
There is one more select query against this table I missed, please see
4th select query in OP
. Do I need to add one more index like thisINDEX(SerialNumber, ID, ItemCode, BatchCode)
? Won't too many indexes slows down the queries? Do these indexes make any impact on insertion process, when insertion takes place too frequently? ThanksWatsMyName– WatsMyName2021年02月17日 05:38:38 +00:00Commented Feb 17, 2021 at 5:38 -
1It should get reasonable performance out or the first recommended index here as is. Or
BatchCode
could be appended to the first index to gain a covering gain there too as an alternate preference to a new index (its a small column so easy). There's an added insert cost with indexes however usually the query performance gains are usually worth adding them. < 5 non-primary is probably ok. After that consider, preferably measure, cost and benefits.danblack– danblack2021年02月17日 05:57:16 +00:00Commented Feb 17, 2021 at 5:57 -
@RickJames with that changes in index you recommended, still a query (
query 3
) is slower first time for particularSerialNumber
and next execution faster.ProductCatalog
has now following indexesINDEX(ItemCode, SerialNumber, ArrivalDate)
ANDINDEX(SerialNumber, ArrivalDate, ItemCode, BasePrice)
WatsMyName– WatsMyName2021年02月22日 06:29:06 +00:00Commented Feb 22, 2021 at 6:29
(SerialNumber, ItemCode)
index (also try(SerialNumber, ItemCode, id)
) - but in most cases the unequiation does not allow to use the index effectively.