0

I have a database table (created by someone else). This table consists over billions of records and records are being inserted every second or so.

I need to optimize this table and hence the queries to fetch stuffs faster. Following is the table ProductCatalog structure.

id int(10)
SerialNumber varchar(20) 
BasePrice decimal(4,1) 
BatchCode tinyint(3)
Type varchar(5) 
ItemCode varchar(5) 
ArrivalDate datetime 
InsertTimestamp int(10)
BrandID tinyint(3)
CompanyID tinyint(4) 
Model varchar(10) 
Description text 

Here is the script to create table

CREATE TABLE `ProductCatalog` (
 `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
 `SerialNumber` varchar(20) DEFAULT NULL,
 `BasePrice` decimal(4,1) DEFAULT NULL,
 `BatchCode` tinyint(3) unsigned DEFAULT NULL,
 `Type` varchar(5) DEFAULT NULL,
 `ItemCode` varchar(5) DEFAULT NULL,
 `ArrivalDate` datetime DEFAULT NULL,
 `InsertTimestamp` int(10) unsigned NOT NULL,
 `BrandID` tinyint(3) unsigned DEFAULT NULL,
 `Model` varchar(10) NOT NULL DEFAULT 'RX209',
 `Description` text,
 PRIMARY KEY (`id`),
 KEY `SerialNumber` (`SerialNumber`,`ArrivalDate`,`ItemCode`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

Here are the sample insert queries.

insert into `productcatalog` (`id`, `SerialNumber`, `BasePrice`, `BatchCode`, `Type`, `ItemCode`, `ArrivalDate`, `InsertTimestamp`, `BrandID`, `Model`, `Description`) values('1','345618761','23','25','ACC','ABCD','2021-02-14 12:24:44','1613371259','37','RX209',NULL);
insert into `productcatalog` (`id`, `SerialNumber`, `BasePrice`, `BatchCode`, `Type`, `ItemCode`, `ArrivalDate`, `InsertTimestamp`, `BrandID`, `Model`, `Description`) values('2','345618761','24','25','RAF','ABCD','2021-02-15 12:55:17','1613373031','45','GA317',NULL);
insert into `productcatalog` (`id`, `SerialNumber`, `BasePrice`, `BatchCode`, `Type`, `ItemCode`, `ArrivalDate`, `InsertTimestamp`, `BrandID`, `Model`, `Description`) values('3','569014575','43','21','DAT','TPRS','2021-02-13 12:56:34','1613373082','34','PX452',NULL);

There could be many many entries for same SerialNumber and ItemCodein Different ArrivalDate

Initially there were three indexes

1. id => Primary
2. SerialNumber
3. ArrivalDate

Following are the queries i run against this table.

SELECT * 
FROM ProductCatalog 
WHERE SerialNumber='1234567890' 
 AND ItemCode!="ABCD" 
ORDER BY id DESC LIMIT 1; //This Query Seems slower
 
SELECT BasePrice 
FROM ProductCatalog 
WHERE SerialNumber='123456789' 
 AND ItemCode!="ABCD" 
 and ItemCode!="PQRS" 
 AND ItemCode!="MNOP" 
ORDER BY ID Desc LIMIT 1 //This Query Seems Slower
 
SELECT * 
FROM ProductCatalog 
WHERE SerialNumber='123456789' 
 AND (ArrivalDate>='2019-01-01 00:00:00' AND ArrivalDate<='2020-12-31 23:59:59') 
 AND ItemCode='ABCD' 
ORDER BY ArrivalDate ASC //This query looks ok
SELECT BatchCode
FROM ProductCatalog 
WHERE SerialNumber='123456789' 
 AND ItemCode!="ABCD" 
 and ItemCode!="PQRS" 
 AND ItemCode!="MNOP" 
ORDER BY ID Desc LIMIT 1 //This Query Seems Slower

Then I changed the indexes such that we have only two indexes now. The primary and the composite one.

1. id => Primary
2. SerialNumber, ArrivalDate, ItemCode

Mysql Information

MySQL Version 5.7
Engine: InnoDB

Problem

The results are still not that satisfactory.

  1. Are the indexes I changed correct to get performance gain?
  2. Are the order of columns in index Correct?
  3. Column SerialNumber contains 16 digit numeric value, shall i changed it to int instead of varchar to gain performance?
asked Feb 15, 2021 at 5:39
3
  • Show table structure as complete CREATE TABLE script, including indices. Show some sample data (2-3 rows). Commented Feb 15, 2021 at 6:10
  • You may try to improve first 2 queries by creating (SerialNumber, ItemCode) index (also try (SerialNumber, ItemCode, id)) - but in most cases the unequiation does not allow to use the index effectively. Commented Feb 15, 2021 at 7:17
  • How does this question differ from dba.stackexchange.com/questions/285600/… Commented Feb 19, 2021 at 19:29

1 Answer 1

2

"billions of records" and id int(10) unsigned NOT NULL AUTO_INCREMENT -- Beware! That will top out at about 4 billion.

To speed up the inserts, see if you can batch them something liek this:

insert into `productcatalog`
 (`id`, `SerialNumber`, `BasePrice`, `BatchCode`, `Type`, `ItemCode`,
 `ArrivalDate`, `InsertTimestamp`, `BrandID`, `Model`, `Description`)
 values
 ('1','345618761','23','25','ACC','ABCD','2021-02-14 12:24:44','1613371259','37','RX209',NULL),
 ('2','345618761','24','25','RAF','ABCD','2021-02-15 12:55:17','1613373031','45','GA317',NULL),
 ('3','569014575','43','21','DAT','TPRS','2021-02-13 12:56:34','1613373082','34','PX452',NULL);

For the SELECTs, then indexes (with the columns in the order specified) should help:

INDEX(SerialNumber, ID, ItemCode, BasePrice) -- for queries 1,2
INDEX(ItemCode, SerialNumber, ArrivalDate) -- for query 3

Query 1 will (I think) use SerialNumber, ID from the first index and stop after one or two rows.

Query 2 is "covering", so even it needs to scan several rows, it would have to bounce between the index's BTree and the data's BTree.

Query 3 -- The = columns need to be first and the "range" column (Arrival Date) last.

The following changes won't improve performance; I just think they are clearer. Instead of

AND ItemCode!="ABCD" 
AND ItemCode!="PQRS" 
AND ItemCode!="MNOP" 
AND (ArrivalDate>='2019-01-01 00:00:00' AND ArrivalDate<='2020-12-31 23:59:59') 

I prefer these:

AND ItemCode NOT IN ("ABCD", "PQRS", "MNOP")
AND ArrivalDate >= '2019-01-01'
AND ArrivalDate < '2019-01-01' + INTERVAL 1 YEAR

The following will explain some of my suggestions: http://mysql.rjweb.org/doc.php/index_cookbook_mysql

This seems odd for a "price": decimal(4,1)

answered Feb 15, 2021 at 7:42
7
  • Thanks, so third query should be changed like this this ...SerialNumber='123456789' AND ItemCode='ABCD' AND (ArrivalDate>='2019年01月01日 00:00:00' AND ArrivalDate<='2020年12月31日 23:59:59') . ? You have added INTERVAL 1 YEAR, but in my case the date range is selected by users, it could be any. Commented Feb 15, 2021 at 8:26
  • 1
    @WatsMyName - True. If you force the user to type that out, keep the range you describe. It might help to change the UI to imply >= and <, so as to not force the user to think about, for example "leap year" -- < 20..--03-01 implies "through February in all cases. Commented Feb 15, 2021 at 19:51
  • There is one more select query against this table I missed, please see 4th select query in OP. Do I need to add one more index like this INDEX(SerialNumber, ID, ItemCode, BatchCode) ? Won't too many indexes slows down the queries? Do these indexes make any impact on insertion process, when insertion takes place too frequently? Thanks Commented Feb 17, 2021 at 5:38
  • 1
    It should get reasonable performance out or the first recommended index here as is. Or BatchCode could be appended to the first index to gain a covering gain there too as an alternate preference to a new index (its a small column so easy). There's an added insert cost with indexes however usually the query performance gains are usually worth adding them. < 5 non-primary is probably ok. After that consider, preferably measure, cost and benefits. Commented Feb 17, 2021 at 5:57
  • @RickJames with that changes in index you recommended, still a query (query 3) is slower first time for particular SerialNumber and next execution faster. ProductCatalog has now following indexes INDEX(ItemCode, SerialNumber, ArrivalDate) AND INDEX(SerialNumber, ArrivalDate, ItemCode, BasePrice) Commented Feb 22, 2021 at 6:29

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.