how to cache a subset for cascading select queries in mysql

Question 1

Here is another database problem I stumbled upon.

I have a date-range partitioned MyIsam look-up table with 200M records and ~150 columns. On this Table I need to perform cascading SELECT-Statements to filter the data. Output:

filter 126M 
filter 110M 
filter 40M 
filter 5M 
filter 100k

Every single SELECT is highly complex with regex (=no index possible) and multiple comparisons, which is why I want them to query the least amount of rows possible.

There are about 500 unique filters and around 200 constant users. Every filter needs to be run for each user, in total around 100k combinations.

Big question: Is there a way for each subsequent SELECT statement to query only the previous subset?

Example: Filter #5 should only have to query the 5M rows out of query 4 to get those 100k results. At the moment it has to scan through all 200M records.

EDIT current approach: cache table

CREATE TABLE IF NOT EXISTS `cache` (
 `filter_id` int(11) NOT NULL,
 `user_id` int(11) NOT NULL,
 `lookup_id` int(11) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
ALTER TABLE `cache`
 ADD PRIMARY KEY (`filter_id`,`user_id`);

This would contain the relation between individual data-rows from the lookup table and the filters. PLUS I'd be able to use the rrimary index to get all of the lookup_ids from the previous filter.

Query for subsequent filters:

SELECT SUM( column), COUNT(*)
 FROM cache c
 LEFT JOIN lookup_table l ON c.lookup_id= l.id
 WHERE
 c.filter_id = 1
 AND c. user_id= x
 AND l.regex_column = preg_rlike...

Question 2

There is nothing builtin for cascading SELECTs. Show us some more details about the table and the queries; we may be able to suggest a different way to solve the overall problem.

Question 3

edited the question with my current approach. the SELECT queries sum up 10 values and calculate an average. WHERE columns contain mostly regex or wildcard "like" search without being able to use indizies

Question 4

Every single SELECT is highly complex with regex (=no index possible) and multiple comparisons, which is why I want them to query the least amount of rows possible.

- have you looked at indexing GENERATED columns - you can perhaps index regexes? Not 100% on this!

Question 5

@Vérace we endet up pre-calculating the regex results per row and saving them into a separate Table, basically a static view, to get them indexed perfectly. Indexing GENERATED columns sounds interesting, that might work too

Question 6

OK - you have to at version 8 (AFAIK) of MySQL - really should be latest of 8 - lots of goodies - if you could give a sample of the code in your regexes? MySQL's implementation (even in 8.0.22) is, ahem..., patchy to say the least... check here

Question 7

If filter 110M is exactly a subset of filter 126M, then appending more AND with WHERE would have done the job.

$sql1 = "SELECT ..... WHERE ...";
$sql2 = $sql1 . "AND column-name = ....";
$sql3 = $sql2 . "AND column-name = ....";

If that is complex to accomplish try to CREATE VIEW of the previous SELECT statement and the next SELECT statement should query from the view.

Question 8

yes, each subsequent query is a subset of the previous query. The dynamic - AND structure is working right now, but scanning the whole table (200M rows) each time which is painfully slow. i would need a view for every filter/user combination, which totals in about 100k additional tables. Each row needs 0.5kb of space, the average row count of each view would be 10M. That still comes out to more that 500TB of data, no way my system could handle that :(

mysqlrockstar mysqlrockstar 8098 silver badges18 bronze badges · Answer 1 · 2015-04-22 12:00:22Z

1

If filter 110M is exactly a subset of filter 126M, then appending more AND with WHERE would have done the job.

$sql1 = "SELECT ..... WHERE ...";
$sql2 = $sql1 . "AND column-name = ....";
$sql3 = $sql2 . "AND column-name = ....";

If that is complex to accomplish try to CREATE VIEW of the previous SELECT statement and the next SELECT statement should query from the view.

Share

Improve this answer

edited Apr 22, 2015 at 12:14

answered Apr 22, 2015 at 12:00

mysqlrockstar's user avatar

mysqlrockstar mysqlrockstar

8098 silver badges18 bronze badges

1

yes, each subsequent query is a subset of the previous query. The dynamic - AND structure is working right now, but scanning the whole table (200M rows) each time which is painfully slow. i would need a view for every filter/user combination, which totals in about 100k additional tables. Each row needs 0.5kb of space, the average row count of each view would be 10M. That still comes out to more that 500TB of data, no way my system could handle that :(

C-D
– C-D

2015年04月23日 08:53:45 +00:00
Commented Apr 23, 2015 at 8:53

Add a comment |

Stack Exchange Network

how to cache a subset for cascading select queries in mysql

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

how to cache a subset for cascading select queries in mysql

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions