6

Here is another database problem I stumbled upon.

I have a date-range partitioned MyIsam look-up table with 200M records and ~150 columns. On this Table I need to perform cascading SELECT-Statements to filter the data. Output:

filter 126M 
filter 110M 
filter 40M 
filter 5M 
filter 100k

Every single SELECT is highly complex with regex (=no index possible) and multiple comparisons, which is why I want them to query the least amount of rows possible.

There are about 500 unique filters and around 200 constant users. Every filter needs to be run for each user, in total around 100k combinations.

Big question: Is there a way for each subsequent SELECT statement to query only the previous subset?

Example: Filter #5 should only have to query the 5M rows out of query 4 to get those 100k results. At the moment it has to scan through all 200M records.

EDIT current approach: cache table

CREATE TABLE IF NOT EXISTS `cache` (
 `filter_id` int(11) NOT NULL,
 `user_id` int(11) NOT NULL,
 `lookup_id` int(11) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
ALTER TABLE `cache`
 ADD PRIMARY KEY (`filter_id`,`user_id`);

This would contain the relation between individual data-rows from the lookup table and the filters. PLUS I'd be able to use the rrimary index to get all of the lookup_ids from the previous filter.

Query for subsequent filters:

SELECT SUM( column), COUNT(*)
 FROM cache c
 LEFT JOIN lookup_table l ON c.lookup_id= l.id
 WHERE
 c.filter_id = 1
 AND c. user_id= x
 AND l.regex_column = preg_rlike...
asked Apr 22, 2015 at 11:30
5
  • There is nothing builtin for cascading SELECTs. Show us some more details about the table and the queries; we may be able to suggest a different way to solve the overall problem. Commented Apr 22, 2015 at 21:07
  • edited the question with my current approach. the SELECT queries sum up 10 values and calculate an average. WHERE columns contain mostly regex or wildcard "like" search without being able to use indizies Commented Apr 23, 2015 at 9:03
  • Every single SELECT is highly complex with regex (=no index possible) and multiple comparisons, which is why I want them to query the least amount of rows possible.- have you looked at indexing GENERATED columns - you can perhaps index regexes? Not 100% on this! Commented Jan 14, 2021 at 11:49
  • @Vérace we endet up pre-calculating the regex results per row and saving them into a separate Table, basically a static view, to get them indexed perfectly. Indexing GENERATED columns sounds interesting, that might work too Commented Jan 15, 2021 at 12:14
  • OK - you have to at version 8 (AFAIK) of MySQL - really should be latest of 8 - lots of goodies - if you could give a sample of the code in your regexes? MySQL's implementation (even in 8.0.22) is, ahem..., patchy to say the least... check here Commented Jan 15, 2021 at 12:39

1 Answer 1

1

If filter 110M is exactly a subset of filter 126M, then appending more AND with WHERE would have done the job.

$sql1 = "SELECT ..... WHERE ...";
$sql2 = $sql1 . "AND column-name = ....";
$sql3 = $sql2 . "AND column-name = ....";

If that is complex to accomplish try to CREATE VIEW of the previous SELECT statement and the next SELECT statement should query from the view.

answered Apr 22, 2015 at 12:00
1
  • yes, each subsequent query is a subset of the previous query. The dynamic - AND structure is working right now, but scanning the whole table (200M rows) each time which is painfully slow. i would need a view for every filter/user combination, which totals in about 100k additional tables. Each row needs 0.5kb of space, the average row count of each view would be 10M. That still comes out to more that 500TB of data, no way my system could handle that :( Commented Apr 23, 2015 at 8:53

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.