check patterns that don't exist in sqlite

Question 1

I explained a similar situation with plain text files on Grep huge number of patterns from huge file. Many people there said I should, so now I'm migrating my data to a sqlite database:

I have a file from which I extract about 10,000 patterns. Then I check if the database doesn't contain such patterns. If it doesn't, I need to save them externally in file for further processing:

for id in $(grep ^[0-9] keys); do
 if [[ -z $(sqlite3 db.sqlite "select id from main where id = $id") ]]; then
 echo $id >>file
 fi
done

Since I'm new to SQL, I couldn't find a simple way to do this. Also, this loop is useless as it is 20 times slower than what I achieved with awk on the mentioned URL.

Since the database is huge, keeps growing, and I run this loop very frequently, is it possible to make this faster?

Question 2

For each pattern, you're invoking a new instance of the sqlite program which connects to the database anew. That's a waste. You should build a single query that looks for any of the keys, then execute that one query. Database clients are good at executing large queries.

If the matching lines in the keys file only contain digits, then you can build the query as follows:

{
 echo 'select id from main where id in (';
 <keys grep -x '[0-9][0-9]*' | # retain only lines containing only digits
 sed -e '1! s/^/, /' | # add ", " at the beginning of every line except the first
 echo ');'
} | sqlite3 db.sqlite

For more general input data, you get the idea: use text transformations to build a single large query. Be careful to validate your input; here we make sure that what gets injected into the query is syntactically valid. There's actually a corner case in the example above: if there is no match in the file, then the SQL syntax is invalid; if that might happen, you'll need to treat this case specially. Here's more complex code that takes care of the empty case:

<keys grep -x '[0-9][0-9]*' |
if read first; then {
 echo 'select id from main where id in (' "$first"
 sed -e 's/^/, /'
 echo ');'
 } | sqlite3 db.sqlite
fi

Question 3

First things first, you really replace the if with a list. Actually I would even replace the [[]]s with []s, and then run in dash or other lighter sh. This even seems simple enough to ditch the entire for, and run with xargs (always my preference, better performance) So for example, maybe something like this ...

grep ^[0-9] keys | xargs -P0 -I '{id}' \
sh -c '[ -z "$(sqlite3 db.sqlite =\"select id from main where id = '{id}'\")" ] && \
echo '{id}' >> file'

My escaping is most likely off, but this should point you in the right direction. I would suspect this would perform MUCH faster, least because you would be running parallel via -P.

If for some reason even this is crawling, you could always look into something that dumps the sqlite db. You would likely be writing a script, if you took that approach. I would only consider it, if it was necessary.

Question 4

Import your IDs into some new table, then use that table to query your main table. Drop the table whet you're done with it.

{ echo id; grep '^[0-9]' keys; } |
sqlite3 database.db \
 'CREATE TABLE ids ( id INTEGER UNIQUE )' \
 '.import /dev/stdin ids' \
 'SELECT * FROM main NATURAL JOIN ids' \
 'DROP TABLE ids'

Testing:

sqlite> .mode box
sqlite> SELECT * FROM main;
┌────┬─────────────────┐
│ id │ word │
├────┼─────────────────┤
│ 1 │ concessionaire │
│ 2 │ goniometrically │
│ 3 │ meshed │
│ 4 │ Celtic │
│ 5 │ guiltless │
│ 6 │ sclerodactylia │
│ 7 │ spiritism │
│ 8 │ ratchel │
│ 9 │ Bajau │
│ 10 │ semimineral │
└────┴─────────────────┘

$ cat keys
3
7
78
190

Running the given command would output the following (select id instead of * to only output the IDs):

3|meshed
7|spiritism

score 1 · Accepted Answer · 2012-02-01 00:39:04Z

For each pattern, you're invoking a new instance of the sqlite program which connects to the database anew. That's a waste. You should build a single query that looks for any of the keys, then execute that one query. Database clients are good at executing large queries.

If the matching lines in the keys file only contain digits, then you can build the query as follows:

{
 echo 'select id from main where id in (';
 <keys grep -x '[0-9][0-9]*' | # retain only lines containing only digits
 sed -e '1! s/^/, /' | # add ", " at the beginning of every line except the first
 echo ');'
} | sqlite3 db.sqlite

For more general input data, you get the idea: use text transformations to build a single large query. Be careful to validate your input; here we make sure that what gets injected into the query is syntactically valid. There's actually a corner case in the example above: if there is no match in the file, then the SQL syntax is invalid; if that might happen, you'll need to treat this case specially. Here's more complex code that takes care of the empty case:

<keys grep -x '[0-9][0-9]*' |
if read first; then {
 echo 'select id from main where id in (' "$first"
 sed -e 's/^/, /'
 echo ');'
 } | sqlite3 db.sqlite
fi

Stack Exchange Network

check patterns that don't exist in sqlite

3 Answers 3

You must log in to answer this question.

Linked

Hot Network Questions

check patterns that don't exist in sqlite

3 Answers 3

You must log in to answer this question.

Linked

Related

Hot Network Questions