2

I am using sqlite3 in an application to conduct an alalysis of distances between sets of points. Distances between these points have been precomputed using dijkstra's algorithim and left me with a table like:

 point | p0 p1 p2 ... p1500
 ------|--------------------------
 p0 | 0 100 15 ... X
 p1 | 100 0 10 ... Y
 p2 | 15 10 0 ... Z
 p150 | X Y Z ... 0

I will be doing lots of I/O on the table. For 10s of thousands of records I need to look up a set of points. Combine with other factors and determine which point is best for that record. Then repeat.

My question is... in terms of performance for a database (and sqlite in particular) would it better to leave the table as is or transform the table into a long format?

 aPoint bPoint | dist
 --------------|-----
 p0 p0 | 0
 p0 p1 | 100
 p0 p2 | 15
 p0 p1500 | X
 p1 p0 | 100
 p1 p1 | 0
 p1 p2 | 10
 p1 p1500 | Y
 p2 p0 | 15
 p2 p1 | 10
 p2 p2 | 0
 p2 p1500 | Z
 p1500 p0 | X
 p1500 p1 | Y
 p1500 p2 | Z
 p1500 p1500 | 0
asked Jan 25, 2016 at 15:12
5
  • 2
    Is Sqlite the only choice? Commented Jan 25, 2016 at 16:56
  • Yes. I'm embedding it into my C++ application. It also needs to be portable to run on other systems which don't have access to databases like MariaDB or postgresql. The other bennefit of using sqlite is I am putting the db on a tmpfs/ramdisk to speed up access. Commented Jan 25, 2016 at 18:08
  • In that case, I'll try to build the table like the second one you've described from the beginning. I'm not very familiar with sqlite, but I didn't heard many about it and database warehouse capabilities, which it usually turns into poor performance with high cardinality tables. Commented Jan 26, 2016 at 0:15
  • @Wanderer, can you share your insights? Did you test the performance? What choices did you eventually make? Commented Jun 11, 2020 at 19:20
  • That project ended prematurely a while ago. However, I do recall that I ended up transforming the table and creating an index on the point pairs to speed up processing. Commented Jun 15, 2020 at 12:51

2 Answers 2

2

Despite of the database you are using, normally you don't add thousands of fields to a table. Your second model is more 'normalized' ( and it's easy to index ), and should work better ( I'm not an expert, anyway ). My opinion is based on normal limits of databases, that are not meant to work that way, so you're going to hit some inner limit of the engine, or misusing the engine as it was designed for.

You can explore functions and datatypes of more modern versions of mysql, MariaDB and others, that have defined types for points and similar.

answered Jan 25, 2016 at 17:04
0

For the second example, this cuts data volume in half, and you can not make a mistake on a "reverse" entry. In the app check that A <= B before inserting, swap if not. Lexicographical comparison is OK. Just remember to adjust queries.

-- Distance from point PT_A to
-- point PT_B is DIST.
--
pt_dist {PT_A, PT_B, DIST}
 PK {PT_A, PT_B}
CHECK (PT_A <= PT_B)
answered Apr 26, 2020 at 18:06

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.