I am using sqlite3 in an application to conduct an alalysis of distances between sets of points. Distances between these points have been precomputed using dijkstra's algorithim and left me with a table like:
point | p0 p1 p2 ... p1500
------|--------------------------
p0 | 0 100 15 ... X
p1 | 100 0 10 ... Y
p2 | 15 10 0 ... Z
p150 | X Y Z ... 0
I will be doing lots of I/O on the table. For 10s of thousands of records I need to look up a set of points. Combine with other factors and determine which point is best for that record. Then repeat.
My question is... in terms of performance for a database (and sqlite in particular) would it better to leave the table as is or transform the table into a long format?
aPoint bPoint | dist
--------------|-----
p0 p0 | 0
p0 p1 | 100
p0 p2 | 15
p0 p1500 | X
p1 p0 | 100
p1 p1 | 0
p1 p2 | 10
p1 p1500 | Y
p2 p0 | 15
p2 p1 | 10
p2 p2 | 0
p2 p1500 | Z
p1500 p0 | X
p1500 p1 | Y
p1500 p2 | Z
p1500 p1500 | 0
-
2Is Sqlite the only choice?vegatripy– vegatripy2016年01月25日 16:56:57 +00:00Commented Jan 25, 2016 at 16:56
-
Yes. I'm embedding it into my C++ application. It also needs to be portable to run on other systems which don't have access to databases like MariaDB or postgresql. The other bennefit of using sqlite is I am putting the db on a tmpfs/ramdisk to speed up access.Wanderer– Wanderer2016年01月25日 18:08:11 +00:00Commented Jan 25, 2016 at 18:08
-
In that case, I'll try to build the table like the second one you've described from the beginning. I'm not very familiar with sqlite, but I didn't heard many about it and database warehouse capabilities, which it usually turns into poor performance with high cardinality tables.vegatripy– vegatripy2016年01月26日 00:15:28 +00:00Commented Jan 26, 2016 at 0:15
-
@Wanderer, can you share your insights? Did you test the performance? What choices did you eventually make?Davor Josipovic– Davor Josipovic2020年06月11日 19:20:17 +00:00Commented Jun 11, 2020 at 19:20
-
That project ended prematurely a while ago. However, I do recall that I ended up transforming the table and creating an index on the point pairs to speed up processing.Wanderer– Wanderer2020年06月15日 12:51:06 +00:00Commented Jun 15, 2020 at 12:51
2 Answers 2
Despite of the database you are using, normally you don't add thousands of fields to a table. Your second model is more 'normalized' ( and it's easy to index ), and should work better ( I'm not an expert, anyway ). My opinion is based on normal limits of databases, that are not meant to work that way, so you're going to hit some inner limit of the engine, or misusing the engine as it was designed for.
You can explore functions and datatypes of more modern versions of mysql, MariaDB and others, that have defined types for points and similar.
For the second example, this cuts data volume in half, and you can not make a mistake on a "reverse" entry. In the app check that A <= B
before inserting, swap if not. Lexicographical comparison is OK. Just remember to adjust queries.
-- Distance from point PT_A to
-- point PT_B is DIST.
--
pt_dist {PT_A, PT_B, DIST}
PK {PT_A, PT_B}
CHECK (PT_A <= PT_B)
Explore related questions
See similar questions with these tags.