Consider a simple relational database with two tables:
item_index: item_id|item_name
item_property: item_id|property_name
Each item has several properties, and I want to iterate over the collection of items in my database. For this I need to write a function with the following signature, where database denotes the type of database handles:
contents : database -> (item_name * (property_name list)) stream
This signature means that the function contents
returns a stream of pairs whose first member is the item name and the second member its list of properties.
There is two simple options to implement contents:
Concurrently require the stream of items and the stream of properties from the database, sorted appropriately, and aggregate the data by examining the two streams.
Require the stream of rows of an inner join, so that each item with its properties is represented by several rows, like
item_id|item_name|property_name
and aggregate the items from this data.
The second seems to put more overhead on the database, because of the join – probably packed as a view – while the first is significantly harder to program because of the concurrent access to the database.
Am I right thinking that implementing the contents
functions using concurrently requests amounts to poorly implement a join
operation in the application? This would imply that the second design is superior to the first as it leads to comparable time complexity and simpler code.
-
Concurrent access to the database also has a lot of overhead.Pieter B– Pieter B2015年05月06日 09:50:24 +00:00Commented May 6, 2015 at 9:50
-
3In general for most of these cases the JOIN would be the recommended solution. It will limit the requests, be more readable and the database has highly optimized code (and can use the indexes), so most likely much faster.thorsten müller– thorsten müller2015年05月06日 09:55:22 +00:00Commented May 6, 2015 at 9:55
-
@Michael - How often are Items added to the database? How often are Items and Item Properties assigned or modified?Michael Riley - AKA Gunny– Michael Riley - AKA Gunny2015年05月06日 13:00:16 +00:00Commented May 6, 2015 at 13:00
-
@MichaelRiley-AKAGunny This is a write once read several times system.Michaël Le Barbier– Michaël Le Barbier2015年05月06日 14:31:32 +00:00Commented May 6, 2015 at 14:31
1 Answer 1
In the general case, you're better off letting the database do the work.
Time complexity is actually a good reason to follow this advice. Unless you're lucky with your data, you'll end up needing to index one or both tables to ensure you have efficient lookup.
Your database will already have various strategies in place for this and the heuristics to do a decent job of finding the best one. one can, sometimes, do better by hand (as you may well have more information) but unless you know for certain that is the case (and always will be), it's best to leave it to the DB.
A similar argument can probably be made for the concurrency and sorting aspects. By letting the DB do all the work, you're also giving it more info to help optimise the retrieval. Again, it is conceivable that you doing part of it and the DB doing part of it is better but this would need to be carefully tested for and I would definitely err on the side of simplicity.