1

Consider a simple relational database with two tables:

item_index: item_id|item_name
item_property: item_id|property_name

Each item has several properties, and I want to iterate over the collection of items in my database. For this I need to write a function with the following signature, where database denotes the type of database handles:

contents : database -> (item_name * (property_name list)) stream

This signature means that the function contents returns a stream of pairs whose first member is the item name and the second member its list of properties.

There is two simple options to implement contents:

  1. Concurrently require the stream of items and the stream of properties from the database, sorted appropriately, and aggregate the data by examining the two streams.

  2. Require the stream of rows of an inner join, so that each item with its properties is represented by several rows, like item_id|item_name|property_name and aggregate the items from this data.

The second seems to put more overhead on the database, because of the join – probably packed as a view – while the first is significantly harder to program because of the concurrent access to the database.

Am I right thinking that implementing the contents functions using concurrently requests amounts to poorly implement a join operation in the application? This would imply that the second design is superior to the first as it leads to comparable time complexity and simpler code.

asked May 6, 2015 at 9:36
4
  • Concurrent access to the database also has a lot of overhead. Commented May 6, 2015 at 9:50
  • 3
    In general for most of these cases the JOIN would be the recommended solution. It will limit the requests, be more readable and the database has highly optimized code (and can use the indexes), so most likely much faster. Commented May 6, 2015 at 9:55
  • @Michael - How often are Items added to the database? How often are Items and Item Properties assigned or modified? Commented May 6, 2015 at 13:00
  • @MichaelRiley-AKAGunny This is a write once read several times system. Commented May 6, 2015 at 14:31

1 Answer 1

4

In the general case, you're better off letting the database do the work.

Time complexity is actually a good reason to follow this advice. Unless you're lucky with your data, you'll end up needing to index one or both tables to ensure you have efficient lookup.

Your database will already have various strategies in place for this and the heuristics to do a decent job of finding the best one. one can, sometimes, do better by hand (as you may well have more information) but unless you know for certain that is the case (and always will be), it's best to leave it to the DB.

A similar argument can probably be made for the concurrency and sorting aspects. By letting the DB do all the work, you're also giving it more info to help optimise the retrieval. Again, it is conceivable that you doing part of it and the DB doing part of it is better but this would need to be carefully tested for and I would definitely err on the side of simplicity.

answered May 6, 2015 at 10:03

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.