Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability (QCon)
Normalisation is, in many ways, the antithesis of typical cache design. We tend to denormalise for speed. Building a data store (rather than a cache) is a little different: Manageability, versioning, bi-temporal reconstitution become more important factors. Normalisation helps solve these problems but normalisation in distributed architectures suffers from problems of distributed joins, requiring iterative network calls.
We’ve developed a mechanism for managing normalisation based on a variant of the Star Schema model used in data warehousing. In our implementation Facts are held distributed (partitioned) in the data nodes and Dimensions are replicated throughout the query-processing nodes. To save space we track ‘used’, or as we term them ‘connected’ data, to ensure only useful objects are replicated.
This model was presented at the QCon 2011 and at the Coherence SIG.
You can find the slides here (Powerpoint – 7MB).
See Also:
23:58 GMT
Nice presentation… especially about handling complex graphs using the snowflake schema.
6:13 GMT
Thanks Ashwin. Appreciated. QCon actually published the slides today with me rambling on in the background! I’m not sure if it’s a good thing or not though. http://www.infoq.com/presentations/ODC-Beyond-The-Data-Grid
10:16 GMT
Could you update links to the slides, because both of them are dead? I attended your presentation at JavaOne. and It was very interesting.
15:09 GMT
Hi Vasily
I’m very pleased you found it useful. I’ve updated the links on this page so you should now be able to download the ppt or pdf. Sorry they are so large. I’ve also pointed to the version I presented @JavaOne which is slightly different to this one.
B
14:13 GMT
Brilliant work. I enjoy reading all your work.
15:21 GMT
Thanks Lyndon. Appreciated!
B
22:05 GMT
I wanted to thank you for this fantastic read!! I definitely loved every bit of it.
I have got you book-marked to check out new things you post...