3

I'm wondering what the best data structure (for storing data on disk) is for storing immutable time-series data (99% of the data is truly immutable, the 1% is metadata that is separate from the immutable data). I've been looking at log-structured merge-trees in particular because of it's heavy use by Cassandra and the like.

asked Sep 10, 2013 at 8:30
7
  • The first question I have is what are the read and write characteristics. How often are new data entries created, how often is it read, how do you know what you need to read? do you need to search the data? Commented Sep 10, 2013 at 9:17
  • Based on what you have said, I'm of the view that either you are designing a large database (SQL/NoSQL) product, or you should probably be using one for your data store.... Commented Sep 10, 2013 at 9:44
  • Thats fine, sorry if I came across as abrupt, but its clarified that you are aware that you are taking on a major challenge, and have actually thought about it before hand. Commented Sep 10, 2013 at 9:53
  • I don't know how your queries look like, but can't you just use Cassandra or HBase and implement the missing query features on top of that technology? Commented Sep 10, 2013 at 13:56
  • 3
    It's becoming way too localized. Commented Sep 10, 2013 at 17:23

1 Answer 1

1

I do not really see what immutable has to do with this.

You simply store it in the normal way and choose not to update it.

Your problem seems to be how to deal with a high insert rate, which unless you are google, amazon or facebook will be easily handled by any modern database.

answered Sep 10, 2013 at 10:33
5
  • 1
    Being able to perform the queries over many TB of data is not trivial for any DB system, even if you are google / amazon or facebook. Commented Sep 10, 2013 at 11:05
  • Nevertheless, +1 because I agree; I don't really see what immutability has to do with storage. Store it in the usual way, but don't update it. Commented Sep 10, 2013 at 15:33
  • @Ptolemy -- Most commercial databases handle these volumes with ease. There are specialist products like Terradata or Netezza specially designed for these workloads. Given the posters emphasis on "avg" etc. he should stick with proven relational technology. Commented Sep 11, 2013 at 1:09
  • I'm not sure we know enough about the requirements of this project to be confident that Oracle SQL can scale to the requirements that this project has. Where I have worked with organisations that have faced these issues, only one has gone the Oracle route. They were the company already the most committed to Oracle, and the organisation I had the lowest view of their internal technical competence. That company has yet to demonstrate that Oracle has delivered their scalability needs. Commented Sep 11, 2013 at 8:45
  • @JamesAnderson I probably shouldn't have gone in such detail, because the problem isn't having to deal with a high insert rate. Cassandra and HBase handles this quite well. However, many of these databases don't support things like real-time ad-hoc queries (Cassandra) and HBase is extremely heavy, dependency heavy and hard to manage. The reason for the question was to support a very domain specific data type. Thus, I can eliminate many of the complexities in normal, generic database systems. I probably should've added that the data is all time-series data. Commented Sep 14, 2013 at 12:33

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.