Immutable Data Structure For Time Series Data

Question 1

I'm wondering what the best data structure (for storing data on disk) is for storing immutable time-series data (99% of the data is truly immutable, the 1% is metadata that is separate from the immutable data). I've been looking at log-structured merge-trees in particular because of it's heavy use by Cassandra and the like.

Question 2

The first question I have is what are the read and write characteristics. How often are new data entries created, how often is it read, how do you know what you need to read? do you need to search the data?

Question 3

Based on what you have said, I'm of the view that either you are designing a large database (SQL/NoSQL) product, or you should probably be using one for your data store....

Question 4

Thats fine, sorry if I came across as abrupt, but its clarified that you are aware that you are taking on a major challenge, and have actually thought about it before hand.

Question 5

I don't know how your queries look like, but can't you just use Cassandra or HBase and implement the missing query features on top of that technology?

Question 6

It's becoming way too localized.

Question 7

I do not really see what immutable has to do with this.

You simply store it in the normal way and choose not to update it.

Your problem seems to be how to deal with a high insert rate, which unless you are google, amazon or facebook will be easily handled by any modern database.

Question 8

Being able to perform the queries over many TB of data is not trivial for any DB system, even if you are google / amazon or facebook.

Question 9

Nevertheless, +1 because I agree; I don't really see what immutability has to do with storage. Store it in the usual way, but don't update it.

Question 10

@Ptolemy -- Most commercial databases handle these volumes with ease. There are specialist products like Terradata or Netezza specially designed for these workloads. Given the posters emphasis on "avg" etc. he should stick with proven relational technology.

Question 11

I'm not sure we know enough about the requirements of this project to be confident that Oracle SQL can scale to the requirements that this project has. Where I have worked with organisations that have faced these issues, only one has gone the Oracle route. They were the company already the most committed to Oracle, and the organisation I had the lowest view of their internal technical competence. That company has yet to demonstrate that Oracle has delivered their scalability needs.

Question 12

@JamesAnderson I probably shouldn't have gone in such detail, because the problem isn't having to deal with a high insert rate. Cassandra and HBase handles this quite well. However, many of these databases don't support things like real-time ad-hoc queries (Cassandra) and HBase is extremely heavy, dependency heavy and hard to manage. The reason for the question was to support a very domain specific data type. Thus, I can eliminate many of the complexities in normal, generic database systems. I probably should've added that the data is all time-series data.

James Anderson James Anderson 18.3k1 gold badge45 silver badges73 bronze badges · Answer 1 · 2013-09-10 10:33:32Z

1

I do not really see what immutable has to do with this.

You simply store it in the normal way and choose not to update it.

Your problem seems to be how to deal with a high insert rate, which unless you are google, amazon or facebook will be easily handled by any modern database.

Share

Improve this answer

answered Sep 10, 2013 at 10:33

James Anderson's user avatar

James Anderson James Anderson

18.3k1 gold badge45 silver badges73 bronze badges

5

1

Being able to perform the queries over many TB of data is not trivial for any DB system, even if you are google / amazon or facebook.

Michael Shaw
– Michael Shaw

2013年09月10日 11:05:36 +00:00
Commented Sep 10, 2013 at 11:05
Nevertheless, +1 because I agree; I don't really see what immutability has to do with storage. Store it in the usual way, but don't update it.

Robert Harvey
– Robert Harvey

2013年09月10日 15:33:36 +00:00
Commented Sep 10, 2013 at 15:33
@Ptolemy -- Most commercial databases handle these volumes with ease. There are specialist products like Terradata or Netezza specially designed for these workloads. Given the posters emphasis on "avg" etc. he should stick with proven relational technology.

James Anderson
– James Anderson

2013年09月11日 01:09:15 +00:00
Commented Sep 11, 2013 at 1:09
I'm not sure we know enough about the requirements of this project to be confident that Oracle SQL can scale to the requirements that this project has. Where I have worked with organisations that have faced these issues, only one has gone the Oracle route. They were the company already the most committed to Oracle, and the organisation I had the lowest view of their internal technical competence. That company has yet to demonstrate that Oracle has delivered their scalability needs.

Michael Shaw
– Michael Shaw

2013年09月11日 08:45:15 +00:00
Commented Sep 11, 2013 at 8:45
@JamesAnderson I probably shouldn't have gone in such detail, because the problem isn't having to deal with a high insert rate. Cassandra and HBase handles this quite well. However, many of these databases don't support things like real-time ad-hoc queries (Cassandra) and HBase is extremely heavy, dependency heavy and hard to manage. The reason for the question was to support a very domain specific data type. Thus, I can eliminate many of the complexities in normal, generic database systems. I probably should've added that the data is all time-series data.

Daniel
– Daniel

2013年09月14日 12:33:00 +00:00
Commented Sep 14, 2013 at 12:33

Add a comment |

Stack Exchange Network

Immutable Data Structure For Time Series Data

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Immutable Data Structure For Time Series Data

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions