Data structure for storing sorted data

Question 1

What's a good way to store loads of sorted string on a file so that inserts and reads are fast?

I could store it up in a tree, and read/modify/store it as needed, but that seems wasteful for operations like 'seeking the nth object'. Isn't there an indexed data structure that's used specifically to store and retrieve sorted information?

Of course a database would work, but how databases store their index for faster access? I doubt they linear scan trough an index for figuring out the nth element of a list and they read/write all the index to update a single row in the middle.

Question 2

Trees are the standard approach for ordered indexes.

Question 3

What makes you believe a tree is "wasteful"? Anything that uses anything more than the physical order of records to store information about their logical order must contain additional structures that take up space, and you can't get much smaller than the structures in a simple tree.

Question 4

I'm referring to the naive implementation of storing a tree as a flat file, so one would have to seek it all before doing anything, like seeking for the 46th record and that alone

Question 5

I think your question is currently underspecified. Are you after a structure that provides fast inserts and reads by numerical index (like arrays) or are you after fast inserts and reads by key (like trees and hash tables) or some kind of as-yet-undefined compromise between the two? I'm pretty sure most database queries boil down to reads/inserts by key rather than index so various kinds of trees and hashes are exactly what they'd focus on (such as the B-Trees the existing answer mentions).

Question 6

If you are talking about huge data, you should consider using BTree data structure (or its variants). BTrees are used by most relational databases for indexing huge data.

You can check B-tree on Wikipedia for more details.

Based on the amount of data you have to deal with, your objective functions vary. For example, if you have to deal with very huge data, your objective function is to optimize (minimize) the number of disk reads(/seeks). In case you have to deal with small set of data (that can fit in your RAM), your objective function is to optimize (minimize) the number of CPU cycles.

prasadr prasadr 193 bronze badges · Answer 1 · 2016-04-08 14:38:17Z

If you are talking about huge data, you should consider using BTree data structure (or its variants). BTrees are used by most relational databases for indexing huge data.

You can check B-tree on Wikipedia for more details.

Based on the amount of data you have to deal with, your objective functions vary. For example, if you have to deal with very huge data, your objective function is to optimize (minimize) the number of disk reads(/seeks). In case you have to deal with small set of data (that can fit in your RAM), your objective function is to optimize (minimize) the number of CPU cycles.

Stack Exchange Network

Data structure for storing sorted data

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Data structure for storing sorted data

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions