What's a good way to store loads of sorted string on a file so that inserts and reads are fast?
I could store it up in a tree, and read/modify/store it as needed, but that seems wasteful for operations like 'seeking the nth object'. Isn't there an indexed data structure that's used specifically to store and retrieve sorted information?
Of course a database would work, but how databases store their index for faster access? I doubt they linear scan trough an index for figuring out the nth element of a list and they read/write all the index to update a single row in the middle.
-
Trees are the standard approach for ordered indexes.CodesInChaos– CodesInChaos2016年04月08日 13:57:07 +00:00Commented Apr 8, 2016 at 13:57
-
2What makes you believe a tree is "wasteful"? Anything that uses anything more than the physical order of records to store information about their logical order must contain additional structures that take up space, and you can't get much smaller than the structures in a simple tree.Kilian Foth– Kilian Foth2016年04月08日 14:07:18 +00:00Commented Apr 8, 2016 at 14:07
-
I'm referring to the naive implementation of storing a tree as a flat file, so one would have to seek it all before doing anything, like seeking for the 46th record and that aloneLorenzo Boccaccia– Lorenzo Boccaccia2016年04月08日 14:52:02 +00:00Commented Apr 8, 2016 at 14:52
-
1I think your question is currently underspecified. Are you after a structure that provides fast inserts and reads by numerical index (like arrays) or are you after fast inserts and reads by key (like trees and hash tables) or some kind of as-yet-undefined compromise between the two? I'm pretty sure most database queries boil down to reads/inserts by key rather than index so various kinds of trees and hashes are exactly what they'd focus on (such as the B-Trees the existing answer mentions).Ixrec– Ixrec2016年04月09日 14:10:46 +00:00Commented Apr 9, 2016 at 14:10
1 Answer 1
If you are talking about huge data, you should consider using BTree data structure (or its variants). BTrees are used by most relational databases for indexing huge data.
You can check B-tree on Wikipedia for more details.
Based on the amount of data you have to deal with, your objective functions vary. For example, if you have to deal with very huge data, your objective function is to optimize (minimize) the number of disk reads(/seeks). In case you have to deal with small set of data (that can fit in your RAM), your objective function is to optimize (minimize) the number of CPU cycles.
Explore related questions
See similar questions with these tags.