0

What's a good way to store loads of sorted string on a file so that inserts and reads are fast?

I could store it up in a tree, and read/modify/store it as needed, but that seems wasteful for operations like 'seeking the nth object'. Isn't there an indexed data structure that's used specifically to store and retrieve sorted information?

Of course a database would work, but how databases store their index for faster access? I doubt they linear scan trough an index for figuring out the nth element of a list and they read/write all the index to update a single row in the middle.

asked Apr 8, 2016 at 13:55
4
  • Trees are the standard approach for ordered indexes. Commented Apr 8, 2016 at 13:57
  • 2
    What makes you believe a tree is "wasteful"? Anything that uses anything more than the physical order of records to store information about their logical order must contain additional structures that take up space, and you can't get much smaller than the structures in a simple tree. Commented Apr 8, 2016 at 14:07
  • I'm referring to the naive implementation of storing a tree as a flat file, so one would have to seek it all before doing anything, like seeking for the 46th record and that alone Commented Apr 8, 2016 at 14:52
  • 1
    I think your question is currently underspecified. Are you after a structure that provides fast inserts and reads by numerical index (like arrays) or are you after fast inserts and reads by key (like trees and hash tables) or some kind of as-yet-undefined compromise between the two? I'm pretty sure most database queries boil down to reads/inserts by key rather than index so various kinds of trees and hashes are exactly what they'd focus on (such as the B-Trees the existing answer mentions). Commented Apr 9, 2016 at 14:10

1 Answer 1

1

If you are talking about huge data, you should consider using BTree data structure (or its variants). BTrees are used by most relational databases for indexing huge data.

You can check B-tree on Wikipedia for more details.

Based on the amount of data you have to deal with, your objective functions vary. For example, if you have to deal with very huge data, your objective function is to optimize (minimize) the number of disk reads(/seeks). In case you have to deal with small set of data (that can fit in your RAM), your objective function is to optimize (minimize) the number of CPU cycles.

Andy
10.4k4 gold badges28 silver badges51 bronze badges
answered Apr 8, 2016 at 14:38

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.