1
$\begingroup$

I am writing a program to store, retrieve and delete "blocks" of data of varying sizes.

The way it currently works is by keeping a database storing the locations of the blocks and the locations of free space in the file.

The file is split into pages such that in each page there are no two free chunk spaces next to each other (During a delete operation, any free chunks which are adjacent are merged into one bigger chunk)

The problem with this is that I am seeing horrible IO performance when removing a bunch of blocks and inserting new ones of different sizes (blocks range from 1k to about 200k and may be written anywhere in the file provided they fit in an existing free chunk in the file. If no such free chunk is found, a new page is created at the end of the file).

Can anyone suggest a way to improve on this, or maybe point me in the right direction?

Raphael
73.3k31 gold badges184 silver badges406 bronze badges
asked Mar 27, 2013 at 10:08
$\endgroup$
2
  • $\begingroup$ Are you interested in conceptual, or rather in implementation-level answers? $\endgroup$ Commented Mar 27, 2013 at 12:14
  • $\begingroup$ Conceptual would be enough. I just want some ideas how I can improve. Though this has to take into account average disk IO (a normal hard disk, not an SSD). I included the implementation details so whoever answers can have an idea how the file is being used $\endgroup$ Commented Mar 27, 2013 at 16:01

2 Answers 2

1
$\begingroup$

A starting point for this would be the heap management algorithms, start perhaps here. I remember Knuth in his Art of Computer Programming has a detailed discussion, I believe in volume 1. The algorithms described are good for RAM, adding in the non-uniform access of rotating disks will be quite a challenge...

Isn't it much simpler to use a database and blobs? Your time is valuable too...

answered Mar 28, 2013 at 0:41
$\endgroup$
1
  • $\begingroup$ I can't really use a database for this thing :( $\endgroup$ Commented Mar 28, 2013 at 10:37
0
$\begingroup$

As you mention writing to a normal HDD in your comment to your original post, the likely factor may be the way you write or remap the data.

Ordinary hard drives are based on storing and reading data off of rotating magnetic platters. The data itself is placed in "rings" on the platters.

Due to how they work, hard drives are most efficient if data can be read or written in a sequential manner. When reading or writing to random places in a file or in a file system, performance drops drastically with hard drives, as they have to wait (up to some 6-10+ milliseconds per query) until the next packet of data spins by the read/write heads.

There are many likely culprits, but a few may be

  • the file you are working on is itself heavily fragmented. Try defragmenting the hard drive.
  • Your method of searching for free chunks may be inefficient (and may also generate lots of random reads for the HDD, especially if you use a parallelized/multithreaded search approach)
  • Your chunk/page/block size and/or method of reading/writing these generate a lot of overhead or random reads/writes , maybe because chunk/page size is too small (thus more pages/chunks to handle)
  • your method of updating your database of pages/chunks may be dragging you down
  • you may have a lot of (meta)data that is processed, even though it might not need to be.

One suggestion for optimizing performance and effeciency when used on HDDs would be to allow fragmentation of data blocks.

While the drawback is that you will have to write a "defragmenter" program (for future maintenance) and implement a block "jump" scheme.

The advantages you get are:

  • Considerably more effecient use and allocation of disk space, as large blocks can be spread over numerous smaller areas of free chunks.
  • Increased IO speed, as the read/write load is adapted to the hardware in use, and kept as sequential as possible

If you have major speed problems in your deletion step, don't actually delete the data (unless you actually need to)... disable the write protection (your "deletion" step) and overwrite these chunks with new data on the next write. This is the way that most filesystems and hard drives handle file "deletion" (and I ain't talking about trashcans :o) )

answered Feb 20, 2018 at 19:41
$\endgroup$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.