We have a module in an application which stores data in multiple files and multilevel directories and access them from multiple threads (both reads and writes). The directory structure is based on a splitted hash value, like:
b1/94/6a/92/a.txt
b1/94/6a/ee/a.txt
a1/0e/db/bb/b.txt
...
If a caller removes a file the module deletes empty directories immediately.
How much should be the concurrency level of the module? Is it worth to create and delete folders from multiple threads at the same time? Can filesystems handle these effectively?
Is it worth to use a multi-threaded module? (It would be much easier to write a single-threaded one.)
(The application is written in Java and mainly runs on Windows, NTFS, non-SSDs but I'm also interested in other operating and filesystems if there are differences.)
3 Answers 3
Of course any OS will handle multiple requests, but even so when you ask a file system based on physical media with seek times to do more than one thing at once your performance will get progressively worse. In my experience it's better on the performance side to have just one thread that queues up your requests and then hands them off to the OS sequentially.
One option would be for this thread to separate the unlink operations and hold those until it detects a slack time when few data operations are happening, then delete those directories to keep your data requests as fast as possible and your housekeeping out of the way.
It's obviously a simple solution, but I think that simple is good until some other requirement forces an optimization or reorganization.
You should profile your app to be sure. Depends on how you use your files it could be bad or very good. If you only read & write to a handful of files, those files will be cached and their blocks will be buffered. It might not touch the disk for some time.
If you create a lot of files & directories and remove them as quick as you create them, then surely you will trash the cache and the buffers. No matter how many threads are doing it.
If you have more processes (not threads), you will get a bigger share of file system time.
Having a thread to gather files read/write requests from other threads is only duplicating the job of the OS and usually OS can schedule and reorder those requests better than what you want to invest time with.
Umm, I'm not sure what you are attempting to do here, but are you sure your needs wouldn't be better served by using a database rather than accessing the filesystem directly. A database would handle multiple concurrent requests for you and help guard against race conditions.
-
The stored files can be quite big (1GB+) although they are usually just a few kilobytes. Is there any lightweight database/anything which can be embedded and supports big files too?usr95– usr952013年07月04日 07:40:56 +00:00Commented Jul 4, 2013 at 7:40
Explore related questions
See similar questions with these tags.
nio
and perform event based asynchronous I/O instead of threaded blocking IO, it's as simple imo and not very resource consuming.