How many threads should access the file system at the same time?

Question 1

We have a module in an application which stores data in multiple files and multilevel directories and access them from multiple threads (both reads and writes). The directory structure is based on a splitted hash value, like:

b1/94/6a/92/a.txt
b1/94/6a/ee/a.txt
a1/0e/db/bb/b.txt
...

If a caller removes a file the module deletes empty directories immediately.

How much should be the concurrency level of the module? Is it worth to create and delete folders from multiple threads at the same time? Can filesystems handle these effectively?

Is it worth to use a multi-threaded module? (It would be much easier to write a single-threaded one.)

(The application is written in Java and mainly runs on Windows, NTFS, non-SSDs but I'm also interested in other operating and filesystems if there are differences.)

Question 2

What do you expect to achieve by using multiple threads in that context ?

Question 3

You should consider nio and perform event based asynchronous I/O instead of threaded blocking IO, it's as simple imo and not very resource consuming.

Question 4

Why do you want to delete the folders while the app is running? How long lived is the app?

Question 5

@user61852: Higher throughput.

Question 6

@MichaelT: Just to remove garbage from the disk. Runtime varies from a few hours to months (it can run as a service too). I thought about a garbage collector thread too, but it also seemed as an overkill.

Question 7

Of course any OS will handle multiple requests, but even so when you ask a file system based on physical media with seek times to do more than one thing at once your performance will get progressively worse. In my experience it's better on the performance side to have just one thread that queues up your requests and then hands them off to the OS sequentially.

One option would be for this thread to separate the unlink operations and hold those until it detects a slack time when few data operations are happening, then delete those directories to keep your data requests as fast as possible and your housekeeping out of the way.

It's obviously a simple solution, but I think that simple is good until some other requirement forces an optimization or reorganization.

Question 8

You should profile your app to be sure. Depends on how you use your files it could be bad or very good. If you only read & write to a handful of files, those files will be cached and their blocks will be buffered. It might not touch the disk for some time.

If you create a lot of files & directories and remove them as quick as you create them, then surely you will trash the cache and the buffers. No matter how many threads are doing it.

If you have more processes (not threads), you will get a bigger share of file system time.

Having a thread to gather files read/write requests from other threads is only duplicating the job of the OS and usually OS can schedule and reorder those requests better than what you want to invest time with.

Question 9

Umm, I'm not sure what you are attempting to do here, but are you sure your needs wouldn't be better served by using a database rather than accessing the filesystem directly. A database would handle multiple concurrent requests for you and help guard against race conditions.

Question 10

The stored files can be quite big (1GB+) although they are usually just a few kilobytes. Is there any lightweight database/anything which can be embedded and supports big files too?

Patrick Hughes Patrick Hughes 1,3691 gold badge8 silver badges12 bronze badges · Answer 1 · 2013-07-03 21:40:59Z

Of course any OS will handle multiple requests, but even so when you ask a file system based on physical media with seek times to do more than one thing at once your performance will get progressively worse. In my experience it's better on the performance side to have just one thread that queues up your requests and then hands them off to the OS sequentially.

One option would be for this thread to separate the unlink operations and hold those until it detects a slack time when few data operations are happening, then delete those directories to keep your data requests as fast as possible and your housekeeping out of the way.

It's obviously a simple solution, but I think that simple is good until some other requirement forces an optimization or reorganization.

imel96 imel96 3,6081 gold badge20 silver badges28 bronze badges · Answer 2 · 2013-07-03 22:53:50Z

You should profile your app to be sure. Depends on how you use your files it could be bad or very good. If you only read & write to a handful of files, those files will be cached and their blocks will be buffered. It might not touch the disk for some time.

If you create a lot of files & directories and remove them as quick as you create them, then surely you will trash the cache and the buffers. No matter how many threads are doing it.

If you have more processes (not threads), you will get a bigger share of file system time.

Having a thread to gather files read/write requests from other threads is only duplicating the job of the OS and usually OS can schedule and reorder those requests better than what you want to invest time with.

Zhehao Mao Zhehao Mao 8834 silver badges6 bronze badges · Answer 3 · 2013-07-03 22:57:54Z

Umm, I'm not sure what you are attempting to do here, but are you sure your needs wouldn't be better served by using a database rather than accessing the filesystem directly. A database would handle multiple concurrent requests for you and help guard against race conditions.

The stored files can be quite big (1GB+) although they are usually just a few kilobytes. Is there any lightweight database/anything which can be embedded and supports big files too?

Stack Exchange Network

How many threads should access the file system at the same time?

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

How many threads should access the file system at the same time?

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions