How to handle file access transactions properly?

Question 1

I'm looking for good practices or any advice regarding file access transaction mechanisms. We will have multiple instances of an application spread over a redundant network (cloud) watching a directory on a 100% available NAS. I'm also looking for any alternative architecture that may be more appropriate.

We will have thousands of mobile devices accessing the system every minute. The devices produce a binary file (picture for example) that is sent to a RESTful service without transformation. The device must do the job quickly to preserve battery.
Since RESTful service must take the input as fast as possible, the binary file is currently written directly as a file in a 100% availability NAS.
If an error occurs (file not being written correctly), the client is notified and can try again.

We must build a worker service that will watch the file folder for new files to process them. The worker will convert them to another format then move them to another folder on the NAS. We will use as many instances as possible to handle load.

My main concern is about file access. The RESTful service is writing to a temporary file with an extension that the worker service is ignoring (.tmp), then after the last byte is written, the file is renamed. The process is quick, but we can't afford the case where 2 or more worker services access the same file at the same time.

We need some transaction mechanism or any other entreprise architecture pattern.

Is there any kind of file system transaction mechanism that scales?

Question 2

Have you considered using a queue? The restful service puts all new filenames on the queue (redis may work) which your workers can pick up, atomically, one at a time.

Question 3

@MartinWickman: interesting. Can you elaborate that in an answer? I'm not familiar with redis

Question 4

Redis is an optimized key-value store with support for lists, counters, hashes etc (think memcache++). Using it as a queue is straight forward, but I suppose any database could do the trick. Anyway, the nice thing is that you're not relying on quirks in your distributed file system to manage file locks etc.

Question 5

A database for example would handle this situation in a transactional and almost any db excepting MSAccess and SQLite will handle this quicker, more reliably and more efficiently than file access.

Another possibility is to use one of the many available Queue Managers (RabbitMQ,WebsphereMQ etc.) which would also ease any scheduling and load balancing problems you have.

Question 6

Great suggestion. The message could be the filename (instead of the file itself) so we are somewhat garanteed that no worker will access the same file at the same time.

Question 7

By default most Q managers will deliver a message only once. The first requester, and, only that requester, will always get the message at the top of the queue.The next requester gets the next message .....

Question 8

My understanding from your description is, for RESTful service:

Input:

1- A file (and its name)

Output: 1- A temp file

2- A renamed temp file

Problem:

Prevent file access by 2 services.

Suggestion:

Based on this, if you name the temp file in output (1) with a unique name for every file, such as GUID+any file name then you can generate a safe and unique temp file name. Only the service instance that created the temp file knows that GUID. That same instance would use the same value to rename the temp file after it is moved away from the temp folder.

James Anderson James Anderson 18.3k1 gold badge45 silver badges73 bronze badges · Answer 1 · 2012-05-09 09:57:12Z

3

A database for example would handle this situation in a transactional and almost any db excepting MSAccess and SQLite will handle this quicker, more reliably and more efficiently than file access.

Another possibility is to use one of the many available Queue Managers (RabbitMQ,WebsphereMQ etc.) which would also ease any scheduling and load balancing problems you have.

Share

Improve this answer

answered May 9, 2012 at 9:57

James Anderson's user avatar

James Anderson James Anderson

18.3k1 gold badge45 silver badges73 bronze badges

2

Great suggestion. The message could be the filename (instead of the file itself) so we are somewhat garanteed that no worker will access the same file at the same time.

user2567
– user2567

2012年05月09日 10:05:00 +00:00
Commented May 9, 2012 at 10:05
By default most Q managers will deliver a message only once. The first requester, and, only that requester, will always get the message at the top of the queue.The next requester gets the next message .....

James Anderson
– James Anderson

2012年05月10日 01:46:02 +00:00
Commented May 10, 2012 at 1:46

Add a comment |

NoChance NoChance 12.5k1 gold badge24 silver badges40 bronze badges · Answer 2 · 2012-05-09 10:14:48Z

My understanding from your description is, for RESTful service:

Input:

1- A file (and its name)

Output: 1- A temp file

2- A renamed temp file

Problem:

Prevent file access by 2 services.

Suggestion:

Based on this, if you name the temp file in output (1) with a unique name for every file, such as GUID+any file name then you can generate a safe and unique temp file name. Only the service instance that created the temp file knows that GUID. That same instance would use the same value to rename the temp file after it is moved away from the temp folder.

Stack Exchange Network

How to handle file access transactions properly?

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

How to handle file access transactions properly?

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions