1

I'm looking for good practices or any advice regarding file access transaction mechanisms. We will have multiple instances of an application spread over a redundant network (cloud) watching a directory on a 100% available NAS. I'm also looking for any alternative architecture that may be more appropriate.

  • We will have thousands of mobile devices accessing the system every minute. The devices produce a binary file (picture for example) that is sent to a RESTful service without transformation. The device must do the job quickly to preserve battery.
  • Since RESTful service must take the input as fast as possible, the binary file is currently written directly as a file in a 100% availability NAS.
  • If an error occurs (file not being written correctly), the client is notified and can try again.

We must build a worker service that will watch the file folder for new files to process them. The worker will convert them to another format then move them to another folder on the NAS. We will use as many instances as possible to handle load.

My main concern is about file access. The RESTful service is writing to a temporary file with an extension that the worker service is ignoring (.tmp), then after the last byte is written, the file is renamed. The process is quick, but we can't afford the case where 2 or more worker services access the same file at the same time.

We need some transaction mechanism or any other entreprise architecture pattern.

Is there any kind of file system transaction mechanism that scales?

Mat
2,0972 gold badges26 silver badges32 bronze badges
asked May 9, 2012 at 8:23
3
  • 1
    Have you considered using a queue? The restful service puts all new filenames on the queue (redis may work) which your workers can pick up, atomically, one at a time. Commented May 9, 2012 at 9:02
  • @MartinWickman: interesting. Can you elaborate that in an answer? I'm not familiar with redis Commented May 9, 2012 at 9:03
  • Redis is an optimized key-value store with support for lists, counters, hashes etc (think memcache++). Using it as a queue is straight forward, but I suppose any database could do the trick. Anyway, the nice thing is that you're not relying on quirks in your distributed file system to manage file locks etc. Commented May 9, 2012 at 9:12

2 Answers 2

3

A database for example would handle this situation in a transactional and almost any db excepting MSAccess and SQLite will handle this quicker, more reliably and more efficiently than file access.

Another possibility is to use one of the many available Queue Managers (RabbitMQ,WebsphereMQ etc.) which would also ease any scheduling and load balancing problems you have.

answered May 9, 2012 at 9:57
2
  • Great suggestion. The message could be the filename (instead of the file itself) so we are somewhat garanteed that no worker will access the same file at the same time. Commented May 9, 2012 at 10:05
  • By default most Q managers will deliver a message only once. The first requester, and, only that requester, will always get the message at the top of the queue.The next requester gets the next message ..... Commented May 10, 2012 at 1:46
1

My understanding from your description is, for RESTful service:

Input:

1- A file (and its name)

Output: 1- A temp file

2- A renamed temp file

Problem:

Prevent file access by 2 services.

Suggestion:

Based on this, if you name the temp file in output (1) with a unique name for every file, such as GUID+any file name then you can generate a safe and unique temp file name. Only the service instance that created the temp file knows that GUID. That same instance would use the same value to rename the temp file after it is moved away from the temp folder.

answered May 9, 2012 at 10:14

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.