I'm attempting to cache contents of a file to avoid frequent reads, using a very simple implementation in a single class. The file contains a list of emailids, one per line. The file changes rarely, and the cache should be updated immediately when it does. So I've checked the file's timestamp in each read operation. The class will be used in servlets and rest apis - as member variable. Although I've synchronized the only public method of the class, I'm not sure whether the code is threadsafe. Do I need to add/change something here to ensure thread safety ?
public class CachedFile {
private File file ;
private long lastModified ;
private List<String> fileLines ;
private boolean fileCached ;
public CachedFile( final String inputFileName )
{
file = new File( inputFileName ) ;
fileCached = false ;
}
public synchronized List<String> getLines() throws IOException
{
//if file is not cached or if its modified after the last read, read & cache it
if( fileCached == false || fileModifiedAfterLastRead() )
{
readFile() ;
}
return fileLines ;
}
private void readFile() throws IOException
{
//read file to cache
fileLines = FileUtils.readLines( file ) ;
//cache the last modified time
lastModified = file.lastModified() ;
//set the cached flag to true
fileCached = true ;
}
private boolean fileModifiedAfterLastRead()
{
return( file.lastModified() > lastModified ) ;
}
}
2 Answers 2
Since you mentioned that the file could be written by another process (assuming in another jvm), there is no way you can introduce the Object monitor based thread safety. So you can depend upon the last modified time of the file object. You can also remove the synchronization
from your getLines()
if you dont return the instance variable. But you do return that so synchronization
should be there in getLines().
Basically you want to read the file without the data corruption. That could happen if another process is writing to the file when your process reads from it. If that is the case, you want to re-read the file. Here is how I would implement it.
private void readFile() throws IOException {
int retryCount = 5;
while (retryCount > 0) {
long beforeRead = file.lastModified();
//read file to cache
fileLines = FileUtils.readLines(file);
if (beforeRead == file.lastModified())
break;
retryCount--;
}
//cache the last modified time
lastModified = file.lastModified() ;
//set the cached flag to true
fileCached = true ;
}
The while loop will retry the read operation if there is a mismatch. Also the retry has to be finite number to avoid looping forever in case of any file system failures.
-
\$\begingroup\$ Thanks Tiny Rick. But if I don't synchronize, will the list fileLines have consistent data ? \$\endgroup\$amolkul– amolkul2017年10月31日 08:59:27 +00:00Commented Oct 31, 2017 at 8:59
-
\$\begingroup\$ Thats what we made sure in the while loop. We checked if the data has been changed at the time our read operation by once again checking the modified time with our pre-read modified time. So answering your question.. Yes without synchronization your list of file lines will be consistent with the file. \$\endgroup\$Tiny Rick– Tiny Rick2017年10月31日 09:25:13 +00:00Commented Oct 31, 2017 at 9:25
-
\$\begingroup\$ The retry will never work like it is now though. Update the
beforeRead
right beforeretryCount--;
That way you check if it has stayed the same in that retry attempt. \$\endgroup\$Imus– Imus2017年10月31日 10:19:20 +00:00Commented Oct 31, 2017 at 10:19 -
\$\begingroup\$ Thanks @Imus . Now beforeRead will be initialized just before the actual read operation. \$\endgroup\$Tiny Rick– Tiny Rick2017年10月31日 10:41:58 +00:00Commented Oct 31, 2017 at 10:41
-
\$\begingroup\$ The
synchronized
is still necessary. Otherwise one thread might see incomplete lines in the list. Read the whole Java Memory Model documentation for more information. \$\endgroup\$Roland Illig– Roland Illig2017年10月31日 12:54:33 +00:00Commented Oct 31, 2017 at 12:54
Your cache is fine and correctly synchronized.
One thing I would change is to remove the throws IOException
from the public method. Just for convenience of the calling code.
For the process that updates the file, make sure that it updates the file atomically. That is, first write the data to a temporary file and then rename that to the final file.
-
\$\begingroup\$ How else would you handle IO errors? \$\endgroup\$200_success– 200_success2017年11月01日 05:25:34 +00:00Commented Nov 1, 2017 at 5:25
-
\$\begingroup\$ In the constructor, load the file or fail (depending on the application needs). Then, after the initial loading has succeeded, if loading the file should ever fail, just log that and keep the previous list. Add some monitoring for the logging. Rationale: since there are probably several places in the code where the cache is used, it would not be correct to do the error handling there. \$\endgroup\$Roland Illig– Roland Illig2017年11月01日 07:14:51 +00:00Commented Nov 1, 2017 at 7:14
-
\$\begingroup\$ If reading fails, writing to a log file might also fail. The bigger problem, though, is that the rest of the consumer application won't be able to report the error; it has to learn of the staleness of the data in a roundabout way. That is a much more onerous design than just propagating an IOException. \$\endgroup\$200_success– 200_success2017年11月01日 07:20:53 +00:00Commented Nov 1, 2017 at 7:20
-
\$\begingroup\$ @RolandIllig +1 for mentioning atomic updates \$\endgroup\$amolkul– amolkul2017年11月01日 07:40:34 +00:00Commented Nov 1, 2017 at 7:40
file
might no longer be valid. \$\endgroup\$