Concerning the optimization of this simple function
String formatDate(String format, Locale locale, Date date) {...}
formatting a Date
using a SimpleDateFormat
created with the arguments format
and locale
. I'm curious what you thing about the code below. I know it's terrible, but let's assume two things
- The code must be as fast as possible.
- There's no suitable object to hold the cache.
- It must not cause any memory leaks.
Because of 1., a thread local cache should be used. However, the passed format and arguments may differ from those used in the cache and this must be checked. Funnily, although SimpleDateFormat
is mutable (which is the cause of all problems), there seem to be no way how to change its format
or locale
. Actually, the parameters hardly ever change, but there's no guarantee.
Because of 2., the cache must be a static field.
Because of 3., no ThreadLocal.initialValue
can be used as it'd lead to memory leaks just like in this question. For the same reason no custom class may be put into the ThreadLocal
.
This all is pretty restrictive. So I wrote the following ugliness
private String toString(Date obj, String format, Locale locale) {
return getSimpleDateFormat(format, locale).format(obj);
}
private SimpleDateFormat getSimpleDateFormat(String format, Locale locale) {
Object[] list = SDF.get();
// An identity check is faster and correct as missing an entry is allowed.
// It's surely good enough as passing equals but not same parameters hardly ever happens.
if (list!=null && list[0] == format && list[1] == locale) {
return (SimpleDateFormat) list[2];
}
final SimpleDateFormat result = new SimpleDateFormat(format, locale);
list = new Object[] {format, locale, result};
SDF.set(list);
return result;
}
/** Contains triples (String format, Locale locale, SimpleDateFormat sdf) */
private static final ThreadLocal<Object[]> SDF = new ThreadLocal<Object[]>();
I don't care about naming and such, as this was just a quick implementation of an idea. I'm curious if there's a substantially better solution satisfying all the above conditions.
If you say, it's premature optimization, I'll fully agree, but let's assume for a while it must be done. If you say, that's hardly faster than no caching, then show me your figures.
Obviously, I'm caching just one value per thread. It should be enough in many cases like producing a CSV with one one or more dates per row as I haven't yet see them to be formatted differently.
However, time is "date" too (at least in Java), and a CSV line containing both date and time in separate columns is rather common and this would render my poor man's cache fully ineffective.
5 Answers 5
What about using a WeakHashMap
? The memory leak is not really an issue since the memory can be reclaimed at any time.
Also, if I were using a map, I would define a FormatAndLocale
util class, with valid hashCode
and equals
, to be used as keys.
-
\$\begingroup\$ And what about thread safety? Do you mean combined with
ThreadLocal
? If so, please elaborate. \$\endgroup\$maaartinus– maaartinus2014年09月21日 23:42:45 +00:00Commented Sep 21, 2014 at 23:42 -
\$\begingroup\$ Yes, I meant like you are doing, but using a WeakHashMap instead of only caching one value. I believe weak references were introduced for caching, just like you are doing. \$\endgroup\$toto2– toto22014年09月21日 23:44:57 +00:00Commented Sep 21, 2014 at 23:44
Your above solution seems to be violating your 3rd requirement.
- It must not cause any memory leaks.
How are you cleaning the cache when the thread dies.
Also your approach is more like ThreadLocal only but with the added contention of LocadingCache and no GC (because you don't really know when any thread dies).
If your server provides a method which is invoked when the app is undeployed, see if you can use the same to your advantage.
-
\$\begingroup\$ When the thread dies, everything is fine. Don't blame me for leaks created by a webserver. There may be a sort of memory leak, but just a very tiny one; not the whole app as usual with webservers, it's just a few bytes, as all the classes used belong to the boot classloader. The Cache is designed to work with multiple threads well. Unlike my TL approach, the Cache works even when multiple formats are needed (like in "format date, format time, format date, format time, ...."). \$\endgroup\$maaartinus– maaartinus2017年09月01日 11:00:22 +00:00Commented Sep 1, 2017 at 11:00
My suggestion is to use a simple synchronized Map:
private static final Map<String, Map<Locale, SimpleDateFormat>> cache = Collections.synchronizedMap(new HashMap<String, Map<Locale, SimpleDateFormat>>());
public static String formatDateDefault(String format, Locale locale, Date date) {
SimpleDateFormat dateFormat = new SimpleDateFormat(format, locale);
return dateFormat.format(date);
}
public static String formatDateCached(String format, Locale locale, Date date) {
Map<Locale, SimpleDateFormat> map = cache.get(format);
if (map == null) {
cache.put(format, map = Collections.synchronizedMap(new HashMap<Locale, SimpleDateFormat>()));
}
SimpleDateFormat dateFormat = map.get(locale);
if (dateFormat == null) {
map.put(locale, dateFormat = new SimpleDateFormat(format, locale));
}
return dateFormat.format(date);
}
public static void main(String[] args) {
String[] formats = { "EEE MMM dd HH:mm:ss zzz yyyy", "dd-MM-yyyy", "dd/MM/yy", "dd-MM-yy:HH:mm:SS", "dd-MM-yy:HH:mm:SS Z" };
Locale[] locales = { Locale.CANADA, Locale.GERMANY, Locale.FRANCE, Locale.ITALY };
long s, e;
Random r = new Random();
int maxIterations = 1_000_000;
int maxWarmUPIterations = 10_000;
// WARMUP
for (int i = 0; i < maxWarmUPIterations; i++) {
formatDateDefault(formats[i % formats.length], locales[i % locales.length], new Date(System.currentTimeMillis() + r.nextInt(Integer.MAX_VALUE)));
}
for (int i = 0; i < maxWarmUPIterations; i++) {
formatDateCached(formats[i % formats.length], locales[i % locales.length], new Date(System.currentTimeMillis() + r.nextInt(Integer.MAX_VALUE)));
}
// GO!!!
s = System.currentTimeMillis();
for (int i = 0; i < maxIterations; i++) {
formatDateDefault(formats[i % formats.length], locales[i % locales.length], new Date(System.currentTimeMillis() + r.nextInt(Integer.MAX_VALUE)));
}
e = System.currentTimeMillis();
System.out.println((e - s) + " ms.");
s = System.currentTimeMillis();
for (int i = 0; i < maxIterations; i++) {
formatDateCached(formats[i % formats.length], locales[i % locales.length], new Date(System.currentTimeMillis() + r.nextInt(Integer.MAX_VALUE)));
}
e = System.currentTimeMillis();
System.out.println((e - s) + " ms.");
}
Output:
4484 ms.
1478 ms.
-
\$\begingroup\$ The synchronization may or may not help, as it still allows two different threads to use the same SDF at the same time. This may work, or not, depending on what way the SDF is unsafe. \$\endgroup\$maaartinus– maaartinus2014年09月21日 04:31:08 +00:00Commented Sep 21, 2014 at 4:31
-
\$\begingroup\$
SimpleDateFormat
is not thread-safe. So while you're safely managing the collection of formatters, you're allowing multiple threads to access the same instance which is unsafe. Thus the use ofThreadLocal
to segregate the instances. \$\endgroup\$David Harkness– David Harkness2014年09月21日 19:57:37 +00:00Commented Sep 21, 2014 at 19:57 -
\$\begingroup\$ By inspecting the code of
SimpleDateFormat
we might find out that it's thread-safe w.r.t. to using it with the same format and locale and changing the date only. But for sure, we'd see that it's pretty terrible piece of code. Besides avoiding SDF, the best solutions seems to be mine (3-part cache key). \$\endgroup\$maaartinus– maaartinus2014年09月27日 22:09:12 +00:00Commented Sep 27, 2014 at 22:09
My suggestion is to use LinkedHashMap instead of ThreadLocals and WeakReferences.
private static final Integer MAX_ENTRIES = new Integer(30);
private static final Map<Long, SimpleDateFormat> CACHE = Collections.synchronizedMap(new LinkedHashMap<Long, SimpleDateFormat>(MAX_ENTRIES + 1, 0.75F, true) {
@Override protected boolean removeEldestEntry(Map.Entry<Long, SimpleDateFormat> eldest) {
return size() > MAX_ENTRIES;
}
});
private String toString(Date obj, String format, Locale locale) {
return getSimpleDateFormat(format, locale).format(obj);
}
private SimpleDateFormat getSimpleDateFormat(String format, Locale locale) {
Long id = Long.valueOf(String.valueOf(format.hashCode()) + Math.abs(locale.hashCode()));
SimpleDateFormat sdf = CACHE.get(id);
if (sdf == null) {
synchronized (MAX_ENTRIES) {
sdf = CACHE.get(id);
if (sdf == null) {
sdf = new SimpleDateFormat(format, locale);
CACHE.put(id, sdf);
}
}
}
return sdf.clone();
}
-
1\$\begingroup\$ You're synchronizing the cache, but you hand out the
SimpleDateFormat
to multiple threads. Funnily, you're synchronizing on a public object (the autoboxed 30 comes fromInteger.cache
. \$\endgroup\$maaartinus– maaartinus2014年09月26日 16:33:09 +00:00Commented Sep 26, 2014 at 16:33
This is what I did in the end. As pointed by others, my single element cache would be totally ineffective if multiple formats would be used in an interleaved fashion (formatting a date differently is probably rare, but using both date and time is rather common).
There's a LoadingCache in Guava, so I went for it. It needs a composed key, so I created a tiny class and used Lombok so I didn't go mad when writing all the boring stuff.
As the stupid class (just in case it's unclear, I mean SimpleDateFormat
) is documented to be thread-unsafe, the same SimpleDateFormat
mustn't be handled to different threads. So my CacheKey
contains the thread id as well. I'd bet, this is simpler and faster than putting a Cache
into a ThreadLocal
or the other way round.
final class DateHelper2 {
@RequiredArgsConstructor @EqualsAndHashCode
private static final class CacheKey {
private final Thread thread;
private final String format;
private final Locale locale;
}
static String format(String format, Locale locale, Date date) {
return get(format, locale).format(date);
}
private static SimpleDateFormat get(String format, Locale locale) {
CacheKey cacheKey = new CacheKey(Thread.currentThread(), format, locale);
return loadingCache.getUnchecked(cacheKey);
}
private static final LoadingCache<CacheKey, SimpleDateFormat> loadingCache =
CacheBuilder
.newBuilder()
.maximumSize(1000)
.build(new CacheLoader<CacheKey, SimpleDateFormat>() {
@Override public SimpleDateFormat load(CacheKey key) {
return new SimpleDateFormat(key.format, key.locale);
}
});
}
Is it worth the effort?
Surely, using FastDateFormat
would be much easier, but let's assume it doesn't exists. I ran a single-threaded benchmark with a one (right) or two (left) SDFs
and got mostly expected results:
results
The red bars are for no caching at all, while the green ones corresponds with manually creating the SDF before the loop. You can see that there's nearly a factor of 4 between them. The solution from this answer is in yellow, not very far from the optimum. In case a single SDF is needed, the approach from my question is even better (blue).
There's also a funny method in SDF
StringBuffer format(Date date, StringBuffer toAppendTo, FieldPosition fieldPosition)
which should be faster, and it usually is, but only a bit and not always. The magenta bars show an inexplicable factor 2 slowdown, when it gets used together with Guava's cache. So I'd suggest to forget about StringBuffer
and related methods.
Update
First I was using threadId, but this makes little sense, as Thread::getId()
is not final
. Edited to using Thread
directly.
Explore related questions
See similar questions with these tags.
DateFormat.getDateInstance(style, locale)
aren't good enough for you? \$\endgroup\$format("Drink %d beers on %[yy-mm-dd]", 6, tomorrow)
, so there's no use for the standard instance. While supporting those standard formats makes sense, too, I'd bet they suffer from the same problem and you need some TL caching, too. I guess, I should benchmark it. \$\endgroup\$ThreadLocal
causes this problem. \$\endgroup\$