Understanding JVM soft references for great good (and building a cache)

by Michael Pollmeier on April 2, 2019

There are plenty of good and popular caching libraries on the JVM, including ehcache, guava and many others. However in some situations it’s worth exploring other options. Maybe you need better performance. Or you want to allow the cache to grow and fill up the entire heap, yet shrink automatically when your application needs more space elsewhere. Typical caching libraries require you to define a static upper bound, which means you’ll need to be very conservative with sizing your cache and still risk running into an OutOfMemoryError. They also need to keep track of the size of their elements themselves, which is potentially expensive.

One alternative are soft references. There are many great articles about the different reference types: strong, soft, weak and phantom. In a nutshell, strongly referenced objects (the common case, e.g. `String s = “abc”`) are never collected by the garbage collector, and can therefor lead to an OutOfMemoryError if you allocate more than fit onto your heap. In contrast, softly referenced objects are collected as a last resort before an OutOfMemoryError is thrown.

You can create a soft reference using
SoftReference<String> softRef = new SoftReference<>(“abc”)

To access the underlying object just call softRef.get(), which may return null. Note that if you (additionally) hold a strong reference to the same underlying object, it’s not (only) softly referenced any more and can’t be automatically freed.

‘The internet’ often discourages the use of soft references, typically without giving a good explanation, so I gave them a try and actually found them to be a good tool to have at my disposal. If you understand how soft references work, you can quite easily build a very simple and efficient cache, which has excellent performance and uses and frees up memory as required in other parts of your application. Before we look at that, let’s discuss some common pitfalls with soft references.

Sponsorships Available

1. don’t trust the javadoc:

All soft references to softly-reachable objects are guaranteed to have been freed before the virtual machine throws an OutOfMemoryError

That’s a lie. It was true when soft references were first introduced in java 1.2, but from java 1.3.1 the jvm property -XX:SoftRefLRUPolicyMSPerMB was introduced. It defaults to 1000 (milliseconds), meaning that if there’s only 10MB available heap, the garbage collector will free references that have been used more than 10s ago. I.e. everything else will not be freed, leading to an OutOfMemoryError, breaking the guarantee from the javadoc (I’ll try to get that changed).
No problem, let’s just set it to -XX:SoftRefLRUPolicyMSPerMB=0 and the javadoc is suddenly true again.

2. it’s all or nothing

When the GC figures that memory is running low and it better frees some softly referenced objects, it will free all of them. This will make our cache very inefficient, because it’s expensive to recreate those objects. It would be better if the GC would only free a small portion of the available soft references.

Working around those issues to build a simple yet efficient cache

Since 1) is easily fixed, how do we fix the issue that the GC frees all soft references? A very straightforward (if not the most efficient) approach is to simply hold additional strong references to the objects you don’t want to get freed.
Obviously, we need to ensure that we drop these strong references if more memory is needed, e.g. when other softly referenced objects have been freed. That’s why we override the finalized method, which gets invoked when the GC frees an object. As long as we always have some softly referenced objects, we’ll not run into an OutOfMemoryError.

Note that we could also perform other actions in finalized, like serializing the object somewhere. If you do, keep in mind that it must be a fast operation that doesn’t require allocating a lot of additional memory, otherwise we’re risking to run out of memory again.

Summary

Soft references are a simple and powerful concept on the JVM, and it’s very useful to understand how they work. That’s true not only if you want to build your own “poor man’s cache”, but also if you use “proper” caching libraries. Some even have the option to use soft references internally, in which case it’s essential to understand the caveats detailed above.

Your default choice should still be a caching library, but if you need better performance, or want the cache to grow and fill up the entire heap, yet shrink automatically when your application needs more space elsewhere, soft references may be for you. Beware that they are rather low level, and come with their own set of tradeoffs. As always: choose the best tool for the job, and keep soft references in the back of your head (literally).

Here’s a simple example for such a cache, if you want to try this yourself:

<a href="https://medium.com/media/6910efedc09d2fdaf3edc6ae51360d8e/href">https://medium.com/media/6910efedc09d2fdaf3edc6ae51360d8e/href</a>

Understanding JVM soft references for great good (and building a cache) was originally published in ShiftLeft Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.