Once Upon a Class

This is a story about classes. A story that harks back to early days, both historically and metaphorically, and it begins with a mystery.

I’ve been trying some more things, and something fairly unexpected is that with a CIDER/nREPL setup, each evaluation adds an extra classloader 🙈

⸺ Arne, Clojureverse, 2018

An account of this confounding observation was made back in 2012 on the Clojure mailing list. Vladimir started a thread titled class loaders stack constant grow in REPL, and concluded:

I think, the main problem is nobody has ever tried to write an article «Class loading in Clojure». If such article existed, it would make life much easier for many developers.

⸺ Vladimir, Clojure mailing list, Dec 10, 2012

Our exploration of the topic makes heavy use of a REPL, I invite you to fire one up and play along. Let’s start with examining the class loader hierarchy.

(->> (.. java.lang.Thread currentThread getContextClassLoader)
     (iterate #(.getParent %))
     (take-while identity))

You’ll notice Clojure’s DynamicClassLoader, plus the troika of built-in class loaders at the top. If you see only two out of the three triumvirs, don’t fret. The Primordial class loader is responsible for bootstrapping Java’s core classes. It is written in native code, out of reach, represented by nil.

(nil? (.getClassLoader java.lang.Class))
Please reevaluate when nREPL is connected

Object, String, Long and List are also core classes.

(every? #(nil? (.getClassLoader (class %))) [4 "hello" (java.lang.Object.) (java.util.Collections/EMPTY_LIST)])
Please reevaluate when nREPL is connected

Other classes are loaded by the Platform class loader.

(.getName (.getClassLoader (class (java.sql.Time. 1 1 1))))

The classes that you care most about, those that you declare as dependencies, are being loaded by the Application class loader. It is the one that load jars and resources on the class path. Clojure’s classes are loaded by the Application class loader.

(str (.getName (class {})) " was loaded by " (.getName (.getClassLoader (class {}))))
"clojure.lang.PersistentArrayMap was loaded by app"

Finally, at the REPL we are creating new classes all the time. We may not notice it, but we do.

(defn foo [x] (+ x x))

To us, foo is a function, but to the JVM it is a class.

(class foo)

Let’s have a look at its class hierarchy.

(->> (class foo)
  (iterate #(.getSuperclass %))
  (take-while identity))
(user$foo clojure.lang.AFunction clojure.lang.AFn java.lang.Object)

Contrary to the previous examples, foo was not present when the JVM started. It’s a brand new class loaded at runtime. This is an important observation. Indeed, the Java class model is designed in such a way that it need not know ahead of time the classes it is going load and run.

Note: Java was born as Greentalk, a nod to Smalltalk because that was the state of the art in terms of virtual machine and JIT compilation. For a time the system was projected to be running on set-top boxes. The need for runtime class loading was anticipated as classes were going to travel across the wire.

So who is responsible for loading Clojure code on-the-fly?

(class (.getClassLoader (class foo)))

When you create foo at the REPL, Clojure’s compiler emits bytecode for consumption by DynamicClassLoader. It will create a new class with the defineClass method before linking it.

Note: Linking is the process of taking a class and combining it into the run-time state of the Java Virtual Machine so that it can be executed.

Once a class loader links a class, it is final. Attempting to link a new definition of the class does nothing. Imagine if your first attempt at writing foo was the last one allowed! To work around this limitation, a new DynamicClassLoader is created for each evaluation. This is the hat trick that Clojure pulls off to ensure that the user is able to override existing classes, not merely creating new ones.

The compiler tracks the current instance of DynamicClassLoader in clojure.lang.Compiler/LOADER, while DynamicClassLoader tracks its classes via a cache. The latter is backed by a reference queue, helping the garbage collector do its job. We can peek into it via the Reflection API.

(defn inspect-cache []
  (let [cache (.getDeclaredField clojure.lang.DynamicClassLoader "classCache")]
    (.setAccessible cache true)
    (.get cache nil)))

This will reveal a mapping between the names of the generated classes and soft references. If you redefine foo at the REPL, the soft reference associated with foo in the cache will be updated.

A new class loader instance is used for every top-level form.

(defn foo [x] (identity x))
(defn bar [y] (identity y))
(= (.getClassLoader (class foo)) (.getClassLoader (class bar)))
| #'user/foo |
| #'user/bar |
| false      |

Compare and contrast.

(let [foo (fn  [x] (identity x))
      bar (fn [y] (identity y))]
  (= (.getClassLoader (class foo)) (.getClassLoader (class bar))))

We’ve shown how class loader instances are being repeatedly created at the REPL, and it sounds like an explanation for the mystery we mentioned at the start. It is not. Let’s take a closer look at the observation that has befuddled inquisitive developers since 2012. It is worth reproducing the experiment in a plain Clojure REPL and a nREPL client side by side.

Upon launching a REPL, Clojure sets the context class loader with a class loader of its own. The following is the first line of clojure.main/repl’s source code.

(let [cl (.getContextClassLoader (Thread/currentThread))]
  (.setContextClassLoader (Thread/currentThread) (clojure.lang.DynamicClassLoader. cl)))

This translates to: instead of setting the default Application loader class on the REPL thread, use mine. The expected behavior is that the context class loader is set once, unlike the per-evaluation class loader stored in clojure.lang.Compiler/LOADER. The relationship between the context class loader and the fleeting instances referenced by clojure.lang.Compiler/LOADER is that of parent-child. In Java, the parent-delegation model is canonical.

The delegation model requires that any request for a class loader to load a given class is first delegated to its parent class loader before the requested class loader tries to load the class itself. The parent class loader, in turn, goes through the same process of asking its parent. This chain of delegation continues through to the primordial class loader. A ClassNotFoundException is thrown if no class loader was privy to the requested class.

Custom class loaders are at liberty to implement different strategies. Clojure’s approach is having DynamicClassLoader check its cache first, see if it can load a class and delegate otherwise.

protected Class<?>findClass(String name) throws ClassNotFoundException {
    Class c = findInMemoryClass(name);
    if (c != null)
        return c;
        return super.findClass(name);

In a regular Clojure REPL, the context class loader is the direct parent of the class loader instances generated by Clojure’s compiler. Not so in a nREPL client.

(= (.getContextClassLoader (Thread/currentThread)) (.getParent (.getClassLoader (class foo))))

Run above and below snippet in a default REPL and in a nREPL client side-by-side. Then rinse and repeat.

(hash (.getContextClassLoader (Thread/currentThread)))

In the default REPL, the same instance of DynamicClassLoader stays associated with the REPL throughout the session. In a nREPL client, instances of DynamicClassLoader keep piling up.

(count (->> (.. java.lang.Thread currentThread getContextClassLoader)
   (iterate #(.getParent %))
   (take-while #(instance? clojure.lang.DynamicClassLoader % ))))

Clojure’s entry point to the REPL is clojure.main/repl. However, due to historical reasons, nREPL runs clojure.main/repl for every evaluation. It isn’t a long running process like Clojure’s native REPL. Since clojure.main/repl starts with setting the context class loader with a new instance of DynamicClassLoader, we end up with an unbounded stack of class loaders.

This is unfortunate, but remember that nREPL is a major community effort to elevate Clojure’s REPL experience. In a native REPL, it is not possible to interrupt an evaluation. nREPL brings that capability. At the cost of the quirk that we’ve described. Colin Jones reported the issue in 2012. Naturally, solutions were envisaged.

Yes, this should be fixed upstream; a new DynamicClassLoader should only be set as the thread-context classloader if one is not already in place…

⸺ Chas Emerick, issue 8, nREPL repository.

A downstream solution, maybe.

I think nREPL will end up having to stop using clojure.main/repl, and maintain a modified version of it itself (something I wanted to avoid exactly so as to benefit from the changes to clojure.main/repl from version to version of Clojure).

⸺ Chas Emerick, NREPL-31, Jira.

Ultimately, pragmatism prevailed.

Some years on, and it’s clear that this is fundamentally a minor problem (insofar as hardly anyone has complained AFAIK)…

⸺ Chas Emerick, issue 8, nREPL repository.

How come nobody is complaining? That’s because there are no side-effects apart from the redundant allocation of objects. At a cost of 112 bytes per instance of DynamicClassLoader, the increased memory usage isn’t immediately noticeable.

We’ve mentioned that the JVM was always capable of loading new classes at runtime. However, and it may come as a surprise, it does not provide a user-accessible API to do so. For years, developers have tapped into the fact that the application class loader was an instance of URLClassLoader. Then Java 9 came along, and reminded everyone that that was an implementation detail. Divorced from URLClassLoader, the application class loader stopped being augmentation-friendly. Project Jigsaw and the new module system was a big refactoring towards a more secure platform. It was still possible to augment the class path, but it was tooling and framework implementors’ responsibility to create specialized class loaders, inheriting from URLClassLoader if need be. Exactly what Clojure was doing all along.

Suppose you’re at the REPL and you realize that you need data.json, but you haven’t declared the dependency. No problem.

(def json-jar "https://repo1.maven.org/maven2/org/clojure/data.json/2.3.0/data.json-2.3.0.jar")
(defn addlib [jar]
  (-> (Thread/currentThread)
     (.addURL (java.net.URL. jar))))
| #'user/json-jar |
| #'user/addlib   |

Now you can.

(addlib json-jar)
(require '[clojure.data.json :as json])
(json/write-str {:foo "bar"})

At one point, Clojure’s core library boasted a function called add-classpath, which had a similar look and feel than addlib. However, it was already deprecated when Clojure hit version 1.1. It appeared to be fragile, prone to failure under environments such as frameworks or IDEs where the context class loader could be foreign. At the REPL, it works fine though. Meyvn builds a user interface on top of addlib, so that the user can select dependencies directly from the Clojars catalog.

In conclusion, Clojure leverages the built-in capabilities of the JVM to provide a dynamic runtime environment. Class path augmentation is begotten by virtue of URLClassLoader inheritance. Redefinition of classes is made possible because each top-level form gets its own class loader instance. In a nREPL client, an extraneous instance is created, a quirk that should not obscure the fact that it provides the more capable REPL.

Are you hiring?

Good! Let’s talk.