UP | HOME

Ghosts in the machine

There is a crack in everything. That's how the light gets in. – Leonard Cohen

lain-animated2.gif

Some stories start with a side note. This is one of them. Then again, the side note might be the end of the story. I’m not sure anymore. But the journey we are going to embark on will quickly transcend it, leading us to explore programming semantics and revisit some timeless philosophical allegories.

TL;DR In which we submit reproducible research about the Clojure runtime, particularly concerning code reloading.

Back to the beginning (or the end). Use with caution. The side note, remember?

clojure.core/remove-ns
 [sym]
Added in 1.0
  Removes the namespace named by the symbol. Use with caution.
  Cannot be used to remove the clojure namespace.

OK, fine. But wait a second, why? What are we guarding ourselves against? What dangers are lurking when invoking above function? A caution works insofar as it is understood. How can one be cautious if one doesn’t know the first thing about the danger or the risk incurred?

remove-ns is not typically found in application codebases, but it is found at the heart of a widely used foundation library: org.clojure/tools.namespace. It is the primary mechanism–along with ns-unmap – to unload active definitions.

delete.gif

So what is there to be so cautious about?

There is no way to go around this, we will have to go down the rabbit hole to establish that. The main method is formal and involves a demonstration at the REPL through code. But we will also discuss an idiomatic use of a Clojure pattern familiar to web application developers. Finally, we will look at the output of a memory profiler through several reloading cycles.

First, let’s create a namespace and intern an entry. (Type the code in a REPL to follow along or watch the live demo).

(create-ns 'choice)
#namespace[choice]
(intern 'choice 'pill "red")
#'choice/pill

This is the equivalent of (def pill "red") where *ns* (the current namespace) is known as “choice”. The explicit form allows us to appreciate some lesser used operations that stand behind the machinery of creating Vars.

Clojure has a host of features—java interop, runtime polymorphism, concurrency primitives, a unified sequence abstraction and what not— but underlying it all is a scoping and binding mechanism embodied by namespaces and Vars. This is the foundation.

When we define a named function, we are interning a symbol in a namespace bound to a value (a function in this case). In above snippet, we have defined a Var named “pill” bound to the value “red”.

(ns-publics 'choice)
{pill #'choice/pill}

So namespaces are mappings from symbols to Vars, and Vars are for naming and looking up values.

Identities are mental tools we use to superimpose continuity on a world which is constantly, functionally, creating new values of itself.

The key thing here is that a Var is a stable logical entity. Vars have a root binding that is observable from all threads. Let’s prove this:

Note: We are restricting the present discussion to static Vars. Dynamic Vars have per-thread bindings, but this is orthogonal to our concerns.

(create-ns 'matrix)
#namespace[matrix]
(intern 'matrix 'Neo (agent [#'choice/pill]))
#'matrix/Neo
(send matrix/Neo #(conj % (var-get (first %))))
#agent[{:status :ready, :val [#'choice/pill "red"]} 0x49d7b18a]

Actions dispatched to an agent occur in a thread, so when we send var-get to the agent, we are getting the root binding of #'pill as seen from a different thread.

If we redefine our Var, our agent will pick up the new value.

(intern 'choice 'pill "blue")
#'choice/pill
(send matrix/Neo #(conj % (var-get (first %))))
#agent[{:status :ready, :val [#'choice/pill "red" "blue"]} 0x49d7b18a]
(= (first @matrix/Neo) #'choice/pill)
true

The official documentation mentions this:

This means that, unless they have been unmap-ed, Var objects are stable references…

Note the caveat. Unless unmap-ed.

Incidentally, we now have set up the stage to discuss the caution found in remove-ns docstring .

(remove-ns 'choice)

Now let’s try to retrieve the color of the pill.

(try (eval '(var-get #'choice/pill))
    (catch Exception e (:cause (Throwable->map e))))
"Unable to resolve var: choice/pill in this context"

Of course. But what if we ask Neo?

(send matrix/Neo #(conj % (var-get (first %))))
#agent[{:status :ready, :val [#'choice/pill "red" "blue"]} 0x49d7b18a]

Oh, in the agent thread, Neo still holds on to the blue pill. Isn’t the Var gone with the namespace? This is a perfectly reasonable expectation, but false in the current state of affairs.

Let’s go further down the rabbit hole, and recreate a red pill.

(create-ns 'choice)
#namespace[choice]
(intern 'choice 'pill "red")
#'choice/pill

Now let’s ask Neo again what pill he has chosen.

If this seems like a contrived exercise, please note that this is exactly what happens when reloading code in the runtime via a library like tools.namespace: it calls remove-ns before reloading definitions with (require ns-sym :reload).

(send matrix/Neo #(conj % (var-get (first %))))
#agent[{:status :ready, :val [#'choice/pill "red" "blue" "blue" "blue"]} 0x49d7b18a]

Blue? Really? Isn’t our pill red?

(var-get #'choice/pill)
"red"

Yes it is, but it seems that we are looking through the Looking-Glass, and things aren’t what they seem. Lewis Carol would have enjoyed the sight.

(= (first @matrix/Neo) #'choice/pill)
false

At this point, we are not looking at the same Var objects anymore. What does it mean for something to not be equal with itself? Having lost its reflexive property, what good is a Var now?

Let’s savour this moment, because this is when the sirens start blasting, all lights go red in the control room, and agents are dispatched in swarm formation to repair the breach in the matrix, removing evidence and making sure no witness remains alive.

sirens.gif

Except that we’ve been warned.

Remember the cryptic message from the man himself: Use with caution.

So, only namespaces that are not referenced by anything else can be safely removed - it's a very special-purpose operation. — Rich Hickey

The danger that lurks behind remove-ns is abstraction leakage (or breakage). From stable references, Vars become unstable references–merely ghosts of themselves.

The ghost in the machine came up in the context of a philosophical discussion around the body and mind. It tries to demolish the notion that body and mind are separate entities working in parallel, which was the classical, cartesian view. Gilbert Ryle argued that treating the mind as a substance, like the body, no matter how different, was a logical fallacy. It was, to be accurate, a category mistake.

A foreigner visiting Oxford or Cambridge for the first time is shown a number of colleges, libraries, playing fields, museums, scientific departments and administrative offices. He then asks 'But' where is the University? … It has then to be explained to him that the University is not another collateral institution, some ulterior counterpart to the colleges, laboratories and offices which he has seen. The University is just the way in which all that he has already seen is organized.

Vars don’t exist as such in bytecode. Vars are an abstraction in the same way as a University. Taken in isolation, an abstraction is like a Platonic ideal: principled, self-contained, true in all instances. This is not our kind of world. Every single process involved in engineering a piece of software is real, not an abstraction.

The equivalent of the body and mind dualism in computer engineering is the duality of the runtime and the specification. The runtime implements an execution model that may or may not coincide with the specification of a client programming language. We often think of programming semantics as anterior or prevalent to the runtime, but they belong to different categories altogether.

If you have ever developed a Ring application, you might know that there is a trick to make your handler pick up changes on the fly. Instead of passing it directly to the HTTP adapter, you pass it a Var.

(run-jetty (var handler) {:port 8080 :join? false})

Because Vars implement the IFn interface, we can invoke them as if they were functions— (#'+ 1 1 ) is equivalent to (+ 1 1).

Instead of directly passing whatever handler evaluates to, we pass the Var, #'handler. This way, any change in your handler will be reflected immediately, without the need to restart the web server. Merely reloading the namespace via require does the trick. You can do this manually, or via a Leiningen plugin. This is how lein-ring works: it re-requires your namespace when it detects a file change. However, if you use refresh from tools.namespace, this stops working. But now we know why. After remove-ns, the Var object passed to the server thread is orphaned.

We can see that this is true when profiling the memory with a Java introspection tool.

This is what happens to foo when reloading a namespace with require ns-sym :reload.

liveness-foo2.png

Only one live object is present at any time because the same Var object gets overwritten over and over. (You can force garbage collection in VisualVM if you don’t see the results immediately.)

And this is what happens to a Var named foo when reloading a namespace with remove-ns ns-sym + require ns-sym :reload.

liveness-foo1.png

Here every refresh results in the duplication of foo. Those Var objects are inaccessible from the Clojure program but are still considered live by the Java runtime, and so they won’t be garbage collected.

(= (:handler (:web system.repl/system)) #'app)

Will return true before the first call to remove-ns, false afterwards.

remove-ns breaks the Var machinery. It jeopardizes Var’s standing as a stable reference type.

If you use the reloaded workflow and you restart all parts of your system, the Jetty component (insert your HTTP server component of choice) gets passed the current Var upon restarting, so everything works like expected and you don’t notice that the runtime is tainted with unstable Vars, unreachable objects with stale values.

How big is this a problem? Well, proliferation of objects that won’t be deallocated fall in the class of memory leaks. The heap will grow at a rate proportional to the number of times refresh is called. This only happens during development time, so I personally wouldn’t worry about it. Especially considering the benefits of using tools.namespace. Additionally, starting with system 0.3.0, you are given the option to skip the remove-ns on a per namespace basis, giving you finegrained control over what refresh should do.

Further study: