Timed idempotency

benjamin.jpg

tl;dr Benjamin, time-constrained idempotency with side-effects.

A procedure with side-effects is idempotent if successive invocations don’t repeat the side-effects beyond its initial run.

This is not the same as idempotency in the mathematical or functional programming sense, where computing a value is considered to be the primary effect and any other effects, well, side-effects.

Idempotency is an overloaded term. In Mathematics, it is a function that satisfies f (f (x)) = f (x). abs() or f (x) = |x| is idempotent , while pow() or f (x) = x² is not.

In computing, the focus shifts from functional composition to sequential composition. This makes sense. Operations often need to be repeated. The question becomes: if I run function f twice with the same input, will it:

Pure functions are defined through this lens. Not all pure functions are idempotent in the mathematical sense, but all pure functions are idempotent with regards to their return value. And so are eligible for memoization, an optimization technique in which the return value is computed once for a given input and cached henceforth.

Procedures, on the other hand, are referentally opaque and much less amenable to optimization. It is up to the programmer to propose an implementation with idempotency guarantees.

Idempotent operations are often used in the design of network protocols, where a request to perform an operation is guaranteed to happen at least once, but might also happen more than once. If the operation is idempotent, then there is no harm in performing the operation two or more times.

When implementing idempotent HTTP verbs (PUT or DELETE, for example), we provide an implementation that executes a side-effect on the first invocation but not on subsequent runs. Those implementations are often ad-hoc. Moreover, we often need to reconsider the desirability of a side-effect, and act accordingly.

In the following sections of this article, we will see how we can derive an abstraction from a specific use case. And at the very end, we will introduce a library that provides that abstraction ready for (re)use.

Consider the following snippet of code:

(doseq [user users]
  (send-newsletter user))

If the command failed midway, we will need to resend the newsletter to some users. But who has gotten our newsletter, and who hasn’t?

Suppose send-newsletter was procedurally idempotent, we would be able to repeat the operation, and still ensure that our users receive one newsletter and not duplicates.

But one month later, with the latest newsletter hot off the press, nobody would receive it!

What we want really is procedural idempotency for a period of time. send-newsletter should be procedurally idempotent in the interval between newsletter issues, but not beyond.

Another way to put it is that we want self-cancelling idempotency. Or think of it as cyclical idempotency. Or something along the lines of conditional idempotency, controlled idempotency or parametrisable idempotency. It doesn’t really matter what you call it. To keep it simple, I call it timed idempotency.

There are numerous examples of this mode of operation in the physical world. Take an elevator, for example. When you press the button to the sixth floor, you can press the button again without fear of being catapulted to the twelth floor. After a roundtrip, the button becomes enabled again.

How do we emulate this type of functionality in our programs?

One way to do it is with a logbook. It looks like this:

{:logbook [{:newsletter timestamp}]}

Over time, we will send many newsletters, and our logbook will look like this:

{:logbook [{:newsletter timestamp}
           {:newsletter timestamp}
           {:newsletter timestamp}]}

A logbook is flexible because you can record all kinds of events that occured with regards to an entity.

{:first-name "Benjamin"
 :last-name "Peirce"
 :occupation "Mathematician"
 :email "benjamin.peirce@harvard.edu"
 :logbook [{:welcome-email timestamp}
           {:subscription-reminder timestamp}
           {:subscription-reminder timestamp}
           {:newsletter timestamp}
           {:newsletter timestamp}
           {:newsletter timestamp}]}

At first glance, we can tell that we have sent our user a welcome email, two subscription reminders and three newsletters.

We can now write:

(doseq [user users
        :when (some #(is-timestamp-outside-range? %) (filter :newsletter (:logbook user)))]
  (send-newsletter user))

Our newsletter is monthly, so the is-timestamp-outside-range? predicate tells us if it is time to send a new edition of our newsletter.

On the other hand, our welcome email gets sent only once, so an idempotent send-welcome-email would look a little bit different:

(doseq [user users
        :when (empty? (filter :welcome-email (:logbook user)))]
  (send-welcome-email user))

Let’s standardize our predicates and make them operate uniformly on one logbook entry at the time:

(defn unique? #(some? %)
(defn last-month? #(if-let [date (first (vals %))]
                     (time/last-month? date)
                     false)

This allows us to compose them neatly with logbook sequences.

(some unique? (filter :welcome-email (:logbook user)))
(some last-month? (filter :newsletter (:logbook user)))

We architecture our solution around events and predicates. In our example, the events are :welcome-email, :subscription-reminder and :newsletter. Now let’s look back at our newsletter sending operation.

(doseq [user users
        :when (some last-month? (filter :newsletter (:logbook user)))]
  (send-newsletter user))

Something is missing, isn’t it? If the newsletter is sent successfully, we need to write an entry in the logbook.

(doseq [user users
        :when (some last-month? (filter :newsletter (:logbook user)))]
  (let [response (send-newsletter user)]
    (when (success? response)
      (write-to-logbook user))))

If you have some experience writing applications, you probably recognize this pattern. And if you have read SICP, you may remember how we muster means of abstraction to generalize a problem. This allows us in turn to devise a solution for a whole class of related problems. What our solution lacks is generality.

(doseq [user users
        :when (some last-month? (filter :newsletter (:logbook user)))]
  (let [response (send-newsletter user)]
    (when (success? response)
      (write-to-logbook user))))

This ad-hoc snippet really boils down to:

(doseq [entity entities
        :when (some pred logbook)]
  (let [response (operation)]
    (when (success? response)
      (write logbook))))

With proper design and the magic of macros, we can actually write the following:

(doseq [entity entities]
  (with-logbook entity event
    body))

We are left with the entity (user), the body with the side-effect (send-newsletter) and the event (from which we derive the predicate). All the rest has been made implicit: how we retrieve the logbook for an entity, how we determine if the side-effect was successful, how we append events to the logbook.

Instead of cluttering the calling site, the user makes those assumptions explicit via a configuration mechanism. From then on, the with-logbook macro will do the reshuffle and the wiring behind the scenes.

It is useful to have that kind of functionality decoupled from application code because the need for timed idempotency pops up time and again. This is the value proposal of Benjamin, which I’m open sourcing today. It’s a small library, 9 lines of macro code and the same amount for a helper function. Quite Lispy in style.

Check out Benjamin on Github.

P.S. Follow me on Twitter.

Daniel Szmulewicz 07 August 2017
blog comments powered by Disqus