Bioscoop
Creative coding with video
Table of Contents
Motivation
Historically, video editing has progressed from destructive techniques, ie. cutting the negative, to non-destructive ones, ie. non-linear editing. The latter has become possible due to film being no longer analog but digital. In both cases, however, the creative process is manual, or mouse-based. A different approach, often overlooked, is programmatic: the editor spells out the edits he wants to apply, and those are being carried out by an underlying system. FFmpeg is such a system; capable of carrying pretty much any editing instruction you can think of.
FFmpeg is embedded in virtually every major platform and application that handles media, yet its use in the creative coding community to make video art is limited. Most video creators prefer to stay away from the notoriously complex and error-prone string-based syntax. Indeed, its usability is hindered by the information density of its textual interface.
Bioscoop unleashes FFmpeg from its shackles and puts programmability in the center of the creative process.
AST convergence
The standout feature in terms of implementation and design is that Bioscoop does both standalone compilation and macro expansion without code duplication. Early in the project, as soon as the grammar for the language was defined, I realized that I didn't want to choose between an internal or external DSL. I wanted both.
Typically, external DSLs use a parser that produces a custom abstract syntax tree (AST), which then gets processed by a dedicated transformation pipeline specific to that parser. Meanwhile, internal DSLs use macros that directly generate target code without going through an intermediate AST structure. This creates two completely separate code generation paths - one for parsed text and another for macro-expanded forms - leading to duplicated transformation logic, potential behavioral inconsistencies between the two modalities, and significant maintenance overhead as changes must be applied to both transformation systems independently.
Traditional DSL Implementation*:
- External DSL: Parser → AST → Transformer
- Internal DSL: Macro → Direct Code Generation
- Problem: Transformation logic is duplicated.
In Bioscoop, a single transform-ast multimethod handles both external
and internal DSLs. The external parser processes text strings into a
parse tree, which is then transformed into AST nodes. The internal
macro converts Clojure forms into the exact same parse tree structure,
enabling both input modalities to share the single transformation
logic that converts the parse tree into the final AST structure. This
convergence at the parse tree level ensures behavioral consistency
while eliminating redundant transformation implementations.
AST Convergence Approach:
- External DSL: Parser → AST → Transformer
- Internal DSL: Macro → AST → Transformer
- Solution: Single transformation logic.
AST Convergence occurs when both the external DSL parser and the internal DSL macro system produce identical abstract syntax trees, enabling a single transformation pipeline to handle both input modalities.
An academic paper is available in draft form. It will be submitted to a peer-reviewed journal soon. That paper elaborates further on the language and related concepts.
Internals
The project is structured around several key components:
DSL Parser (src/bioscoop/dsl.clj)
- Uses Instaparse to parse a Lisp-like syntax defined in
resources/lisp-grammar.bnf - Grammar allows for expressions like
(scale 1920 1080)and(let [width 1920] (scale width 1080)) - Supports typical Lisp constructs: functions, let bindings, symbols, keywords, strings, numbers, booleans
- Transformation of the parse tree into internal representation
Core Data Structures
Three main records represent FFmpeg concepts:
- Filter: A single filter operation (e.g., scale, overlay, crop)
- FilterChain: A sequence of filters connected in series (comma-separated in FFmpeg)
- FilterGraph: Multiple filter chains running in parallel (semicolon-separated in FFmpeg)
Domain Model (src/bioscoop/domain/)
- Uses Clojure Spec to define the structure of filters, filter chains, and filtergraphs
- Provides validation for the core data structures
Rendering Engine (src/bioscoop/render.clj)
- Implements the
FFmpegRenderableprotocol to convert data structures to FFmpeg syntax - Handles input/output labels for complex filtergraphs
- Ensures proper escaping and formatting
Bidirectional Translation
- src/bioscoop/render.clj: Converts DSL → FFmpeg filtergraph strings
- src/bioscoop/ffmpeg.clj: Parses existing FFmpeg filtergraph strings → DSL structures
The filtergraph
In FFmpeg, atomic editing operations—such as scaling, cropping, blending, and color correction—are implemented as filters. With over 500 filters available, FFmpeg provides extensive transformation capabilities.
When multiple filters are applied sequentially to source material,
they form a filterchain, written as comma-separated
commands. Filterchains can be labeled at their input and output
points, allowing one chain's output to serve as another's input. These
interconnected filterchains create a directed acyclic graph (DAG)
structure, which FFmpeg calls a filtergraph. Filtergraphs are passed
to FFmpeg using the -filter_complex parameter.
That string-based syntax maps closely with the underlying libavfilter
that parses it. What is happening is the following: the parser
(libavfilter/graphparser.c) tokenizes the string, identifying
individual filters, their parameters, and the connections between
them.
Each filter name in the string (like scale, overlay, colorkey) maps to
a registered filter implementation in libavfilter. FFmpeg looks up
these filters in its internal registry and instantiates them as
AVFilterContext objects. Key-value pairs within each filter
specification are parsed and passed to the filter's initialization
function, which validates and stores them in the filter's private
context structure.
The parser creates an AVFilterGraph object and connects the instantiated filters according to the semicolons (filterchain boundaries) and labels in the string. Each connection becomes an AVFilterLink that defines data flow between filter pads.
FFmpeg validates the complete graph topology, checking that input/output pad counts match, media types are compatible (audio vs. video), and that there are no cycles.
- Filter names → AVFilter structs registered in libavfilter
- Parameters (key=value) → Filter-specific configuration passed to AVFilter->init()
- Filterchains (comma-separated) → Linked sequences of AVFilterContext nodes
- Labels ([label]) → Named AVFilterLink references for graph routing
- Semicolons → Graph branching points that create multiple parallel paths
This one-way translation from string to internal structures is the
core problem: there's no inverse mapping. Once parsed, the
AVFilterGraph exists in memory, but there's no standardized way to
serialize it back or manipulate it programmatically before the string
parsing step. Developers must work in strings because libavfilter's
graph construction API, while programmatically accessible, is complex
and poorly documented compared to the string syntax.
The string format is essentially a convenience layer over
libavfilter's C API—and it's become the only practical interface,
despite its limitations.
This creates several challenges:
No programmatic structure: Unlike many modern tools that use JSON, YAML, or object-based APIs, filtergraphs cannot be easily constructed, validated, or manipulated programmatically. There's no schema to reference, no type checking, and no ability to introspect the graph structure before execution.
String concatenation dependency: Building dynamic filtergraphs requires manual string concatenation, making the code fragile and error-prone. A single misplaced comma, semicolon, or bracket can break the entire pipeline, with errors only surfacing at runtime.
Limited tooling support: Because filtergraphs lack a formal representation, IDEs cannot provide syntax highlighting, auto-completion, or validation. Developers must memorize the syntax or constantly reference documentation.
Debugging difficulty: When a filtergraph fails, error messages reference the string position rather than logical components, making it hard to identify which filter or connection caused the problem.
This string-only representation means filtergraphs are essentially "write-only" code—difficult to read, maintain, and programmatically generate at scale.
The Language
The DSL accepts a Lisp-like syntax with the following core constructs:
Basic Syntax Elements
Program Structure:
- A program consists of zero or more expressions
- Multiple expressions are automatically composed into filter graphs
Atoms:
- Numbers:
42,-3.14,1920 - Strings:
"hello world","in","1920x1080" - Symbols:
scale,my-filter,input-vid - Keywords: :color, :size
- Booleans:
true,false
Core Language Constructs
- 1. Function Calls (Lists)
(scale {:width 1920 :height 1080}) (drawtext {:text 'Hello World' :x 100 :y 50})
Standard FFmpeg filters are being treated as functions in Bioscoop. Clojure functions are available as well, but since all expressions in Bioscoop must produce a Filtergraph, this is a valid Bioscoop program:
(scale {:width (inc 1919) :height 1080})
While this is not:
(inc 1919)
In typical Clojure fashion, keywords are functions. So you can write the following:
(let [data {:width 1920 :height 1080}] (scale {:width (:width data) :height (:height data)}))
- 2. Variable Binding (let)
(let [width 1920 height 1080] (scale {:width width :height height}))
- 3. Filterchains and filtergraphs
(chain (scale {:width 1920 :height 1080}) (transpose {:dir "cclock"}))
Filters in a single chain are transformations on a stream that flow from left to right. The video scales to 1080p, and then that scaled output is rotated 90 degrees counter-clockwise.
Because streams can split or merge, you use input/output labels enclosed in square brackets [v] to link the chains together.
[["0:v"] (scale {:width 1920 :height 1080}) ["scaled"]] [["1:v"] (flip) ["flipped"]] [["scaled"]["flipped"] (overlay) ["out"]]
- Chain 1 scales the first video input and names the output
[scaled]. - Chain 2 vertically flips a second video input and names the output
[flipped]. - These two unique chains are combined by the overlay filter.
To combine those separate filterchains in one filtergraph, you use Bioscoop's special form
compose:(compose [["0:v"] (scale {:width 1920 :height 1080}) ["scaled"]] [["1:v"] (flip) ["flipped"]] [["scaled"]["flipped"] (overlay) ["out"]])
- Chain 1 scales the first video input and names the output
- 4. Iteration (for)
(for [i (range n)] [[(str "v" i)] (xfade {:transition "fade" :duration 1 :offset (+ (* i 4) 3)}) [(str "v" (inc i))]])
forbinds one symbol to each element of a range expression (any seqable —range,map,filter,iterateall work) and evaluates the body once per element. The range expression is evaluated in the current environment, so it can referenceletbindings and injected Clojure locals.forisn't limited to producing filtergraph bodies — it can also appear in label position, generating a sequence of label strings that gets flattened into a padded graph's input/output labels:[[(for [i (range n)] (str "b" i "c"))] stacking] ;; generates ["b0c" "b1c" ... "b(n-1)c"] as input labels
- 5. Graph Definitions
Graph definitions are named, reusable components. They can appear anywhere in the source code. Subsequent definitions can refer to previous ones. They are resolved at runtime.
On the JVM (Clojure runtime), they behave like Vars (which they essentially are). In the standalone compiler, they are processed in the interpreter's environment.
(defgraph my-scale (scale {:width 1920 :height 1080})) (defgraph mirror-pipeline (compose [["0:v"] my-scale ["scaled"]] [["1:v"] (flip) ["flipped"]] [["scaled"]["flipped"] (overlay) ["out"]]) [["v:0"] (crop "iw/2" "ih" "0" "0") ])
Error Handling
Fail-soft discipline with typed sentinels. Compilation errors don't abort evaluation. A failing subexpression returns an empty FilterGraph in place, while the error itself is recorded in a dynamically-scoped, deduplicated error accumulator, printed immediately, and inspectable after compilation completes. This is a deliberate REPL-oriented design choice — partial results and immediate feedback matter more than hard failure boundaries for interactive creative coding.
Built-in Functions
The DSL provides access to numerous FFmpeg filters including:
- Scaling & Cropping:
scale,crop,pad - Color & Effects:
color,hue,negate,curves,threshold - Layout & Composition:
hstack,vstack,xstack,overlay,blend - Transforms:
hflip,vflip,zoompan - Text & Drawing:
drawtext,drawgrid - Sources:
testsrc,rgbtestsrc,smptebars - Time-based:
fade,loop,trim,setpts - Advanced:
split,concat,lut,lagfun,cellauto
Parameters
Following FFmpeg, Bioscoop supports positional parameters, but those are error prone. Named parameters are recommended:
(= (scale 1920 1080) (scale {:width 1920 :height 1080})) (= (crop "iw/2" "400" "ih" "800") (crop {:out_w "iw/2" :w "400" :out_h "ih" :h "800"}))
Grammar Rules (BNF Summary)
The language follows these production rules:
program = expression*
<expression> = atom | list | let-binding | for-binding | map | compose | graph-definition | padded-graph
let-binding = <'('> <'let'> binding-vector expression+ <')'>
<binding-vector> = <'['> binding* <']'>
binding = symbol expression
for-binding = <'('> <'for'> <'['> symbol expression <']'> expression+ <')'>
graph-definition = <'('> <'defgraph'> symbol expression* <')'>
padded-graph = <'['> label* expression label* <']'>
compose = <'('> 'compose' expression+ <')'>
list = <'('> expression* <')'>
<mapentry> = keyword expression [<','>]
map = <'{'> mapentry* <'}'>
label = <'['> label-content <']'>
<label-content> = string | number | identifier | for-binding | list
<atom> = number | boolean | !boolean !number symbol | string | keyword
number = #'-?\d+(\.\d+)?'
string = <'"'> #'[^"]*' <'"'>
<operator> = '=' | '<' | '>' | '<=' | '>=' | '+' | '-' | '*' | '/'
<identifier> = #'[a-zA-Z_][a-zA-Z0-9_\-:]*[?!]?'
symbol = operator | identifier
keyword = <':'> identifier
boolean = "true" | "false"
Inspiration
The acts of the mind, wherein it exerts its power over simple ideas, are chiefly these three:
- Combining several simple ideas into one compound one, and thus all complex ideas are made.
- The second is bringing two ideas, whether simple or complex, together, and setting them by one another so as to take a view of them at once, without uniting them into one, by which it gets all its ideas of relations.
- The third is separating them from all other ideas that accompany them in their real existence: this is called abstraction, and thus all its general ideas are made.
—John Locke, An Essay Concerning Human Understanding (1690)