Bioscoop
Creative coding with video
Table of Contents
Motivation
Historically, video editing has progressed from destructive techniques, ie. cutting the negative, to non-destructive ones, ie. non-linear editing. The latter has become possible due to film being no longer analog but digital. In both cases, however, the creative process is manual, or mouse-based. A different approach, often overlooked, is programmatic: the editor spells out the edits he wants to apply, and those are being carried out by an underlying system. FFmpeg is such a system; capable of carrying pretty much any editing instruction you can think of.
FFmpeg is embedded in virtually every major platform and application that handles media, yet its use in the creative coding community to make video art is limited. Most video creators prefer to stay away from the notoriously complex and error-prone string-based syntax. Indeed, its usability is hindered by the information density of its textual interface.
Bioscoop unleashes FFmpeg from its shackles and puts programmability in the center of the creative process.
AST convergence
The standout feature in terms of implementation and design is that Bioscoop does both standalone compilation and macro expansion without code duplication. Early in the project, as soon as the grammar for the language was defined, I realized that I didn't want to choose between an internal or external DSL. I wanted both.
Typically, external DSLs use a parser that produces a custom abstract syntax tree (AST), which then gets processed by a dedicated transformation pipeline specific to that parser. Meanwhile, internal DSLs use macros that directly generate target code without going through an intermediate AST structure. This creates two completely separate code generation paths - one for parsed text and another for macro-expanded forms - leading to duplicated transformation logic, potential behavioral inconsistencies between the two modalities, and significant maintenance overhead as changes must be applied to both transformation systems independently.
Traditional DSL Implementation*:
- External DSL: Parser → AST → Transformer
- Internal DSL: Macro → Direct Code Generation
- Problem: Transformation logic is duplicated.
In Bioscoop, a single transform-ast
multimethod handles both external
and internal DSLs. The external parser processes text strings into a
parse tree, which is then transformed into AST nodes. The internal
macro converts Clojure forms into the exact same parse tree structure,
enabling both input modalities to share the single transformation
logic that converts the parse tree into the final AST structure. This
convergence at the parse tree level ensures behavioral consistency
while eliminating redundant transformation implementations.
AST Convergence Approach:
- External DSL: Parser → AST → Transformer
- Internal DSL: Macro → AST → Transformer
- Solution: Single transformation logic.
AST Convergence occurs when both the external DSL parser and the internal DSL macro system produce identical abstract syntax trees, enabling a single transformation pipeline to handle both input modalities.
A camera-ready LaTeX paper (for conference submissions) is available in the repository. It elaborates further on the concept of AST convergence.
Internals
The project is structured around several key components:
DSL Parser (src/bioscoop/dsl.clj
)
- Uses Instaparse to parse a Lisp-like syntax defined in
resources/lisp-grammar.bnf
- Grammar allows for expressions like
(scale 1920 1080)
and(let [width 1920] (scale width 1080))
- Supports typical Lisp constructs: functions, let bindings, symbols, keywords, strings, numbers, booleans
- Transformation of the parse tree into internal representation
Core Data Structures
Three main records represent FFmpeg concepts:
- Filter: A single filter operation (e.g., scale, overlay, crop)
- FilterChain: A sequence of filters connected in series (comma-separated in FFmpeg)
- FilterGraph: Multiple filter chains running in parallel (semicolon-separated in FFmpeg)
Domain Model (src/bioscoop/domain/
)
- Uses Clojure Spec to define the structure of filters, filter chains, and filtergraphs
- Provides validation for the core data structures
Rendering Engine (src/bioscoop/render.clj
)
- Implements the
FFmpegRenderable
protocol to convert data structures to FFmpeg syntax - Handles input/output labels for complex filtergraphs
- Ensures proper escaping and formatting
Bidirectional Translation
- src/bioscoop/render.clj: Converts DSL → FFmpeg filtergraph strings
- src/bioscoop/ffmpeg.clj: Parses existing FFmpeg filtergraph strings → DSL structures
The filtergraph
In FFmpeg, atomic editing operations—such as scaling, cropping, blending, and color correction—are implemented as filters. With over 500 filters available, FFmpeg provides extensive transformation capabilities.
When multiple filters are applied sequentially to source material,
they form a filterchain, written as comma-separated
commands. Filterchains can be labeled at their input and output
points, allowing one chain's output to serve as another's input. These
interconnected filterchains create a directed acyclic graph (DAG)
structure, which FFmpeg calls a filtergraph. Filtergraphs are passed
to FFmpeg using the -filter_complex
parameter.
That string-based syntax maps closely with the underlying libavfilter
that parses it. What is happening is the following: the parser
(libavfilter/graphparser.c
) tokenizes the string, identifying
individual filters, their parameters, and the connections between
them.
Each filter name in the string (like scale
, overlay
, colorkey
) maps to
a registered filter implementation in libavfilter
. FFmpeg looks up
these filters in its internal registry and instantiates them as
AVFilterContext
objects. Key-value pairs within each filter
specification are parsed and passed to the filter's initialization
function, which validates and stores them in the filter's private
context structure.
The parser creates an AVFilterGraph object and connects the instantiated filters according to the semicolons (filterchain boundaries) and labels in the string. Each connection becomes an AVFilterLink that defines data flow between filter pads.
FFmpeg validates the complete graph topology, checking that input/output pad counts match, media types are compatible (audio vs. video), and that there are no cycles.
- Filter names → AVFilter structs registered in libavfilter
- Parameters (key=value) → Filter-specific configuration passed to AVFilter->init()
- Filterchains (comma-separated) → Linked sequences of AVFilterContext nodes
- Labels ([label]) → Named AVFilterLink references for graph routing
- Semicolons → Graph branching points that create multiple parallel paths
This one-way translation from string to internal structures is the
core problem: there's no inverse mapping. Once parsed, the
AVFilterGraph
exists in memory, but there's no standardized way to
serialize it back or manipulate it programmatically before the string
parsing step. Developers must work in strings because libavfilter
's
graph construction API, while programmatically accessible, is complex
and poorly documented compared to the string syntax.
The string format is essentially a convenience layer over
libavfilter
's C API—and it's become the only practical interface,
despite its limitations.
This creates several challenges:
No programmatic structure: Unlike many modern tools that use JSON, YAML, or object-based APIs, filtergraphs cannot be easily constructed, validated, or manipulated programmatically. There's no schema to reference, no type checking, and no ability to introspect the graph structure before execution.
String concatenation dependency: Building dynamic filtergraphs requires manual string concatenation, making the code fragile and error-prone. A single misplaced comma, semicolon, or bracket can break the entire pipeline, with errors only surfacing at runtime.
Limited tooling support: Because filtergraphs lack a formal representation, IDEs cannot provide syntax highlighting, auto-completion, or validation. Developers must memorize the syntax or constantly reference documentation.
Debugging difficulty: When a filtergraph fails, error messages reference the string position rather than logical components, making it hard to identify which filter or connection caused the problem.
This string-only representation means filtergraphs are essentially "write-only" code—difficult to read, maintain, and programmatically generate at scale.
The Language
The DSL accepts a Lisp-like syntax with the following core constructs:
Basic Syntax Elements
Program Structure:
- A program consists of zero or more expressions
- Multiple expressions are automatically composed into filter graphs
Atoms:
- Numbers:
42
,-3.14
,1920
- Strings:
"hello world"
,"in"
,"1920x1080"
- Symbols:
scale
,my-filter
,input-vid
- Keywords:
:input
,:output
,:color
- Booleans:
true
,false
Core Language Constructs
- 1. Function Calls (Lists)
(scale 1920 1080) (overlay {:input "main"} {:input "overlay"}) (drawtext "text='Hello World'" "x=100" "y=50")
- 2. Variable Binding (let)
(let [width 1920 height 1080] (scale width height))
- 3. Filter Chains
(chain (scale 1920 1080) (crop "iw/2" "ih" "0" "0") (hflip))
- 4. Filter Graphs (Parallel Processing)
(graph (chain (scale 1920 1080) (crop "220")) (chain (hflip) (vflip)))
- 5. Graph Definitions (Reusable Components)
(defgraph my-scale (scale 1920 1080)) (defgraph mirror-pipeline (chain (crop "iw/2" "ih" "0" "0") (split {:output "left"} {:output "tmp"}) (hflip {:input "tmp"} {:output "right"}) (hstack {:input "left"} {:input "right"})))
- 6. Label Management
Explicit Labels:
(input-labels "in" "video0") (output-labels "out" "processed")
Inline Labels (Map Syntax):
(scale 1920 1080 {:input "in"} {:output "scaled"})
- 7. Padded Graphs (Complex Labeling)
[[in][offset] (chain (scale 1920 1080) (crop "220")) [out]] [[v:0][v:1] my-complex-filter [processed]]
- 8. Composition
(compose graph1 graph2 graph3)
Built-in Functions
The DSL provides access to numerous FFmpeg filters including:
- Scaling & Cropping:
scale
,crop
,pad
- Color & Effects:
color
,hue
,negate
,curves
,threshold
- Layout & Composition:
hstack
,vstack
,xstack
,overlay
,blend
- Transforms:
hflip
,vflip
,zoompan
- Text & Drawing:
drawtext
,drawgrid
- Sources:
testsrc
,rgbtestsrc
,smptebars
- Time-based:
fade
,loop
,trim
,setpts
- Advanced:
split
,concat
,lut
,lagfun
,cellauto
Parameter Passing
Positional Parameters:
(scale 1920 1080) (crop "iw/2" "ih" "0" "0")
Named Parameters (Maps):
(color {:color "blue" :size "1920x1080" :rate 24 :duration "10"}) (drawtext {:text "Hello World" :x 100 :y 50 :fontsize 24})
Advanced Features
- Mathematical Expressions
(let [width (mod 10 6) size (max 1920 1080) next (inc 1919)] (scale width size))
- Complex Pipeline Example
(let [out-left-tmp (output-labels "left" "tmp") in-tmp (input-labels "tmp") out-right (output-labels "right") in-left-right (input-labels "left" "right")] (graph (chain (crop "iw/2" "ih" "0" "0") (split out-left-tmp)) (hflip in-tmp out-right) (hstack in-left-right)))
This compiles to the FFmpeg filter graph:
crop=out_w=iw/2:w=ih:out_h=0:h=0,split[left][tmp];[tmp]hflip[right];[left][right]hstack
Language Design Principles
- Composability: Every construct can be composed with others
- Immutability: Variables are bound once and cannot be reassigned
- Explicit Labeling: Stream labels are first-class citizens
- Structural Equivalence: The DSL produces the same internal structures as parsing FFmpeg commands directly
- Error Handling: Comprehensive error reporting for invalid parameters and syntax
Grammar Rules (BNF Summary)
The language follows these production rules:
program = expression* expression = atom | list | let-binding | map | compose | graph-definition | padded-graph let-binding = '(' 'let' binding-vector expression+ ')' graph-definition = '(' 'defgraph' symbol expression* ')' padded-graph = '[' label+ expression* label+ ']' compose = '(' 'compose' expression+ ')' list = '(' expression* ')' map = '{' mapentry* '}' atom = number | boolean | symbol | string | keyword
Inspiration
The acts of the mind, wherein it exerts its power over simple ideas, are chiefly these three:
- Combining several simple ideas into one compound one, and thus all complex ideas are made.
- The second is bringing two ideas, whether simple or complex, together, and setting them by one another so as to take a view of them at once, without uniting them into one, by which it gets all its ideas of relations.
- The third is separating them from all other ideas that accompany them in their real existence: this is called abstraction, and thus all its general ideas are made.
—John Locke, An Essay Concerning Human Understanding (1690)