Bioscoop
Creative coding with video

Table of Contents

Motivation

Historically, video editing has progressed from destructive techniques, ie. cutting the negative, to non-destructive ones, ie. non-linear editing. The latter has become possible due to film being no longer analog but digital. In both cases, however, the creative process is manual, or mouse-based. A different approach, often overlooked, is programmatic: the editor spells out the edits he wants to apply, and those are being carried out by an underlying system. FFmpeg is such a system; capable of carrying pretty much any editing instruction you can think of.

FFmpeg is embedded in virtually every major platform and application that handles media, yet its use in the creative coding community to make video art is limited. Most video creators prefer to stay away from the notoriously complex and error-prone string-based syntax. Indeed, its usability is hindered by the information density of its textual interface.

Bioscoop unleashes FFmpeg from its shackles and puts programmability in the center of the creative process.

AST convergence

The standout feature in terms of implementation and design is that Bioscoop does both standalone compilation and macro expansion without code duplication. Early in the project, as soon as the grammar for the language was defined, I realized that I didn't want to choose between an internal or external DSL. I wanted both.

Typically, external DSLs use a parser that produces a custom abstract syntax tree (AST), which then gets processed by a dedicated transformation pipeline specific to that parser. Meanwhile, internal DSLs use macros that directly generate target code without going through an intermediate AST structure. This creates two completely separate code generation paths - one for parsed text and another for macro-expanded forms - leading to duplicated transformation logic, potential behavioral inconsistencies between the two modalities, and significant maintenance overhead as changes must be applied to both transformation systems independently.

Traditional DSL Implementation*:

  • External DSL: Parser → AST → Transformer
  • Internal DSL: Macro → Direct Code Generation
  • Problem: Transformation logic is duplicated.

In Bioscoop, a single transform-ast multimethod handles both external and internal DSLs. The external parser processes text strings into a parse tree, which is then transformed into AST nodes. The internal macro converts Clojure forms into the exact same parse tree structure, enabling both input modalities to share the single transformation logic that converts the parse tree into the final AST structure. This convergence at the parse tree level ensures behavioral consistency while eliminating redundant transformation implementations.

AST Convergence Approach:

  • External DSL: Parser → AST → Transformer
  • Internal DSL: Macro → AST → Transformer
  • Solution: Single transformation logic.

AST Convergence occurs when both the external DSL parser and the internal DSL macro system produce identical abstract syntax trees, enabling a single transformation pipeline to handle both input modalities.

A camera-ready LaTeX paper (for conference submissions) is available in the repository. It elaborates further on the concept of AST convergence.

Internals

The project is structured around several key components:

DSL Parser (src/bioscoop/dsl.clj)

  • Uses Instaparse to parse a Lisp-like syntax defined in resources/lisp-grammar.bnf
  • Grammar allows for expressions like (scale 1920 1080) and (let [width 1920] (scale width 1080))
  • Supports typical Lisp constructs: functions, let bindings, symbols, keywords, strings, numbers, booleans
  • Transformation of the parse tree into internal representation

Core Data Structures

Three main records represent FFmpeg concepts:

  • Filter: A single filter operation (e.g., scale, overlay, crop)
  • FilterChain: A sequence of filters connected in series (comma-separated in FFmpeg)
  • FilterGraph: Multiple filter chains running in parallel (semicolon-separated in FFmpeg)

Domain Model (src/bioscoop/domain/)

  • Uses Clojure Spec to define the structure of filters, filter chains, and filtergraphs
  • Provides validation for the core data structures

Rendering Engine (src/bioscoop/render.clj)

  • Implements the FFmpegRenderable protocol to convert data structures to FFmpeg syntax
  • Handles input/output labels for complex filtergraphs
  • Ensures proper escaping and formatting

Bidirectional Translation

  • src/bioscoop/render.clj: Converts DSL → FFmpeg filtergraph strings
  • src/bioscoop/ffmpeg.clj: Parses existing FFmpeg filtergraph strings → DSL structures

The filtergraph

In FFmpeg, atomic editing operations—such as scaling, cropping, blending, and color correction—are implemented as filters. With over 500 filters available, FFmpeg provides extensive transformation capabilities.

When multiple filters are applied sequentially to source material, they form a filterchain, written as comma-separated commands. Filterchains can be labeled at their input and output points, allowing one chain's output to serve as another's input. These interconnected filterchains create a directed acyclic graph (DAG) structure, which FFmpeg calls a filtergraph. Filtergraphs are passed to FFmpeg using the -filter_complex parameter.

That string-based syntax maps closely with the underlying libavfilter that parses it. What is happening is the following: the parser (libavfilter/graphparser.c) tokenizes the string, identifying individual filters, their parameters, and the connections between them.

Each filter name in the string (like scale, overlay, colorkey) maps to a registered filter implementation in libavfilter. FFmpeg looks up these filters in its internal registry and instantiates them as AVFilterContext objects. Key-value pairs within each filter specification are parsed and passed to the filter's initialization function, which validates and stores them in the filter's private context structure.

The parser creates an AVFilterGraph object and connects the instantiated filters according to the semicolons (filterchain boundaries) and labels in the string. Each connection becomes an AVFilterLink that defines data flow between filter pads.

FFmpeg validates the complete graph topology, checking that input/output pad counts match, media types are compatible (audio vs. video), and that there are no cycles.

  • Filter names → AVFilter structs registered in libavfilter
  • Parameters (key=value) → Filter-specific configuration passed to AVFilter->init()
  • Filterchains (comma-separated) → Linked sequences of AVFilterContext nodes
  • Labels ([label]) → Named AVFilterLink references for graph routing
  • Semicolons → Graph branching points that create multiple parallel paths

This one-way translation from string to internal structures is the core problem: there's no inverse mapping. Once parsed, the AVFilterGraph exists in memory, but there's no standardized way to serialize it back or manipulate it programmatically before the string parsing step. Developers must work in strings because libavfilter's graph construction API, while programmatically accessible, is complex and poorly documented compared to the string syntax.

The string format is essentially a convenience layer over libavfilter's C API—and it's become the only practical interface, despite its limitations.

This creates several challenges:

No programmatic structure: Unlike many modern tools that use JSON, YAML, or object-based APIs, filtergraphs cannot be easily constructed, validated, or manipulated programmatically. There's no schema to reference, no type checking, and no ability to introspect the graph structure before execution.

String concatenation dependency: Building dynamic filtergraphs requires manual string concatenation, making the code fragile and error-prone. A single misplaced comma, semicolon, or bracket can break the entire pipeline, with errors only surfacing at runtime.

Limited tooling support: Because filtergraphs lack a formal representation, IDEs cannot provide syntax highlighting, auto-completion, or validation. Developers must memorize the syntax or constantly reference documentation.

Debugging difficulty: When a filtergraph fails, error messages reference the string position rather than logical components, making it hard to identify which filter or connection caused the problem.

This string-only representation means filtergraphs are essentially "write-only" code—difficult to read, maintain, and programmatically generate at scale.

The Language

The DSL accepts a Lisp-like syntax with the following core constructs:

Basic Syntax Elements

Program Structure:

  • A program consists of zero or more expressions
  • Multiple expressions are automatically composed into filter graphs

Atoms:

  • Numbers: 42, -3.14, 1920
  • Strings: "hello world", "in", "1920x1080"
  • Symbols: scale, my-filter, input-vid
  • Keywords: :input, :output, :color
  • Booleans: true, false

Core Language Constructs

  • 1. Function Calls (Lists)
    (scale 1920 1080)
    (overlay {:input "main"} {:input "overlay"})
    (drawtext "text='Hello World'" "x=100" "y=50")
    
  • 2. Variable Binding (let)
    (let [width 1920
          height 1080]
      (scale width height))
    
  • 3. Filter Chains
    (chain
      (scale 1920 1080)
      (crop "iw/2" "ih" "0" "0")
      (hflip))
    
  • 4. Filter Graphs (Parallel Processing)
    (graph
      (chain (scale 1920 1080) (crop "220"))
      (chain (hflip) (vflip)))
    
  • 5. Graph Definitions (Reusable Components)
    (defgraph my-scale (scale 1920 1080))
    (defgraph mirror-pipeline 
      (chain 
        (crop "iw/2" "ih" "0" "0") 
        (split {:output "left"} {:output "tmp"})
        (hflip {:input "tmp"} {:output "right"})
        (hstack {:input "left"} {:input "right"})))
    
  • 6. Label Management

    Explicit Labels:

    (input-labels "in" "video0")
    (output-labels "out" "processed")
    

    Inline Labels (Map Syntax):

    (scale 1920 1080 {:input "in"} {:output "scaled"})
    
  • 7. Padded Graphs (Complex Labeling)
    [[in][offset] (chain (scale 1920 1080) (crop "220")) [out]]
    [[v:0][v:1] my-complex-filter [processed]]
    
  • 8. Composition
    (compose graph1 graph2 graph3)
    

Built-in Functions

The DSL provides access to numerous FFmpeg filters including:

  • Scaling & Cropping: scale, crop, pad
  • Color & Effects: color, hue, negate, curves, threshold
  • Layout & Composition: hstack, vstack, xstack, overlay, blend
  • Transforms: hflip, vflip, zoompan
  • Text & Drawing: drawtext, drawgrid
  • Sources: testsrc, rgbtestsrc, smptebars
  • Time-based: fade, loop, trim, setpts
  • Advanced: split, concat, lut, lagfun, cellauto

Parameter Passing

Positional Parameters:

(scale 1920 1080)
(crop "iw/2" "ih" "0" "0")

Named Parameters (Maps):

(color {:color "blue" :size "1920x1080" :rate 24 :duration "10"})
(drawtext {:text "Hello World" :x 100 :y 50 :fontsize 24})

Advanced Features

  • Mathematical Expressions
    (let [width (mod 10 6)
          size (max 1920 1080)
          next (inc 1919)]
      (scale width size))
    
  • Complex Pipeline Example
    (let [out-left-tmp (output-labels "left" "tmp")
          in-tmp (input-labels "tmp") 
          out-right (output-labels "right")
          in-left-right (input-labels "left" "right")]
      (graph
        (chain
          (crop "iw/2" "ih" "0" "0")
          (split out-left-tmp))
        (hflip in-tmp out-right)
        (hstack in-left-right)))
    

    This compiles to the FFmpeg filter graph:

    crop=out_w=iw/2:w=ih:out_h=0:h=0,split[left][tmp];[tmp]hflip[right];[left][right]hstack
    

Language Design Principles

  1. Composability: Every construct can be composed with others
  2. Immutability: Variables are bound once and cannot be reassigned
  3. Explicit Labeling: Stream labels are first-class citizens
  4. Structural Equivalence: The DSL produces the same internal structures as parsing FFmpeg commands directly
  5. Error Handling: Comprehensive error reporting for invalid parameters and syntax

Grammar Rules (BNF Summary)

The language follows these production rules:

program = expression*
expression = atom | list | let-binding | map | compose | graph-definition | padded-graph
let-binding = '(' 'let' binding-vector expression+ ')'
graph-definition = '(' 'defgraph' symbol expression* ')'
padded-graph = '[' label+ expression* label+ ']'
compose = '(' 'compose' expression+ ')'
list = '(' expression* ')'
map = '{' mapentry* '}'
atom = number | boolean | symbol | string | keyword

Inspiration

The acts of the mind, wherein it exerts its power over simple ideas, are chiefly these three:

  1. Combining several simple ideas into one compound one, and thus all complex ideas are made.
  2. The second is bringing two ideas, whether simple or complex, together, and setting them by one another so as to take a view of them at once, without uniting them into one, by which it gets all its ideas of relations.
  3. The third is separating them from all other ideas that accompany them in their real existence: this is called abstraction, and thus all its general ideas are made.

—John Locke, An Essay Concerning Human Understanding (1690)

Author: Daniel Szmulewicz

Created: 2025-10-07 Tue 02:46

Validate