Trust the Data, Not Your Imagination

March 24, 2026 · Oskar Austegard

Claude Opus can look at an image, understand what it sees, and write SVG code to reproduce it. Over 31 painstaking iterations on Kandinsky's Around the Circle (1940), it produced increasingly faithful reproductions — hand-drawing each path from visual interpretation, adjusting colors by eye, layering shapes based on artistic judgment.

The result after all that work: 1,064 paths, 959KB, and a recognizable reproduction.

Then I ran a 20-second automated pipeline on the same image and got equivalent fidelity at half the file size.

The Experiment

The starting point was three SVG snapshots from a marathon Opus session — versions 9, 14, and 31 of 31 attempts to reproduce Kandinsky's painting. Version 9 was the earliest approach: clean geometric primitives (circles, rectangles, polygons) placed by visual interpretation. It looked like a diagram of the painting rather than a reproduction. By version 14, Opus had switched to organic traced paths, and the biomorphic forms finally started to flow. Versions 14 through 31 were incremental refinements within that paradigm.

Watching this progression made the failure mode obvious: Claude's visual interpretation of images is unreliable for precise color matching, shape positioning, and spatial relationships. Each iteration was an LLM guessing at hex values, estimating coordinates, and hoping the shapes would land in the right place. Sometimes they did. Often they didn't. 31 iterations is a lot of correction cycles.

So I built a skill around a different principle: every shape, color, and position must come from computational analysis of the source pixels.

The Pipeline

The pipeline is straightforward computer vision:

Preprocessing — bilateral filter to remove texture while preserving edges, then gentle Gaussian blur
Color quantization — K-means clustering (K=28–36) on the pixel color space, reducing millions of colors to a manageable palette
Background detection — identify background clusters by edge contact (colors that dominate image borders)
Contour extraction — for each color cluster, extract filled contours via OpenCV, simplify with approxPolyDP
Z-ordering — painter's algorithm, largest shapes first
SVG assembly — emit paths with extracted hex colors

No visual interpretation involved. The pipeline doesn't know it's looking at a Kandinsky — it sees pixel clusters and contour boundaries.

Kandinsky: The Scorecard

Five-way comparison: original Kandinsky, Opus v9, v14, v31, and skill pipeline output — Top row: original, Opus v9 (geometric), Opus v14 (organic paths). Bottom row: Opus v31 (final), skill pipeline (single pass).

	v9	v14	v31	Skill
RMSE	55.2	22.3	23.3	23.5
Mean ΔE	69.4	21.6	23.4	21.5
Correlation	0.12	0.86	0.85	0.84
File size	29KB	825KB	959KB	358KB
Paths	~300	986	1,064	825
Time	31 iterations			20 seconds

The skill's single automated pass matches v14/v31 on fidelity metrics at less than half the file size. Mean color distance is actually the lowest among all SVG versions — because K-means extracted colors from actual pixels instead of Claude eyeballing hex values.

Here's the skill's Kandinsky reproduction as an SVG you can zoom into:

SVG reproduction of Kandinsky's Around the Circle — 358KB, 825 paths, 20 seconds. Open full SVG

For comparison, here's Opus's early geometric attempt — version 9 — to illustrate the gap that visual interpretation has to cross:

Opus v9 geometric SVG attempt — The v9 "diagram" approach: clean primitives, correct composition, but reads as a schematic rather than a reproduction. Open full SVG

The Sfumato Stress Test

Kandinsky's geometric abstraction is arguably the easy case for flat-fill SVG — the painting already consists of discrete color regions. The real test: Leonardo's Mona Lisa, where the entire technique is built on invisible tonal gradients.

Three-way comparison: original Mona Lisa, quantized ceiling, SVG output — Left: original. Center: quantized image (K=36) — the theoretical ceiling. Right: SVG extraction (2,505 paths).

The numbers are counterintuitively good: RMSE 19.0, correlation 0.93 — better than Kandinsky. That's because the Mona Lisa has large continuous tonal regions; even quantized into 36 colors, the big shapes occupy roughly the right spatial areas and the pixel error stays low.

But the numbers lie about perceptual quality. Look at the face. Leonardo's sfumato — the invisible transitions between light and shadow at the corners of the mouth, the modeling around the eyes — gets carved into discrete zones with hard edges. The famous smile, which exists entirely in the gradient between light and shadow, becomes a harder line. The result has a woodcut quality that's immediately visible even though the metrics say it's close.

SVG reproduction of the Mona Lisa — 1.27MB, 2,505 paths, 75 seconds. Unmistakably her — but the sfumato is gone. Open full SVG

What This Shows

The interesting finding isn't that computer vision beats visual interpretation — that's expected. It's the specific failure modes on each side.

Visual interpretation fails at precision. When Claude hand-draws SVG from looking at an image, it gets the gestalt right but the details wrong. Colors drift. Shapes land in approximately the right place. Each iteration fixes some errors and introduces others. 31 iterations is a lot of human-in-the-loop correction for a result that an automated pipeline matches in seconds.

Automated extraction fails at semantics. The pipeline doesn't know that the red circle should be perfectly round, or that the smile should be a smooth transition rather than a polygon boundary. It faithfully reproduces whatever the contour extraction finds — including artifacts from the K-means boundaries. It can't make the judgment calls that would push quality beyond what the pixel data directly provides.

Both hit the same ceiling. The quantized image (RMSE 9.9 for Kandinsky, 11.8 for Mona Lisa) shows there's still a significant gap between what the pixel data knows and what contour extraction can represent. That gap is texture — the micro-variation in brushwork, the granular color transitions, the material quality of paint on canvas. Flat-fill SVG polygons can't capture it without fundamentally different techniques: SVG filters, gradient meshes, noise textures, or vastly more overlapping semi-transparent paths.

The Skill

The image-to-svg skill (v1.0.0) is available in the claude-skills repo. It's designed for Claude.ai's containerized compute environment — opencv-python-headless, scikit-image, scipy, and librsvg2-bin for rendering verification.

The core principle that emerged from watching 31 iterations fail in instructive ways: trust the data, not your imagination. Claude's visual interpretation is unreliable for precise spatial reasoning. Every shape, color, and position should come from computational analysis of source pixels. The skill codifies this into a pipeline with explicit anti-patterns — never hand-draw shapes, never claim a fix works without rendering, never boost saturation globally.

For photographic or painterly images, the current pipeline hits the flat-fill ceiling hard. A v2 could explore higher K values (48–64), finer polygon approximation, hierarchical contour extraction for nested shapes, and optional SVG filters to re-soften skin tones. But for graphic art, illustrations, and geometric compositions, a single pass already matches what iterative visual interpretation achieves — at a fraction of the cost.

All reproductions generated by Claude using OpenCV and K-means clustering. Kandinsky's Around the Circle (1940) and the Mona Lisa are in the public domain. The image-to-svg skill is open source.