Pixel Canvas

Pixel Canvas

Pixel Canvas is a real-time collaborative canvas that I’m building as a learning project to better understand Elixir and the OTP platform. I chose this as a use-case that is easy to understand and test, and has high-throughput low-latency requirements.

Constraints

Because this is a project meant to help me deeply understand the building blocks of OTP, I decided to approach it with a no dependencies approach. This means:

  • manual TCP/IP handling
  • hand-rolled websocket implementation
  • custom binary WS protocol
  • custom telemetry infrastructure

AI Usage

During this process, I’ve been refining an AI-based learning framework. I wanted to leverage Claude as a mentor to help me along the way, but I set a hard rule that I wouldn’t let the LLM write any of my application code. The first commit in this project includes a CLAUDE.md that directs the agent to NEVER make direct code changes. At a high level, I directed Claude to:

  • provide honest, even harsh code reviews and architectural feedback
  • discuss architectural options with me without jumping straight to the “correct” answer
  • write unit tests based on specs I’ve settled on, but before I implement (classic TDD flow)

This has been like a more interactive codecrafters.io - having an expert available to explain tradeoffs and catch mistakes in real-time has dramatically accelerated my learning.

Scaling Goals

I have set some very high goals for this project in an attempt to really stretch out into handling large-scale traffic. The numbers are intentionally unrealistic - I’m simulating worst-case sustained load like what an IoT aggregator might see:

  • accept and broadcast 1M pixel changes per second
  • each client can send up to 60 messages/second
  • this requires the app to handle ~17k concurrent connections all maxing out their rate limits

Since I don’t have a bunch of IoT devices, the canvas gives me something people can easily interact with while I stress-test the underlying infrastructure.

Current Progress

This is a pretty ambitious project that I will probably be woodshedding for a while. As of 2025-10-22, I have accomplished the following:

  • a working prototype that handles ~70k messages per second across 500 connections
  • simple hand-rolled telemetry infrastructure for measuring performance
  • in-project load tests for local testing
  • started on a distributed load tester

The main bottleneck I’ve identified is process message buildup - I need to get from 70k to 1M msg/s (14x improvement) by addressing GenServer mailbox congestion.

What’s Next

  • connection pooling to reduce number of spawned processes
  • broadcast pooling to distribute the message queue load across multiple processes

Project Posts

No recent posts