Why We Built Tornado in Rust: Performance at Scale
When we set out to build Tornado, we needed a technology stack that could handle thousands of concurrent downloads while maintaining reliability and minimizing resource usage. Here's why we chose Rust.
The Challenge
Video downloading at scale presents unique challenges:
- High concurrency — We process 600-1000 videos/hour per cluster
- Network-bound I/O — Downloads are limited by bandwidth, not CPU
- Memory efficiency — Streaming large files without loading into memory
- Reliability — Jobs must complete even when individual downloads fail
Why Rust?
Zero-Cost Abstractions
Rust's ownership model and zero-cost abstractions let us write high-level code that compiles to efficient machine code. No garbage collector pauses, no runtime overhead.
Fearless Concurrency
Rust's type system prevents data races at compile time. This is critical when managing thousands of concurrent connections and shared state.
Tokio Async Runtime
Tokio is the gold standard for async Rust. It provides:
- Efficient async I/O for network operations
- Work-stealing scheduler for balanced load
- Tower ecosystem for middleware and utilities
Architecture Overview
┌─────────────────────────────────────────┐
│ API Gateway │
│ (Actix-Web) │
└─────────────────┬───────────────────────┘
│
┌─────────────────▼───────────────────────┐
│ Job Queue │
│ (Redis) │
└─────────────────┬───────────────────────┘
│
┌─────────────────▼───────────────────────┐
│ Worker Cluster │
│ ┌─────────────────────────────┐ │
│ │ Downloader (yt-dlp) │ │
│ │ Muxer (FFmpeg) │ │
│ │ Uploader (S3/Azure/GCS) │ │
│ └─────────────────────────────┘ │
└─────────────────────────────────────────┘Key Design Decisions
Streaming Uploads
We never load entire files into memory. Downloads stream directly to cloud storage using multipart uploads, handling files of any size efficiently.
Graceful Degradation
When proxies get blocked or sources become unavailable, the system automatically retries with different configurations. Jobs fail only after exhausting all options.
Horizontal Scaling
Each worker node is stateless. Adding more nodes linearly increases throughput. Redis handles job distribution and state management.
Performance Numbers
| Metric | Value |
|---|---|
| Videos/hour (10-node cluster) | 600-1000 |
| Memory per worker | ~200 MB |
| CPU utilization | ~30% (network-bound) |
| P99 latency (API) | < 50ms |
Why Not Python or Node.js?
Most download tools and APIs are built with Python or Node.js. These are fine for prototypes, but they struggle at the scale we operate:
| Aspect | Rust (Tornado) | Python | Node.js |
|---|---|---|---|
| Memory per 1000 connections | ~200 MB | ~2 GB | ~800 MB |
| CPU overhead | Near zero | GIL bottleneck | Event loop limits |
| Concurrency model | True async (Tokio) | asyncio (limited) | Single-threaded event loop |
| Crash safety | Compile-time guarantees | Runtime exceptions | Runtime exceptions |
| Deploy size | ~15 MB binary | ~500 MB with deps | ~200 MB with deps |
When you're processing 400+ TB daily, every percentage of efficiency matters. Rust lets us run more workers per machine, handle more concurrent downloads per worker, and maintain sub-50ms API latency even under heavy load.
The Anti-Restriction Engine
The most critical part of Tornado isn't the download pipeline—it's the anti-restriction engine. This is what sets us apart from open-source tools like yt-dlp that can't maintain reliable access at scale.
Our engine manages:
- Proxy selection — Intelligent routing through residential proxies based on target, geography, and recent success rates
- Session management — Cookie and authentication state maintained per-proxy to mimic real user behavior
- Fingerprint randomization — TLS fingerprints, HTTP headers, and request patterns vary per session
- Rate limiting — Adaptive rate control that adjusts throughput based on platform response signals
- Automatic recovery — When a proxy gets flagged, the system seamlessly switches to a fresh one mid-download
All of this is written in Rust for maximum performance. The engine makes sub-millisecond decisions on proxy selection and can process thousands of routing decisions per second.
Lessons Learned
- Rust's learning curve is worth it — The upfront investment pays off in reliability, performance, and reduced operational costs.
- Async is essential for I/O-bound workloads — Tokio handles thousands of connections effortlessly with minimal resource usage.
- Type safety catches bugs early — Many potential runtime errors become compile-time errors, which is critical for a service with 99.9% uptime SLA.
- Performance enables features — Being fast enough to process TB/hour means we can offer pricing that makes large-scale AI data collection economically viable.
What This Means for You
You don't need to care about our tech stack to use Tornado—it's a simple REST API. But our architecture is why you can download several terabytes per hourwithout a single 403 error, while other solutions struggle with a few hundred videos.
Whether you're building AI training datasets, powering a long-form to short-form video AI pipeline, or running any media processing workflow at scale, Tornado's Rust-powered infrastructure ensures your downloads are fast, reliable, and cost-effective.