Benchmarks
Performance comparison of django-vtasks against Celery and django-tasks-rq.
Methodology
All benchmarks simulate an async Django view dispatching tasks — the common ASGI use case.
- Enqueue: All frameworks enqueue from an async context (
await aenqueue()for VTasks,sync_to_async(task.delay)for Celery,sync_to_async(backend.enqueue)for RQ). This reflects real async Django views. - Processing: Each framework runs a single worker process with 200 concurrency (except RQ which is single-threaded).
- Task types: NoOp (raw overhead) and Sleep 10ms (simulates a lightweight DB query or API call).
- Infrastructure: Valkey 9, Python 3.14, Docker containers on the same host.
- Cloud simulation: 2ms network latency added via
tc netemon the Valkey container, simulating cloud-hosted Valkey/ElastiCache. - Measurement: Total time from worker start to all tasks completed, including worker startup.
Results — Local (0ms latency)
| Scenario | Tasks | Enqueue (ops/s) | Process (ops/s) | Peak RSS (MB) | Valkey Conns |
|---|---|---|---|---|---|
| VTasks — NoOp | 5,000 | 4,471 | 4,493 | 76 | 3 |
| VTasks — Sleep 10ms | 5,000 | 5,203 | 3,796 | 76 | 3 |
| Celery Threads — NoOp | 5,000 | 2,142 | 1,042 | 122 | 11 |
| Celery Threads — Sleep 10ms | 5,000 | 2,228 | 894 | 123 | 11 |
| RQ — NoOp | 500 | 500 | 52 | 174 | 4 |
| RQ — Sleep 10ms | 500 | 436 | 25 | 170 | 4 |
Results — Simulated Cloud (2ms RTT)
To simulate a production environment where Valkey runs on a separate host (e.g. AWS ElastiCache), we add 2ms network latency using tc netem on the Valkey container.
| Scenario | Tasks | Enqueue (ops/s) | Process (ops/s) | Peak RSS (MB) | Valkey Conns |
|---|---|---|---|---|---|
| VTasks — NoOp | 5,000 | 398 | 403 | 74 | 3 |
| VTasks — Sleep 10ms | 5,000 | 395 | 411 | 74 | 3 |
| Celery Threads — NoOp | 5,000 | 293 | 121 | 122 | 11 |
| Celery Threads — Sleep 10ms | 5,000 | 292 | 121 | 122 | 11 |
| RQ — NoOp | 500 | 55 | 11 | 174 | 4 |
| RQ — Sleep 10ms | 500 | 55 | 9 | 170 | 4 |
Analysis
vs Celery Threads (same deployment model: 1 process, 200 concurrency)
- Enqueue: VTasks is ~2x faster — native async vs
sync_to_asyncwrapping - Processing (local): VTasks is ~4x faster for I/O tasks (3,796 vs 894 ops/s)
- Processing (cloud): VTasks is ~3.4x faster (411 vs 121 ops/s)
- Memory: VTasks uses 38% less RAM (76 MB vs 123 MB)
- Connections: VTasks uses 3 Valkey connections vs 11
vs RQ (django-tasks ecosystem)
- Processing: VTasks is 75-150x faster — RQ is single-threaded with no async support
- Enqueue: VTasks is ~10x faster
- Memory: VTasks uses 56% less RAM
Why VTasks is faster
- Rust I/O driver (django-vcache) — async Valkey communication with minimal overhead
- Native asyncio — no thread pool overhead for async tasks, no
sync_to_asynctax on enqueue - Minimal connections — 2-3 multiplexed connections vs per-worker connections
- Efficient serialization —
orjsonwith optional zstd compression
Network latency impact
With 2ms RTT, all frameworks degrade significantly because each Valkey operation pays the round-trip cost. VTasks maintains its relative advantage but the absolute throughput drops ~10x. The worker currently fetches one task at a time via BLMOVE — a prefetch optimization using BLMPOP to batch-fetch tasks is planned.
Running Benchmarks
# Start services
docker compose up -d db cache
# Run the comparison suite
docker compose run --rm web uv run python benchmarks/compare.py
# For cloud simulation, add 2ms latency to the Valkey container:
# (requires cap_add: NET_ADMIN on the cache service in compose.yml)
docker compose exec cache apt-get update -qq && apt-get install -y -qq iproute2
docker compose exec cache tc qdisc add dev eth0 root netem delay 2ms
# Run benchmarks again
docker compose run --rm web uv run python benchmarks/compare.py
# Remove latency
docker compose exec cache tc qdisc del dev eth0 root
Notes
- These benchmarks focus on throughput under controlled conditions
- Real-world performance depends on task complexity, network latency, and infrastructure
- Celery offers features (chains, chords, result backends) that django-vtasks intentionally omits
- RQ uses 500 tasks (vs 5,000) because its single-threaded worker would take too long otherwise
- Celery enqueue uses
sync_to_async(task.delay)— this is what you actually do in an async Django view - VTasks enqueue uses native
await task.aenqueue()— no sync wrapping needed
Benchmarks last updated: April 2026