Polyglot Benchmarks: Right Tool for the Job

A short, skim-readable companion. For the full architecture deep-dive — harness layout, decision matrix, anti-patterns, and how to fork the benchmarks — read the long article.

Your org picked one stack in 2019. Your workloads did not.

Most teams still standardise on a single language for every new service: everything in Node, everything in Python, or a heroic bet on Rust for greenfield only. That habit is understandable — hiring, CI templates, and security review all get easier when the stack is uniform. The problem is that workloads are not uniform. An API gateway, a nightly ETL job, an NGINX edge transform, and a chat backend with long-lived connections stress different parts of the runtime. Picking one winner from a hallway debate or a single micro-benchmark is how you end up with the wrong tool carrying the wrong load.

Polyglot Benchmarks is the antidote: a reproducible harness and live dashboard that compares six runtimes on the same HTTP workloads, so architects can match evidence to bounded contexts instead of default-stack bias.

Polyglot Benchmarks — right tool, right job cover

What it measures

The public dashboard compares NGINX njs, OpenResty Lua, Python (FastAPI), Go (net/http), Rust (Actix), and Bun across seven synthetic tests that mirror real API and edge patterns: plain text baseline, JSON serialization, CPU-bound fibonacci, string manipulation, request inspection, internal subrequest + transform, and routing logic. Each test reports requests per second, average and tail latency (including P99), time to first byte from curl, and error counts — streamed live as bench.sh completes.

The harness lives in workflow-examples/benchmarks: Docker Compose services, a shared wrk profile (10 seconds, 4 threads, 100 connections), and JSON results consumed by the dashboard. You can fork it for your own candidates and hot paths.

Why this is a decision framework, not a leaderboard

Polyglot Benchmarks does not crown one language for all time. Winners change per test row — exactly what you want when designing microservices. The dashboard’s verdict section even maps results to use cases (edge routing in Lua/njs, core concurrency in Rust/Go, velocity in Python). That is the thesis: polyglot by design, with data for architecture review boards instead of opinion.

Polyglot Benchmarks four-step approach: define workload, run harness, compare, decide — The four-step approach in one frame: define the workload, run the harness, compare on the live dashboard, then decide with an ADR.

Six benefits for platform and engineering leads

Evidence over opinion — attach charts and config to ADRs; settle stack debates with measured runs.
Workload-specific winners — latency-sensitive paths vs batch vs edge transforms get different leaders.
Total cost of ownership — raw RPS is not enough; weigh build time, image size, team skill fit, and ops burden.
Reproducible workflows — same repo, same Compose file, same wrk script; rerunnable in CI.
Legitimate polyglot microservices — different languages per service boundary without shame or surprise.
Risk reduction — prototype in the runner-up before an org-wide mandate.

Polyglot benchmarks workflow from problem spec to ADR

Quick start

Open the live dashboard while a run is in progress (or start one locally).
Clone workflow-examples, cd benchmarks, docker compose up.
Read results.json on the dashboard volume and map winners to your workload rows.
Write the ADR — include duration, threads, connections, and hardware class.

Read the long version

The long article covers the full repo layout, a criteria table, the decision-matrix diagram, case patterns per test family, anti-patterns, limitations, and SEO-ready snippets for sharing with your ARB. Published by Workstation; benchmark site hosted at polyglot-benchmarks.fictionally.org.

Your org picked one stack in 2019. Your workloads did not.

Polyglot Benchmarks — right tool, right job cover

What it measures

Why this is a decision framework, not a leaderboard

The four-step approach in one frame: define the workload, run the harness, compare on the live dashboard, then decide with an ADR.

Six benefits for platform and engineering leads

Evidence over opinion — attach charts and config to ADRs; settle stack debates with measured runs.

Workload-specific winners — latency-sensitive paths vs batch vs edge transforms get different leaders.

Total cost of ownership — raw RPS is not enough; weigh build time, image size, team skill fit, and ops burden.

Reproducible workflows — same repo, same Compose file, same wrk script; rerunnable in CI.

Legitimate polyglot microservices — different languages per service boundary without shame or surprise.

Risk reduction — prototype in the runner-up before an org-wide mandate.

Polyglot benchmarks workflow from problem spec to ADR

Quick start

Open the live dashboard while a run is in progress (or start one locally).

Clone workflow-examples, cd benchmarks, docker compose up.

Read results.json on the dashboard volume and map winners to your workload rows.

Write the ADR — include duration, threads, connections, and hardware class.

Polyglot Benchmarks: Choosing the Right Tool for the Right Job

Your org picked one stack in 2019. Your workloads did not.

What it measures

Why this is a decision framework, not a leaderboard

Six benefits for platform and engineering leads

Quick start

Read the long version

Polyglot Benchmarks: Choosing the Right Tool for the Right Job

Your org picked one stack in 2019. Your workloads did not.

What it measures

Why this is a decision framework, not a leaderboard

Six benefits for platform and engineering leads

Quick start

Read the long version