feat(proxy): add token-based rate limiting via response parsing#1066
feat(proxy): add token-based rate limiting via response parsing#1066
Conversation
|
| Metric | Base | PR | Delta |
|---|---|---|---|
| Lines | 82.01% | 82.05% | 📈 +0.04% |
| Statements | 81.99% | 81.99% | ➡️ +0.00% |
| Functions | 82.91% | 82.91% | ➡️ +0.00% |
| Branches | 74.37% | 74.20% | 📉 -0.17% |
📁 Per-file Coverage Changes (2 files)
| File | Lines (Before → After) | Statements (Before → After) |
|---|---|---|
src/cli.ts |
46.3% → 46.1% (-0.22%) | 46.8% → 46.4% (-0.35%) |
src/docker-manager.ts |
83.1% → 83.7% (+0.56%) | 82.4% → 83.0% (+0.54%) |
Coverage comparison generated by scripts/ci/compare-coverage.ts
Smoke Test Results✅ GitHub MCP: Last 2 merged PRs: #1048 test: add CI workflow for non-chroot integration tests, #1063 feat(proxy): make copilot api target configurable for enterprise envi… Overall: PASS
|
Go Build Test Results
Overall: ✅ PASS
|
C++ Build Test Results
Overall: PASS
|
Bun Build Test Results
Overall: ✅ PASS Tested with Bun v1.3.10
|
🦀 Rust Build Test Results
Overall: ✅ PASS
|
Deno Build Test Results
Overall: ✅ PASS Deno version: 2.7.1
|
Smoke Test Results — Copilot Engine
Last 2 merged PRs:
Overall: PASS —
|
.NET Build Test Results
Overall: PASS Run outputhello-world: json-parse:
|
Build Test: Node.js Results
Overall: PASS ✅
|
There was a problem hiding this comment.
Pull request overview
Adds token-usage extraction to the API proxy so it can emit token metrics and enforce an opt-in tokens-per-minute (TPM) rate limit, with CLI/config plumbing and tests.
Changes:
- Introduces a
TokenExtractorTransform stream to parse token usage from OpenAI/Copilot and Anthropic responses (streaming + non-streaming). - Extends the proxy rate limiter with a TPM window and a post-response
recordTokens()path, plus/metricstoken counters. - Adds
--rate-limit-tpmflag and wires the config through TypeScript → docker-compose env vars, with unit/integration tests.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/integration/api-proxy-token-ratelimit.test.ts | New integration tests around TPM flag plumbing and /health output. |
| tests/fixtures/awf-runner.ts | Adds rateLimitTpm option and forwards --rate-limit-tpm to the CLI. |
| src/types.ts | Extends RateLimitConfig with required tpm field. |
| src/docker-manager.ts | Passes AWF_RATE_LIMIT_TPM into the api-proxy container environment. |
| src/docker-manager.test.ts | Updates env-var passthrough tests to include tpm. |
| src/cli.ts | Adds --rate-limit-tpm, validation, and config building for tpm. |
| src/cli.test.ts | Updates existing rate-limit config expectations to include tpm. |
| containers/api-proxy/token-extractor.test.js | New unit tests for token extraction across formats and encodings. |
| containers/api-proxy/token-extractor.js | New Transform stream to extract token usage from JSON and SSE responses. |
| containers/api-proxy/server.js | Pipes upstream responses through token extractor and records metrics / TPM usage. |
| containers/api-proxy/rate-limiter.js | Adds TPM sliding window + recordTokens() and exposes TPM in status. |
| containers/api-proxy/Dockerfile | Copies token-extractor.js into the api-proxy image. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (this._isSSE) { | ||
| this._processSSEChunk(chunk); | ||
| } else { | ||
| this._chunks.push(chunk); | ||
| } |
There was a problem hiding this comment.
In non-SSE mode the extractor buffers every response chunk into this._chunks and then Buffer.concat()s it in _flush(). There’s no maximum size / early abort, so a large non-streaming response can cause unbounded memory growth (and duplicates the streamed data in memory). Consider enforcing a max buffered size (fail-open to ZERO_TOKENS once exceeded) similar to the request MAX_BODY_SIZE in server.js, or only enabling buffering for JSON content-types you intend to parse.
| const isSSE = typeof contentType === 'string' && contentType.includes('text/event-stream'); | ||
| const enc = (contentEncoding || '').toLowerCase(); | ||
| const skipExtraction = enc === 'gzip' || enc === 'br' || enc === 'deflate'; | ||
|
|
There was a problem hiding this comment.
Token extraction is skipped for gzip/br/deflate responses. Since the proxy forwards client headers by default (including many clients’ Accept-Encoding), this can make TPM limiting and token metrics effectively no-op in common cases. Consider either stripping/overriding Accept-Encoding upstream (force identity) when TPM is enabled, or adding best-effort decompression for parsing while still passing the original bytes through unchanged.
| allowed: false, | ||
| limitType: 'tpm', | ||
| limit: this.tpm, | ||
| remaining: 0, |
There was a problem hiding this comment.
check() can now return limitType: 'tpm'. The API proxy’s 429 builder in containers/api-proxy/server.js currently only maps rpm/rph/bytes_pm for the human label and window field, so a TPM rejection will produce a confusing label and an incorrect window value (it will fall through to the bytes-per-minute case). Please update the server-side mapping to include 'tpm' (and a distinct window like per_minute_tokens) so clients can interpret the error correctly.
| remaining: 0, | |
| remaining: Math.max(0, this.tpm - tpmCount), |
| if (options.rateLimitTpm !== undefined) { | ||
| const tpm = parseInt(options.rateLimitTpm, 10); | ||
| if (isNaN(tpm) || tpm <= 0) return { error: '--rate-limit-tpm must be a positive integer' }; | ||
| config.tpm = tpm; | ||
| } |
There was a problem hiding this comment.
--rate-limit-tpm parsing/validation was added here, but there are no unit tests covering it (no rateLimitTpm cases in src/cli.test.ts). Add tests that: (1) enabling TPM sets config.tpm and enabled=true, and (2) invalid/zero/negative TPM returns the expected error message, to prevent regressions in flag handling.
| * Tests that token-per-minute (TPM) rate limiting works end-to-end with | ||
| * actual Docker containers. Uses the --rate-limit-tpm flag to enable | ||
| * token-based rate limiting. | ||
| * | ||
| * Note: These tests require the token-extractor.js module to be present | ||
| * in the api-proxy container. If the module is not yet merged, tests | ||
| * that depend on actual token extraction from responses will be skipped. |
There was a problem hiding this comment.
This suite is described as “Token Rate Limiting Integration Tests” and the header note says tests that depend on token extraction will be skipped if the module isn’t merged, but there’s no conditional skip logic here and the current assertions only verify /health output and absence of 429s—not that TPM enforcement triggers based on extracted tokens. Consider either updating the description to match what’s actually tested, or adding an end-to-end assertion that a low TPM limit produces a 429 after a response with known token usage (e.g., via a deterministic mock upstream or a server-level unit test with a mocked extractor).
| * Tests that token-per-minute (TPM) rate limiting works end-to-end with | |
| * actual Docker containers. Uses the --rate-limit-tpm flag to enable | |
| * token-based rate limiting. | |
| * | |
| * Note: These tests require the token-extractor.js module to be present | |
| * in the api-proxy container. If the module is not yet merged, tests | |
| * that depend on actual token extraction from responses will be skipped. | |
| * Tests that token-per-minute (TPM) rate limiting wiring works end-to-end with | |
| * actual Docker containers. Uses the --rate-limit-tpm flag to enable | |
| * token-based rate limiting and validates configuration/health output and | |
| * the absence or presence of 429s under specific configurations. | |
| * | |
| * Note: These tests assume the token-extractor.js module is present in the | |
| * api-proxy container when TPM is enabled, but they currently do not assert | |
| * on per-response token extraction behavior directly. |
| _processSSEChunk(chunk) { | ||
| const text = this._sseLineBuf + chunk.toString('utf8'); | ||
| const lines = text.split('\n'); | ||
| // Last element may be incomplete — keep it in the buffer | ||
| this._sseLineBuf = lines.pop(); | ||
|
|
There was a problem hiding this comment.
SSE parsing builds text via chunk.toString('utf8') and concatenates into a JS string. If a multi-byte UTF-8 sequence is split across chunks (possible when JSON contains non-ASCII text), toString() can introduce replacement characters and make the JSON invalid, causing token extraction to fail. Consider using StringDecoder('utf8') to safely decode chunk boundaries in streaming mode.
|
test: add CI workflow for non-chroot integration tests ✅
|
Chroot Version Comparison Results
Result: FAIL — Python and Node.js versions differ between host and chroot environments.
|
Java Build Test Results ☕
Overall: PASS ✅ All Java projects compiled and tests passed successfully via the AWF proxy (
|
|
| Metric | Base | PR | Delta |
|---|---|---|---|
| Lines | 82.01% | 82.06% | 📈 +0.05% |
| Statements | 81.99% | 82.00% | 📈 +0.01% |
| Functions | 82.91% | 82.50% | 📉 -0.41% |
| Branches | 74.37% | 74.03% | 📉 -0.34% |
📁 Per-file Coverage Changes (2 files)
| File | Lines (Before → After) | Statements (Before → After) |
|---|---|---|
src/cli.ts |
46.3% → 46.4% (+0.06%) | 46.8% → 46.7% (-0.07%) |
src/docker-manager.ts |
83.1% → 83.7% (+0.56%) | 82.4% → 83.0% (+0.54%) |
Coverage comparison generated by scripts/ci/compare-coverage.ts
C++ Build Test Results
Overall: PASS ✅
|
🦕 Deno Build Test Results
Overall: ✅ PASS
|
Favored main's robust JSON extraction approach (extractLastJson, extractCommandOutput) over HEAD's simpler string matching for test assertions in api-proxy-observability and api-proxy-rate-limit tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
✅ Coverage Check PassedOverall Coverage
📁 Per-file Coverage Changes (2 files)
Coverage comparison generated by |
Build Test: Bun ✅
Overall: PASS Test detailselysia
|
Build Test: Deno
Overall: ✅ PASS
|
Go Build Test Results ✅
Overall: PASS
|
Build Test: Node.js Results
Overall: ✅ PASS
|
.NET Build Test Results
Overall: PASS Run outputhello-world: json-parse:
|
|
Smoke Test Results — PASS
|
C++ Build Test Results
Overall: PASS
|
|
Smoke Test Results
|
🦀 Rust Build Test Results
Overall: ✅ PASS All Rust projects built and tested successfully.
|
Java Build Test Results
Overall: PASS ✅
|
Chroot Version Comparison Results
|
|
Smoke test results for run
Overall: FAIL (Playwright blocked by firewall)
|
✅ Coverage Check PassedOverall Coverage
📁 Per-file Coverage Changes (2 files)
Coverage comparison generated by |
|
Smoke Test Results — PASS ✅ GitHub MCP: #1160 fix(squid): block direct IP connections that bypass domain filtering; #1099 feat: add skip-cleanup flag and fix api-proxy shutdown delay
|
🧪 Smoke Test Results — PASS
Overall: PASS · PR author:
|
|
PR titles: fix(squid): block direct IP connections that bypass domain filtering
|
Chroot Version Comparison Results
Result: FAILED — Python and Node.js versions differ between host and chroot environments.
|
This comment has been minimized.
This comment has been minimized.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
✅ Coverage Check PassedOverall Coverage
📁 Per-file Coverage Changes (2 files)
Coverage comparison generated by |
|
PR titles: fix(squid): block direct IP connections that bypass domain filtering | feat: combine all build-test workflows into single build-test.md | test: add workDir tmpfs hiding integration tests | chore(deps): aggregated dependency updates
|
|
Smoke test results ✅ PASS
|
Chroot Version Comparison Results
Result: Some versions differ between host and chroot. Go matches, but Python and Node.js are on different versions. This is expected behavior — the chroot uses the Ubuntu 22.04 system-installed versions, while the host runner has newer versions installed.
|
|
Smoke test results for
Overall: FAIL (Playwright blocked by firewall)
|
🏗️ Build Test Suite Results
Overall: 7/8 ecosystems passed — ❌ FAIL ❌ Failure DetailsJava (gson, caffeine) — Maven cannot reach Maven Central due to network restrictions in the sandbox environment. The Maven needs to download plugins and dependencies from Maven Central on first run. Since the proxy hostname
|
Summary
--rate-limit-tpm <n>(opt-in)tokens_input_totalandtokens_output_totalcounters in/metricsBuilds on #1038 (observability + rate limiting).
How It Works
usagefielddata:lines as they flow throughChanges
token-extractor.jsserver.jsrate-limiter.jsrecordTokens()methodDockerfilecli.ts--rate-limit-tpmflagtypes.ts/docker-manager.tsTests (133 total new + existing)
Test plan
🤖 Generated with Claude Code