-
Notifications
You must be signed in to change notification settings - Fork 282
Description
Engine Enhancement Proposal
Scope
This document reviews the current engine implementation in gh-aw and proposes a backward-compatible refactoring to remove the hard-coded engine patterns that currently limit flexibility.
The current design has several anti-patterns that make new engine support expensive and brittle:
- fixed engine inventories duplicated across the runtime registry, constants, schema, CLI prompts, and docs
- a single
EngineConfigtype that mixes generic settings with runtime-specific fields - provider auth and request behavior embedded directly in Go runtime implementations instead of declarative definitions
- secret handling coupled to built-in runtimes instead of provider definitions
- adding a new engine or provider requiring edits in multiple hard-coded locations rather than adding a catalog entry
The result is a system where engine support is tightly coupled to compiled code, configuration is not reusable, and non-standard backends require new bespoke integrations instead of fitting into a common definition model.
The goal is not to mimic the widely used AI SDK API exactly, but to bring the same architectural separation to gh-aw:
- runtime adapter selection
- provider selection
- model/profile selection
- runtime-managed provider delegation where applicable
- central registry-based resolution
- backward-compatible aliases for existing engine names
Reference material:
- https://ai-sdk.dev/docs/foundations/providers-and-models
- https://ai-sdk.dev/docs/ai-sdk-core/provider-management
Files Reviewed
Core engine and config flow:
pkg/workflow/agentic_engine.gopkg/workflow/engine.gopkg/workflow/engine_helpers.gopkg/workflow/compiler_orchestrator_engine.gopkg/workflow/threat_detection.gopkg/workflow/strict_mode_validation.gopkg/workflow/safe_outputs_env.go
Concrete runtimes:
pkg/workflow/claude_engine.gopkg/workflow/codex_engine.gopkg/workflow/copilot_engine.gopkg/workflow/copilot_engine_execution.gopkg/workflow/copilot_engine_installation.gopkg/workflow/gemini_engine.go
CLI and metadata surfaces:
pkg/cli/engine_secrets.gopkg/cli/add_interactive_engine.gopkg/cli/logs_orchestrator.gopkg/constants/constants.gopkg/parser/schemas/main_workflow_schema.jsondocs/src/content/docs/reference/engines.md
Current Architecture
Today, "engine" means "CLI/runtime implementation", not "LLM provider configuration".
Examples:
claudemeans the Claude Code CLI runtimecodexmeans the Codex CLI runtimecopilotmeans the GitHub Copilot CLI runtimegeminimeans the Gemini CLI runtime
agentic_engine.go defines a composite CodingAgentEngine interface that combines:
- identity
- capability flags
- install step generation
- execution step generation
- MCP config rendering
- log parsing
- secret requirements
- model env var handling
engine.go parses frontmatter into a single EngineConfig:
- shared-looking fields:
id,version,model,command,args,env - runtime-specific fields:
agent,user-agent,config,max-continuations
Each runtime then translates that raw config into runtime-specific shell commands, environment variables, log parsing, and secret validation.
Key Discoveries
1. The abstraction boundary is at the wrong level
The current registry is effectively a runtime registry, but the user-facing name engine suggests provider selection.
That mismatch is manageable with four built-ins, but it does not scale to:
- multiple providers using one runtime
- provider-specific auth and base URLs
- model aliases and profiles
- OpenAI-compatible/self-hosted endpoints
- organization-specific defaults
2. EngineConfig mixes generic and runtime-specific concerns
Current fields are not consistently generic:
- generic-ish:
model,version,env,command,args - Copilot-only:
agent,max-continuations - Codex-only:
user-agent,config - runtime/firewall coupling:
firewall
That means the schema looks portable, but behavior is runtime-specific and often implicit.
3. Provider metadata is duplicated across the codebase
The current engine inventory exists in multiple places:
NewEngineRegistry()inpkg/workflow/agentic_engine.goEngineOptionsinpkg/constants/constants.goAgenticEnginesinpkg/constants/constants.go- schema enums in
pkg/parser/schemas/main_workflow_schema.json - docs in
docs/src/content/docs/reference/engines.md - CLI selection logic in
pkg/cli/add_interactive_engine.go
This is already drifting:
geminiis registered in the runtime registry and schemageminiis not present inEngineOptionsgeminiis not present inAgenticEngines
That is a concrete sign that there is no single source of truth.
4. Secret handling is coupled to runtimes, not providers
Examples:
- Claude runtime returns
ANTHROPIC_API_KEY - Codex runtime returns
OPENAI_API_KEYandCODEX_API_KEY - Copilot runtime returns
COPILOT_GITHUB_TOKEN
This works for the built-ins, but it makes it hard to express:
- one provider with multiple accepted auth schemes
- short-lived token exchange before the model call
- provider-specific secret overrides independent of runtime choice
- organization-defined secret aliases
- custom providers with custom auth headers or base URLs
5. Provider identity alone is not enough
Some backends look OpenAI-like at the model endpoint but still require non-standard transport behavior.
Example requirements we already have within Cisco:
- exchange
clientIdandclientSecretfor a short-lived token first - send that token in
api-key, notAuthorization - build the request URL with a custom path template and
api-versionquery parameter - inject a JSON string containing
appKeyinto the request bodyuserfield
That means the design must support:
- auth strategy
- token exchange inputs and outputs
- header mapping
- request URL templates
- request body shaping
If the design cannot express those declaratively, then this class of backend still becomes a new hard-coded engine.
6. Model selection is inconsistent by design
Model selection currently depends on runtime-specific translation rules:
- native env vars for Claude, Copilot, Gemini
- shell config injection for Codex
- detection jobs use separate fallback env vars
This should remain a runtime concern internally, but it should not leak into the user-facing provider/model abstraction.
7. Too many consumers depend on raw EngineConfig
Engine config is consumed outside the runtime implementations:
- compiler orchestration
- strict-mode secret validation
- threat detection engine cloning
- safe-output metadata
- CLI secret setup
threat_detection.go currently copies only a subset of engine fields into a new EngineConfig. That is already fragile. A richer provider config will make this worse unless resolution becomes centralized.
8. Adding a provider today would require creating a new "engine"
With the current design, supporting something like Azure OpenAI or an OpenAI-compatible endpoint would likely require:
- a new engine ID
- new schema enum entries
- new docs
- new CLI secret handling
- new runtime-specific mapping logic
That is the wrong scaling model. Providers and runtimes are different axes.
Primary Objective
The principal update is to support engine definitions outside the current fixed built-ins.
Today, adding a new engine usually means changing Go code in multiple places:
- runtime registration
- schema enums
- docs
- CLI prompts
- secret handling
- tests
That does not scale. The core change should be:
- built-in engines become built-in definitions
- new engines can be declared through data
- Go runtime code remains responsible only for execution behavior that cannot be expressed declaratively
Provider/model separation matters because it makes those definitions reusable and composable, but it is not the primary story. The primary story is moving from a fixed engine list to a definition-based engine catalog.
Design Principles
- Keep
engineas the workflow-facing selection mechanism. - Preserve current built-in engine behavior.
- Separate runtime implementation from engine definition.
- Make one catalog the source of truth for built-in and external engines.
- Resolve engine configuration once and pass the resolved result downstream.
- Allow new engines to be added without editing multiple hard-coded engine lists.
- Allow provider definitions to describe auth flows and request shaping, not just names and secrets.
Proposed Architecture
1. Separate runtime adapters from engine definitions
The current CodingAgentEngine interface is really a runtime adapter interface.
It should be treated as the layer that knows how to:
- install a runtime
- execute a runtime
- render runtime-specific config
- parse logs
- expose runtime capabilities
It should not be the primary source of truth for the list of available engines.
Instead, introduce a first-class EngineDefinition model:
type EngineDefinition struct {
ID string
DisplayName string
Description string
RuntimeID string
Install InstallDefinition
Provider ProviderSelection
Models ModelSelection
Auth []AuthBinding
Options map[string]any
}Built-in engines such as claude, codex, copilot, and gemini should be represented as built-in EngineDefinition entries in the same model that future external engines will use.
2. Introduce an engine catalog
Add a catalog that resolves engine definitions from one place.
The catalog should merge:
- built-in engine definitions shipped with
gh-aw - repo-local or imported engine definitions
The exact storage format can be decided later. The important part is the model:
- the catalog is the source of truth
- schema, docs, CLI prompts, and validation derive from it
- runtime registration is separate from engine definition registration
3. Add provider and model structure to engine definitions
The AI SDK-style provider/model split is still valuable, but as support for engine definitions rather than as a standalone feature.
An engine definition should be able to specify:
- provider identity
- model defaults
- optional secondary models
- auth bindings
- provider-specific options such as base URL or headers
- request transport details such as path templates and query parameters
- body shaping rules for non-standard request payload fields
This allows fixed built-ins and external engines to share the same shape.
4. Add provider-owned auth and request shaping
Provider definitions need more than static secret names.
They should be able to describe:
- static secret headers
- OAuth2 client credentials or other token exchange flows
- token response parsing
- header mapping from token response fields
- custom request path templates
- query parameters
- request body injection rules
This is required for backends that are OpenAI-like but not wire-compatible.
For example, our Cisco AI requirements imply all of the following must be definable without creating a brand-new engine type:
- defaults:
baseUrlapiVersionmodel
- required config:
appKeytokenAuthUrlclientIdclientSecret
- auth flow:
- POST form-encoded client credentials request
- parse
access_token - map it into
api-key
- request shaping:
- URL template using deployment model name and
api-version - body injection of
user: "{\"appkey\":\"...\"}"
- URL template using deployment model name and
5. Add runtime-owned config rendering
Some runtimes are configured by flags and env vars. Others need generated config files.
The architecture should support both.
That means the runtime adapter should be able to consume a resolved engine definition and produce:
- command arguments
- env vars
- generated config files
- additional runtime metadata
This keeps engine definitions declarative while still allowing richer integrations.
6. Resolve once, consume everywhere
The compiler should resolve raw frontmatter into one ResolvedEngineTarget.
Everything downstream should consume that object instead of re-reading EngineConfig.
That includes:
- runtime execution
- strict-mode env validation
- threat detection
- safe-output metadata
- CLI secret requirement calculation
Suggested Internal Model
type RuntimeSelection struct {
ID string
Version string
Command string
Args []string
Env map[string]string
Options map[string]any
}
type ProviderSelection struct {
ID string
Model string
BaseURL string
Headers map[string]string
Options map[string]any
Auth AuthDefinition
Request RequestShape
Defaults map[string]string
Required []string
}
type ModelSelection struct {
Primary string
Small string
Extra map[string]string
}
type EngineDefinition struct {
ID string
DisplayName string
Description string
Runtime RuntimeSelection
Provider ProviderSelection
Models ModelSelection
Auth []AuthBinding
Options map[string]any
}
type ResolvedEngineTarget struct {
Definition EngineDefinition
Runtime AgentRuntime
SecretBindings []SecretBinding
GeneratedConfigFiles []GeneratedConfigFile
Metadata map[string]string
}
type AuthDefinition struct {
Strategy string
TokenURL string
ClientID string
ClientSecret string
TokenField string
HeaderName string
}
type RequestShape struct {
PathTemplate string
Query map[string]string
Headers map[string]string
BodyInject map[string]string
}Key point: workflows select an engine definition, and the runtime adapter executes the resolved target.
Proposed Workflow Shape
Backward-compatible legacy forms
These should continue to work:
engine: claudeengine:
id: codex
model: gpt-5These map to built-in engine definitions.
Proposed engine definition form
engines:
internal-coder:
display-name: Internal Coder
description: Company-standard coding engine
runtime:
id: codex
version: 0.105.0
provider:
id: openai-compatible
base-url: https://llm.example.com/v1
auth:
secret: INTERNAL_LLM_API_KEY
models:
primary: qwen3-coder-30b
small: qwen3-coder-7b
engine: internal-coderThis is the key capability the redesign should unlock: new engine definitions without hardcoding a new engine ID into the binary.
Proposed token-auth provider form
This class of definition is important because it proves the design can support custom auth and payload shaping without inventing a new hard-coded engine:
engines:
cisco-review:
display-name: Cisco Review Engine
runtime:
id: codex
provider:
id: openai
defaults:
baseUrl: https://chat-ai.cisco.com
apiVersion: 2025-04-01-preview
model: gpt-4o-mini
required:
- appKey
- tokenAuthUrl
- clientId
- clientSecret
auth:
strategy: oauth-client-credentials
tokenUrl: "{tokenAuthUrl}"
clientId: "{clientId}"
clientSecret: "{clientSecret}"
tokenField: access_token
headerName: api-key
request:
pathTemplate: /openai/deployments/{model}/chat/completions
query:
api-version: "{apiVersion}"
bodyInject:
user: "{\"appkey\":\"{appKey}\"}"
engine: cisco-reviewThis example demonstrates the intended design outcome:
- the engine still resolves through the same catalog model
- the runtime still stays generic
- the provider definition carries the custom auth and request semantics
- the backend does not require a new dedicated engine ID just because its auth flow is unusual
- required provider config values can be referenced by auth and request templates without leaking those details into a new engine type
Proposed explicit selection form
For workflows that do not need a named catalog entry:
engine:
runtime:
id: codex
version: 0.105.0
provider:
id: openai
model: gpt-5
auth:
secret: OPENAI_API_KEYThis can be treated as an inline engine definition that resolves the same way as a catalog entry.
Backward Compatibility
Legacy built-ins should remain available as named engine definitions:
claudecodexcopilotgemini
Compatibility rules:
engine: claudecontinues to workengine.idcontinues to work for built-ins- existing engine-specific fields continue to parse during migration
- documentation can gradually shift from "engine = built-in ID" to "engine = resolved definition"
Non-Goals
- This proposal does not require every runtime to support every provider.
- This proposal does not require all engine behavior to become declarative.
- This proposal does not make OpenCode the center of the design.
OpenCode should be treated as one future integration that can fit into the architecture, not as the architecture itself.
OpenCode Fit
OpenCode remains a good example of why this direction is useful:
- it is an engine integration outside the original fixed built-ins
- it already separates provider and model selection
- it likely needs runtime-owned config rendering rather than just flags
But that is a consequence of the design, not the driver of the design.
Reference from community tagged issue #20122
Proposed Implementation Plan
Phase 1: Introduce the internal engine definition model
Goal: separate engine definition data from runtime adapter code without changing workflow behavior.
Steps:
- Introduce
EngineDefinition,EngineCatalog, andResolvedEngineTarget. - Treat existing runtime implementations as runtime adapters.
- Represent current built-ins as built-in engine definitions.
- Update downstream consumers to use
ResolvedEngineTarget. - Add regression tests proving legacy built-ins still resolve exactly as before.
Phase 2: Create a single source of truth catalog
Goal: eliminate duplicated engine metadata.
Steps:
- Move built-in engine metadata into one catalog.
- Derive from that catalog:
- supported engine lists
- CLI selection options
- schema defaults and enums where still needed
- documentation tables and examples
- Remove hard-coded duplicate lists from constants, docs, and CLI helpers.
Phase 3: Add external engine definitions
Goal: support engines outside the built-in set.
Steps:
- Extend schema and parsing to support named engine definitions.
- Support inline engine definitions and catalog-defined engine definitions.
- Add validation for:
- unknown runtime IDs
- missing auth bindings
- invalid provider/model config
- invalid auth strategy or token mapping
- invalid request templates or body injection rules
- duplicate engine IDs
- Add import and merge semantics if engine definitions can come from shared workflows.
Phase 4: Add provider-owned auth and request shaping
Goal: support non-standard backends without hard-coded engines.
Steps:
- Add
AuthDefinitionandRequestShapeto the engine definition model. - Implement validation for:
- token exchange configuration
- response token extraction
- header mapping
- URL/query template rendering
- request body injection rules
- Ensure secret handling and strict-mode validation operate on provider auth definitions rather than just secret names.
Phase 5: Add runtime-owned config rendering
Goal: support richer engines that need generated configuration, not just flags and env vars.
Steps:
- Add a config-renderer hook to the runtime adapter layer.
- Let runtimes emit generated config files and runtime metadata.
- Keep provider/model/auth resolution in the engine definition layer.
Phase 6: Add new integrations on top of the definition model
Goal: validate that the architecture supports engines beyond the built-ins.
Candidate integrations:
- OpenCode
- openai-compatible internal engines
- other future runtime adapters
Testing Plan
Add or update tests in these areas:
- legacy engine resolution
- catalog resolution
- inline engine definition parsing
- external engine definition parsing
- provider/model/auth validation
- token exchange auth validation
- request template and body injection validation
- threat detection preservation of resolved engine data
- safe-output metadata emission
- CLI engine selection and secret prompts
- runtime config rendering for engines that need generated config
Concrete First Refactors
The lowest-risk first changes are:
- Introduce
ResolvedEngineTarget. - Make
compiler_orchestrator_engine.goresolve once and pass the result around. - Convert current built-ins into catalog-backed engine definitions.
- Replace duplicated engine lists with catalog-derived data.
- Add the runtime config-renderer hook.
Summary
The document should stay centered on one change:
- move from a fixed built-in engine list to a definition-based engine catalog
Provider/model separation supports that goal. Runtime adapters support that goal. Future integrations such as OpenCode benefit from that goal.
That is the cohesive direction for the engine redesign.