AI Agent for Trucking β v1 Proposal
| Field | Value |
|---|---|
| Status | Draft β decisions locked, code not started |
| Owner | Scott Asher |
| Target repos | attunelogic-api, attunelogic-service |
| Industries | Trucking only (v1) |
| LLM provider | Anthropic (Claude) |
| Implementation plan | docs/plans/ai-agent-trucking-v1.md |
| Last updated | 2026-05-16 |
Reading order: start here for the what and why. When you're ready for the how and in what order, jump to the implementation plan β it mirrors the Phase 1 β 2 β 3 build order with checkbox-trackable todos.
Locked decisions (May 2026)β
Captured during the 2026-05-16 pre-build review. These are settled β code can be written against them without re-litigation. Anything not on this list defers to the section it lives in below.
Scope & postureβ
- Phase 1 and Phase 2 will be built; Phase 3 stays deferred until an Anthropic API key is acquired AND the data-policy review is signed off.
- "Live for development purposes" means: Phase 1+2 code deploys to
betaonly.AI_AGENT_GLOBAL_ENABLED=trueinbeta,falseinalphaandmain. No external tenant gets the flag forced on until the pre-release checklist is green. - Code work is not started yet. Lock the decisions in this doc, revisit kickoff after the current
feature/tenant-roles-permissionswork lands and the Anthropic key is in hand.
Names, paths & schemas (lock; renames cost migrations later)β
- Feature flag key:
aiAgent.enabled - Config block:
configs.aiAgent.{ model, monthlyTokenCap, perUserDailyMessageCap, monthlyCostCeilingUsd, allowedTools, defaultUndoWindowMs, dryRun } - System-level kill switch doc:
Configdoc{ type: "system", key: "aiAgent.globallyEnabled" } - Env vars:
AI_AGENT_GLOBAL_ENABLED,ANTHROPIC_API_KEY,AI_AGENT_DEFAULT_MODEL - Routes:
POST /api/v1/ai/agent/messages,GET /api/v1/ai/agent/sessions/:id,GET /api/v1/system/ai-status(no auth),GET /api/v1/admin/ai-agent/health(super-admin),GET /api/v1/admin/ai-agent/:parentCompanyId/activity,PATCH /api/v1/account/feature-flags - Mongo collection:
agentsessions(default Mongoose pluralization) - Source tree:
src/services/ai/{ anthropic.js, agent/{ index.js, systemPrompt.js, addressRedactor.js, addressDetector.js, tools/ } } - Additive Job fields:
aiCreated: Boolean(default false),aiCreatedAt: Date,pendingAiReview: Boolean(default false). All sparse-indexable, backwards compatible.
Cost ceilings & alerts (dev posture)β
- Internal/dev tenants: no hard auto-disable ceiling. App-side enforcement off for internal tenants.
- Alerts only: super-admin gets a Sentry alert + email when:
- Per-tenant MTD cost crosses $10/day for that tenant
- Per-tenant MTD cost crosses $50/month for that tenant
- The L2 runtime kill switch and the per-tenant Force Off remain available as manual cutoffs if alerts trip.
- External pilot ceilings are unchanged from the proposal: $25/tenant/mo, $200/day platform β those become live the moment a non-internal tenant is enabled. Internal vs external is identified by a
pilot.aiAgent === false(or unset) on the Customer doc β internal/dev tenants explicitly setpilot.aiAgent = "internal"to opt into alert-only mode.
Audit retentionβ
AgentSessiondocuments: 90-day TTL onstartedAt. Mongo TTL index handles purge.- Rationale: balances storage, debugging headroom (3 months of user-reported issue investigation), and audit-exposure surface. Longer-term audit is satisfied by Sentry breadcrumbs + Mongo backups, not by keeping primary
agentsessionsindefinitely. - Revisit before GA if real usage patterns suggest a different number.
Dry-run mode defaultβ
- No global default.
aiAgent.dryRunis set per-tenant, super-admin only. - New internal tenants are added with
dryRun = trueuntil the team is satisfied with tool-call quality on that tenant's actual data, then flipped tofalsedeliberately. - This avoids both extremes: never accidentally writing for a tenant we haven't vetted, and never gating internal work behind dry-run forever.
Anthropic provider (Phase 3 only)β
- API key: to be created when Phase 3 starts. Not blocking Phase 1+2.
- Data-policy review: narrowed scope is "client names + location names + city/state strings only." No addresses, lat/lng, postal, phone, email leave the API. Sign-off required before Phase 3 begins.
- Default model: pin to a specific dated Claude version at Phase 3 kickoff (current direction:
claude-sonnet-4-...). Read fromAI_AGENT_DEFAULT_MODELenv so version bumps don't require code changes.
Cross-referencesβ
- Implementation plan with checkbox-trackable todos:
docs/plans/ai-agent-trucking-v1.md - Risk callouts and the "lock this now or regret it" list live further down in this doc.
TL;DRβ
Add an in-app AI assistant on the service web that lets trucking customers create multi-leg Jobs from natural language ("schedule a load with 3 orders for Acme from Dallas to Houston"). The agent runs as an orchestrator on the API using Claude with tool-use, resolves entities through scoped read-only tools, and commits via the existing handleExtractedJobCreate pipeline so created records flow through current validation, tenancy, and audit. Created jobs are flagged as AI-originated and surfaced in the drawer with a 5-minute Undo window.
Two non-negotiable safety properties for launch:
- No addresses ever leave the API. The LLM works only with
{ id, name, city, state }. Street, ZIP, lat/lng, and pre-formatted address strings stay server-side. - Four independent kill-switch layers gate every request, so the agent can be disabled at the env, runtime, tenant, or industry level without touching the others.
Stakeholder review checklistβ
Use this section when sharing with stakeholders to capture sign-off.
- Product β confirm scope (trucking-only, create-only, drivers excluded, no net-new locations) is acceptable for v1
- Security / Privacy β sign off on the no-address PII model and the four-layer kill switch
- Operations β sign off on cost ceilings ($25/tenant/mo pilot, $200/day platform), runbook, and panic-button procedure
- Anthropic data-policy review β narrowed scope is "client names + location names + city/state strings only"
- Engineering β confirm phased build order is feasible and resource it
Architectureβ
flowchart LR
User["User (trucking tenant)"] -->|prompt| Drawer["AgentDrawer<br/>(service web)"]
Drawer -->|"POST /api/v1/ai/agent/messages"| Route["Express route<br/>L1+L2+L3+L4 gates<br/>+ rate limit"]
Route --> Orchestrator["Agent service<br/>(Claude + tool loop)"]
Orchestrator -->|tool calls| Tools["Scoped tools (parentCompany)"]
Tools --> Read[("Clients,<br/>Locations")]
Orchestrator -->|"createDraftJob tool"| Existing["createJob -> handleExtractedJobCreate<br/>(existing controller)"]
Existing --> DB[("MongoDB")]
Existing -->|"jobId + aiCreatedAt"| Drawer
Drawer -->|"5-min Undo<br/>DELETE /jobs/:id"| Existing
Core principle: the LLM never writes directly. It calls scoped read tools to resolve entities, then a single createDraftJob tool that calls the existing createJob controller path with an extractedData.legs payload. This reuses every guardrail already in attunelogic-api/src/controllers/jobs/create.js.
Critical implementation notes (validated against the codebase)β
createDraftJobpayload shape (deterministic, low-risk). The agent's tool always emits a fully pre-resolved payload β no name strings, no fuzzy fields β so the server-side OCR matching path is short-circuited:All location IDs come from{
"client": "<Client._id>",
"appointmentDate": "<ISO>",
"extractedData": {
"legs": [
{ "origin": "<Location._id>", "destination": "<Location._id>", "resolvedLocation": { "id": "<Location._id>" }, "orderNumber": "1" },
{ "origin": "<Location._id>", "destination": "<Location._id>", "resolvedLocation": { "id": "<Location._id>" }, "orderNumber": "2" }
]
},
"aiCreated": true,
"pendingAiReview": true
}searchLocations(which only returns IDs scoped toparentCompany). Client ID comes fromsearchClients. The LLM never invents IDs.- No driver assignment in v1. Setting
leg.drivertriggershandleUserAssignedToLeg, which sends notifications. Drivers are dispatched by humans after approval. - Approval status is NOT a schedule gate.
schedule.allfor trucking does not filter byapproval.status, so a pending job is still visible on dispatch screens. Mitigation: newpendingAiReviewflag on Job + an opt-in exclude filter so AI-created jobs stay off dispatch schedules until approved. - Search tools reuse existing endpoints rather than creating parallel ones:
searchClientsβGET /clients?search=true&searchTerm=...(Atlas Search autocomplete onname)searchLocationsβGET /locations?search=true&searchTerm=...&clientId=...(regex-with-scoring)
- Live flag refresh.
ConfigProvidercurrently only initializesconfigsonce; we patch itsuseEffectto re-sync onconfigDatachange so admin toggles propagate to the launcher without a page reload.
Data sent to Anthropic (PII minimization)β
Hard rule: no addresses ever leave the API. This includes street, postalCode, country, lat/lng, and any pre-formatted "123 Main St, Dallas TX 75001" strings. The LLM works entirely with identifiers + display labels (id + name + city + state).
flowchart LR
User[User prompt] --> Guard["Address detector<br/>(regex on inbound prompt)"]
Guard -->|address-shaped| Reject["400 ADDRESS_DETECTED<br/>friendly nudge in UI<br/>nothing forwarded"]
Guard -->|clean| Orch[Orchestrator]
Orch -->|searchLocations clientId, query| Tool[searchLocations tool]
Tool --> DB[(Locations<br/>full docs)]
DB --> Redact["addressRedactor<br/>β {id, name, city, state}"]
Redact -->|redacted JSON| Orch
Orch -->|"createDraftJob<br/>legs[].pickup={locationId},<br/>legs[].drop={locationId}"| Existing[handleExtractedJobCreate]
Existing --> Resolve["Server resolves IDs<br/>β full Location docs<br/>(addresses stay server-side)"]
What goes to Anthropic:
- The user's prompt (after passing the address-detector guard)
- The system prompt (no tenant data)
- Tool results in the form
{ id, name, city, state }only - Created job summaries by
id+ leg names
What never goes to Anthropic:
- Street addresses, postal codes, country codes
- Latitude / longitude
- Pre-formatted address strings
- Phone numbers, emails, contact names
- Driver names or IDs (driver assignment is out of scope for v1)
- Anything from the user's prompt that looks address-shaped (rejected pre-flight)
How it's enforced (six layers):
- Inbound guard. A regex-based
addressDetectorscans the user's message before it reaches the LLM. Address-shaped tokens (street suffixes like St/Ave/Blvd, ZIP/postal patterns, lat/lng pairs) trigger a friendly 400 with codeADDRESS_DETECTED. The message is never forwarded. - Outbound redaction. Every tool that touches Location data passes its result through
addressRedactorbefore returning to the orchestrator. Deny-list (dropsstreet,postalCode,country,coordinates,formattedAddress) layered with an allow-list (onlyid,name,city,statesurvive). - Reference-by-ID
createDraftJob. The LLM cannot construct a location β it can only reference IDs returned by a priorsearchLocationscall in the same session. The server re-resolves the ID against the full DB record (with addresses) when callinghandleExtractedJobCreate. The LLM never sees, types, or stores the address. - AgentSession audit storage. Tool result snapshots persisted on the session use the redacted shape. Even if a future bug exposed sessions, no addresses would leak.
- Regression test gate.
tests/services/ai/no-address-leak.test.jsruns every tool against fixtures with recognizable addresses and fails if any address-shaped string appears in the orchestrator-bound payload or the persisted session. Runs in CI on every PR touchingsrc/services/ai/**. - UX nudge. The InputBar tells users up front: "Refer to locations by name β please don't paste addresses." The 400 error renders an inline nudge with a quick link to create a saved Location.
Trade-offs the user should know about:
- City/state ambiguity. Two locations in the same city share the city/state label. The LLM resolves them by name suffix ("Acme Dallas DC" vs "Acme Dallas Yard"). Search results return both with city/state to help disambiguate.
- Net-new locations are out of scope for v1. v1 will not let the LLM create a Location, since creating one would require the LLM to handle an address. If the user wants a stop at a location not yet saved, the agent responds: "I can't add a new stop yet β please create the location first, then I can use it." This is a deliberate v1 limit that keeps the no-address rule airtight.
- Slightly chattier prompts. Users have to refer to locations by name rather than describing them by address. Mitigated by the
searchLocationsautocomplete being good enough that "Acme Dallas" finds "Acme Dallas DC".
Anthropic data-policy review is now narrowly scoped to: client names + location names + city/state strings, in a context where Anthropic has zero data-retention beyond standard processing. Significantly smaller review than the original "names + addresses" scope.
Kill-switch hierarchy (defense in depth)β
The agent is gated by four independent layers that are evaluated in order on every request. Any one of them can disable the system without touching the others. This is the core safety story for the launch.
flowchart TD
Req["Incoming agent request"] --> L1{"L1<br/>AI_AGENT_GLOBAL_ENABLED env var"}
L1 -->|false| Block1["503 Service Unavailable<br/>(no DB, no LLM, no logs beyond a counter)"]
L1 -->|true| L2{"L2<br/>System runtime flag<br/>(Config type=system)"}
L2 -->|false| Block2["503 β global runtime kill"]
L2 -->|true| L3{"L3<br/>Tenant flag<br/>featureFlags.aiAgent.enabled"}
L3 -->|false| Block3["403 β tenant has agent off"]
L3 -->|true| L4{"L4<br/>Industry gate<br/>appType === trucking"}
L4 -->|false| Block4["403 β wrong industry"]
L4 -->|true| L5["Run rate limits, cost ceiling,<br/>dry-run check, then handler"]
| Layer | Source of truth | Who controls it | How fast to flip | Use case |
|---|---|---|---|---|
| L1: Env var | AI_AGENT_GLOBAL_ENABLED env on every API instance | Devops (deploy or env update) | Minutes (rolling restart) | Hardest kill. Defaults to false in production β agent is opt-in per environment. Survives DB outages. |
| L2: System runtime flag | Config doc with type: "system", key aiAgent.globallyEnabled | Super-admin via panic button | Seconds (cached 30s per process; cache busted on toggle) | Operational kill. Use during incidents, billing spikes, or vendor outages. No deploy needed. |
| L3: Tenant flag | Config.configs.featureFlagOverrides.aiAgent.enabled | Super-admin (Force On/Off) or tenant admin (self-serve) | Immediate | Per-customer enablement. The feature for everyday admin work. |
| L4: Industry gate | Customer.appType === "trucking" | N/A β derived from tenant data | N/A | Belt-and-suspenders. Prevents accidental enablement on a non-trucking tenant. |
Key properties:
- L1 short-circuits everything. When the env var is false, the route returns
503before any DB query, before reading any tenant config, before touching the LLM. No cost, no logs (except a counter for monitoring), no risk. - L2 has a 30s per-process cache so it doesn't add a DB hit per request, but the panic-button toggle explicitly busts the cache across all instances via a Mongo change stream / pub-sub (or, simpler for v1: 30s is acceptable kill-window with documented expectation).
- L4 is hard-coded by industry, not a flag. Even if someone Force On's the tenant flag for a service-repair tenant, the industry gate still blocks them. Cannot be bypassed from the admin UI.
- The launcher in service web honors the same hierarchy. A new public-ish
GET /api/v1/system/ai-statusendpoint returns{ globalEnabled }with no auth required (just the L1+L2 result), polled every 60s + on window focus by the service web. When global goes off, the launcher disappears from active sessions within 60s without requiring a re-login. - Circuit breaker can flip L2 automatically when error rate or platform cost exceeds threshold. Auto-recovery is intentionally manual β a human must verify before re-enabling.
Automatic safety circuitsβ
Beyond the manual kill switches, the system protects itself:
- Error-rate circuit breaker. Sliding 5-minute window of Anthropic API calls. If error rate exceeds 25% over at least 10 calls, automatically flips L2 to off, sends Sentry alert + super-admin email. Manual flip required to re-enable.
- Platform cost ceiling. Independent of per-tenant ceilings. Sum of all tenant MTD cost. If 24h rolling spend exceeds platform threshold (configured value, e.g. $200/day starting), flips L2 to off + alerts.
- Per-tenant cost ceiling. When a tenant's MTD cost hits its per-tenant
monthlyCostCeilingUsd, that tenant is auto-disabled (their L3 effectively flips to off via atenantSuspendedUntilfield). Other tenants unaffected. Resets on the 1st of the next month. - Per-tenant message rate limit.
perUserDailyMessageCapandaiAgentLimiter(express-rate-limit). 429 when exceeded.
Dry-run modeβ
A per-tenant aiAgent.dryRun: true config flag (super-admin only) makes createDraftJob return a preview payload without creating a Job. Lets us:
- Pilot in production with a real tenant and zero write risk
- Demo the agent to a prospect without touching their data
- Test the LLM's tool selection on real prompts before flipping write-mode on
The drawer UI shows a clear "Preview mode β no jobs will be created" banner when dryRun is on.
Per-customer on/off (first-class requirement)β
Two control surfaces β super-admin (full control) and tenant admin (self-serve for their own org only).
Super-admin (existing UI, just register the flag)β
SuperAdmin > Feature Flags(/admin/feature-flags) β pick a tenant β theaiAgent.enabledrow appears with Inherit / Force On / Force Off. No new admin screen needed.- Same UI also surfaces Tier Defaults, beta allowlist, and (new) the AI Activity widget below.
Tenant admin self-serve (new, scoped)β
- New endpoint
PATCH /api/v1/account/feature-flagswithverifyToken+admin(role) +verifyParent. Body:{ featureKey, value }wherevalue β { true, false, null }(null = inherit). - Allowlist enforcement: the endpoint only accepts flags whose registry entry has
tenantAdminToggleable: true. Any other key returns403. This means we explicitly opt features in to self-serve βaiAgent.enabledis opted in; sensitive infra flags are not. - Writes to the same
Config.configs.featureFlagOverridesthat super-admin uses, so both surfaces share one source of truth and the resolver doesn't change. - Where it appears in the UI: new "AI Assistant" card in the existing tenant settings/account area (admin-visible only), with a single on/off toggle, short description, and a "Learn more" link. Toggle is hidden for non-admins.
- Live effect: on success, optimistically refresh
useConfig()so the launcher shows/hides immediately without a reload.
Rollout knobs available out of the boxβ
- Default off for everyone via
defaultEnabled: false+lifecycle: "beta"in the registry. - Force On / Force Off per tenant from super-admin (writes
featureFlagOverrides). - Tenant admin self-serve via the new account-scoped endpoint.
- Tier-based defaults (e.g., enable for tier3+) via existing Tier Defaults tab.
- Beta allowlist (channel =
latest) for staged rollouts without a Force On. - Kill switch: Force Off any tenant from super-admin to override even the tenant's own preference (Force Off beats tenant admin On per existing resolver precedence).
AI Activity widget (super-admin)β
Lives on SuperAdmin > Feature Flags and renders when a tenant is selected and the registry contains aiAgent.enabled. Lets ops monitor adoption and cost per customer alongside the toggle.
- Endpoint:
GET /api/v1/admin/ai-agent/:parentCompanyId/activity(super-admin only). - Response shape:
{
enabled: boolean,
tokenUsage: {
monthToDate: { input: number, output: number, estimatedCostUsd: number },
perDay: [{ date: "YYYY-MM-DD", input, output }] // last 30 days
},
sessionCounts: { mtd: number, today: number },
jobsCreated: { mtd: number, today: number },
recentSessions: [
{ _id, user: { _id, name }, startedAt, promptPreview, toolsUsed: [string], jobsCreatedCount, tokenUsage: { input, output }, status }
] // last 10
} - Aggregations: computed from
AgentSession(filtered byparentCompany). MTD aggregations use a single Mongo aggregation pipeline; recent sessions =find().sort({ startedAt: -1 }).limit(10). - Cost estimate: model-pricing table in
src/services/ai/pricing.js(input/output $/1M tokens by model name); easy to update without redeploy by reading from config. - UI: new card built with existing
shared/Card,shared/Badge, and a small inline sparkline. Shows totals at top, table below, sparkline on the right. - Empty state: when
enabled === falseand no sessions exist, shows "No AI activity for this tenant yet" with a hint to enable the flag.
API changes (attunelogic-api)β
New dependency
@anthropic-ai/sdk(latest). API key in env:ANTHROPIC_API_KEY(add to.env.example,config/keys.js,config/index.js).
New files
src/services/ai/anthropic.jsβ singleton client wrapper, model selection (defaultclaude-sonnet-4-β¦), token usage logger.src/services/ai/agent/systemPrompt.jsβ tenant-aware system prompt (injectsappType, today's date, tenant timezone, rules: "always confirm ambiguous client/location matches before creating", etc.).src/services/ai/agent/index.jsβrunAgent({ messages, tenantContext }). Implements the Claude tool-use loop with hard cap of N iterations.src/services/ai/agent/tools/index.jsβ tool registry exporting{ name, description, input_schema, handler }.src/services/ai/agent/tools/searchClients.jsβ fuzzy match onClient.name, scoped byparentCompanyfromcustomerConfigStorage. Returns top 5 with_id,name, location count.src/services/ai/agent/tools/searchLocations.jsβ search by city/state/name, optionally filtered byclientId. Returns top 5, redacted to{ id, name, city, state }.src/services/ai/agent/tools/createDraftJob.jsβ buildsextractedDatapayload and invokes existingcreateJob(or its innerhandleExtractedJobCreate). SetsaiCreated: true,aiCreatedAt: new Date(),pendingAiReview: trueon the Job document. Returns the createdjobId+ summary.src/services/ai/agent/addressRedactor.jsβ utility that strips address-bearing fields from any object before it reaches the orchestrator.src/services/ai/agent/addressDetector.jsβ regex guard for inbound user prompts.src/controllers/ai/agent/index.jsβmessages.createhandler. Validates body, attaches tenant + user context, callsrunAgent, returns{ messages, toolEvents, createdJobId? }. Persists transcript to a newAgentSessionmodel.src/models/AgentSession.jsβ{ parentCompany, user, messages: [{ role, content, toolUses, toolResults, ts }], tokenUsage, createdJobIds }. Tenant-scoped, indexed onparentCompany + user.src/routes/api/v1/ai/agent.jsβPOST /messages,GET /sessions/:id. Middlewares:verifyToken,admin,verifyParent,requireFeature("aiAgent.enabled"), dedicatedaiAgentRateLimiter.src/routes/api/v1/system/ai-status.jsβGET /system/ai-status(no auth) returning{ globalEnabled }for L1+L2 only.src/routes/api/v1/admin/ai-agent/health.jsβ super-admin health endpoint.src/routes/api/v1/account/feature-flags.jsβ tenant-admin allowlisted PATCH.- Wire all routes in
src/routes/api/v1/index.js.
Touched files
src/models/Job.jsβ add optionalaiCreated: Boolean,aiCreatedAt: Date,pendingAiReview: Boolean(additive, backwards compatible).src/controllers/schedule/index.jsβ opt-in filter excludingpendingAiReview: truejobs from dispatcher schedule.src/services/config/default-configs/feature-flags.jsβ registeraiAgent.enabledwith{ lifecycle: "beta", defaultEnabled: false, description: "AI assistant for creating loads from natural language", tenantAdminToggleable: true }.src/services/config/default-configs/index.jsβ addaiAgentconfig block:{ model, monthlyTokenCap, perUserDailyMessageCap, monthlyCostCeilingUsd, allowedTools, defaultUndoWindowMs, dryRun }.src/middlewares/rateLimiting.jsβ addaiAgentLimiter(e.g., 30 req / 5 min per user, 500 req / day per tenant; values in config).src/middlewares/featureGates.js(new or existing) β addrequireAppType("trucking")andrequireGlobalAiEnabledmiddlewares..env.example,config/keys.js,config/index.jsβANTHROPIC_API_KEY,AI_AGENT_DEFAULT_MODEL,AI_AGENT_GLOBAL_ENABLED.
Tests (tests/controllers/ai/agent/, tests/services/ai/, tests/controllers/account/)
- Tool-call loop with mocked Anthropic client (single-leg, multi-leg, ambiguous client β asks for clarification).
- Tenancy: tools cannot return data from other
parentCompanyvalues. - Feature flag off β 403 on every agent route.
- L1 env var off β 503 before any DB hit.
- L2 runtime flag off β 503 within cache window.
- L4 industry gate β 403 for non-trucking tenants even with flag forced on.
- Rate limit triggers 429.
- Created Job carries
aiCreated: true,pendingAiReview: true, hidden from dispatcher schedule. - Undo:
DELETE /jobs/:idwithin window succeeds. - Tenant-admin toggle: PATCH /account/feature-flags as admin enables/disables
aiAgent.enabledand the change is reflected inGET /config. Same call with a non-toggleable flag β 403. Same call as a non-admin user β 403. Cross-tenant attempt blocked byverifyParent. - AI Activity endpoint: super-admin gets correct aggregates; non-super-admin gets 403; another tenant's data is never returned.
tests/services/ai/no-address-leak.test.jsβ runs every tool against address-bearing fixtures and asserts zero address-shaped strings reach the orchestrator-bound payload or the persisted AgentSession. CI gate forsrc/services/ai/**.- Address detector β positive tests for street suffixes, ZIP/postal, and lat/lng patterns returning
400 ADDRESS_DETECTED.
Service web changes (attunelogic-service)β
New files
src/redux/services/ai/agentApi.jsβ RTK Query slice withsendAgentMessagemutation,getAgentSessionquery,getAiSystemStatusquery. InvalidatesJobs/Scheduletags on successful job creation.src/components/AIAgent/AgentLauncher.jsxβ FAB. Stacks aboveChatLauncherusing the existing--right-sidebar-offsetCSS var. Hidden when L1/L2/L3/L4 disable the agent.src/components/AIAgent/AgentDrawer.jsxβ usesshared/Drawer. Sections: message list, tool-call status pills ("Looking up Acmeβ¦", "Checked Dallas locations"), composer, and the Undo banner.src/components/AIAgent/MessageList.jsx,InputBar.jsx,UndoBanner.jsx,ToolCallPill.jsx.- Optional v1.1:
src/components/AIAgent/JobPreviewCard.jsxβ clickable card linking to/jobs/:idafter creation. src/pages/SuperAdmin/FeatureFlags/AiSystemStatusPanel.jsxβ health endpoint output + panic button.src/components/Settings/AiAssistantCard.jsxβ tenant-admin on/off toggle.
Touched files
src/layouts/Dashboard/index.jsxβ mount<AgentLauncher />+<AgentDrawer />next to existingChatWidget/ChatLauncher.src/hooks/useConfig.tsxβ fixConfigProvideruseEffectto re-syncconfigswheneverconfigDatachanges.src/pages/SuperAdmin/FeatureFlags/index.tsxβ mount AI Activity widget + AI System Status panel.
Undo flow
- On agent response containing
createdJobId, drawer starts a 5-minute countdown banner with "Undo" β fires existinguseDeleteJobMutationand shows a confirmation toast. After expiry, banner converts to a passive "Created job #X β view" link.
No-address UX
- InputBar shows persistent helper text: "Refer to locations by name (e.g. 'Acme Dallas DC') β please don't paste addresses."
- If the API returns the 400
ADDRESS_DETECTEDerror code, show an inline error nudging the user to use a saved location name and offer a quick link to create one. - Tool-call pills render only redacted fields (name + city/state).
Phased build orderβ
Live todos for this section live in the implementation plan. This proposal lists the phase goals; the plan lists the per-task checkboxes you tick off as work lands.
The Anthropic data-policy review is a blocker for sending real customer data to the LLM, but it does NOT block any of the surrounding infrastructure. We build in 3 phases and only Phase 3 needs the policy decision.
Phase 1 β Infrastructure & kill switches (no LLM, no Anthropic key needed)β
Goal: a fully gated, observable, killable system before a single token is spent.
- L1 env-var gate (
AI_AGENT_GLOBAL_ENABLED) wired into config and middleware. - L2 system-runtime kill switch (Config doc
type: "system") + cache + super-admin endpoint to flip it. - L3 register
aiAgent.enabledin the feature-flag registry;tenantAdminToggleable: true. - L4 industry-gate middleware (
requireAppType("trucking")). aiAgentconfig block (model, monthlyTokenCap, perUserDailyMessageCap, monthlyCostCeilingUsd, defaultUndoWindowMs, dryRun).Job.aiCreated,Job.aiCreatedAt,Job.pendingAiReviewfields + schedule filter excluding pending-review jobs by default.- Tenant-admin
PATCH /account/feature-flagsendpoint with allowlist enforcement. GET /api/v1/system/ai-status(no auth β returns{ globalEnabled }only).GET /api/v1/admin/ai-agent/health(super-admin) β initially returns env/runtime status only.- Service: fix
ConfigProviderlive-refresh; add tenant settings "AI Assistant" toggle card; add Super-Admin "AI System Status" panel with panic button + AI Activity widget shell; launcher polls/system/ai-statusevery 60s.
Ship value: every kill switch in place and verifiable. The platform can guarantee "AI is off" before any AI exists.
Phase 2 β Tools & scaffolding with stubbed LLM (still no real Anthropic call)β
addressRedactorutility +tests/services/ai/no-address-leak.test.jsregression suite.- Inbound
addressDetectorregex guard at the route boundary returning400 ADDRESS_DETECTED. AgentSessionmodel + audit storage (PII-trimmed tool results β redacted shape only).- Scoped tools (
searchClients,searchLocations,createDraftJob) with full unit tests including cross-tenant isolation tests AND no-address-leak snapshot tests. aiAgentLimiter(express-rate-limit), iteration cap, token cap, dry-run mode plumbing increateDraftJob.- Agent route + controller with a stubbed LLM provider (returns canned tool-call sequences). End-to-end test: stubbed LLM emits "call createDraftJob with these IDs" β real Job is created with
aiCreated: true,pendingAiReview: true, no driver, hidden from schedule by default. - AI agent drawer/launcher UI shell wired to the route, gated behind L1+L2+L3+L4.
- Undo flow end-to-end (UI banner + DELETE call).
- Circuit-breaker stub: track call counts and error rates against the stubbed provider so we can unit-test the auto-trip logic.
Ship value: entire system testable, killable, observable, and reviewable without any real LLM call.
Phase 3 β Wire Anthropic (requires data-policy decision + API key)β
- Implement
src/services/ai/anthropic.js(fail-closed whenANTHROPIC_API_KEYmissing β 503). - Real tool-use loop in
runAgentwith iteration cap and token accounting. - Tenant-aware system prompt with timezone, today's date, and "always confirm ambiguous matches" rule.
- Cost estimation against
pricing.jstable; enforce per-tenant monthly $ ceiling and platform 24h ceiling; circuit breaker flips L2 on threshold breach. - Wire health endpoint to report
anthropicReachable,lastSuccessfulCallAt, real error rates, MTD cost. - Internal smoke test in dry-run mode against a test tenant. Then internal employee tenant with real writes. Then 1 friendly external pilot tenant.
Pre-release safety checklist (run before any production tenant is enabled)β
Every box must be green. This is not optional.
Kill-switch verification (end-to-end, in alpha)
- L1 env var = false β API returns 503; service launcher hidden within 60s
- L2 runtime flag flipped via panic button β API returns 503; cache busts within 30s; launcher hidden
- L3 tenant flag off β API returns 403; launcher hidden for that tenant only
- L4 industry gate β service-repair tenant with flag forced on still gets 403
- Restart all API instances with
AI_AGENT_GLOBAL_ENABLED=falseβ smoke check that no agent traffic succeeds
Cost & rate controls
- Per-user daily message cap tested (429 after threshold)
- Per-tenant monthly token cap tested (graceful rejection)
- Per-tenant monthly $ ceiling tested (auto-disables that tenant only)
- Platform 24h $ ceiling tested (trips L2 + alert)
- Error-rate circuit breaker tested (forced 50% error rate trips L2 + alert)
- Manual recovery from circuit-breaker trip works (super-admin flips L2 back)
Data & write safety
- Cross-tenant isolation: tools never return data from other
parentCompany(covered by tests) - LLM cannot invent IDs (createDraftJob rejects IDs not produced by recent search tool calls in same session)
- AI-created jobs are flagged and hidden from dispatcher schedule by default
- No
leg.driveris ever set by the agent (verified by test) - Undo within 5 min cleanly deletes the job; AgentSession records the undo
- Dry-run mode in production tenant does NOT create any Job
No-address guarantee (PII minimization)
-
no-address-leak.test.jsis green and wired into CI forsrc/services/ai/**paths - Manual proxy-trace of a real session in alpha confirms outbound Anthropic payload contains zero address-shaped strings
- Inbound address detector rejects pasted addresses with
400 ADDRESS_DETECTED(positive test for street, ZIP, and lat/lng patterns) - AgentSession docs in alpha verified to contain only redacted location shapes
- InputBar helper text + inline error nudge render correctly
- Anthropic data-policy review signed off for the narrowed scope (names + city/state only β no addresses)
Observability
- Sentry receives breadcrumbs for tool loop iterations and provider errors
- AI Activity widget shows correct MTD usage for a test tenant after a session
- Logs verified PII-free (no full prompts, no full client lists, no tokens, no addresses at info level)
- Health endpoint returns sane values from a freshly restarted instance
- Alert email actually delivers when circuit breaker trips (test in alpha)
Process
- Runbook published in
attunelogic-docs/docs/operations/ai-agent-runbook.md - On-call rotation knows about the panic button and how to use it
- First pilot tenant has been briefed and given the Undo + report-issue paths
- Per-tenant
monthlyCostCeilingUsdset conservatively for pilot (e.g. $25) - Platform
dailyCostCeilingUsdset (e.g. $200 for pilot, raise as adoption grows)
Pilot rollout plan (gated by checklist above)β
- Internal alpha (employees only): dry-run mode on a test tenant; verify tool selection quality on 50+ real-world prompts.
- Internal beta (employees only): dry-run off, real writes, hard $5/day ceiling, ~1 week.
- Friendly external pilot (1 tenant): real writes, $25/month ceiling, daily check-in for first week.
- Expanded pilot (3-5 tenants): real writes, monitor AI Activity widget across all of them.
- GA flip: lifecycle
beta β gain registry; tier defaults can opt in tier3+ tenants by default. Tenant admins can self-serve from there.
Any pilot tenant can be cut at any moment via L3 (super-admin Force Off). All pilot tenants can be cut simultaneously via L2 (panic button). The whole platform can be cut via L1 (env redeploy).
Out of scope for v1 (called out for follow-ups)β
- Service/Repair industry support (separate tool set + system prompt).
- Read-only Q&A ("show me this week's loads") and updates/reschedules.
- Mobile assistant.
- Streaming responses (v1 returns full reply when tool loop ends; v1.1 can switch to SSE).
- Document/email ingestion ("create jobs from this attached PDF") β the
extractedDatapipeline already supports it; we'd add aningestDocumenttool later. - Net-new Location creation by the agent (would require LLM to handle addresses; deliberately deferred).
- Driver assignment by the agent (would trigger notifications; humans dispatch after approval).
Open / pending external decisionsβ
- Anthropic data-policy review (blocks Phase 3 only): scope narrowed to client names + location names + city/state strings β no addresses, lat/lng, postal codes, phone, or email. Pending stakeholder sign-off.
- Anthropic API key: not yet created; build will wire env var and a fail-closed
503when missing so Phase 1/2 can ship without it. - Per-tenant monthly $ ceiling default: starting recommendation is
$25/monthfor pilots β easy to raise per tenant via super-admin once we see real usage patterns. Confirm before Phase 3. - Platform 24h $ ceiling: starting recommendation
$200/day. Confirm before Phase 3. - Default model: starting recommendation
claude-sonnet-4-...(latest). Confirm before Phase 3.
Branchesβ
- API:
feature/ai-agent-trucking-v1(already created) - Service:
feature/ai-agent-trucking-v1(to create when Phase 2 begins) - Promotion:
feature/* β beta β alpha β mainper44-release-branch-policy
Related workβ
- Implementation plan with checkbox-trackable todos:
docs/plans/ai-agent-trucking-v1.md - Operational runbook (to be authored in Phase 1):
docs/operations/ai-agent-runbook.md - Local Cursor plan (live todo state, not committed):
~/.cursor/plans/ai_agent_trucking_v1_*.plan.md - Existing Job extraction pipeline:
attunelogic-api/docs/JOB_EXTRACTION_API.mdandattunelogic-api/src/controllers/jobs/create.js#handleExtractedJobCreate - Feature flag system:
attunelogic-api/src/services/feature-flags/resolveFeatureFlags.js,attunelogic-service/src/pages/SuperAdmin/FeatureFlags/index.tsx - Cross-repo branch policy:
44-release-branch-policy