AI Behavioral Contract Compilation

THE PRINCIPLE

A tool's description, as visible to an agent, is not documentation explaining the tool to a developer reader. It is a compiled behavioral artifact that determines how the agent will use the tool at runtime. The text inside description: is the agent's specification, not the developer's reference.

When a tool description is treated as documentation, it tends to be vague, optimistic, and incomplete. The agent reads it and produces non-deterministic behavior — sometimes correct, sometimes wrong, almost never auditable. When a tool description is treated as a behavioral contract, it tends to be specific, conservative, and complete. The agent reads it and produces predictable behavior because the contract closed the ambiguity.

The difference between the two postures is the rate at which agentic systems require manual intervention to keep them on track.

FIVE REQUIRED ELEMENTS

PER-TOOL DESCRIPTION CHECKLIST

ELEMENT 01

What this tool DOES (one sentence, present tense)

Direct statement of the operation. Not "this tool can help you..." but "Returns the list of...", "Writes a record to...", "Sends an email via...". The agent maps from intent to tool by matching this sentence.

ELEMENT 02

When to use it (explicit triggers)

Conditions under which this is the right tool. "Use when the user asks for X." "Use after Y." Specifying the trigger reduces wrong-tool selection. If two tools have overlapping triggers, that overlap is itself a design defect that must be resolved.

ELEMENT 03

When NOT to use it (explicit anti-triggers)

Conditions under which a different tool or no tool at all is the right answer. "Do not use when the data is already in context." "Do not use for read-only operations." The anti-triggers prevent the most common misuse patterns.

ELEMENT 04

Argument semantics (what each argument means in this tool's context)

Not just type but meaning. "scope: must be the org slug, not the team slug." Argument semantics are where agents go wrong most often when the description is thin.

ELEMENT 05

Failure modes and observable signals

What it looks like when the tool returns an error vs. when it returns success-with-empty-result. The agent's recovery behavior depends on distinguishing these. Vague descriptions cause the agent to treat empty as failure or failure as empty — both wrong, both common.

WORKED EXAMPLE

Documentation-style (insufficient)

{
  name: "send_message",
  description: "Sends a message to a channel"
}

Behavioral-contract-style (sufficient)

{
  name: "send_message",
  description: "Sends a message to a Slack channel.

  USE when the user explicitly asks to post or announce
  something in Slack. USE after the user has approved
  the exact message text in this session.

  DO NOT USE for drafting (use draft_message instead).
  DO NOT USE if the channel name is ambiguous; ask first.
  DO NOT USE for messages containing credentials or PII.

  ARGUMENT SEMANTICS:
    channel: must be channel ID (C0...), not channel name.
      If the user provides a channel name, resolve it with
      lookup_channel first.
    text: the exact text approved in this session. Markdown
      is rendered. Tagging with @channel requires a
      separate approval flag.

  FAILURE MODES:
    - Returns {error: 'channel_not_found'} if channel ID
      invalid. Recovery: do not retry; ask user to verify.
    - Returns {ok: true, ts: '...'} on success. The ts
      value is the message timestamp; persist it if you
      may need to delete or edit the message later.
    - Empty text is rejected at API level, not silently."
}

The second version produces predictable behavior because every ambiguity an agent would face at runtime is resolved at compile time — in the description itself.

ANTI-PATTERNS

Marketing copy in the description. "Easily send messages." "Powerful API for channel communication." The agent does not read marketing.
Cross-referencing external docs. "See the Slack API docs for details." The agent cannot follow that link. The detail must be in the description.
Aspirational descriptions. "Can handle complex message threading." If the tool cannot actually do that yet, the description is a fabrication source.
Single-line descriptions for multi-arg tools. A tool with five arguments needs description content for each argument's semantics, not a generic one-liner.

COMPILATION DISCIPLINE

Treat the tool definition file the way a compiler treats source code: every change is a behavioral contract change. Review tool descriptions in pull requests with the same rigor as production code. A change to a tool description is a change to the agent's runtime behavior — even if no other code changed.

Versioning tool descriptions independently of tool code is sometimes appropriate. The behavior of the agent might change because the description got more specific, even though the underlying API did not change.