Conditions, Restarts, and the Agent That Chooses

A companion to my friend Chaitanya Gupta’s excellent Common Lisp: A Tutorial on Conditions and Restarts

Common Lisp’s condition system separates signalling an error from handling it. The code that detects a problem does not decide what to do about it. It establishes restarts — named recovery options — and signals a condition. A handler higher up the call stack selects a restart.

Restarts can be selected interactively from the debugger or programmatically via handler-bind and invoke-restart. The original tutorial demonstrates both: a human selecting restarts from the debugger menu, and list-csv-errors selecting them in code. The programmatic path works, but it operates on restart names and error message strings. The handler must know, at compile time, which restart to pick for which situation. It cannot reason about the system’s goals, weigh tradeoffs between recovery options, or adapt its strategy to context it was not explicitly programmed for.

This limitation is invisible when the handler is a handler-bind form written by the same programmer who wrote the restarts. It becomes a problem when the handler is an LLM-based agent.

But what if the handler is an agent?

An agent monitoring a data pipeline encounters a validation error at 3 AM. It sees:

ERROR:
  message: "URL invalid."
  value:   "not-a-url"

AVAILABLE RESTARTS:
  CONTINUE-NEXT-FIELD  — "Continue validation on next field."
  CONTINUE-NEXT-ROW    — "Continue validation on next row."
  RETRY-FILE           — "Retry validating the file /data/input.csv."

Three recovery options. No context for choosing between them. The agent does not know:

What the system is trying to achieve. Is data integrity the priority, or is completing the full pass more important?
What failed and why. Is this a format error or a structural problem? Does it affect one cell or the entire row?
Why these specific restarts exist. What design rationale led to offering field-level, row-level, and file-level recovery?
What each choice costs. Does skipping this field violate a system goal?

The condition system provides the mechanism for recovery. What the agent lacks is the semantic context for making an informed choice.

Now consider what the agent sees when the same system is built with structured intent metadata:

ERROR:
  message: "URL invalid."
  value:   "not-a-url"
  line:    3

GOALS:
  :DATA-INTEGRITY        — "Every field value conforms to its header's
                            type and format rules"
  :GRACEFUL-RECOVERY     — "Validation failures are recoverable at field,
                            row, and file granularity"
  :PROGRAMMATIC-COLLECTION — "Callers can collect all errors without
                              entering the debugger"

AVAILABLE RESTARTS:
  CONTINUE-NEXT-FIELD  — "Continue validation on next field."
  CONTINUE-NEXT-ROW    — "Continue validation on next row."
  RETRY-FILE           — "Retry validating the file /data/input.csv."

DESIGN RATIONALE:
  chose: "Three-level restart hierarchy: field, row, file"
  over:  ("Single abort-all restart" "Per-field restarts only")
  because: "Field-level lets callers skip bad cells while validating
            remaining fields. Row-level lets callers skip structurally
            malformed rows. File-level lets callers fix the source and
            retry. Each level enables a different recovery strategy
            without dictating which one to use."

Same restarts. Profoundly different decision-making surface. The agent can now map recovery options to system goals, understand the designer’s rationale, and select a strategy that serves the right objective.

The remainder of this article shows how to produce that second view.

Intent makes restarts agent-legible

Telos is a Common Lisp library for intent introspection. It captures the why behind code — purpose, goals, failure modes, design decisions — and makes it queryable at runtime. Combined with conditions and restarts, it produces machine-readable recovery context.

The following sections rebuild the CSV validator from the original tutorial with telos. The validation logic and restart architecture are unchanged. What changes is the metadata layer.

The feature hierarchy

Before writing any functions, declare what the system is for:

(deffeature csv-validation
  :purpose "Validate CSV files against header-defined field schemas
            with recoverable error handling"
  :goals ((:data-integrity
            "Every field value conforms to its header's type and format rules")
          (:graceful-recovery
            "Validation failures are recoverable at field, row,
             and file granularity")
          (:programmatic-collection
            "Callers can collect all errors without entering the debugger"))
  :decisions ((:id :layered-restarts
               :chose "Three-level restart hierarchy: field, row, file"
               :over ("Single abort-all restart"
                      "Per-field restarts only")
               :because "Field-level lets callers skip bad cells while
                         validating remaining fields. Row-level lets callers
                         skip structurally malformed rows. File-level lets
                         callers fix the source and retry. Each level enables
                         a different recovery strategy without dictating
                         which one to use.")))

(deffeature field-validation
  :purpose "Validate individual CSV field values against
            header-specific format rules"
  :belongs-to csv-validation
  :goals ((:field-correct
            "Each field value satisfies its header's validation predicate"))
  :failure-modes ((:unknown-header
                    "Header name has no associated validation rule"
                    :violates :data-integrity)
                  (:format-mismatch
                    "Field value does not match expected format"
                    :violates :field-correct)))

(deffeature error-recovery
  :purpose "Provide structured restart points for non-local recovery
            from validation errors"
  :belongs-to csv-validation
  :goals ((:restart-availability
            "Every validation error has at least one applicable restart")
          (:caller-autonomy
            "The handler--human or agent--chooses recovery strategy,
             not the validator")))

Three properties worth noting:

Goals are named. :data-integrity, :graceful-recovery, :programmatic-collection. An agent references these by ID, not by parsing English.
Failure modes link to goals. :format-mismatch violates :field-correct. The agent knows which objective breaks when this error occurs.
Design decisions are recorded. The layered restart strategy was chosen over alternatives, for stated reasons. An agent can read this rationale before selecting a restart.

The condition with intent

(define-condition/i csv-error (error)
  ((message ...)
   (value ...)
   (line-number ...))
  (:feature csv-validation)
  (:role "Unified signal type for all validation failures,
          carrying enough context for recovery decisions")
  (:purpose "Bridge between low-level validation checks
             and high-level recovery handlers"))

The condition now carries declared intent. An agent encountering a csv-error can query telos for its purpose, its parent feature, and that feature’s goals.

Validators with failure modes

Each validator declares what can go wrong:

(defun/i validate-url (string)
  "The URL of the page; should start with http:// or https://."
  (:feature field-validation)
  (:role "Enforce URL protocol prefix requirement")
  (:failure-modes ((:no-protocol
                     "URL missing http:// or https:// prefix"
                     :violates :field-correct)))
  (unless (cl-ppcre:scan "^https?://" string)
    (csv-error "URL invalid." :value string)))

The validation logic is identical to the original. The addition is metadata: this function belongs to field-validation, plays a specific role, and its failure mode (:no-protocol) violates a specific goal (:field-correct).

The validator and restart code is unchanged from the original tutorial — validate-csv establishes retry-file, continue-next-row, and continue-next-field restarts exactly as before. The full working code is in examples/csv-validator.lisp.

Building recovery context

The function that assembles the structured view shown earlier:

(defun/i build-recovery-context (condition)
  "Build structured context an LLM agent needs to choose a restart."
  (:feature agent-recovery)
  (:role "Transform runtime condition + telos metadata into
          agent-legible recovery context")
  (let* ((restarts (compute-restarts condition))
         (feature-intent (feature-intent 'csv-validation))
         (goals (intent-goals feature-intent))
         (decisions (list-decisions 'csv-validation)))
    (list
     :error (list :message (csv-error-message condition)
                  :value (csv-error-value condition)
                  :line (csv-error-line-number condition))
     :goals goals
     :available-restarts
     (loop for r in restarts
           for name = (restart-name r)
           when (member name '(continue-next-field
                               continue-next-row
                               retry-file))
           collect (list :name name
                         :description (princ-to-string r)))
     :design-rationale
     (mapcar (lambda (d)
               (list :chose (decision-chose d)
                     :over (decision-over d)
                     :because (decision-because d)))
             decisions))))

This function combines three sources: the runtime condition (what happened), compute-restarts (what recovery options exist), and telos queries (why things are the way they are). The result is the structured plist shown in the agent’s view above.

Agent as handler

With structured context, the agent can implement recovery strategies mapped to system goals:

(defun/i agent-handle-csv-error (condition &key (strategy :collect-all))
  "Agent-style handler that chooses restarts based on
   structured intent context."
  (:feature agent-recovery)
  (:role "Choose restarts based on structured intent context")
  (ecase strategy
    ;; Audit mode: skip individual fields, keep validating everything.
    ;; Serves goal :PROGRAMMATIC-COLLECTION.
    (:collect-all
     (let ((restart (find-restart 'continue-next-field)))
       (when restart (invoke-restart restart))))

    ;; Pipeline mode: one bad field taints the whole row.
    ;; Serves goal :DATA-INTEGRITY (strict interpretation).
    (:strict-rows
     (let ((restart (find-restart 'continue-next-row)))
       (when restart (invoke-restart restart))))

    ;; Interactive mode: log problems for agent reasoning.
    ;; Serves goal :GRACEFUL-RECOVERY.
    (:fix-and-retry
     ...)))

Each strategy is a different interpretation of the system’s goals:

Goal	Strategy	Restart chosen
`:programmatic-collection`	`:collect-all`	`continue-next-field`
`:data-integrity`	`:strict-rows`	`continue-next-row`
`:graceful-recovery`	`:fix-and-retry`	`continue-next-field` (after logging)

In strict-rows mode, the agent sees one error per bad row instead of four:

(let ((errors nil))
  (handler-bind
      ((csv-error (lambda (c)
                    (push c errors)
                    (agent-handle-csv-error c :strategy :strict-rows))))
    (validate-csv "/tmp/test.csv"))
  (length errors))
;; => 1  (vs. 4 in audit mode)

The row had four invalid fields. In audit mode the agent records all four. In strict-rows mode it records the first and discards the row. Same validator, same restarts — different handler strategy, informed by declared goals.

In a production system, the strategy itself need not be static. An LLM agent with access to build-recovery-context can reason over goals and rationale at runtime, selecting or combining strategies based on the operational context: audit during batch imports, strict during pipeline runs, interactive when a human is available for escalation.

The broader implication

Common Lisp’s condition system embodies a principle that predates the current generation of AI by three decades: the code that detects a problem should not decide how to recover from it. That decision belongs to whoever has the broader context.

The condition/restart protocol already provides the right architecture for agentic error recovery. It separates mechanism from policy. It supports multiple simultaneous recovery options without the signaller knowing which will be chosen. It allows the handler to be arbitrarily far — in code, in time, in understanding — from the signaller.

What it lacks is structured context for the handler. Programmatic handlers work when the programmer hardcodes the right restart for each case. Agent handlers need more: they need to understand the system’s objectives, the failure taxonomy, and the designer’s rationale for the recovery architecture.

Telos provides this. It does not replace conditions and restarts — it makes them agent-legible.

(intent-chain 'validate-url)
;; =>
;; FUNCTION validate-url
;;   role: "Enforce URL protocol prefix requirement"
;;   failure-modes: ((:NO-PROTOCOL ...))
;; FEATURE field-validation
;;   purpose: "Validate individual CSV field values..."
;; FEATURE csv-validation
;;   purpose: "Validate CSV files against header-defined
;;             field schemas with recoverable error handling"

One query. The agent traces from a specific function through the feature hierarchy to the system’s purpose. It knows what failed, what feature the failure belongs to, and what the system is trying to achieve. That is the context required to choose a restart intelligently.

The condition system gave us the right abstraction 35 years ago. Agents are the handlers it was waiting for.

Full working code: examples/csv-validator.lisp

Telos — intent introspection for Common Lisp

quasiLabs Blog

Stories from the code-mines