c8volt ops repair incident

The Problem

Incident repair is rarely one API call. Operators often need to inspect the incident, update variables on the affected process-instance scope, restore retries or timeout on the related job when one exists, resolve the incident, confirm it cleared, and keep an audit trail of every decision.

The Promise

c8volt ops repair incident turns that remediation chain into a fixed-target workflow. It accepts explicit incident keys, stdin keys, or incident filters, freezes the incident set before mutation, plans variable, job, and resolution steps, then reports what was planned, skipped, submitted, confirmed, or failed.

Search mode pages through all matching incidents by default. --batch-size controls the per-page request size only; --limit is the explicit cap for the frozen repair scope. Human output, JSON output, and Markdown reports show whether discovery completed or was user-limited.

Use When

  • repairing one known incident key from incident-aware process-instance output
  • repairing a small filtered set of active incidents after previewing the match
  • restoring job retries or timeout before resolving an incident
  • setting process-instance-scope variables once per unique scope before dependent incident resolution
  • producing a Markdown or JSON audit report for operator handoff

Command At A Glance

# read-only: preview one known incident repair
c8volt ops repair incident --key <incident-key> --dry-run

# read-only: preview a bounded filtered repair scope
c8volt ops repair incident --state active --error-type io_mapping_error --limit 5 --dry-run

# destructive: updates requested repair data and resolves the selected incident after confirmation
c8volt ops repair incident --key <incident-key> --vars '{"hasIncident":false}' --report-file repair-incident.md

Built From Lower-Level Commands

This is the conceptual flow. The implemented command calls c8volt services directly and freezes the target set before mutation.

c8volt get incident --key <incident-key>
c8volt get incident <incident-filters>
c8volt update pi --key <process-instance-key> --vars <json>
c8volt update job --key <job-key> --retries <count>
c8volt update job --key <job-key> --timeout <duration>
c8volt resolve incident --key <incident-key>

Keyed mode and search mode are mutually exclusive. --key and stdin - select incident keys. Search mode uses incident filters such as --state, --error-type, --error-message, --bpmn-process-id, --pi-key, --root-key, --element-id, --element-instance-key, creation-time bounds, --batch-size, and --limit. --batch-size changes page size only and does not cap how many matching incidents are frozen. Use --limit N when the repair scope should intentionally stop after N matching incidents. When --bpmn-process-id is set, c8volt validates the visible process-definition selector before incident discovery so a typo or invisible definition is not treated as an empty incident set.

Workflow

read incident keys or search filters
        |
        v
discover and freeze incident targets
        |
        v
dedupe process-instance variable scopes
        |
        v
plan variable, job, resolution, and confirmation steps
        |
        +--> --dry-run: report plan, mutate nothing
        |
        v
confirm, auto-confirm, or automation-confirm
        |
        v
update requested variables once per scope
        |
        v
update related job retries and timeout where applicable
        |
        v
resolve each incident
        |
        v
confirm clearance unless --no-wait is set
        |
        v
write optional audit report

Dry Run

--dry-run resolves the target set and builds the full repair plan without updating variables, changing jobs, or resolving incidents. Human output emphasizes incident count, the repair preview, related job count, variable scope count, and mixed job coverage when some but not all incidents have related jobs.

Search-mode dry-run output includes discovery user-limited when --limit stops discovery. Normal completed paging is shown only with --verbose.

Verbose output can list normal completed discovery paging, frozen incident keys, process-instance keys, job keys, and planned variable scopes.

Real Execution

Without --dry-run, interactive runs first execute the same plan as a preflight and ask for confirmation. --auto-confirm and --automation allow supported unattended repair.

When --vars or --vars-file is supplied, the variables are applied once per unique process-instance scope before incident resolution. If a variable update fails for a scope, dependent incident resolution is blocked for that scope. Job retry and timeout steps run only when the incident has a related job; incidents without related jobs still proceed to incident resolution and are detailed in verbose output or mixed job-coverage summaries.

Reports

Reports use schema version ops.repair.v1. They include command metadata, discovery mode, incident filters, frozen incident and process-instance keys, variable scopes, job applicability, per-incident plan statuses, remaining active incidents when checked, notices, errors, automation flags, timestamps, duration, and final outcome.

Report format is inferred from --report-file unless --report-format markdown|json is supplied.

Demo

The VHS source is demos/vhs/ops-repair-incident.tape.

c8volt ops repair incident --state active --limit 1 --dry-run
c8volt ops repair incident --state active --limit 1 --report-file repair-incident.md

Failure And Safety Notes

  • --key means incident key; it cannot be combined with search filters.
  • Stdin - can be combined with repeated --key, but not with search filters.
  • --vars and --vars-file are mutually exclusive and must contain a JSON object.
  • --retries 0 skips retry restoration; it does not set retries to zero.
  • Existing report files are preserved for dry-run and preflight-style report planning.
  • --json cannot be combined with --verbose for this command.