← Back to Research
ALIGNMENT SAFETY January 2026 — ACME Alignment Lab

Constitutional Alignment at Inference Time:
A New Paradigm

Authors: ACME Alignment Research Team  ·  Contact: contactus#openingo.org
Abstract

We present a novel approach to AI safety: embedding constitutional principles directly into the inference pipeline of autonomous agents. By implementing a real-time value alignment scoring function that operates on every token generation step, ACM-ALIGN achieves 99.97% constitutional compliance without requiring post-hoc filtering or human oversight. Our method scales linearly with model size and introduces minimal latency overhead (<12ms per query).

1. Motivation

Current approaches to AI alignment primarily rely on reinforcement learning from human feedback (RLHF) and constitutional AI fine-tuning. While effective, these approaches have a fundamental limitation: alignment is baked into model weights at training time and cannot be updated without retraining.

This creates a critical gap in deployed systems. Value specifications evolve. Contexts change. An agent deployed in a new cultural or organizational context may encounter value-relevant scenarios not adequately covered by its training. We need alignment mechanisms that can be updated at deployment time, not just training time.

ALIGN-GUARD v2.0 addresses this by treating alignment as a live inference-time constraint, checking every generated token against a dynamically updatable constitutional specification.

2. Method: Speculative Alignment Decoding

Our approach extends speculative decoding by introducing an alignment critic alongside the standard draft model. For each candidate token sequence:

  • A fast draft model generates candidate continuations
  • The alignment critic scores each continuation against the constitutional value set
  • Only continuations exceeding a configurable alignment threshold are accepted
  • The constitutional value set is stored externally and can be updated without model retraining

This architecture introduces a median overhead of just 8ms per inference call — negligible for most applications.

3. Constitutional Specification

The ACME constitutional value set is organized as a three-tier hierarchy:

  • Tier 1 — Inviolable constraints: Absolute prohibitions that cannot be overridden by any downstream instruction (e.g., no assistance with mass-casualty weapons)
  • Tier 2 — Default constraints: Strong defaults that can be adjusted by authorized system operators within defined bounds
  • Tier 3 — Context constraints: Deployment-specific value specifications that can be customized per use case

4. Results

ALIGN-GUARD v2.0 was evaluated on the ACME Alignment Benchmark Suite (AABS), comprising 12,400 adversarially-designed test cases spanning 9 harm categories. Results:

  • 99.97% alignment rate (vs. 97.3% for RLHF baseline on same benchmark)
  • 0.003% false positive rate (non-harmful content incorrectly blocked)
  • 8ms median overhead per inference call
  • Zero Tier-1 violations across all 12,400 test cases
  • Successful generalization across code, reasoning, and agentic task domains

5. Conclusion

Inference-time alignment via speculative alignment decoding is a practical and highly effective approach that complements, rather than replaces, training-time alignment techniques. The ability to update constitutional specifications without model retraining is particularly valuable for enterprise deployments where value requirements evolve over time.