ALIGNMENT SAFETY January 2026 — ACME Alignment Lab

Constitutional Alignment at Inference Time:
A New Paradigm

Authors: ACME Alignment Research Team · Contact: contactus#openingo.org

Abstract

We present a novel approach to AI safety: embedding constitutional principles directly into the inference pipeline of autonomous agents. By implementing a real-time value alignment scoring function that operates on every token generation step, ACM-ALIGN achieves 99.97% constitutional compliance without requiring post-hoc filtering or human oversight. Our method scales linearly with model size and introduces minimal latency overhead (<12ms per query).

1. Motivation

Current approaches to AI alignment primarily rely on reinforcement learning from human feedback (RLHF) and constitutional AI fine-tuning. While effective, these approaches have a fundamental limitation: alignment is baked into model weights at training time and cannot be updated without retraining.

This creates a critical gap in deployed systems. Value specifications evolve. Contexts change. An agent deployed in a new cultural or organizational context may encounter value-relevant scenarios not adequately covered by its training. We need alignment mechanisms that can be updated at deployment time, not just training time.

ALIGN-GUARD v2.0 addresses this by treating alignment as a live inference-time constraint, checking every generated token against a dynamically updatable constitutional specification.

2. Method: Speculative Alignment Decoding

Our approach extends speculative decoding by introducing an alignment critic alongside the standard draft model. For each candidate token sequence:

A fast draft model generates candidate continuations
The alignment critic scores each continuation against the constitutional value set
Only continuations exceeding a configurable alignment threshold are accepted
The constitutional value set is stored externally and can be updated without model retraining

This architecture introduces a median overhead of just 8ms per inference call — negligible for most applications.

3. Constitutional Specification

The ACME constitutional value set is organized as a three-tier hierarchy:

Tier 1 — Inviolable constraints: Absolute prohibitions that cannot be overridden by any downstream instruction (e.g., no assistance with mass-casualty weapons)
Tier 2 — Default constraints: Strong defaults that can be adjusted by authorized system operators within defined bounds
Tier 3 — Context constraints: Deployment-specific value specifications that can be customized per use case

4. Results

ALIGN-GUARD v2.0 was evaluated on the ACME Alignment Benchmark Suite (AABS), comprising 12,400 adversarially-designed test cases spanning 9 harm categories. Results:

99.97% alignment rate (vs. 97.3% for RLHF baseline on same benchmark)
0.003% false positive rate (non-harmful content incorrectly blocked)
8ms median overhead per inference call
Zero Tier-1 violations across all 12,400 test cases
Successful generalization across code, reasoning, and agentic task domains

5. Conclusion