Quantum Software Stacks Are a Fragmented Mess. Enter openQSE.

https://arxiv.org/pdf/2604.20912

dev/brief — quantum frontlineQF

      DEEP DIVE
      openQSE
      HPC SOFTWARE STACK
    
      >
      tldr
      A consortium surveyed 9 quantum-HPC software stacks and published a reference architecture called openQSE that defines shared interface boundaries without replacing existing SDKs.
    
      >
      significance
      Every HPC site integrating a QPU today rebuilds the same scheduling, resource description, and runtime plumbing from scratch; openQSE's layer model is the first cross-vendor attempt to standardize those boundaries.

      >
      dev relevance
      OpenQASM 3, Qiskit, CUDA-Q, and Slurm are the usable common surfaces right now, but the broader openQSE interface contracts are still conceptual with no implementation timeline confirmed.
    
      >
      _

If you have tried to wire a quantum processor into an HPC workflow, you have probably rebuilt the same plumbing more than once. Job submission logic that works with IBM's Qiskit runtime fails against an IonQ endpoint. A Slurm plugin tuned for one provider needs a complete rewrite for the next. Data paths, auth models, compilation stages, etc. is reimplemented per vendor.

That fragmentation is the subject of a new preprint from a broad consortium of national labs, hardware vendors, and academic groups, including Oak Ridge National Laboratory, Lawrence Berkeley National Laboratory, Munich Quantum Software Company, Technical University of Munich, Quantinuum, IonQ, Qblox, AMD, AWS, and more. The paper, "Quantum-HPC Software Stacks and the openQSE Reference Architecture", surveys nine production quantum-HPC stacks and proposes a common architectural blueprint called openQSE (open Quantum-HPC Software Ecosystem). The goal is to create a shared set of interface boundaries that let existing stacks interoperate without requiring any vendor to rebuild their internals.

The Scheduling Assumptions That Break When You Add a QPU

QPUs do not scale like classical HPC resources.

Classical HPC adds capacity incrementally. Cores, nodes, GPUs can be fractionally allocated and preempted. QPUs cannot. Qubit counts grow through hardware improvement. Quantum information cannot be copied, which means QPU jobs cannot be checkpointed or preemptively scheduled. A QPU is allocated whole or not at all.

This asymmetry means that bolting a QPU onto an HPC cluster using the same resource management logic as a GPU partition will produce idle HPC nodes, stalled workflows, and unpredictable utilization.

Four Axes for Evaluating Any QHPC Stack

The authors assess all nine stacks along four dimensions. These are worth internalizing because they map directly onto deployment decisions faced now:

Deployment Environment
Cloud, on-premises, or federated (cross-site). Each imposes different latency budgets, ownership boundaries, and scheduler semantics. A stack that assumes cloud-managed containers will need significant rework to run in a tightly-coupled HPC environment, and vice versa.
Hardware Modality
Superconducting qubits (IBM, Quantum Brilliance), trapped ions (IonQ, Quantinuum), neutral atoms (Pasqal), and photonic systems (Xanadu) each have fundamentally different timing, control, and error profiles. A compiler pass that works on a superconducting device will not port cleanly to a neutral-atom system. The paper argues that abstraction layers must isolate modality-specific details from application code, which is something most current stacks handle internally and inconsistently.
Application Interaction Pattern
The survey identifies two meta-patterns for how applications call into quantum resources:
- Workflow-centric: Classical and quantum stages are separate, coordinated by a scheduler or workflow engine. Data and control exchange only at stage boundaries. This model maps well to remote or cloud QPU access and is the dominant pattern today.
- Accelerator-centric: A classical HPC program dispatches quantum kernels on demand, analogous to GPU offload. This requires co-allocation, tight coupling, low latency, and often mid-circuit feedback. It is more demanding but enables closed-loop hybrid execution.
Quantum Computing Era
Current workloads tolerate loose timing . A variational algorithm running a VQE loop is performance-sensitive but not microsecond-critical. Eventual fault-tolerant quantum computing will be different. Syndrome readout and error correction must happen within the qubit coherence time, introducing hard real-time constraints at the software layer. The paper separates these into distinct timing domains (physical, deterministic, real-time, application) and designs the architecture so that FTQC requirements do not destabilize existing current application interfaces.

If you are writing hybrid workflows today, axes 1 and 3 are immediately actionable. Knowing whether your QPU access is workflow-centric or accelerator-centric will tell you which parts of openQSE matter to you now. Axes 2 and 4 are forward-looking, but worth tracking as hardware diversity and FTQC timelines evolve.

The Nine Stacks

The survey covers AWS Braket, IBM Quantum, IonQ, JHPC-Quantum (RIKEN/Softbank/Tokyo), the Munich Quantum Software Stack (MQSS), Pasqal, Quantinuum, Quantum Brilliance, and Xanadu. The table below summarizes key integration characteristics:

Table: Key interface characteristics across nine production QHPC stacks (April 2026). OpenQASM 3 is the only interface with universal adoption. QDMI and QRMI remain early-stage but are gaining traction.

OpenQASM 3, Qiskit, CUDA-Q, and Slurm Are the Common Ground

OpenQASM 3 is the closest thing to a universal IR across the surveyed stacks; all nine support it. Qiskit and CUDA-Q also appear across almost every stack, reflecting the de facto role both have taken as portable quantum programming frontends.

Slurm integration (via SPANK plugins, GRES, or Lua-based license plugins) appears in seven of nine stacks, making it the dominant HPC scheduler pathway. If you are running quantum jobs from an HPC environment today, Slurm compatibility is the most reliable common surface.

QDMI and QRMI Are the Traction Points, Everything Else Is Still Fragmented

Runtime boundaries, resource description formats, and interconnect semantics vary significantly. A job specification that works in one stack's QRMI implementation may not translate to another without manual adaptation. Telemetry and observability exist in every stack, but what gets exposed, in what format, and to which layers differs entirely.

QDMI (Quantum Device Management Interface) and QRMI (Quantum Resource Management Interface) are the two cross-stack interface efforts that have gained the most traction. QDMI provides a C-level standard for device interaction without embedding vendor APIs in upper layers. QRMI sits one level up and handles resource allocation, job submission, and lifecycle management within HPC resource managers like Slurm. As of April 2026, three stacks each have adopted QDMI or QRMI.

The openQSE Architecture

The architecture is organized into five logical elements: an HPC control node, HPC/classical compute nodes, quantum access nodes, quantum resources, and shared network/data infrastructure.

Job Submission Abstraction
The control node sits at the top and exposes a standardized, scheduler-agnostic resource description. Today, Slurm and Flux each have different mechanisms for describing quantum resources; the same job specification cannot be submitted across both without modification. openQSE defines a consistent resource description layer that decouples the job specification from the underlying resource manager. This is the layer most relevant for HPC-integrated workflows today.
Quantum Runtime Interface (QRI)
QRI is the application-facing runtime boundary that hybrid application code calls. The paper draws an explicit analogy to AMD's HIP for heterogeneous GPU environments: applications continue using their preferred SDK, but they interact with a common execution contract rather than a different backend runtime for each target. QRI handles resource-aware submission, lifecycle management, and multi-resource launch. It is available on both HPC compute nodes and quantum access nodes.
QHPC-Interconnect API
This is the bidirectional boundary between compute elements. Most current stacks treat the classical-to-quantum communication path as one-way: submit a job, poll for results. The QHPC-Interconnect API explicitly supports upstream signaling. Quantum-side services can trigger callbacks, adaptation requests, and dynamic resource requests back to the classical tier. This matters most in the accelerator-centric pattern, where mid-circuit feedback and adaptive compilation require a two-way channel.
Compiler Tool Pipeline
The paper defines a governed pipeline model for combining compilation passes in ordered stages. Passes can run at different points in the stack — algorithm-level transformations on HPC nodes, backend-specific lowering near the quantum resource. The architecture does not mandate a single compilation location; it standardizes how stages are described and connected. This is directly relevant to anyone managing compiler toolchains across multiple QPU targets.
Control Electronics as an Architectural Concern
Control electronics, the layer that translates digital commands into physical qubit operations, are treated as an architectural concern versus a vendor-internal detail. The IonQ Forte system is used as a concrete example: calibration cycles consume approximately 47% of total operational time, leaving 53% for user circuits. Single-qubit gates run at 110-130 microseconds, two-qubit gates at 900-950 microseconds, and ion-chain cooling at around 3 ms per shot.

The point is that this information needs to surface to the scheduler. openQSE calls for a capability-discovery interface so upper layers can query supported timing windows, branching primitives, and feedback latencies, rather than assuming a uniform execution model across platforms.

The Logical Control Layer (LCL) and Fault-Tolerant Execution Engine (FTEE) in the architecture are explicitly scoped to the future FTQC era. The LCL operates in the real-time domain and exposes parameter streaming, JIT compilation hooks, and hierarchical error correction interfaces. The FTEE handles bounded-latency syndrome decoding. Neither is required for current deployments. The architecture is designed so these layers are additive, not disruptive to current workflows.

Status Check: What You Can Actually Use Today

The paper is explicit about its limitations, which is helpful for calibrating how much weight to put on different pieces.

Available and in production:

Slurm integration via SPANK or Lua plugins — seven of nine stacks support this today
OpenQASM 3 as a portable circuit representation — universal across surveyed stacks
Qiskit and CUDA-Q as cross-vendor SDK entrypoints — supported by nearly every stack
QRMI and QDMI reference implementations — available in IBM, IonQ, and MQSS stacks

Announced or in progress:

QDMI adoption expanding — AWS Braket recently released a QDMI implementation, and IonQ has on-premises QDMI pathways in development
Xanadu's Catalyst compiler adding FTQC error correction passes to its MLIR lowering stack
IonQ's on-premises and federated QRMI pathways — listed as emerging

Conceptual / not yet specified:

openQSE interface contracts — the architecture is a reference model, not a reference implementation; concrete API specifications are future work
Cross-stack quantitative benchmarks — the paper explicitly scopes this out; structural comparison only
Unified capability-discovery interface for control electronics — described in the architecture but not yet standardized

The Practical Case for a Reference Architecture

Reference architectures often feel academic until they do not. The value of openQSE is in providing a shared vocabulary for interface negotiation between HPC centers, QPU vendors, and SDK developers.

Right now, every HPC center deploying a QPU is solving the same set of integration problems independently. How do you describe a QPU to Slurm? How do you handle calibration downtime in a fair-share queue? How do you expose compiler capability metadata without hard-coding vendor APIs into your workflow engine? The paper surfaces these as recurring patterns precisely because they keep being rebuilt from scratch. If openQSE's interface boundaries stabilize into something like QDMI and QRMI, it will reduce the cost of supporting multiple QPU backends in a single HPC environment.

If you are actively integrating QPU access into an HPC environment, the stack-by-stack analysis and the openQSE architecture layers are the highest-signal sections. The interface convergence matrix in the original paper is a useful reference for understanding which stacks support which interfaces.

If you are evaluating SDK strategy across multiple hardware targets, the classification axes and the stack-specific SDK support are the most actionable.

If you are tracking FTQC software readiness, the timing domain separation in and the LCL/FTEE architecture give a cleaner conceptual model than most vendor materials.

Shehata et al., "Quantum-HPC Software Stacks and the openQSE Reference Architecture: A Survey," arXiv:2604.20912v1 [quant-ph], April 22, 2026.

openQSE initiative: github.com/openQSE/Workshops