QSBench: Synthetic quantum circuit datasets for QML benchmarking

QSBench: Synthetic Quantum Circuit Datasets for QML Benchmarking

Hi everyone,

I’m sharing QSBench — a collection of synthetic quantum circuit datasets designed for machine learning benchmarking, especially for graph-based models and noise-aware learning.

Resources


What is QSBench?

QSBench is an ecosystem of datasets and tools for generating quantum circuits enriched with structural and physical metadata.

The goal is to move beyond:

  • purely random circuits
  • classical datasets embedded into quantum states

and instead provide structured, ML-ready quantum data.


Key Features

Structural Metadata (Graph-Ready)

Each circuit includes:

  • Adjacency matrices
  • Gate-level statistics
  • Entanglement metrics

This makes the datasets directly usable with Graph Neural Networks (GNNs).


Noise-Aware Design

QSBench explicitly models different physical noise channels:

  • Depolarizing noise
  • Amplitude damping
  • Thermal relaxation (T1/T2)
  • Readout errors

High-Performance Format

All datasets are stored in Apache Parquet, enabling:

  • Faster queries
  • Efficient large-scale processing
  • Better integration with ML pipelines

Available Datasets

QSBench-Core

  • Clean structural dataset (no noise)
  • Includes QASM, adjacency matrices, and entanglement metrics

QSBench-Depolarizing

  • Circuits with depolarizing noise
  • Designed for robustness and error mitigation research

QSBench-Amplitude

  • Focused on amplitude damping noise
  • Suitable for asymmetric noise modeling

QSBench-Transpilation

  • Raw vs transpiled circuits
  • Useful for studying compilation overhead and optimization

QSBench-Thermal

  • Thermal relaxation noise (T1/T2)
  • Designed for decoherence-aware modeling

QSBench-Device

  • Hardware-inspired noise models
  • Includes realistic combinations of error sources

Example Usage

from datasets import load_dataset

dataset = load_dataset("QSBench/QSBench-Core-v1.0.0-demo")

sample = dataset["train"][0]

print(sample["gate_count"])
print(len(sample["adjacency_matrix"]))

Use Cases

  • Predicting circuit properties from structure
  • Training GNNs on quantum circuits
  • Noise classification and error mitigation
  • Transpilation cost estimation
  • Hardware-aware ML modeling

Roadmap

  • Targeted entanglement generation
  • Dynamic circuits (mid-circuit measurements)
  • Integration with physical Hamiltonians

Feedback

Would love feedback, especially on:

  • Missing features or metadata
  • Additional noise models
  • Real-world use cases

Thanks!

1 Like