QSBench: Synthetic Quantum Circuit Datasets for QML Benchmarking
Hi everyone,
I’m sharing QSBench — a collection of synthetic quantum circuit datasets designed for machine learning benchmarking, especially for graph-based models and noise-aware learning.
Resources
What is QSBench?
QSBench is an ecosystem of datasets and tools for generating quantum circuits enriched with structural and physical metadata.
The goal is to move beyond:
- purely random circuits
- classical datasets embedded into quantum states
and instead provide structured, ML-ready quantum data.
Key Features
Structural Metadata (Graph-Ready)
Each circuit includes:
- Adjacency matrices
- Gate-level statistics
- Entanglement metrics
This makes the datasets directly usable with Graph Neural Networks (GNNs).
Noise-Aware Design
QSBench explicitly models different physical noise channels:
- Depolarizing noise
- Amplitude damping
- Thermal relaxation (T1/T2)
- Readout errors
High-Performance Format
All datasets are stored in Apache Parquet, enabling:
- Faster queries
- Efficient large-scale processing
- Better integration with ML pipelines
Available Datasets
QSBench-Core
- Clean structural dataset (no noise)
- Includes QASM, adjacency matrices, and entanglement metrics
QSBench-Depolarizing
- Circuits with depolarizing noise
- Designed for robustness and error mitigation research
QSBench-Amplitude
- Focused on amplitude damping noise
- Suitable for asymmetric noise modeling
QSBench-Transpilation
- Raw vs transpiled circuits
- Useful for studying compilation overhead and optimization
QSBench-Thermal
- Thermal relaxation noise (T1/T2)
- Designed for decoherence-aware modeling
QSBench-Device
- Hardware-inspired noise models
- Includes realistic combinations of error sources
Example Usage
from datasets import load_dataset
dataset = load_dataset("QSBench/QSBench-Core-v1.0.0-demo")
sample = dataset["train"][0]
print(sample["gate_count"])
print(len(sample["adjacency_matrix"]))
Use Cases
- Predicting circuit properties from structure
- Training GNNs on quantum circuits
- Noise classification and error mitigation
- Transpilation cost estimation
- Hardware-aware ML modeling
Roadmap
- Targeted entanglement generation
- Dynamic circuits (mid-circuit measurements)
- Integration with physical Hamiltonians
Feedback
Would love feedback, especially on:
- Missing features or metadata
- Additional noise models
- Real-world use cases
Thanks!