Quantum‑Kernel‑Molecular‑Classification

Abstract

This report benchmarks classical graph kernels against quantum‑inspired embeddings for molecular classification on TU datasets loaded via PyTorch Geometric’s TUDataset, ensuring consistent preparation across AIDS, PROTEINS, NCI1, PTC‑MR, and MUTAG.

The proposed ego‑graph quantum‑walk (Ego‑QW) feature map caps qubit width by encoding local neighborhoods with compact, Trotterized circuits using QURI simulators; Feature Vector is extracted as expectation readout of list of simple observables.

A complementary full‑graph continuous‑time quantum walk (CTQW) embedding summarizes global transport using exact unitary evolution \( \psi(t)=e^{-iHt}\psi(0) \) with adjacency or Laplacian Hamiltonians, evaluated efficiently at multiple times via eigen‑decomposition.

Evaluation uses SVM with an RBF kernel and nested cross‑validation to report mean ± std of Accuracy and \(F1\), enabling a fair comparison between local quantum features, global walk summaries, and classical baselines (Shortest‑Path, WL‑subtree).

        Innovation. Fix qubit‑count scaling while still capturing functional‑group topology via ego‑graphs, enabling efficient, accurate feature extraction and improved classification on molecular benchmarks.  
      

1. Introduction

The objective is to design an original graph feature map \( \phi(G) \) that captures molecular structure for SVM classification with a chosen kernel, and to evaluate performance on TU datasets using a standard cross‑validation protocol; datasets are loaded via PyTorch Geometric’s TUDataset for consistent preparation and metadata.

Quantum‑walk encodings on full graphs scale qubits with nodes, motivating an ego‑graph approach that preserves discriminative local dynamics while capping circuit width for practical QURI‑based simulation and potential hardware alignment.

2. Background and Motivation

Continuous‑time quantum walks (CTQW) evolve states unitarily by \(U(t)=e^{-iHt}\) where \(H\) is typically the adjacency \(A\) or Laplacian \(L=D-A\); CTQWs do not converge to a stationary distribution but can be characterized via time‑averaged distributions and spreading metrics.

Ego‑graph processing is a scalable strategy for quantum graph learning: decomposing large graphs into local neighborhoods compatible with limited qubit counts, consistent with modern ego‑graph quantum learning designs.

3. Methods

3.1 Ego‑QW (Proposed)

Centers are sampled with a degree‑biased policy; radius‑\( r \) ego‑graphs are built and remapped to contiguous indices; integer node labels are encoded as a diagonal potential \(V\) on the register for label‑aware phase evolution.
A first‑order Trotter product approximates \( e^{-iHt} \) with \(H=\gamma A+V\): per step, apply label phases via \(R_Z(-\Delta t\,V_i)\), then two‑qubit edge interactions implementing \(e^{-i\gamma \Delta t (X_iX_j+Y_iY_j)}\) via RXX/RYY.
Observables include projectors \(|i\rangle\!\langle i|\), adjacency \(A\), \(A^2\), and degree diagonal; expectations \( \langle O \rangle=\psi^\dagger O \psi \) are averaged across times and centers to form fixed‑length features for SVM‑RBF classification.

3.2 CTQW (Full Graph)

Full‑graph evolution uses \( H=\gamma A+V \) or \( H=\gamma L+V \) and \( \psi(t)=e^{-iHt}\psi(0) \); eigen‑decomposition yields multi‑time states efficiently as \( \psi(t)=V e^{-it\Lambda} V^\dagger \psi(0) \), avoiding repeated exponentials.
Features summarize \(p(t)=|\psi(t)|^2\) via mean, std, max, Shannon entropy, and degree expectation across times, capturing global connectivity and transport behaviors for classification.

3.3 Classical Baselines

Shortest‑Path: counts over tuples \( (\min(\ell_u,\ell_v),\max(\ell_u,\ell_v),d(u,v)) \) provide label‑aware geodesic signatures that perform strongly on labeled TU datasets.
Weisfeiler–Lehman (WL‑subtree): \(h\) rounds of 1‑WL color refinement with bag‑of‑colors features yield efficient multi‑scale neighborhood signatures for graph classification.

4. Datasets

AIDS, PROTEINS, NCI1, PTC‑MR, and MUTAG are accessed through PyTorch Geometric’s TUDataset, which automates download and processing; conversions to NetworkX with to_networkx provide integer node labels for pipelines that require label diagonals in \(H\) or observables.

Sample graphs from NCI1 dataset with node labels and edge structures

5. Implementation Details

5.1 Data loading and label normalization (PyG → NetworkX)

TUDataset(root, name) provides consistent loaders; to_networkx converts to NetworkX; node labels are derived as argmax over one‑hot x when present (else zeros) and nodes are relabeled to 0..n‑1 for array‑aligned computation in QW routines.

5.2 Graph preprocessing for walks

For CTQW, adjacency \(A\), degrees \(D\), and \(L=D-A\) are built from NetworkX; for QURI ego‑QW, a radius‑\(r\) ego‑graph is extracted and truncated to max_nodes, preserving neighborhood topology while fitting qubit limits.

5.3 Ego sampling

A degree‑biased sampling strategy selects half the centers from top‑degree nodes and the rest from remaining nodes, improving coverage of hubs and typical vertices under a small ego budget for runtime stability.

5.4 QURI ego‑QW circuits

The register is initialized to \( |\!+\rangle^{\otimes n} \) via Hadamards; per Trotter slice, diagonal label phases use \(R_Z\) gates and edges use RXX/RYY composites implementing \(e^{-i\gamma \Delta t (X_iX_j+Y_iY_j)}\); statevectors are obtained via evaluate_state_to_vector and normalized before expectation calculations.

5.5 Trotter product formula (first‑order)

The first‑order product formula \( e^{-i(H_1+H_2)t} \approx \left(e^{-iH_1 \Delta t} e^{-iH_2 \Delta t}\right)^{t/\Delta t} \) is a standard tool for non‑commuting operator exponentials; higher‑order formulas reduce error with more gates.

5.6 State padding and slice‑safe expectations

Ego states are padded or truncated to max_nodes to match observable canvases; a slice‑safe \( \langle O \rangle=\psi^\dagger O \psi \) computes with \(k=\min(\lvert \psi\rvert, \lvert O\rvert)\), preventing dimension errors across graphs of varying ego sizes.

5.7 CTQW eigen‑decomposition

Multi‑time evolution uses the spectral decomposition \(H=V\Lambda V^\dagger\) with \( \psi(t)=V e^{-it\Lambda} V^\dagger \psi(0) \), avoiding repeated exponentials; CTQW mixing and spreading properties are analyzed via time‑averaged distributions rather than stationary convergence.

5.8 Baselines

Shortest‑Path builds labeled distance histograms; WL‑subtree performs 1‑WL color refinement and accumulates color counts for \(h\) iterations; both align with established graph kernel literature and strong TU baselines.

5.9 Classifier and nested CV

Features are standardized and fed to an SVM with RBF \(K(x,y)=\exp(-\gamma\lVert x-y\rVert^2)\); \(C,\gamma\) are selected with an inner grid in nested cross‑validation for unbiased outer‑fold metrics (Accuracy, \(F1\)).

5.10 Reproducibility and performance

Random seeds are fixed for Python/NumPy/PyTorch; max_graphs_per_dataset bounds wall‑time; small time grids and Trotter steps stabilize QURI ego‑QW runtime, while CTQW’s \(O(n^3)\) eigen‑decomposition amortizes over multiple times.

5.11 Failure modes and fixes

In QURI, always extract vectors via evaluate_state_to_vector(...).vector and normalize before expectations; ensure padding/slice‑safe expectations for ego states vs fixed‑size observable matrices to avoid matmul shape errors.

5.12 Project notebooks (src/)

The src directory contains the primary notebooks to explore data, reproduce models, and visualize circuits for this work.

src/EDA.ipynb — Exploratory data analysis and visualisations of AIDS, PROTEINS, NCI1, PTC‑MR, and MUTAG.
src/main.ipynb — Main implementation of all model codes and benchmarks.
src/Quri_Quantum_Circuit.ipynb — Example visualization of an ego‑subgraph and the Trotterized quantum circuit.

Note: Ego‑graph decomposition and neighborhood‑wise quantum processing let fixed‑size circuits summarize large graphs by pooling local expectations into graph‑level features.

6. Results

Bar chart of mean accuracy across datasets for QURI, CTQW, SP, and WL

Bar chart of mean F1-score across datasets for QURI, CTQW, SP, and WL

Method	Dataset	Accuracy (Mean ± Std)	F1-score (Mean ± Std)
QURI Ego‑QW	AIDS	0.9650 ± 0.0255	0.9662 ± 0.0237
CTQW	AIDS	0.9950 ± 0.0100	0.9948 ± 0.0103
Shortest‑Path	AIDS	0.9600 ± 0.0200	0.9598 ± 0.0207
WL Subtree	AIDS	0.9300 ± 0.0245	0.9193 ± 0.0309
QURI Ego‑QW	MUTAG	0.8772 ± 0.0336	0.8793 ± 0.0324
CTQW	MUTAG	0.8514 ± 0.0255	0.8527 ± 0.0269
Shortest‑Path	MUTAG	0.7872 ± 0.0169	0.7812 ± 0.0161
WL Subtree	MUTAG	0.7818 ± 0.0273	0.7783 ± 0.0295

Replace placeholders with your computed nested‑CV statistics from runs to finalize this table.

7. Discussion

Ego‑QW captures local interference patterns and short‑hop structure under qubit caps, while CTQW captures global connectivity via spectral dynamics; SP and WL remain strong classical baselines encoding geodesic and refined neighborhood structure.

Future hybrid designs may combine local quantum features with global classical summaries, or leverage higher‑order Trotterization and compilation to improve fidelity‑vs‑depth trade‑offs in circuit‑based encodings.

8. Limitations and Future Work

Ego radius trades local fidelity against global context, as small egos may miss long‑range motifs that CTQW summaries capture.
Trotter error versus depth requires tuning; higher‑order product formulas reduce error but increase gate count and runtime.
Extending observables and Hamiltonians to include bond types/charges could enrich chemistry‑aware features beyond degree and adjacency powers.

Our innovation

Full‑graph quantum walk embeddings scale qubits with nodes, so we pivot to ego‑graphs that bound circuit width while preserving discriminative local walk dynamics around chemically meaningful centers.

Each ego‑subgraph is encoded by a compact Trotterized circuit using QURI Parts, features are extracted from observables, and aggregated over time points and centers to yield a fixed‑size vector per molecule.

This decouples hardware width from global graph size by capping qubits at the max ego size, enabling NISQ‑feasible circuits without losing salient local patterns.

9. Team

Team The Cat’s Cradle acknowledges the following contributors and contacts for this project.

Name	Discord Handle
Jajapuram Shiva Sai	@frosty
Amon Koike	@thedaemon_AK
Ramesh Makwana	@Ramesh Makwana
Dr. Sushant Tapase	@Dr-Sushant

10. Reproduce

Open the src folder, choose the required notebook (.ipynb), and click “Open in Colab” to run.

References

X. Ai et al., “Towards Quantum Graph Neural Networks: An Ego‑Graph Learning Approach,” arXiv:2201.05158v3 (2024). HTML.
D. Aharonov, A. Ambainis, J. Kempe, U. Vazirani, “Quantum Walks on Graphs,” STOC’01; arXiv:quant‑ph/0012090. PDF.
C. Kluber, “Trotterization in Quantum Theory,” arXiv:2310.13296. PDF.
N. Shervashidze et al., “Weisfeiler‑Lehman Graph Kernels,” JMLR 12, 2539–2561 (2011). PDF.
K. M. Borgwardt, H.‑P. Kriegel, “Shortest‑path kernels on graphs,” ICDM 2005. PDF.
PyTorch Geometric TUDataset. docs.
PyTorch Geometric utils (to_networkx, conversions). utils.
QURI Parts simulator and states. simulator, states.
Continuous‑time quantum walk overview. overview.
scikit‑learn SVC and RBF kernel. SVC, rbf_kernel.

Back to top