Comparing classical graph kernels and quantum‑inspired embeddings for molecular classification — Team The Cat’s Cradle
This report benchmarks classical graph kernels against quantum‑inspired embeddings for molecular classification on TU datasets loaded via PyTorch Geometric’s TUDataset, ensuring consistent preparation across AIDS, PROTEINS, NCI1, PTC‑MR, and MUTAG.
The proposed ego‑graph quantum‑walk (Ego‑QW) feature map caps qubit width by encoding local neighborhoods with compact, Trotterized circuits using QURI simulators; Feature Vector is extracted as expectation readout of list of simple observables.
A complementary full‑graph continuous‑time quantum walk (CTQW) embedding summarizes global transport using exact unitary evolution \( \psi(t)=e^{-iHt}\psi(0) \) with adjacency or Laplacian Hamiltonians, evaluated efficiently at multiple times via eigen‑decomposition.
Evaluation uses SVM with an RBF kernel and nested cross‑validation to report mean ± std of Accuracy and \(F1\), enabling a fair comparison between local quantum features, global walk summaries, and classical baselines (Shortest‑Path, WL‑subtree).
The objective is to design an original graph feature map \( \phi(G) \) that captures molecular structure for SVM classification with a chosen kernel, and to evaluate performance on TU datasets using a standard cross‑validation protocol; datasets are loaded via PyTorch Geometric’s TUDataset for consistent preparation and metadata.
Quantum‑walk encodings on full graphs scale qubits with nodes, motivating an ego‑graph approach that preserves discriminative local dynamics while capping circuit width for practical QURI‑based simulation and potential hardware alignment.
AIDS, PROTEINS, NCI1, PTC‑MR, and MUTAG are accessed through PyTorch Geometric’s TUDataset, which automates download and processing; conversions to NetworkX with to_networkx provide integer node labels for pipelines that require label diagonals in \(H\) or observables.
TUDataset(root, name) provides consistent loaders; to_networkx converts to NetworkX; node labels are derived as argmax over one‑hot x when present (else zeros) and nodes are relabeled to 0..n‑1 for array‑aligned computation in QW routines.
For CTQW, adjacency \(A\), degrees \(D\), and \(L=D-A\) are built from NetworkX; for QURI ego‑QW, a radius‑\(r\) ego‑graph is extracted and truncated to max_nodes, preserving neighborhood topology while fitting qubit limits.
A degree‑biased sampling strategy selects half the centers from top‑degree nodes and the rest from remaining nodes, improving coverage of hubs and typical vertices under a small ego budget for runtime stability.
The register is initialized to \( |\!+\rangle^{\otimes n} \) via Hadamards; per Trotter slice, diagonal label phases use \(R_Z\) gates and edges use RXX/RYY composites implementing \(e^{-i\gamma \Delta t (X_iX_j+Y_iY_j)}\); statevectors are obtained via evaluate_state_to_vector and normalized before expectation calculations.
The first‑order product formula \( e^{-i(H_1+H_2)t} \approx \left(e^{-iH_1 \Delta t} e^{-iH_2 \Delta t}\right)^{t/\Delta t} \) is a standard tool for non‑commuting operator exponentials; higher‑order formulas reduce error with more gates.
Ego states are padded or truncated to max_nodes to match observable canvases; a slice‑safe \( \langle O \rangle=\psi^\dagger O \psi \) computes with \(k=\min(\lvert \psi\rvert, \lvert O\rvert)\), preventing dimension errors across graphs of varying ego sizes.
Multi‑time evolution uses the spectral decomposition \(H=V\Lambda V^\dagger\) with \( \psi(t)=V e^{-it\Lambda} V^\dagger \psi(0) \), avoiding repeated exponentials; CTQW mixing and spreading properties are analyzed via time‑averaged distributions rather than stationary convergence.
Shortest‑Path builds labeled distance histograms; WL‑subtree performs 1‑WL color refinement and accumulates color counts for \(h\) iterations; both align with established graph kernel literature and strong TU baselines.
Features are standardized and fed to an SVM with RBF \(K(x,y)=\exp(-\gamma\lVert x-y\rVert^2)\); \(C,\gamma\) are selected with an inner grid in nested cross‑validation for unbiased outer‑fold metrics (Accuracy, \(F1\)).
Random seeds are fixed for Python/NumPy/PyTorch; max_graphs_per_dataset bounds wall‑time; small time grids and Trotter steps stabilize QURI ego‑QW runtime, while CTQW’s \(O(n^3)\) eigen‑decomposition amortizes over multiple times.
In QURI, always extract vectors via evaluate_state_to_vector(...).vector and normalize before expectations; ensure padding/slice‑safe expectations for ego states vs fixed‑size observable matrices to avoid matmul shape errors.
The src directory contains the primary notebooks to explore data, reproduce models, and visualize circuits for this work.
src/EDA.ipynb — Exploratory data analysis and visualisations of AIDS, PROTEINS, NCI1, PTC‑MR, and MUTAG.src/main.ipynb — Main implementation of all model codes and benchmarks.src/Quri_Quantum_Circuit.ipynb — Example visualization of an ego‑subgraph and the Trotterized quantum circuit.
| Method | Dataset | Accuracy (Mean ± Std) | F1-score (Mean ± Std) |
|---|---|---|---|
| QURI Ego‑QW | AIDS | 0.9650 ± 0.0255 | 0.9662 ± 0.0237 |
| CTQW | AIDS | 0.9950 ± 0.0100 | 0.9948 ± 0.0103 |
| Shortest‑Path | AIDS | 0.9600 ± 0.0200 | 0.9598 ± 0.0207 |
| WL Subtree | AIDS | 0.9300 ± 0.0245 | 0.9193 ± 0.0309 |
| QURI Ego‑QW | MUTAG | 0.8772 ± 0.0336 | 0.8793 ± 0.0324 |
| CTQW | MUTAG | 0.8514 ± 0.0255 | 0.8527 ± 0.0269 |
| Shortest‑Path | MUTAG | 0.7872 ± 0.0169 | 0.7812 ± 0.0161 |
| WL Subtree | MUTAG | 0.7818 ± 0.0273 | 0.7783 ± 0.0295 |
Replace placeholders with your computed nested‑CV statistics from runs to finalize this table.
Ego‑QW captures local interference patterns and short‑hop structure under qubit caps, while CTQW captures global connectivity via spectral dynamics; SP and WL remain strong classical baselines encoding geodesic and refined neighborhood structure.
Future hybrid designs may combine local quantum features with global classical summaries, or leverage higher‑order Trotterization and compilation to improve fidelity‑vs‑depth trade‑offs in circuit‑based encodings.
Full‑graph quantum walk embeddings scale qubits with nodes, so we pivot to ego‑graphs that bound circuit width while preserving discriminative local walk dynamics around chemically meaningful centers.
Each ego‑subgraph is encoded by a compact Trotterized circuit using QURI Parts, features are extracted from observables, and aggregated over time points and centers to yield a fixed‑size vector per molecule.
Team The Cat’s Cradle acknowledges the following contributors and contacts for this project.
| Name | Discord Handle |
|---|---|
| Jajapuram Shiva Sai | @frosty |
| Amon Koike | @thedaemon_AK |
| Ramesh Makwana | @Ramesh Makwana |
| Dr. Sushant Tapase | @Dr-Sushant |
Open the src folder, choose the required notebook (.ipynb), and click “Open in Colab” to run.