Parnassus: Fast Detector Simulation and Reconstruction
Simulated datasets are an integral part of interpreting complex experimental data. Such datasets accurately and precisely represent nature; they bridge the fundamental theory to observable quantities. The full science program of a given particle physics experiment requires many synthetic datasets, with many variations on fundamental (Standard Model and beyond), phenomenological, and instrumental parameters. This significant computational bottleneck is particularly acute for the ATLAS and CMS experiments at the Large Hadron Collider (LHC), where simulation and reconstruction of synthetic data during the high-luminosity era will require more computing resources than real data processing.
To address this challenge, a suite of fast simulation programs have been developed. Experimental collaborations (like ATLAS or CMS) have developed tailored approaches, trying to reduce the slowest parts of the simulations, notably calorimeters. These fast surrogate models mimic the output of full detector simulations and are thus processed using the same reconstruction algorithms as the data. While this approach offers dramatic speedup for simulation, it does not alleviate the significant computational cost of reconstruction.
Our group proposed and developed an end-to-end fast simulation model — Parnassus [1,2] — that includes both generation and reconstruction, and can be trained for any detector and reconstruction configuration. Parnassus is a deep generative model, based on the flow matching architecture, creating a point cloud (reconstructed particles) conditioned on another point cloud (detector-stable particles).
Using a variety of physics processes at the LHC, we show that the extended Parnassus is able to generalize beyond the training dataset and outperforms the standard, public tool Delphes.
Since it is based on a neural network, the entire framework is written in Python with few dependencies and automatically compatible with Graphical Processing Units (GPUs), allowing fast and efficient data generation.