Abstract
Accurate molecular force fields are of paramount importance for the efficient implementation of molecular dynamics techniques at large scales. In the last decade, machine learning (ML) methods have demonstrated impressive performances in predicting accurate values for energy and forces when trained on finite size ensembles generated with ab initio techniques. At the same time, quantum computers have recently started to offer new viable computational paradigms to tackle such problems. On the one hand, quantum algorithms may notably be used to extend the reach of electronic structure calculations. On the other hand, quantum ML is also emerging as an alternative and promising path to quantum advantage. Here we follow this second route and establish a direct connection between classical and quantum solutions for learning neural network (NN) potentials. To this end, we design a quantum NN architecture and apply it successfully to different molecules of growing complexity. The quantum models exhibit larger effective dimension with respect to classical counterparts and can reach competitive performances, thus pointing towards potential quantum advantages in natural science applications via quantum ML.

Original content from this work may be used under the terms of the Creative Commons Attribution 4.0 license. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
1. Introduction
Since more than half a century, atomistic simulations represent one of the sharpest tools available for scientific investigation in a wide range of research fields, such as chemistry, materials science and biology [1, 2]. To perform a molecular dynamics (MD) calculation, in which the classical equations of motion are numerically integrated for each atom in the system under study, an accurate knowledge of potential energy surfaces (PES) and local forces is required. This information, originating from the quantum mechanical behaviour of electrons and nuclei, could in principle be deducted from the exact solution of the Schrödinger equation. However, the complexity of such task makes it impractical beyond a few small-scale paradigmatic examples. In a delicate balance between performance and accuracy, approximate solution methods such as density functional theory (DFT) have therefore been proposed, leading to the family of so called ab initio MD techniques [3, 4]. These precise yet computationally demanding strategies can be applied up to medium-sized systems, while larger problems may only be tackled, typically at a lower accuracy, with empirical force fields (FFs) [5]. In fact, MD runs require on-the-fly computations of energy and forces at each time step and for each configuration visited by the system during its evolution: therefore, the use of simple parametrized functional potentials (e.g. the FFs) that can be evaluated in a fraction of the time required by actual quantum mechanical calculations is the only viable strategy when thousands of atoms are involved.
Recently, machine learning (ML) has emerged as a new technological paradigm offering promising and effective solutions for physics and chemistry [6, 7]. In the context of MD simulations, a pioneering approach to ML-powered FFs was proposed by Behler and Parrinello using neural networks (NN) [8]. The original idea has later been refined and extended [9–16], also promoting the development of specific software libraries [17, 18]. The fundamental insight behind the so called neural network potentials (NNPs) is two-fold: first, they incorporate the idea that large performance gains can be achieved by directly modelling some form of functional relationship between structure (i.e. atomic positions) and properties of interest (e.g. energies), essentially bypassing the explicit solution of the underlying quantum mechanical problem. Second, NNPs typically enjoy the generalization capabilities of ML models, maintaining extremely good accuracy even on previously unseen configurations when trained on data sets constructed with ab initio methods.
Despite the success of classical ML techniques in the realm of atomic and molecular dynamical processes, the quantum mechanical character of the fundamental laws governing such phenomena immediately leads to the question whether quantum machine learning (QML) methods could provide further significant advantages [6, 19, 20]. The interested reader can refer to [21, 22] for a thorough introduction on quantum computation and QML, respectively.
Thanks to the high practical relevance of the problem, the learning and generation of molecular FFs may actually constitute a very natural and appealing playground in which QML could be tested and compared with state-of-the-art classical counterparts. An important aspect of such comparison lies in the fact that the properties to be learned, namely the relationship between configurations, energies and forces, are generally hard to be derived directly from first principles due to their quantum mechanical origin. At the same time, it has recently been suggested that information theoretical complexity considerations are strongly affected by the availability of training data even when quantum systems are involved [23], with classical ML methods showing competitive performances, e.g. in predicting non-trivial many-body properties [24]. Several important questions may therefore be addressed, from an overall assessment of the capabilities of QML protocols for PES and FFs reconstruction to systematic tests of classical versus quantum techniques for learning specific quantum mechanical properties through data.
In this work, we take a first step in such direction by establishing a direct connection between QML and NNPs. In particular, we demonstrate how quantum neural networks (QNNs) can be employed, in combination with classical data sets, as trainable models for the prediction of energies and forces in molecular systems. Although many different realizations of QNNs and quantum perceptrons are known in the literature [25–33], here we will focus on variational parametrized quantum circuits (PQC) [34–38], which offer the greater flexibility for near term applications. It is worth mentioning that some instances of QNNs, such as the ones that we will implement in the following, are known to exhibit greater power, as measured by the effective dimension in model space, compared to their classical equivalents [39]—a fact that places them among the most promising candidates in the quest for quantum advantage in ML. We also notice that, while variational quantum algorithms are often employed for the direct ab initio computation of Hamiltonian spectra [40, 41] and corresponding forces [42], here we take a rather different approach [43], using quantum resources to learn an implicit mapping between atomic coordinates, energy and forces, without any explicit solution of the quantum mechanical problem itself.
2. Model and methods
2.1. Quantum neural networks
We adopt a supervised learning approach with training sets of the form
, namely collections of n-atom molecular configurations
with associated total energies Eα
and forces
. Here, the index α runs over the different elements of the training set,
for
is the Cartesian coordinate of the ith atom and:

is the force acting on atom ai
along the direction c. We will also denote with
the number of samples contained in
.
The training data, derived from classical ab initio methods that will be specified in the following, are used to optimize the predictions made by quantum models, specifically QNNs. These are based on the general notion of PQCs [33, 35, 37], consisting of a reference initial state
, an output observable
and a model unitary
, depending on both some input data
and a set of trainable parameters
. The function expressed by a QNN takes the general form:

and can be represented schematically as shown in figure 1. For our application of QNNs to PES learning, the Cartesian coordinates
will be playing the role of the input
. In practical realizations, we allow for an additional classical preprocessing map
, which in our specific context can serve the purpose of converting between Cartesian coordinates and more suitable molecular descriptors, as well as enhancing the effective nonlinearity of the model [34].
Figure 1. Quantum neural network model. The input data
are preprocessed with the classical (yellow) function W and encoded with the map Φ (blue). The variational layers (red) contain free parameters that are optimized to minimize the loss function
.
represents the computation of the expectation value of the observable through quantum measurements.
Download figure:
Standard image High-resolution imageIt is worth noticing that while the term QNNs is used to emphasize some significant similarities between parametrized quantum models and classical neural networks (NN)—most prominently the layered structure and the use of trainable parameters—some important differences also exists. Indeed, contrary to the usual feed-forward scheme of classical NN, QNNs are quite easily designed and interpreted in the form of re-uploading circuits [44, 45], where trainable layers
, red in figure 1, are alternated with data encoding ones
, blue in figure 1, with the same classical input values appearing multiple times. The choice of this re-uploading architecture is actually related to a more fundamental difference between classical and quantum NNs, namely the fact that quantum unitary layers act linearly on quantum states, hence lacking an explicit implementation of the usual non-linear activation function as a separate step. Instead, the non-linear manipulation of the classical input data is essentially referred to the choice of the encoding map, the trainable quantum gates and the final measurements.
It has actually been shown that the re-uploading mechanism makes QNNs universal functional approximators [44–46]: more specifically, any QNN output function (equation (2)) can be recast into a truncated Fourier series with a set of available independent frequencies determined by the eigenvalues of the encoding map and growing with the number of re-uploading steps. While recent results suggest that, by expanding the size of the quantum registers, these models can actually be mapped back on simpler sequential ones in which all input operations appear at the beginning [47], one can still profit from the insights offered by the re-uploading picture as a guide for intuition in the actual design of application-specific QNNs. As an example, theory suggests that, by adjusting the number of input layers, the richness of the Fourier spectrum can be systematically increased.
The structure of QNNs is completed by choosing a physical observable
and a suitable loss function
, whose minimization drives the update of the trainable parameters via a classical optimization routine. In the following, we will make the simple choice:

namely we will take the expectation value of the Pauli-z operator on the first qubit to construct the network output. Moreover, we will use a quadratic mean square error (MSE) loss function:

with the hyperparameter χ weighting the contribution of energy and forces [48]. In practice, we directly associate the output of the QNN with the energy potential surface by defining:

In parallel, consistent predictions for the forces are obtained by taking the derivative of the quantum circuit output with respect to the molecular coordinates:

We leave for future works the investigation of alternative strategies, such as the use of two independent quantum circuits for the separate learning of energies and forces.
2.2. Encoding layers
As mentioned in the previous section, the encoding operations
are used to input the molecular configurations in the QNN model. Here, we assume for simplicity that these encoding unitaries are identical across different layers or re-uploading stages, i.e.
. Moreover, let us denote by N the linear dimension of the feature vector
, where in general N can differ from 3 n due to the preprocessing stage. Following standard practices [49], we will use N qubits to encode and manipulate N-dimensional features
.
The quantum feature map
is then constructed according to the expression:

where
is a collection of single-qubit Pauli-y rotations:

and
is an entangling operation of the form:

The latter directly resembles the so called ZZ feature map originally introduced in [49]. The set of qubit pairs
can be chosen in different ways, balancing ease of implementation on physical architectures with functional expressivity. Standard examples include linear entanglement:

circular entanglement:

and full entanglement:

For further generality, we also define a natural extension of the ZZ feature map to degree l interactions, adding factors of the form:

where, in principle, the l qubits
can be arbitrarily chosen among the N available.
In figure 2 we explicitly show a 3-qubit example with a full 2-qubit entangling map and an additional l = 3 operation. All multiple qubit operations are already decomposed into a standard universal set made of single-qubit rotations and CNOTs [50], which is typical of superconducting quantum computing architectures.
Figure 2. Example of a full encoding map
(blue) with single-qubit rotations (green), l = 2 (yellow) and l = 3 interactions (red) on N = 3 qubits.
Download figure:
Standard image High-resolution imageBefore moving to the description of the trainable part of the QNN models, let us also remark a few points about the classical preprocessing step W (see figure 1). As suggested above, this classical manipulation is generally used to enhance the nonlinear behaviour of the network, for example by taking inverse trigonometric functions of the original inputs [34]. At the same time, we can use this initial step to embed the relevant set of physical symmetries into the abstract representation of the target molecular systems seen by the QNN. This is known to be crucial already for classical ML methods, where the role of symmetry preserving features is played, e.g. by the so called symmetry functions [8, 10].
To limit the complexity of the quantum models, in the following we will use a set of internal coordinates, namely bond distances and angles, which by design respect translational and rotational symmetries. The integration of more advanced techniques, including fragmentation of large systems into local atomic environments, the use of other classes of molecular fingerprints and, possibly, the realization of symmetry adapted quantum circuits all represent natural future extensions of the present work.
We explicitly notice that the conversion between Cartesian and internal coordinates (
) must be taken into account when computing the quantum forces predictions, as these are defined with respect to the former class (equation (1)). As a result, equations (5) and (6) are more properly rewritten as:

and

2.3. Trainable layers
The parametrized operations
represent the adjustable components of the QNN model. If suitably optimized during the learning phase, these select the most appropriate mapping from inputs (molecular coordinates) to outputs (energy and forces).
The precise structure of the
can vary significantly across different models and applications, with several popular choices known in the literature. For our specific functional regression problem, we follow the steps of Mitarai et al [34] and make use of a physics-inspired ansatz which is known to generate highly entangled states. In particular, we consider the following fully connected transverse field Ising Hamiltonian:

and we use the induced time evolution operator
as a template for the trainable layers. By using the approximate product formula:

and by replacing aj
t and
with free parameters, we obtain a unitary operation of the form:

where now
is the collection of all
and
. It is easy to see that
can be implemented with single-qubit Pauli-y rotations and two-qubit ZZ operations, similarly to what happens for the encoding map Φ defined in the previous section. We stress however that, while the encoding layers are identical across different reuploading stages, as they repeatedly input the same classical data
, the trainable parameters
are allowed to vary across different layers
.
For
, we make the special choice
, namely we only use single-qubit rotations at the beginning of the computation (see figure 1). Moreover, we empirically find that including generalized l-qubit interactions up to l = 3, i.e. terms of the form
improves the overall performances of the model at the cost of only a modest increase of circuit complexity. The resulting
, as seen in figure 1, can thus be written as:

Notice that the product runs from
to
from left to right.
2.4. Model training
We train QNN models by minimizing an average quadratic error that in principle contains both energy and forces labels, see equation (4). In most instances, we make use of an update rule which follows the negative gradient direction of the loss function:

where η is the learning rate and
the set of trainable parameters at the optimization step t. For energy predictions derived from a quantum circuit, derivatives with respect to any given trainable parameter µ can be easily computed with the parameter shift rule [51]:

where
and
is the set of all trainable parameters without µ. The corresponding result for the forces predictions is obtained with the iterative parameter shift rule [52], and schematically reads:

with
the standard basis vectors.
Notice that the parameter shift rule of equation (21) can also be used to retrieve the actual QNN forces predictions, namely:

In this case, factors of the form
must be computed from the quantum circuit, while the chain rule factors taking into account the classical preprocessing are known analytically and are determined solely by the mapping function W.
At the beginning of the training, all free parameters are initialized at zero, so that each hidden layer acts as the identity operator. This essentially follows the recommendations given in [53] to promote effective optimization steps in the early training phase.
2.5. Effective dimension
In the remaining part of this work, we will provide a series of examples to demonstrate how QNN models are able to learn and give consistent predictions of PES and FFs for individual molecules. The overall performances will be assessed primarily through the evaluation of the average root MSE on suitable test sets.
In addition to that, we will also make use of the concept of effective model dimension—originally introduced in [39]—to inform the comparison with classical counterparts and to investigate potential advantages. As a brief summary, the effective dimension quantifies the capacity of both classical and quantum parametrized ML models [54] by measuring the portion of model space that they occupy. In other words, it estimates the capability of a model in covering the functional space defined by a particular model class by making a productive use of all its parameters, going beyond naive parameter-counting arguments. A high effective dimension is therefore related to a richer set of expressible functions and better trainability, as it can lead to a more favourable landscape for gradient based methods [39]. Moreover, the effective dimension can sensibly bound the model generalization error [39, 54].
For a given statistical model
, the effective dimension is defined as:

where
is a d-dimensional parameter space of volume
,
is the number of data samples. Here, we have also introduced the Fisher information matrix:


with
and
being the probability distribution mass of the model. The effective dimension is bounded by the rank of the Fisher information matrix and is usually normalized with the total number of parameters d.
3. Results
3.1. Diatomic molecules
3.1.1. Lithium hydride
As a first proof-of-concept, we consider a single LiH molecule and we design a QNN model that learns its dissociation curve, and the corresponding FF, as a function of the bond length r.
The presence of a single internal coordinate, which is obtained from the Cartesian positions of the Li and H atoms with a mapping
, makes the problem effectively one dimensional. We construct a classical dataset by numerically diagonalizing the second quantized Hamiltonian expressed in the STO3G basis set [41] for different bond lengths r in the range
Å. Forces are computed via finite differences over the exact PES.
To make better use of the Fourier series structure of the QNN, we make the exact PES symmetric by mirroring it around the r = 4.5 Å and selecting the data set over the extended range, see appendix
with the scikit-learn MinMaxScaler, then we apply the map:

We employ a 3-qubit QNN model with D = 10 trainable layers, alternated with the same amount of input stages. Notice that, even if a single qubit could in principle be used to encode the 1-dimensional problem under study, we instead choose to engineer 3-dimensional feature vectors from the single independent internal coordinate available (interatomic distance) to enrich by design the set of expressible functions and to enable the use of genuine multi-qubit quantum resources such as entanglement. More in general, a very common choice when using a parametric type of encoding is to use one qubit per feature. However, similarly to the depth of the model and the overall number of trainable parameters, this is by no means a strict requirement and could be freely modified, subject to appropriate convergence tests, to improve performances or in response to additional constraints (e.g. limited number of qubits or finite quantum coherence). We use a full feature map, as introduced in section 2.2, with l = 3 (see also figure 2), and similar trainable layers with full entanglement and degree l = 3 interactions. The input of the feature map is
, with W as in equation (27) and r the Li-H inter-atomic distance.
We benchmark our model against a fully connected classical NN, which is given access to the same set of seven engineered inputs and products thereof shown in figure 2. The classical NN is composed of five layers with
units using the hyperbolic tangent activation function. Such network topology is chosen through a random search over all possible configurations with the same number of parameters (identical to the QNN), selecting the one that achieves the best validation loss.
Both the classical and the QNN models are trained with 4000 steps of the ADAM [55] algorithm on 50 data points using the
(χ = 0) loss. We remark that the small size of the classical NN makes it quite sensitive to parameters initialization, even when using the Glorot and Bengio [56] scheme. Hence, the model underfits in
of the cases and needs 1600 epochs to reach convergence. On the other hand, the QNN appears to be robust against weights initialization and converges in 230 epochs.
The validation root mean square errors (RMSE) for both models, computed on a test set with 120 data points, are given in table 1 and the respective predictions are shown in figure 3. We notice that, in this setting, the QNN outperforms the best classical counterpart with the same number of parameters in terms of stability and quality of the predictions, and also exhibits a larger effective dimension. Although the classical model can match the QNN results in absolute terms when its size grows or if the forces are explicitly included in the loss function (χ ≠ 0), the present comparison, supported by the effective dimension analysis, certifies the competitiveness of the proposed quantum models. In appendix
Figure 3. Prediction of the LiH energy (shifted by −212.8 eV) (a) and force (b) as a function of the inter-atomic distance r. The exact solution (black line) is compared with the QNN (red dots) and the classical NN (green stars). Insets show an enlarged view of the energy minimum region.
Download figure:
Standard image High-resolution imageTable 1. Validation RMSE for LiH energy (eV) and forces (eV Å−1), together with the normalized effective dimension (
), the total number of parameters (d) and the number of training epochs for both the QNN and the classical NN models.
| LiH | RMSE (E) | RMSE (F) |
| d | Epochs |
|---|---|---|---|---|---|
| QNN | 0.003 | 0.044 | 0.9 | 73 | 230 |
| NN | 0.034 | 0.21 | 0.38 | 73 | 1600 |
3.1.2. Argon dimer
As a second experiment, we consider two Argon atoms separated by a bond length r. This example covers a richer set of intermolecular interactions, as Ar2 is loosely bounded via Van der Waals forces. The dataset is constructed using density functional theory (DFT/B3LYP with Grimme D3 corrections as implemented in Gaussian [57]), and smoothened by fitting to a
function.
The training strategy is almost identical to the LiH case, including the QNN architecture (with D = 9) and the dataset mirroring. One notable difference is the use of the COBYLA optimizer, which empirically showed better performance than the ADAM one. The benchmark is performed against a fully connected classical NN, taking as input the bond length, composed of ten layers with
hidden units. The architecture is chosen with a random search through the models with the same number of parameters as the QNN and variable number of inputs, selecting the one that achieves the best validation loss.
The validation RMSE for both models, computed on a test set with 120 data points, are given in table 2 and the respective predictions are shown in figure 4. As for the LiH case, we observe that the QNN outperforms the best classical counterpart with the same number of parameters in terms of stability and quality of the predictions, and also exhibits a larger effective dimension. Notably, the region around the PES minimum is better reconstructed with the QNN than the classical NN.
Figure 4. Prediction of the Ar2 energy (shifted by −1.086 eV) (a) and force (b) as a function of the inter-atomic distance r. The exact solution (black line and crosses) is compared with the QNN (red dots) and the classical NN (green stars). Insets show an enlarged view of the energy minimum region.
Download figure:
Standard image High-resolution imageTable 2. Validation RMSE for Ar2 energy (eV) and forces (eV Å−1), together with the normalized effective dimension (
), the total number of parameters (d) for both the QNN and the classical NN models.
| Ar2 | RMSE (E) | RMSE (F) |
| d |
|---|---|---|---|---|
| QNN |
| 0.058 | 0.77 | 66 |
| NN |
| 0.6 | 0.11 | 66 |
3.2. A single H2O molecule
We move towards the multi-dimensional case by considering an individual H2O molecule. For this more challenging test, we choose 3 internal coordinates, namely the two O–H bond lengths and the H–H planar angle, which we scale again in
and preprocess to create the following 3-dimensional feature vector:

which is then inputed in the model via the feature map. The classical configurations and the corresponding energy and forces labels, computed with DFT, are retrieved from [58]. To simplify the problem, we concentrate on configurations around the equilibrium position, discarding those with bond length outside
Å.
Our QNN model is constructed similarly to the LiH case, with three qubits and a depth of D = 12 encoding and trainable layers. We compare again its results with those of a classical NN model with the same number of parameters and designed with the same conditions applied in the LiH case presented above. Furthermore, we also use as a reference the state-of-the-art specialized n2p2 [59] classical package, which makes full use of symmetry functions and of the sub-network fragmentation idea [8], and from which we expect peak performance.
More specifically, the simplified classical NN model is a fully-connected 6-layer network with
units and hyperbolic tangent activation function. Instead, the n2p2 network uses three sub-networks with two hidden layers, each with 15 units, and hyperbolic tangent activation. This model takes as input a set of 15 symmetry functions for the Oxygen atom and 20 for the two Hydrogen ones. The models are trained by minimizing a
loss function on 300 data points (1000 for n2p2). Notice that here we set χ = 1: indeed, as pointed out in [60], it is important to incorporate the forces in the training if we wish to predict them accurately. However, this makes a full gradient descent computationally very intensive for the simulation of the quantum model, as it requires the calculation of the circuit Hessian matrix with recursive parameter shifts (see equation (22)). For this reason, we choose the gradient free optimizer COBYLA for the present demonstration, while the classical models are trained with the ADAM method.
The validation loss results (computed on a test set with 650 data points) are presented in table 3, while figure 5 shows the predicted energy and forces against the reference data points. The QNN is once again competitive with respect to classical counterparts, outperforming the non-specialized classical NN and reporting the largest normalized effective dimension.
Figure 5. Predicted H2O energy (a) and forces (b) from the QNN (red crosses), the generic NN (green crosses) and the n2p2 (blue crosses) models compared to reference DFT data points (black line).
Download figure:
Standard image High-resolution imageTable 3. Validation RMSE for H2O energy (eV) and forces (eV Å−1), together with the normalized effective dimension (
) and the total number of parameters (d) for the QNN and the classical NN and n2p2 models.
| H2O | RMSE (E) | RMSE (F) |
| d |
|---|---|---|---|---|
| QNN | 0.005 | 0.06 | 0.72 | 87 |
| NN | 0.006 | 0.1 | 0.25 | 87 |
| n2p2 |
| 0.01 | 0.04 | 1642 |
3.3. Umbrella motion of hydronium
In our last test, we consider a single hydronium (H3O+) molecule. Following [61], we prepare a training set by sampling configurations traversed along the inversion pathway shown in figure 6 using DFT-based MD simulations at 400 K and collecting the corresponding energies and forces. In order to sample the full profile along the dihedral angle ‘HHHO’ from −0.78 to 0.78 rad, we also applied a dynamical constraint with increments of 0.005 rad at each MD time step. All calculations were performed with the plane-wave (PW) code CPMD [62] using unrestricted Kohn–Sham DFT with the PBE functional [63], a PW cutoff of 70 Ry, a cubic simulation box of edge 14 Å, and Trouiller–Martins pseudo-potentials [64]. For the MD, a time step 10 a.u. (0.242 fs) was used.
Figure 6. Predicted H3O+ energy (shifted by −474.45 (eV)) from the QNN (red), the generic classical NN (green) and the n2p2 (blue) models as a function of the dihedral angle (a) and compared to the reference DFT energy (b). The molecule is represented at the two energy minima where the oxygen atom sits above or below the plane formed by the hydrogen ones, and at the saddle point where the oxygen and hydrogen atoms are co-planar.
Download figure:
Standard image High-resolution imageWe describe each configuration with 6 internal degrees of freedom, namely three O–H bond lengths, two O–H angles and the dihedral angle formed by the four atoms (see appendix
and loaded in the model via the feature map, see appendix
A 6-qubit QNN model is constructed with D = 10 repetitions of the encoding map and the trainable layers, both with linear entanglement and order l = 3 interactions. The latter are also placed with linear couplings across all neighbouring 3-qubit groups. Due to the complexity of the system and the high computational cost, we only include energy labels in the training set, leaving a more refined study of the forces for future investigations. Hence, we train the model by minimizing a
loss on 500 data points (9000 for n2p2) with 5000 steps of the ADAM algorithm.
As in the previous examples, we compare the QNN model with a classical NN, whose topology is optimized under the constraint that the number of trainable parameters be the same of the QNN, and with a n2p2-build model. In this case, the classical NN has [6, 14, 2, 1] units, while the n2p2 NN is composed of four sub-networks, each one with a structure similar to the case if the H2O molecule in section 3.2.
The RMSE results for a test set with 500 data points are reported in table 4, and energy predictions are also depicted in figure 6. Here, the QNN model is still competitive with respect to the generic classical NN, although both of then are clearly outperformed by the specialized and much larger n2p2 model. It is also worth noticing that forces predictions are much worse than energy ones, which confirms the necessity of including them in the loss function to achieve good results in non-trivial systems.
Table 4. Validation RMSE for H3O+ energy (eV) and forces (eV Å−1), together with the normalized effective dimension (
) and the total number of parameters (d) for the QNN and the classical NN and n2p2 models.
| H3O+ | RMSE (E) | RMSE (F) |
| d |
|---|---|---|---|---|
| QNN |
| 0.26 | 0.67 | 135 |
| NN |
| 0.207 | 0.19 | 135 |
| n2p2 |
| 0.19 | 0.03 | 2214 |
4. Conclusions
In this work we have successfully demonstrated the systematic application of QML techniques, and specifically parametrized QNN, to the problem of learning molecular PES from classical ab initio data sets and for the generation of molecular FFs. In all our numerical simulations, the proposed QNN models already achieved competitive performances with respect to comparable classical ones, reporting good prediction accuracy for several paradigmatic single-molecule examples.
The present assessment naturally opens several questions and future research directions. On the one hand, the design of more specialized QNN architectures and molecular descriptors would allow a more refined and effective treatment of the problem, including the possibility of tackling bulk materials. For example, classical state-of-the-art implementations [59] crucially benefit from the fragmentation of large systems in local environments and corresponding sub-networks [8]. Similar techniques could possibly be engineered for QNNs, e.g. by resorting to more general perceptron-based models [30, 31]. These could explicitly be partitioned into local components and entangling connections between different sub-networks could eventually be realized. In addition to that, the development of efficient quantum gradient estimation protocols, which currently represents one of the main algorithmic bottlenecks, will be crucial for larger scale implementations of the proposed methods and for an adequate treatment of forces. It is also worth mentioning that another viable alternative may be represented by quantum kernel methods [49], whose classical counterparts are already extensively used in the context of ML potentials [9], for example in combination with Coulomb matrix descriptors [11].
On the other hand, the most interesting open questions concern the potential quantum advantages brought by QML approaches. In this work, we focused on the problem of learning from classical data and we observed, following [39], that suitably designed QNN models exhibit a larger effective dimension than their classical equivalents. This in turn relates to stable and fast training capabilities, and points toward a more effective handling of larger systems. Indeed, large normalized effective dimensions signal, in the spirit of capacity measures, an effective use of the available model parameters, thus suggesting that quantum models could represent economic and manageable tools to tackle large molecular simulations.
At the same time, this analysis does not yet take into account the role of overparametrization, which is known to contribute in a crucial way to the performances of classical NN. In fact, our numerical experiments confirm that large classical models, such as the ones employed in sections 3.2 and 3.3, still achieve the best prediction and generalization accuracy. Interestingly, while the normalized effective dimension of those classical models is quite small, the absolute effective dimension is actually comparable to the one of their direct quantum counterpart, thus hinting at some form of ‘computational phase transition’ in their behaviour [66–68]. A few similar observations have already been made for QML models [68, 69], and a thorough exploration of QNNs capabilities in such highly overparametrized regime—including, among others, the question of whether this could be realized in a more effective way or with different qualitative behaviours compared to classical models—represents an interesting open research direction in general. We leave a complete analysis of its application in the context of NNPs for future investigations.
On a larger perspective, one may also envisage the use of QML methods on quantum data [20, 29, 70] retrieved from experiments or quantum chemistry simulations, e.g. in the form of quantum wavefunctions generated through variational [41, 71] or dynamical methods. The extraction of physical/chemical properties and the classification of materials directly at the quantum level of description could then likely represent one of the most advanced and exploratory efforts in the quest for quantum advantage with QML.
Data availability statement
The data and code that support the findings of this study are available upon reasonable request from the authors.
Acknowledgments
We thank Amira Abbas, Julian Schuhmacher and Stefan Wörner for insightful discussions. We acknowledge financial support from the Swiss National Science Foundation (SNF) through the Grant No. 200021-179312. O K acknowledges financial support through a scholarship from the Dominik and Patrick Gemperlé Foundation.
IBM, the IBM logo, and ibm.com are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. The current list of IBM trademarks is available at www.ibm.com/legal/copytrade.
Appendix A.: Mirroring
As we recalled in the main text, QNN models introduced in equation (2) can be expressed as partial Fourier Series [44]:

where Ω is the set of available frequencies and depends exclusively on the encoding map, while the coefficients cn
are determined by the trainable unitary gates and the observable. QNNs of this form become universal functional approximators in the limit
. In practice, we make use of the Fourier series interpretation of QNNs and of Fourier analysis to guide model design, for example by changing model complexity (in our case essentially determined by the depth D for fixed qubit number) to control the number of available frequencies.
In simple proof-of-principle experiments, such as LiH, we can also compare the model spectrum with the one of the data, although this is may not be possible in general. We also empirically find that better performances are obtained for periodic data sets, which intuitively correspond to finite frequencies spectra and hence reduce spurious oscillations in the final solution. We apply this intuition to the 1-dimensional LiH and Ar2 case, whose PES can be artificially made periodic by mirroring around r = 4.5 Å, (for LiH) and r = 6.3 Å (for Ar2) as shown in figure 7.
Figure 7. LiH data set (blue) and the corresponding mirrored extension (red).
Download figure:
Standard image High-resolution imageAppendix B.: LiH—molecular dynamics
The motivation behind both quantum and classical NNPs is ultimately to provide computationally effective access to highly accurate FFs to drive MD. In this section, we provide a first demonstration of a quantum NNP used to simulate the oscillations of a LiH molecule around its equilibrium position. We use the Velocity Verlet algorithm [72] to compute the time evolution of inter-atomic distance and velocity, starting from some given initial conditions. At each point, the forces are predicted with the trained QNN presented in section 3.1. For comparison, we also present results obtained with the classical NNP introduced in section 3.1 of the main text.
We start with the atoms at rest and close to each other, with an initial inter-atomic distance of 1.05 Å, and let the system evolve according to Netwon’s equations. The time evolution of the inter-atomic distance is shown in figure 8(a), where we compare the results obtained with classical and quantum NNPs to the exact solution obtained numerically. To assess the overall quality of the trajectories, we compute the frequency spectrum of the oscillations with a Fourier transform, which we apply to an evolution of 0.6 fs, repeated 1000 times and passed through an Hamming window. As reported in figure 8(b), the quantum and classical models can accurately reproduce the dominant frequencies.
Figure 8. Molecular dynamics of a LiH molecule with initial conditions
(Å) and
(at.u.). The exact solution (blue) is compared to the classical NNP (green) and to the quantum NNP (red) ones.
Download figure:
Standard image High-resolution imageAppendix C.: H3O+—internal coordinates
A hydronium molecule has
degrees of freedom. To describe its configurations, we used three bond lengths (r
, r
and r
), two angles (
and
) and one dihedral angle (d
). We used the following definition for the dihedral angle with atoms (ijkl):


where
is the distance vector between atom i and j. The above definitions and the formulas for their gradients can be found in [65].
Appendix D.: Explicit QNN construction
In this section we show the explicit construction of the QNN in the case of a H2O molecule. The data
consist of three Euclidean coordinates for each of the three atoms, leading to nine features per individual configuration. However, by taking into account the planar structure of the molecule, as well as rotation and translation symmetries, we can define a set of three independent internal coordinates. We choose them as the two O–H bond lengths and the H–H planar angle,

which we scale in
and preprocess to create the following 3-dimensional feature vector:

For each data point in the training set and for any fixed value of the trainable parameters Θ during the model learning process, we compute the QNN output, defined as (see equation (2)):

by executing the corresponding quantum circuit and measuring the expectation value of
. As discussed in the main text, see section 2.1, the QNN model unitary
is constructed as a repetition of trainable and data encoding layers. The feature map is chosen as:

where
is a collection of single-qubit Pauli-y rotations:

and
is an entangling operation. Here,
(
) correspond to Pauli matrices:



acting on the ith qubit, i.e.

Notice that equation (D5) is written explicitly for the 3-qubit model that was used for H2O, and it corresponds to the first set of operations highlighted in green in figure 2. As explained in section 3.2, we then used a full feature map with an order l = 3 interaction, i.e.

as explicitly depicted in the quantum circuit of figure 2 for N = 3. In particular, the order l = 2 operations (yellow in figure 2) are decomposed as:

while l = 3 ones (red in figure 2) read:

here,
is a z-rotation on qubit m, and
denotes a CNOT [21] operation between qubits j and k.
The trainable layers follow the same structure, except the data entries of the form yj
,
and
are replaced with distinct parameters θm
, where m is an index assigned to each trainable gate.




















