1 Introduction

Adapting already-known machine learning algorithms to quantum computing is proven to bring advantages. These advantages are shown in the final accuracy of the model or even in simplifying parameters. Grant et al., for instance, show these accuracy advantages in their work, which evaluates different quantum computing machine learning algorithms [1]. Other authors that evidence these advantages are Biamonte et al., who provide an overview of different quantum machine learning algorithms [2]. Experimentally, other quantum achievements have gained merit over their classical counterparts [3, 4]. There is also a benefit of reducing the complexity, as Pastorello et al. in [5] proposed with a simple swap test equivalent to a two-layer feed-forward neural network.

Having algorithms that are proven to work better in some aspects than the classical algorithms is indeed interesting news [6,7,8,9,10]. However, when you consider the real-world scenario—open quantum systems, the expected behavior is not the actual one. Piatrenka et al. demonstrate that some algorithms running it in a real quantum computer can lead to less accurate results [11]. Fortunately, this issue can be circumvented if we explore the unavoidable interaction between system and environment in the open quantum system. From this broader vision, intelligent quantum algorithms based on open quantum systems emerge. One example is Korkmaz et al., who propose a scheme where the system is coupled to thermal reservoirs. The input data are encoded in these reservoirs, and it is possible to show that this system works as a natural classifier that resembles the perceptron from Rosenblatt [12, 13]. As done by Korkmaz et al., investigating options of open quantum algorithms for machine learning is one alternative to work around the limitations of the difference between the expected behavior versus the actual behavior of quantum computing.

Several other works follow this paradigm. Wang et al. propose a classifier that consists in a QuBit of interest that gets influenced by the dissipative dynamics of other auxiliary QuBits dispersed on the environment [14]. Still, on the idea of using quantum dissipative models, Korkmaz et al. bring a full study about the learning task on a quantum classifier based on collisions [15]. They show that the model is suitable for gradient descendent training. Zhang et al. bring another approach by trying to use the environment as the influence from each instance attribute. Spreading the numerical normalized value of each attribute of the instance, the environment gets combined into a one-QuBit system using a unitary operator derived from the Hamiltonian of the system which outputs the chance of that instance belonging to class 0 or 1 [16]. In this work, the model from Zhang et al. work will be referred to as IQC, short for Interactive Quantum Classifier. In the same direction as the use of open quantum systems as a classifier model, Türkpençe et al. suggest to use of different quantum reservoirs weakly coupled to each other. These quantum reservoirs represent the information of the input data. The authors prove mathematically that this open quantum system reaches a steady state that behaves as a nonlinear activation function [17]. Türkpençe also proposes an open quantum neuron that utilizes thermal reservoirs to be activated. For this classifier, the temperature of these reservoirs would carry the information of the input data. After some time, the steady state will contain the activation of this neuron [18]. Korkmaz et al. also propose a classifier that bases itself on the quantum collision of the system that is induced by thermal reservoirs [19]. In their model, the input data are also loaded through the temperature control of the reservoirs. Korkmaz et al. address a gradient descent algorithm to train an open quantum classifier that uses a dissipative protocol to classify data. With the experiments, they prove that the traditional way of training machine learning models also works for open quantum classifiers [20, 21].

Another aspect of open quantum systems that is also relevant for the algorithms executions is the system entanglement. To better understand it, Korkmaz et al. studied the evolution of the entanglement in an open neural network execution and demonstrated that it is possible to control the entanglement lifetime of a system by engineering the reservoir states and the initial states of the target system [22]. Ballarin et al. also dedicated their time to investigating the entanglement of multiple approaches to adapt the classical neural network to a quantum one [23]. An analysis of open quantum system machine learning models on real-world datasets, bringing the accuracy performance, entanglement speed, expressibility, and distribution of eigenvalues, was presented.

In this work, we propose to change the way the input information is loaded into the classifier proposed by Zhang et al. In our proposal, the variable information is embedded in the probability amplitude of the quantum states of the system; the parameters to be adjusted continue to be part of the Hamiltonian. This is another way of exploring the construction of open quantum systems applied in the area of machine learning, which has shown an improvement in the quality of classification for several databases.

This work is divided as follows: in Sect. 2, we present in more detail the work proposed by Zhang et al. In Sect. 3, the proposed model is presented. The experimental protocol to be carried out is detailed in Sect. 4. The results found are described in Sect. 5. Conclusions and future works are presented in Sect. 6.

2 Interactive quantum classifier (IQC)—an open quantum system neuron

Using the idea that it is possible to control the environmental impact on a specific quantum system, Zhang et al. propose a binary classification neuron [16]. The concept is rather simple: using a unitary operator defined by the Hamiltonian of the system that combines both the attributes of the instance and a custom vector of weights that are to be learned by the classifier.

To build the classifier, Zhang et al. [16] propose to load the data of each instance into a unitary operator that is constructed by the Hamiltonian of the open quantum system, which will control the interaction between the system and the environment (see Fig. 1). The model has a simple one-QuBit system (\(\left| \psi _\mathrm{{sys}}\right\rangle \), Eq. 1) that interacts with \(\mathrm{{log}}_{2}N\) QuBits environment (\(\left| \psi _\mathrm{{env}}\right\rangle \), Eq. 2), where N is the number of attributes of each system, both initialized on an equal probability amplitude superposition states.

Fig. 1
figure 1

Zhang et al. circuit that depicts the interaction between the system and the environment throughout a unitary operator

$$\begin{aligned} \left| \psi _\mathrm{{sys}}\right\rangle = \frac{1}{\sqrt{2}}\left| 0\right\rangle + \frac{1}{\sqrt{2}}\left| 1\right\rangle \end{aligned}$$
(1)
$$\begin{aligned} \left| \psi _\mathrm{{env}}\right\rangle = \frac{1}{\sqrt{N}}\left| 0\right\rangle + \frac{1}{\sqrt{N}}\left| 1\right\rangle + \dots + \frac{1}{\sqrt{N}}\left| N-1\right\rangle \end{aligned}$$
(2)

Since a one-QuBit system is being combined with a \(log_{2}N\) QuBits environment, a \(1 +\mathrm{{log}}_{2}N\) QuBits unitary operator is needed from the Hamiltonian. Therefore, Zhang et al. reduce the unitary operator into Eq. 3, where \(\tau = gt\), g is a coupling constant, and t is a normalized vector of the attributes of the instance.

$$\begin{aligned} U(\tau ) = e^{i\sigma ^{Q}\otimes \sigma ^{E}(\tau )} \end{aligned}$$
(3)

In this way, it is needed one \(\sigma ^{Q}\) related to the original system, and one \(\sigma ^{E}\) related to the environment. \(\sigma ^{Q}\) is defined by Eq. 7, and \(\sigma ^{E}\) by Eq. 8.

$$\begin{aligned} \sigma ^X = \begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix} \end{aligned}$$
(4)
$$\begin{aligned} \sigma ^Y = \begin{bmatrix} 0 & -i \\ i & 0 \end{bmatrix} \end{aligned}$$
(5)
$$\begin{aligned} \sigma ^Z = \begin{bmatrix} 1 & 0 \\ 0 & -1 \end{bmatrix} \end{aligned}$$
(6)
$$\begin{aligned} \sigma ^{Q} = \sigma ^{X} + \sigma ^{Y} + \sigma ^{Z} = \begin{bmatrix} 1 & 1 - i \\ 1 + i & -1 \end{bmatrix} \end{aligned}$$
(7)
$$\begin{aligned} \begin{aligned} \sigma ^{E}(\tau ) = \begin{bmatrix} w_1\tau _1 & & & & \\ & & \ddots & & \\ & & & & w_n\tau _n \end{bmatrix} \end{aligned} \end{aligned}$$
(8)

Combining all the elements and generating the unitary operator, one can calculate the evolution of the system, which is described in Eq. 9, where \(\rho _\mathrm{{cog}}\) is the density matrix of the system and \(\rho _\mathrm{{env}}\) is the density matrix of the environment. Also, since the system contains one QuBit, the result \(\varepsilon (\rho _\mathrm{{cog}})\) will end up being a binary classification, as shown in Eq. 9, where \(p_0^2\) and \(p_1^2\) indicate the probability of belonging to class 0 and class 1, respectively. In this sense, the instance is assigned to class 0 in case \(p_0 \ge p_1\) and to class 1 otherwise.

$$\begin{aligned} \varepsilon (\rho _\mathrm{{cog}}) = \mathrm{{tr}}_\mathrm{{env}}[U(\rho _\mathrm{{cog}} \otimes \rho _{env})U^{\dagger }] = p^2_{0}\left| 0\right\rangle \left\langle 0\right| + p^2_{1} \left| 1\right\rangle \left\langle 1\right| \end{aligned}$$
(9)

This classifier can be trained by updating the weights vector used in the \(\sigma ^{E}\) component, by doing a simplified version of the delta rule after interacting with each training instance, following Eq. 10.

$$\begin{aligned} w_{i, k+1} = w_{i, k} + \eta (z_{j} - y_{j})(1 - p_{1}^{2})x_{i} \end{aligned}$$
(10)

In Eq. 10:

  • \(w_{i, k}\) = weight related to the ith attribute, at epoch k.

  • \(\eta \) = learning rate.

  • \(z_{j}\) = actual class of the jth training element.

  • \(y_{j}\) = predicted class of the jth training element.

  • \(p_{1}\) = probability of the jth training element belonging to the class 1.

  • \(x_{i}\) = ith attribute value.

Zhang et al. also suggest normalizing the data used to train and evaluate the classifiers between \(-1\) and 0. In their work, both Iris [24] and Wine [25] datasetsFootnote 1 are used, and both datasets are described in Sect. 4.1. The pseudo-code for IQC used in this paper is described in Algorithm 1. The whole code is available in this GitHub repository.

figure a

3 Interactive quantum classifier with amplitude information loading (IQC-AIL)

In this work, it is proposed a quantum neuron that follows the idea described before in Sect. 2, but splitting the \(\sigma ^{E}\) (Eq. 8) into two: the attributes of the instances are loaded in the density matrix of the environment (\(\rho _\mathrm{{env}}\), Eqs. 11 and 12), and the new \(\sigma ^{E}\) is now a diagonal matrix containing only the calculated weights (Eq. 13). Since this consists of loading the problem information on the environment amplitude, the proposed model will be referred to as IQC-AIL, short for Interactive Quantum Classifier with Amplitude Information Loading, later in this work.The classification in IQC-AIL is done in the same way as in IQC—by evaluating \(p_0\) and \(p_1\). The pseudo-code for IQC-AIL used in this paper is described in Algorithm 2. The whole code is available in this GitHub repository.

$$\begin{aligned} \rho _\mathrm{{env}} = \left| \psi _\mathrm{{env}}\right\rangle \left\langle \psi _\mathrm{{env}}\right| \end{aligned}$$
(11)
$$\begin{aligned} \left| \psi _\mathrm{{env}}\right\rangle = \frac{x_{0}}{\mathrm{{norm}}(x)}\left| 0\right\rangle + \frac{x_{1}}{\mathrm{{norm}}(x)}\left| 1\right\rangle + \dots + \frac{x_{N}}{\mathrm{{norm}}(x)}\left| N\right\rangle \end{aligned}$$
(12)
$$\begin{aligned} \begin{aligned} \sigma ^{E}(\tau ) = \begin{bmatrix} w_1 & & & & \\ & & \ddots & & \\ & & & & w_n \end{bmatrix} \end{aligned} \end{aligned}$$
(13)
figure b

3.1 Complexity and state initialization

The proposed model deals with the loading of information in the probability amplitude of quantum states. To do so, one algorithm that can be used is the state preparation that uses the uniformly controlled rotations defined by Möttönen et al. [27], which is also used by popular quantum developing libraries such as Qiskit [28]Footnote 2.

Regarding complexity, the height of the circuit would be \(1 + log_{2}N\), since it is necessary one QuBit for the binary classification system and \(\mathrm{{log}}_{2}N\) QuBits in the environment to describe the instance. Analyzing the width of the circuit, it would be needed a combination of O(4n), where n is the number of QuBits, for the state preparation—since it is the combination of four rotation gates per QuBit: 1 \(R_{Z}\) + 2 \(R_{Y}\) + 1 \(R_{Z}\)—and the Hamiltonian operator. The latter, however, needs to be decomposed into simpler quantum gates, and this decomposition can be done using quantum Shannon decomposition (QSD) [29], for example, that produces \(O\frac{3}{4}(4^{n})\) gates [30]. QSD is also used by QiskitFootnote 3, but any other decomposition, such as the ZYZ, or rotated versions [30]. Thus, if using the QSD, the width of the circuit would result in a \(O(\frac{3}{4}(4^{n}) + 4n)\) gates.

4 Model implementation

This work implemented the IQC and IQC-AIL models and their training algorithms using the Numpy library in version 1.23.5, Scipy library in version 1.10.1, Sklearn library in version 1.2.2, and Toqito library in version 1.0.4 of Python in version 3.9.16.

Both classifiers described in Sect. 2 went through training on the real-world scenarios of the Iris dataset, Wine dataset, Pima Indians diabetes dataset and Caesarian section classification dataset, apart from seven toy problems: Blobs, Circles, Ellipses, Moons, Stripes, and also a linear separable and a XOR problem. All databases are defined in Sect. 4.1. The sklearn library was used to create the synthetic databases [31]. It was used a stratified tenfold cross-validation method throughout 20 different seeds for each database, except for the Pima Indians diabetes dataset, in which 10 seeds were used due to the time taken to train and evaluate in this dataset. The average, the highest, and the minimum accuracy and F1 scores were taken into consideration. All input data were also normalized between 0 and 1 when used in the calculations, in two different ways: by column and by row—which differs from the experiment made by Zhang et al., since the authors used a normalization between -1 and 0. Another difference between this experiment and the one from Zhang et al. is that there were no batches used in any experiment. Besides, for the Iris dataset (Sect. 4.1.1), different combinations of \(\sigma ^{Q}\) were considered: having different weights for each one of the Pauli matrices (Eqs. 4, 5, and 6) plus the identity matrix (Eq. 14, in which a, b, c, and d are parameters), or a polar version of it (Eq. 15). tr(\(\cdot \)) is the matrix trace operation of linear algebra. In the polar version of the \(\sigma ^{Q}\), \(r_{x} = r \cdot \mathrm{{sin}}(\theta ) \cdot \mathrm{{cos}}(\phi )\), \(r_{y} = r \cdot \mathrm{{sin}}(\theta ) \cdot \mathrm{{sin}}(\phi )\) and \(r_{z} = r \cdot \mathrm{{cos}}(\theta )\), in which r, \(\theta \), and \(\phi \) are parameters. In this work, both IQC and IQC-AIL are also analyzed on two more real-world datasets and seven new toy problems.

$$\begin{aligned} \sigma ^{Q} = \frac{a \cdot \sigma ^{X} + b\cdot \sigma ^{Y} + c\cdot \sigma {Z} + d\cdot \mathbf {I_{n}}}{\mathrm{{tr}}(a\cdot \sigma ^{X} + b\cdot \sigma ^{Y} + c\cdot \sigma ^{Z} + d\cdot \mathbf {I_{n}})} \end{aligned}$$
(14)
$$\begin{aligned} \sigma ^{Q} = \frac{\mathbf {I_{n}} + (r_{x}\cdot \sigma ^{X} + r_{y}\cdot \sigma ^{Y} + r_{z}\cdot \sigma {Z})}{2} \end{aligned}$$
(15)

The research for the best \(\sigma ^{Q}\) combination was done by a randomized search cross-validation, selecting 100 different combinations evaluated in a tenfold cross-validation using accuracy as score. The values for the weighted \(\sigma ^{Q}\) (Eq. 14) research varied in two subsets: one having integer values from 0 to 15 (\(a, b, c, d \in [0..15])\)) and another one with float values from 0 to 1 (\(a, b, c, d \in [0, 1]\)). As for the polar version of \(\sigma ^{Q}\) (Eq. 15), there was one subset of float values from 0 to 1 and 0 to \(\pi \) (\(r \in [0, 1]\), and \(\theta , \phi \in [0, \pi ]\))

All datasets were split into two sets: the training subset and the test subset. The training subset corresponds to 70% of the dataset and is used for training the classifier. The test data—the remaining 30% of the dataset, 40 instances—is used for the classifier evaluation. For the Iris dataset, the bipartite entanglement between the system (\(\rho _\mathrm{{cog}}\)) and the environment (\(\rho _\mathrm{{env}}\)) was evaluated using the average negativity (\(\textit{N}\), Eq. 16) of the classifier output density matrix when evaluating the test data. The \(\rho _\mathrm{{cog}}\) is a system with one QuBit, and \(\rho _\mathrm{{env}}\) a system with two QuBits (\(log_{2}4 = 2\)). Since the Iris dataset is a multiclass problem, it was considered the average of the negativity of all classes. The average negativity per class was also taken into consideration. The negative is defined in Eq. 16, where \(\rho ^{\Gamma _\mathrm{{sys}}}\) is the partial transpose of \(\rho \) related to the system of interest, and \(\Vert X \Vert \) is the trace norm of X.

$$\begin{aligned} N(\rho ) = \frac{\Vert \rho ^{\Gamma _\mathrm{{sys}}}\Vert _{1} - 1}{2} \end{aligned}$$
(16)

To find out about the neuron functioning, both the IQC and the IQC-AIL were evaluated in a simple linear separation artificial problem, and its decision area was drawn. The computer used in all stages of the experiments had the following configuration: i5-11400 H, 8GB Ram 3200mhz DDR4, and an Adata SU650 M.2 2280, running Windows 11 Version 22H2. The code is available in this GitHub repository.

4.1 Databases

The real-world databases used to test the quantum-inspired neurons are described in this section. To assert that the classifiers describe different distributions, the results were applied to the Student’s t test [32], Mann–Whitney U test [33], Wilcoxon signed-rank test [34] and Kruskal–Wallis H test [35], which for most of the tests, the distributions between the IQC-AIL and the IQC are determined as different distributions.

4.1.1 Iris dataset

The Iris dataset [24] consists of 150 instances of 4 features each: sepal length, sepal width, petal length, and petal width. The goal of this dataset is to define whether each flower is an Iris Setosa, Iris Versicolour, or Iris Virginica, and it contains 50 examples of each flower. For this experiment, a two-QuBit environment is used (\(\mathrm{{log}}_{2}4 = 2\)), with a learning rate of 0.001.

4.1.2 Wine dataset

The Wine dataset [25] consists of 178 instances of 13 features each: (1) alcohol, (2) malic acid, (3) ash, (4) alkalinity of ash, (5) magnesium, (6) total phenols, (7) Flavanoids, (8) nonflavanoid phenols, (9) proanthocyanins, (10) color intensity, (11) hue, (12) 0D280 and 0D315 of diluted wines, and (13) proline. The goal of this dataset is to define whether each of the three types of wine the evaluated wine is, and it contains 59 samples of class 1 wine, 71 samples of class 2 wine, and 48 samples of class 3 wine. For this experiment, a four-QuBit environment is used (\(\lceil (\mathrm{{log}}_{2}13)\rceil = 4\)), and for the left three values, a fixed 1-bias was used, and a learning rate of 0.009.

4.1.3 Pima Indians Diabetes Dataset

The Pima Indians diabetes dataset [36] consists of 768 instances of 8 features each: (1) number of times pregnant, (2) plasma glucose concentration a 2 h in an oral glucose tolerance test, (3) diastolic blood pressure (mmHg), (4) triceps skin fold thickness (mm), (5) 2-h serum insulin (\(\mu U/ml\)), (6) body mass index (weight in kg/(height in \(\mathrm{{m}}^2\)), (7) diabetes pedigree function, and (8) age (years), and has the goal of deciding whether the instance (a person) has or not the diabetes mellitus disease. Out of the 768 people, 500 do not have the disease and 268 have it. For this experiment, a three-QuBit environment is used (\(log_{2}8 = 3\)), with a learning rate of 0.009 (same as the wine dataset).

4.1.4 Caesarian section classification dataset

The Caesarian Section classification dataset [37] consists of 80 instances of 5 features each: (1) age, (2) delivery number, (3) delivery time (0 = timely, 1 = premature, 2 = latecomer), (4) blood pressure (0 = low, 1 = normal, 2 = high), and (5) heart problem. The goal of this dataset is to predict whether this woman needed a caesarian section surgery while giving birth. Out of the 80 people, 46 needed the surgery and 34 did not need it. For this experiment, a three-QuBit environment is used (\(\lceil (\mathrm{{log}}_{2}5)\rceil = 3\)), and for the remaining three values, a fixed 1-bias was used, with a learning rate of 0.01.;

4.1.5 Artificial databases

All artificial databases are common toy problems used to benchmark some artificial intelligence models and have specific shapes that contain two attributes and are a binary classification problem. These shapes are:

  • Blobs dataset: depicted in Fig. 2a, b, normalized by column and row, respectively;

  • Circles dataset: a dataset where the instances organization resembles circles contained by another one from another class. Depicted in Fig. 3a, b, normalized by column and row, respectively;

  • Ellipses dataset: a dataset where the instances organization resembles ellipses contained by another one from another class. Depicted in Fig. 4a, b, normalized by column and row, respectively;

  • Linear separable problem: a problem that is solvable by a single line splitting classes. Depicted in Fig. 5a, b, normalized by column and row, respectively;

  • Moons dataset: a dataset where the instances organization resembles two half moons. Depicted in Fig. 6a, b, normalized by column and row, respectively;

  • Stripes dataset: a dataset that consists of instances spread through stripes. Depicted in Fig. 7a, b, normalized by column and row, respectively;

  • XOR problem: recreates the XOR problem. Depicted in Fig. 8a, b, normalized by column and row, respectively;

Fig. 2
figure 2

Toy problem named “Blobs” data distribution

Fig. 3
figure 3

Toy problem named “Circles” data distribution

Fig. 4
figure 4

Toy problem named “Ellipses” data distribution

Fig. 5
figure 5

Toy linear separable problem data distribution

Fig. 6
figure 6

Toy problem named “Moons” data distribution

Fig. 7
figure 7

Toy problem named “Stripes” data distribution

Fig. 8
figure 8

Toy XOR problem data distribution

5 Results

This section will display all results after running the experiments described in Sect. 4. The result for normalizing data both by column or by row will also be displayed on each table. Differently from Zhang et al. in [16], in this work, we analyze the performance of IQC and IQC-AIL classifiers in two more real-world datasets apart from Iris and Wine and seven artificial toy problems.

5.1 Iris dataset

Taking a look at the result for the Iris dataset (Tables 1 and 2), the experiments show that the IQC works when having a row normalized input—which is not the common way of normalizing the input, since different attributes have different scales: normalizing the age of someone with their height in meters, for example, does not make much sense. However, the IQC-AIL maintains its performance even when changing the normalizing approach.

Table 1 Highest, minimum, and average accuracy for Iris dataset after following the experiments described in Sect. 4.1.1
Table 2 Highest, minimum, and average F1 scores for Iris dataset after following the experiments described in Sect. 4.1.1

5.1.1 Exploratory Search Result

As the extensive search for better parameters when varying the weights on \(\sigma ^{Q}\) takes too long to finish, the research was done for the cases in which the input data is normalized by column. However, for both neurons—the IQC and IQC-AIL, the best combination was the original one: using \(\text {learning rate} = 0.01\) and the \(\sigma ^{Q}_\mathrm{{weights}} = [1, 1, 1, 1]\) for the IQC-AIL, or \(\sigma ^{Q}_\mathrm{{weights}} = [1, 1, 1]\) for the IQC.

5.1.2 Negativity for iris dataset

In this section, we present the results of the negativity calculation experiments in the Iris database. The average negativity of the density matrix was taken on the evaluation of the test data. Figures 9 and 10 indicate that the negativity of the system is not related to its accuracy, but Fig. 11 depicts that the negativity stabilizes after a certain time training, following a similar pattern to the accuracy graph. Figure 11 also shows that there is no direct relation between the training and the increase in the system entanglement. The results for the negativity bring no direct intuition apart from it being random—which matches with the results from Mangini in [23], in the sense that it would be needed more layers to have a stabilized negativity for all cases.

Fig. 9
figure 9

Evolution of negativity against accuracy, analyzing class by class, after running Iris dataset throughout the experiments described in Sect. 4

Fig. 10
figure 10

Evolution of negativity against accuracy after running Iris dataset throughout the experiments described in Sect. 4. Each Figure relates to either the IQC or IQC-AIL and uses column or row normalization

Fig. 11
figure 11

Evolution of negativity against epochs, analyzing class by class, after running Iris dataset throughout the experiments described in Sect. 4

5.2 Wine dataset

For the Wine dataset, the performance of the IQC-AIL for column normalization overcame every other combination for both accuracy and F1 score. Its behavior of better performing when having column normalization is repeated, but now the IQC behaves the same when having either column or row normalization (Tables 3 and 4).

Table 3 Highest, minimum, and average accuracy for Wine dataset after following the experiments described in Sect. 4.1.2
Table 4 Highest, minimum, and average F1 scores for Wine dataset after following the experiments described in Sect. 4.1.2

5.3 Pima Indians diabetes dataset

The behavior for the Pima Indians diabetes (Tables 5 and 6) dataset follows the same pattern as the Wine dataset: having the IQC-AIL with column normalization outperforming every other configuration. In this case, however, the IQC has an increasing performance for the row normalization, which resembles its behavior for the Iris dataset.

Table 5 Highest, minimum, and average accuracy for Pima Indians diabetes dataset after following the experiments described in Sect. 4.1.3
Table 6 Highest, minimum, and average F1 scores for Pima Indians diabetes dataset after following the experiments described in Sect. 4.1.3

5.4 Caesarian section classification dataset

Nearly following the previous results, the IQC-AIL with column normalization outperforms the other ones in almost every metric—except for the minimum accuracy. The results Caesarian Section classification dataset (Tables 7 and 8) mirror the one from the Wine dataset—in which the results when using column or row normalization are the same for the IQC.

Table 7 Highest, minimum, and average accuracy for Caesarian Section classification dataset after following the experiments described in Sect. 4.1.4
Table 8 Highest, minimum, and average F1 scores for Caesarian Section classification dataset after following the experiments described in Sect. 4.1.4

5.5 Artificial datasets

Out of the seven artificial datasets, the IQC-AIL had a better performance on three: Blobs (Table 9), Circles (Table 10), and Stripes (Table 14). The IQC went better on Ellipses (Table 11) and Moons (Table 13) datasets, and the XOR and linear separable problems (Tables 15 and 12). However, issues that are considered trivial, such as the linear separable problem, were not effectively solved by either of the models.

Table 9 Highest, minimum, and average accuracy for Blobs dataset after following the experiments described in Sect. 4.1.5
Table 10 Highest, minimum, and average accuracy for Circles dataset after following the experiments described in Sect. 4.1.5
Table 11 Highest, minimum, and average accuracy for Ellipses dataset after following the experiments described in Sect. 4.1.5
Table 12 Highest, minimum, and average accuracy for linear separable problem after following the experiments described in Sect. 4.1.5
Table 13 Highest, minimum, and average accuracy for Moons dataset after following the experiments described in Sect. 4.1.5
Table 14 Highest, minimum, and average accuracy for Stripes dataset after following the experiments described in Sect. 4.1.5
Table 15 Highest, minimum, and average accuracy for XOR problem after following the experiments described in Sect. 4.1.5

5.6 Analyzing decision area

Aiming for a better understanding of the functioning of the model, its decision area after the training for a linearly separable problem is drawn in Fig. 12. It is possible to see that, for a two-dimensional classification problem, the decision area from the IQC has a more complex shape, which is repeated with a certain interval. The IQC-AIL, however, introduces a more simple shape—four cones sharing the center of the graph. During the studies, the weights of the \(\sigma ^{Q}\) were changed one by one to investigate its influence on the decision area, but no pattern was observed from the experiments. It also did not change the angle of the lines that shape the cones for the IQC-AIL (Figs. 12c, d). The result of this variation can be seen in the GitHub as a GIF, available on this link for the IQC—varying either the weights on \(\sigma _I\), \(\sigma _X\), \(\sigma _Y\) or \(\sigma _Z\)—and available on this link for the IQC-AIL, similar to what is done for the IQC. For this training, both models had a 0.01 learning rate, no batches, and trained up to 1000 epochs. Although it is not possible to visualize the decision region in larger spaces, it is possible to imagine that the IQC-AIL classifier deals with more complex regions than those presented. This assumption comes from the fact that the classifiers solved nonlinear problems, as shown in results in Sects. 5.1, 5.2, 5.3, 5.4, and 5.5.

Fig. 12
figure 12

Region of decision of both IQC-AIL and IQC, when training on the linear separation dataset defined in Sect. 4.1.5, using a 0.01 learning rate, no batches, and training up to 1000 epochs, using either row or column normalization

5.7 Comparing results

The results for the Iris dataset were better when compared to the other datasets for both models. This is probably due to the simple nature of the problem. On the Iris dataset, the classes are easily separable from each other, even using the regions depicted in Sect. 5.6. This simplicity is shown in Fig. 13, which shows a 2-PCA distribution of the Iris dataset. The 2-PCA is reliable enough to show the simplicity of the problem since the first two main attributes carry circa 95.81% of its variance.

Fig. 13
figure 13

A 2-PCA (principal component analysis [38]) of the Iris dataset

When comparing the real-world databases, the IQC-AIL had better performance than the IQC. However, when analyzing the toy problems, the IQC-AIL outperformed in 2 out of 7 databases. Considering every situation analyzed in this work, the IQC-AIL had better results in 6 out of 11 problems. If considering only column normalization—the most common way of normalizing data—the number of cases in which IQC-AIL had better results grows to 8 out of 11 problems. The comparison between both classifiers is made in Table 16.

Table 16 Summary of results without using the exponential kernel

6 Conclusion

In this work, we proposed an open quantum system-inspired neuron that loads the information in the environment amplitude. The model handles the interaction between a 1 QuBit system and the environment with a unitary gate built based on the Hamiltonian of the whole system. We also compared the proposed model with the classifier from Zhang et al. model in [16]. For the analysis, both classifiers were evaluated in seven toy problems and four real-world datasets, having them normalized both by column and row. Out of these, the IQC-AIL outperformed the one based on the Zhang et al. classifier in 8 out of 11 problems (considering the column normalization—the most common way of normalizing data): all four real-world datasets, and four of the toy problems, proving to be a valid option as a classifier. For the negativity analysis on the Iris Dataset, no pattern was observed in the results. This also indicates that the entanglement is not directly related to the accuracy.

When comparing to other related works, although the methodology differs from author to author, the IQC-AIL outperformed the classifier from Schuld et al. in [39]: the distance-based classifier from Schuld et al. scored an average accuracy of 0.911, against the 0.9360 of the IQC-AIL. The quantum nearest centroid from Johri et al. in [40] reached 84% of accuracy, also being outperformed by IQC-AIL. When comparing with the consecutive rotation variational classifier from Adhikary et al. in [41], the IQC-AIL also goes better, since their classifier scored 81.99% average classifier. However, when comparing to the Qiskit simulation from the variational quantum classifier with different configurations from Piatrenka et al. in [11], the IQC-AIL is outperformed by 16 of their configurations and practically matches the other 17, scoring better by 0.003. Both the IQC and IQC-AIL also went under Student’s t test [32], Mann–Whitney U test [33], Wilcoxon signed-rank test [34] and Kruskal–Wallis H test [35], asserting that it corresponds to statistically different results, meaning that they are not describing the same behavior. When it comes to entanglement, no patterns were observable for none of those models. This also means that, according to the experiments, the entanglement does not have a relation with the time spent training nor the accuracy of the model.

For future works, the matrices used to build up \(\sigma ^{Q}\) in Eq. 14 could be changed to use other matrices apart from the Pauli ones (Eqs. 4, 5 and 6)—or maybe not use any of the Pauli matrices. It could also be studied the usage of kernels to map the input data. To study the impact of having multiple QuBits in the system (instead of 1, for the binary classification), to have multi-classification in one single neuron is another path of future works. Another study is to have a real implementation of the models in one quantum computer, which would bring validation of the classifier. Analyzing the behavior in a real thermal reservoir, considering the interference of the environment in the state preparation is another interesting way for future works.