Universal Approximation Property via
Quantum Feature Maps
Quoc Hoan Tran*,Takahiro Goto*, and Kohei Nakajima
Physical Intelligence Lab, UTokyo
QTML2021
arXiv:2009.00298
Phys. Rev. Lett. 127, 090506 (2021)
(*) Equal contribution
The future of intelligent computations
2/20
Bits
Qubits
Neurons Quantum
mind
Quantum computing
High Efficiency + Fragile
Neural networks
Versatile + Adaptive + Robust
Two rising fields can be complementary
Converge to an adaptive platform:
Quantum Neural Networks
QNN via Quantum Feature Map[1,2]
3/20
[1] V. Havlíček et al., Supervised Learning with Quantum-Enhanced Feature Spaces, Nature 567, 209 (2019)
[2] M. Schuld and N. Killoran, Quantum Machine Learning in Feature Hilbert Spaces, Phys. Rev. Lett. 122, 040504 (2019)
[3] Y. Liu et al., A Rigorous and Robust Quantum Speed-Up in Supervised Machine Learning, Nat. Phys. 17, 1013 (2021)
Input Space 𝒳
Quantum
Hilbert space
𝒙
|Ψ 𝒙 ⟩
Access via Measurements
or Compute Kernel
Or at least the same
expressivity as classical
𝜅 𝒙, 𝒙′ = ⟨Ψ(𝒙)|Ψ(𝒙′)⟩
Heuristic
Quantum Circuit Learning
(K. Mitarai, M. Negoro,M. Kitagawa,
and K. Fujii, Phys. Rev.A 98, 032309)
◼ For a special problem, a formal quantum advantage[3] can
be proven (but is neither “near-term” nor practical)
◼ We expect quantum feature maps to be more
expressive than classical counter parts
Universal Approximation Property (UAP)
4/20
𝑥1
𝑥2
𝑤1
𝑤2
𝑤𝐾
𝑦
◼ Feed-forward NN with one hidden layer can approximate any
continuous function* on a closed and bounded subset of ℝ𝑑
to
arbitrary 𝜀 (Hornik+1989, Cybenko+1989)
* This belongs to a larger class of any Borel measurable function (beyond this talk)
➢ Given enough hidden units with any
Squashing Activation Functions (Step,
Sigmoid,Tanh, Relu)
➢ Extensive extensions to non-continuous
activation function, resource evaluation
UAP of Quantum ML models
5/20
Data Re-uploading
Adrián Pérez-Salinas et al., Data re-uploading for a universal quantum
classifier, Quantum 4, 226 (2020)
Expressive power via a partial
Fourier series
M, Schuld, R. Sweke and J. J. Meyer, Effect of data encoding on the expressive power
of variational quantum-machine-learning models, Phys. Rev.A 103, 032430 (2021)
𝑓𝜽 𝒙 = ෍
𝜔∈Ω
𝑐𝜔 𝜽 𝑒𝑖𝜔𝒙
Our simple but clear questions
◼ UAP in the language of observables
and quantum feature maps
◼ How well the approximation could
perform?
◼ How many parameters it needs (vs.
classical NNs)?
QNN via Quantum Feature Map
◼ Choosing W is equivalent to selecting suitable observables
6/20
◼ The output is a linear expansion of the basis functions with a
suitable set of observables
𝜓𝑖 𝒙 = Ψ 𝒙 𝑂𝑖 Ψ 𝒙 𝑓 𝒙; Ψ, 𝑶, 𝒘 = ෍
𝑖=1
𝐾
𝑤𝑖𝜓𝑖 𝒙
[1]V. Havlíček et al., Supervised Learning with Quantum-Enhanced Feature Spaces, Nature 567, 209 (2019)
𝑤𝛽 𝜽 = Tr 𝒲† 𝜽 𝑍⊗𝑁𝒲 𝜽 𝑃𝛽 𝑃𝛽 ∈ 𝐼, 𝑋, 𝑌, 𝑍 ⊗𝑁
𝜓𝛽 𝒙 = Tr Ψ 𝒙 Ψ 𝒙 𝑃𝛽
Pauli-group
Basis functions (expectation)
E.g., Decision function[1] sign
1
2𝑁 ෍
𝛽
𝑤𝛽 𝜽 𝜓𝛽 𝒙 + 𝑏
UAP via Quantum Feature Map
𝑓 𝒙; Ψ, 𝑶, 𝒘 = ෍
𝑖=1
𝐾
𝑤𝑖𝜓𝑖 𝒙 ≈ 𝑔 𝒙 ?
Given a compact set 𝒳 ∈ ℝ𝑑
, for any
continuous function 𝑔: 𝒳 → ℝ, can we
find 𝚿, 𝑶, 𝒘 s.t
7/20
UAP. Let 𝒢 be a space of continuous functions 𝑔: 𝒳 → ℝ.The quantum feature framework
ℱ = {𝑓 𝒙; Ψ, 𝑶, 𝒘 } has the UAP w.r.t. 𝒢 and a norm ⋅ if given any 𝑔 ∈ 𝒢; then for any
𝜀, there exists 𝑓 ∈ ℱ s.t. 𝑓 − 𝑔 < 𝜀
Classification capability. For arbitrary disjoint regions 𝒦1, 𝒦2, … , 𝒦𝑛 ⊂ 𝒳, there exists
𝑓 ∈ ℱ s.t. 𝑓 can separate these regions (can be induced from UAP)
𝒳
ℝ
𝒦1
𝒦2
𝑓 ∈ ℱ
UAP - Scenarios
Parallel
Sequential
◼ Encode the “nonlinear” property in the basis functions for UAP
1. Using appropriate observables to construct a polynomial
approximation
2. Using appropriate activation functions in pre-processing 8/20
Parallel scenario
Approach 1. Produce the “nonlinear” by suitable observables
◼ Basis functions can be polynomials if we choose the
correlated measurement operators
◼ Then we apply the Weierstrass’s polynomial approximation
theorem
𝑈𝑁 𝒙 = 𝑉1 𝒙 ⊗ ⋯ ⊗ 𝑉𝑁 𝒙
𝑉
𝑗 𝒙 = 𝑒−𝑗𝜃𝑗 𝒙 𝑌 𝜃𝑗 𝒙 = arccos( 𝑥𝑘) for 1 ≤ 𝑘 ≤ 𝑑, 𝑗 ≡ 𝑘(mod 𝑑)
𝑂𝒂 = 𝑍𝑎1 ⊗ ⋯ ⊗ 𝑍𝑎𝑁, 𝒂 ∈ 0,1 𝑁
Observables
Basis functions 𝜓𝒂 𝑥 = Ψ𝑁 𝒙 𝑂𝒂 Ψ𝑁 𝒙 = ෑ
𝑖=1
𝑁
2𝑥[𝑖] − 1
𝑎𝑖
Feature map: Ψ𝑁(𝒙) = 𝑈𝑁 𝒙 0 ⊗𝑁
1 ≤ 𝑖 ≤ 𝑑
𝑖 ≡ 𝑖 (𝑚𝑜𝑑 𝑑)
9/20
Parallel scenario
Approach 1. Produce the “nonlinear” by suitable observables
Lemma 1. Consider a polynomial 𝑃(𝒙) of the input 𝒙 = 𝑥1, 𝑥2, … , 𝑥𝑑 ∈ 0,1 𝑑 where the
degree of 𝑥𝑗 in 𝑃(𝒙) is less than or equal to
𝑁+(𝑑−𝑗)
𝑑
for 𝑗 = 1, … , 𝑑 𝑁 ≥ 𝑑 ; then there exists a
collection of output weights {𝑤𝒂 ∈ 𝑅|𝒂 ∈ 0,1 𝑁} s.t.
෍
𝒂∈ 0,1 𝑁
𝑤𝒂 𝜓𝒂 𝒙 = 𝑃(𝒙)
Result 1. (UAP in the parallel scenario). For any continuous function 𝑔: 𝒳 → 𝑅
lim
𝑁→∞
min
𝒘
෍
𝒂∈ 0,1 𝑁
𝑤𝒂𝜓𝒂 𝒙 − 𝑔(𝒙)
∞
= 0
※The number of observables ≤ 2 +
𝑁−1
𝑑
𝑑
10/20
Parallel scenario
Approach 2. Activation function in pre-processing (straight-forward)
[Huang+, Neurocomputing, 2007] 𝜎: 𝑅 → [−1,1] is nonconstant piecewise continuous function
such that 𝜎(𝒂 ⋅ 𝒙 + 𝒃) is dense in 𝐿2(𝑋).Then, for any continuous function 𝑔: 𝑋 → 𝑅 and any
function sequence 𝜎𝑗 𝒙 = 𝜎 𝒂𝑗 ⋅ 𝒙 + 𝑏𝑗 where 𝒂𝑗, 𝑏𝑗 are randomly generated based on any
continuous sampling distribution,
lim
𝑁→∞
min
𝒘
෍
𝑗
𝑁
𝑤𝑗𝜎𝑗 𝒙 − 𝑔(𝒙)
𝐿2
= 0
We obtain UAP if we can implement the squashing activation function in
pre-processing process (Result 2)
• 𝑈𝑁 𝒙 = 𝑉1 𝒙 ⊗ ⋯ ⊗ 𝑉𝑁 𝒙
• 𝑉
𝑗 𝒙 = 𝑒−𝑗𝜃𝑗 𝒙 𝑌
• 𝜃𝑗 𝒙 = arccos(
1+𝜎(𝒂𝑗⋅𝒙+𝑏𝑗)
2
)
𝜓𝑗 𝑥 = Ψ𝑁 𝒙 𝑍𝑗 Ψ𝑁 𝒙 = ⟨0|𝑒𝑖𝜃𝑗𝑌
𝑍𝑒−𝑖𝜃𝑗𝑌
|0⟩
= 2 cos2(𝜃𝑗) − 1 = 𝜎 𝒂𝑗 ⋅ 𝒙 + 𝑏𝑗 = 𝜎𝑗(𝒙)
11/20
Sequential scenario with a single qubit
Finite input set 𝒳 = {𝒙1, 𝒙2, … , 𝒙𝑀}
𝑉 𝒙 = 𝑒−𝜋𝑖𝜃 𝒙 𝑌
→ 𝑉𝑛
𝒙 = 𝑒−𝜋𝑖𝑛𝜃 𝒙 𝑌
Basis functions with the Pauli-Z
𝜓𝑛 𝒙 = 0 𝑉𝑛 †
𝒙 𝑍𝑉𝑛
𝒙 0 = 2 cos2
(𝜋𝑛𝜃(𝒙))
= cos 2𝜋𝑛𝜃 𝒙 = cos(2𝜋{𝑛𝜃 𝒙 })
We can use the ergodic theory
of the dynamical system on M-
dimensional torus
Lemma 3. If real numbers 1, 𝑎1, … , 𝑎𝑀 are linearly independent over ℚ,
then 𝑛𝑎1 , … , 𝑛𝑎𝑀 𝑛∈ℕ is dense in 0, 1 𝑀
.
Result 3. If 1, 𝜃(𝒙1), … , 𝜃(𝒙𝑀) are linearly independent over ℚ, then for
any function 𝑔: 𝒳 → ℝ and for any 𝜀 > 0, there exists 𝑛 ∈ ℕ such that
𝑓
𝑛 𝑥 − 𝑔 𝑥 < 𝜀 for ∀𝑥 ∈ 𝒳
12/20
UAP – Approximation Rate
Classical UAP of DNN
inf
𝐰
𝑓
𝒘 − 𝑔 = 𝑂(𝐾−𝛽/𝑑
)
[1] Hrushiksh M Mhaskar. Neural networks for optimal approximation of smooth and analytic functions. Neural Computation, 8(1):164-167, 1996
Optimal rate[3]
[2] Dmitry Yarotsky. Error bounds for approximations with deep Relu networks. Neural Networks, 94:103-114, 2017
[3] Ronald A DeVore. Optimal nonlinear approximation. Manuscripta Mathematica, 63(4):469-478, 1989.
𝐾
𝜀
𝑂(𝐾−𝛼1)
𝑂(𝐾−𝛼2)
𝛼1 < 𝛼2
13/20
The following statement[1,2] holds for continuous activation
function (with 𝐿 = 2), and ReLU (with 𝐿 = 𝑂(𝛽log
𝐾
𝑑
))
➢ 𝑑: input dimension
➢ 𝐾: number of network parameters
➢ 𝛽: derivative order of g
➢ 𝐿: number of layers.
How to describe relative good and bad UAP?
Approximation Rate = Decay rate of appr. error vs.
(better rate)
UAP – Approximation Rate
Approximation Rate = Decay rate of appr. error vs.
➢ Number K of observables
➢ Input dimension d
➢ Number N of qubits
𝐾
𝜀
𝑂(𝐾−1/3𝑑)
𝑂(𝐾−1/𝑑)
14/20
In the parallel scenario: 𝐾 = 𝑂 𝑁𝑑
→ 𝜀 = 𝑂 𝐾−1/𝑑
Result 4. If 𝑋 = 0,1 𝑑 and the target function 𝑔: 𝑋 → 𝑅 is Lipschitz continuous w.r.t. the
Euclidean norm, we can construct an explicit form of the approximator to 𝑔 in the parallel
scenario by 𝑁 qubits with the error 𝜀 = 𝑂(𝑑7/6𝑁−1/3). Furthermore, there exist an
approximator with 𝜀 = 𝑂(𝑑3/2𝑁−1)
(optimal rate)
Sketch of the proof
Use multivariate Bernstein polynomials for approximating
continuous function g
Number of qubits 𝑁 = 𝑛𝑑 (𝑑: input dimension)
How to create these terms by observables?
15/20
Consider the operators for each 𝒑 = (𝑝1, … , 𝑝𝑑)
16/20
We choose the following observables
Basis functions
17/20
Approximate rate in parallel scenario
We evaluate the approximation rate via the approximation of multivariate
Bernstein polynomials to a continuous function
18/20
Approximate rate in parallel scenario
We use the Jackson theorem in higher dimension of the quantitative information on the
degree of polynomial approximation (D. Newman+1964)
Bernstein basis polynomials form a basis of the linear space of 𝑑–variate polynomials
of degree less than or equal to 𝑁 (but we do not know the explicit form)
19/20
Summary
◼ UAP via simple quantum feature
maps with observables
◼ Other questions:
◼ Evaluate how well the approximation
could perform (vs. classical scenario)
➢ UAP with other properties of the target function: non
continuous, varied smoothness from place to place
➢ Entanglement and UAP
➢ Approximation rate with the number of layers
(such as in data re-uploading scheme)
Thank you for listening!
20/20
◼ Require a large number of qubits

More Related Content

PDF
010_20160216_Variational Gaussian Process
PDF
Advanced Support Vector Machine for classification in Neural Network
PDF
Tutorial of topological data analysis part 3(Mapper algorithm)
PDF
SIAM-AG21-Topological Persistence Machine of Phase Transition
PDF
PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)
PDF
CCS2019-opological time-series analysis with delay-variant embedding
PDF
The Gaussian Process Latent Variable Model (GPLVM)
PDF
Methods of Manifold Learning for Dimension Reduction of Large Data Sets
010_20160216_Variational Gaussian Process
Advanced Support Vector Machine for classification in Neural Network
Tutorial of topological data analysis part 3(Mapper algorithm)
SIAM-AG21-Topological Persistence Machine of Phase Transition
PSF_Introduction_to_R_Package_for_Pattern_Sequence (1)
CCS2019-opological time-series analysis with delay-variant embedding
The Gaussian Process Latent Variable Model (GPLVM)
Methods of Manifold Learning for Dimension Reduction of Large Data Sets

What's hot (14)

PDF
A review on structure learning in GNN
PPTX
Dimension Reduction And Visualization Of Large High Dimensional Data Via Inte...
PDF
C0311019026
PPTX
Clustering techniques
PDF
Training and Inference for Deep Gaussian Processes
PPTX
Principal component analysis
PDF
Probabilistic PCA, EM, and more
PDF
Neural Networks: Principal Component Analysis (PCA)
PDF
Pca ankita dubey
PDF
Clustering lect
PDF
Hyperparameter optimization with approximate gradient
PDF
Radial Basis Function Interpolation
PPT
Machine Learning and Statistical Analysis
PDF
From RNN to neural networks for cyclic undirected graphs
A review on structure learning in GNN
Dimension Reduction And Visualization Of Large High Dimensional Data Via Inte...
C0311019026
Clustering techniques
Training and Inference for Deep Gaussian Processes
Principal component analysis
Probabilistic PCA, EM, and more
Neural Networks: Principal Component Analysis (PCA)
Pca ankita dubey
Clustering lect
Hyperparameter optimization with approximate gradient
Radial Basis Function Interpolation
Machine Learning and Statistical Analysis
From RNN to neural networks for cyclic undirected graphs
Ad

Similar to QTML2021 UAP Quantum Feature Map (20)

PDF
Master Thesis Presentation (Subselection of Topics)
PDF
On the limitations of representing functions on sets
PPT
Q-Metrics in Theory and Practice
PPT
Q-Metrics in Theory And Practice
PPTX
GAN for Bayesian Inference objectives
PDF
Deep Learning Theory Seminar (Chap 1-2, part 1)
PPTX
Bayesian Neural Networks
PDF
A deep learning framework for solving forward and inverse problems involving ...
PDF
Lecture 1: Deep Learning Fundamentals - Full Stack Deep Learning - Spring 2021
PDF
Sampling and low-rank tensor approximations
PDF
A nonlinear approximation of the Bayesian Update formula
PDF
Quantum Deep Learning
PDF
W2 - Multilayer Neural Network Lecture notes university of sydney
PDF
Quantum Machine Learning and QEM for Gaussian mixture models (Alessandro Luongo)
PDF
NeuralNetworks ppt for the mchjine lrning coure .pdf
PDF
Non-parametric regressions & Neural Networks
PDF
QMC: Transition Workshop - Probabilistic Integrators for Deterministic Differ...
PDF
How to find a cheap surrogate to approximate Bayesian Update Formula and to a...
PDF
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 6
PDF
Tensor Completion for PDEs with uncertain coefficients and Bayesian Update te...
Master Thesis Presentation (Subselection of Topics)
On the limitations of representing functions on sets
Q-Metrics in Theory and Practice
Q-Metrics in Theory And Practice
GAN for Bayesian Inference objectives
Deep Learning Theory Seminar (Chap 1-2, part 1)
Bayesian Neural Networks
A deep learning framework for solving forward and inverse problems involving ...
Lecture 1: Deep Learning Fundamentals - Full Stack Deep Learning - Spring 2021
Sampling and low-rank tensor approximations
A nonlinear approximation of the Bayesian Update formula
Quantum Deep Learning
W2 - Multilayer Neural Network Lecture notes university of sydney
Quantum Machine Learning and QEM for Gaussian mixture models (Alessandro Luongo)
NeuralNetworks ppt for the mchjine lrning coure .pdf
Non-parametric regressions & Neural Networks
QMC: Transition Workshop - Probabilistic Integrators for Deterministic Differ...
How to find a cheap surrogate to approximate Bayesian Update Formula and to a...
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 6
Tensor Completion for PDEs with uncertain coefficients and Bayesian Update te...
Ad

More from Ha Phuong (20)

PDF
018 20160902 Machine Learning Framework for Analysis of Transport through Com...
PDF
017_20160826 Thermodynamics Of Stochastic Turing Machines
PDF
016_20160722 Molecular Circuits For Dynamic Noise Filtering
PDF
015_20160422 Controlling Synchronous Patterns In Complex Networks
PDF
Tutorial of topological_data_analysis_part_1(basic)
PDF
013_20160328_Topological_Measurement_Of_Protein_Compressibility
PDF
011_20160321_Topological_data_analysis_of_contagion_map
PDF
009_20150201_Structural Inference for Uncertain Networks
PDF
PRML Reading Chapter 11 - Sampling Method
PDF
Approximate Inference (Chapter 10, PRML Reading)
PDF
008 20151221 Return of Frustrating Easy Domain Adaptation
PDF
007 20151214 Deep Unsupervised Learning using Nonequlibrium Thermodynamics
PDF
006 20151207 draws - Deep Recurrent Attentive Writer
PDF
005 20151130 adversary_networks
PDF
004 20151116 deep_unsupervisedlearningusingnonequlibriumthermodynamics
PPTX
003 20151109 nn_faster_andfaster
PDF
002 20151019 interconnected_network
PDF
001 20151005 ranking_nodesingrowingnetwork
PDF
Deep Learning And Business Models (VNITC 2015-09-13)
PDF
Prediction io–final 2014-jp-handout
018 20160902 Machine Learning Framework for Analysis of Transport through Com...
017_20160826 Thermodynamics Of Stochastic Turing Machines
016_20160722 Molecular Circuits For Dynamic Noise Filtering
015_20160422 Controlling Synchronous Patterns In Complex Networks
Tutorial of topological_data_analysis_part_1(basic)
013_20160328_Topological_Measurement_Of_Protein_Compressibility
011_20160321_Topological_data_analysis_of_contagion_map
009_20150201_Structural Inference for Uncertain Networks
PRML Reading Chapter 11 - Sampling Method
Approximate Inference (Chapter 10, PRML Reading)
008 20151221 Return of Frustrating Easy Domain Adaptation
007 20151214 Deep Unsupervised Learning using Nonequlibrium Thermodynamics
006 20151207 draws - Deep Recurrent Attentive Writer
005 20151130 adversary_networks
004 20151116 deep_unsupervisedlearningusingnonequlibriumthermodynamics
003 20151109 nn_faster_andfaster
002 20151019 interconnected_network
001 20151005 ranking_nodesingrowingnetwork
Deep Learning And Business Models (VNITC 2015-09-13)
Prediction io–final 2014-jp-handout

Recently uploaded (20)

PPTX
Introcution to Microbes Burton's Biology for the Health
PPTX
perinatal infections 2-171220190027.pptx
PPTX
Presentation1 INTRODUCTION TO ENZYMES.pptx
PDF
Assessment of environmental effects of quarrying in Kitengela subcountyof Kaj...
PPTX
POULTRY PRODUCTION AND MANAGEMENTNNN.pptx
PPTX
limit test definition and all limit tests
PPT
Mutation in dna of bacteria and repairss
PPTX
Probability.pptx pearl lecture first year
PPT
Computional quantum chemistry study .ppt
PDF
CHAPTER 2 The Chemical Basis of Life Lecture Outline.pdf
PDF
GROUP 2 ORIGINAL PPT. pdf Hhfiwhwifhww0ojuwoadwsfjofjwsofjw
PPT
Enhancing Laboratory Quality Through ISO 15189 Compliance
PDF
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
PPT
LEC Synthetic Biology and its application.ppt
PDF
Social preventive and pharmacy. Pdf
PPTX
Hypertension_Training_materials_English_2024[1] (1).pptx
PPTX
A powerpoint on colorectal cancer with brief background
PDF
Packaging materials of fruits and vegetables
PPTX
gene cloning powerpoint for general biology 2
PDF
Unit 5 Preparations, Reactions, Properties and Isomersim of Organic Compounds...
Introcution to Microbes Burton's Biology for the Health
perinatal infections 2-171220190027.pptx
Presentation1 INTRODUCTION TO ENZYMES.pptx
Assessment of environmental effects of quarrying in Kitengela subcountyof Kaj...
POULTRY PRODUCTION AND MANAGEMENTNNN.pptx
limit test definition and all limit tests
Mutation in dna of bacteria and repairss
Probability.pptx pearl lecture first year
Computional quantum chemistry study .ppt
CHAPTER 2 The Chemical Basis of Life Lecture Outline.pdf
GROUP 2 ORIGINAL PPT. pdf Hhfiwhwifhww0ojuwoadwsfjofjwsofjw
Enhancing Laboratory Quality Through ISO 15189 Compliance
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
LEC Synthetic Biology and its application.ppt
Social preventive and pharmacy. Pdf
Hypertension_Training_materials_English_2024[1] (1).pptx
A powerpoint on colorectal cancer with brief background
Packaging materials of fruits and vegetables
gene cloning powerpoint for general biology 2
Unit 5 Preparations, Reactions, Properties and Isomersim of Organic Compounds...

QTML2021 UAP Quantum Feature Map

  • 1. Universal Approximation Property via Quantum Feature Maps Quoc Hoan Tran*,Takahiro Goto*, and Kohei Nakajima Physical Intelligence Lab, UTokyo QTML2021 arXiv:2009.00298 Phys. Rev. Lett. 127, 090506 (2021) (*) Equal contribution
  • 2. The future of intelligent computations 2/20 Bits Qubits Neurons Quantum mind Quantum computing High Efficiency + Fragile Neural networks Versatile + Adaptive + Robust Two rising fields can be complementary Converge to an adaptive platform: Quantum Neural Networks
  • 3. QNN via Quantum Feature Map[1,2] 3/20 [1] V. Havlíček et al., Supervised Learning with Quantum-Enhanced Feature Spaces, Nature 567, 209 (2019) [2] M. Schuld and N. Killoran, Quantum Machine Learning in Feature Hilbert Spaces, Phys. Rev. Lett. 122, 040504 (2019) [3] Y. Liu et al., A Rigorous and Robust Quantum Speed-Up in Supervised Machine Learning, Nat. Phys. 17, 1013 (2021) Input Space 𝒳 Quantum Hilbert space 𝒙 |Ψ 𝒙 ⟩ Access via Measurements or Compute Kernel Or at least the same expressivity as classical 𝜅 𝒙, 𝒙′ = ⟨Ψ(𝒙)|Ψ(𝒙′)⟩ Heuristic Quantum Circuit Learning (K. Mitarai, M. Negoro,M. Kitagawa, and K. Fujii, Phys. Rev.A 98, 032309) ◼ For a special problem, a formal quantum advantage[3] can be proven (but is neither “near-term” nor practical) ◼ We expect quantum feature maps to be more expressive than classical counter parts
  • 4. Universal Approximation Property (UAP) 4/20 𝑥1 𝑥2 𝑤1 𝑤2 𝑤𝐾 𝑦 ◼ Feed-forward NN with one hidden layer can approximate any continuous function* on a closed and bounded subset of ℝ𝑑 to arbitrary 𝜀 (Hornik+1989, Cybenko+1989) * This belongs to a larger class of any Borel measurable function (beyond this talk) ➢ Given enough hidden units with any Squashing Activation Functions (Step, Sigmoid,Tanh, Relu) ➢ Extensive extensions to non-continuous activation function, resource evaluation
  • 5. UAP of Quantum ML models 5/20 Data Re-uploading Adrián Pérez-Salinas et al., Data re-uploading for a universal quantum classifier, Quantum 4, 226 (2020) Expressive power via a partial Fourier series M, Schuld, R. Sweke and J. J. Meyer, Effect of data encoding on the expressive power of variational quantum-machine-learning models, Phys. Rev.A 103, 032430 (2021) 𝑓𝜽 𝒙 = ෍ 𝜔∈Ω 𝑐𝜔 𝜽 𝑒𝑖𝜔𝒙 Our simple but clear questions ◼ UAP in the language of observables and quantum feature maps ◼ How well the approximation could perform? ◼ How many parameters it needs (vs. classical NNs)?
  • 6. QNN via Quantum Feature Map ◼ Choosing W is equivalent to selecting suitable observables 6/20 ◼ The output is a linear expansion of the basis functions with a suitable set of observables 𝜓𝑖 𝒙 = Ψ 𝒙 𝑂𝑖 Ψ 𝒙 𝑓 𝒙; Ψ, 𝑶, 𝒘 = ෍ 𝑖=1 𝐾 𝑤𝑖𝜓𝑖 𝒙 [1]V. Havlíček et al., Supervised Learning with Quantum-Enhanced Feature Spaces, Nature 567, 209 (2019) 𝑤𝛽 𝜽 = Tr 𝒲† 𝜽 𝑍⊗𝑁𝒲 𝜽 𝑃𝛽 𝑃𝛽 ∈ 𝐼, 𝑋, 𝑌, 𝑍 ⊗𝑁 𝜓𝛽 𝒙 = Tr Ψ 𝒙 Ψ 𝒙 𝑃𝛽 Pauli-group Basis functions (expectation) E.g., Decision function[1] sign 1 2𝑁 ෍ 𝛽 𝑤𝛽 𝜽 𝜓𝛽 𝒙 + 𝑏
  • 7. UAP via Quantum Feature Map 𝑓 𝒙; Ψ, 𝑶, 𝒘 = ෍ 𝑖=1 𝐾 𝑤𝑖𝜓𝑖 𝒙 ≈ 𝑔 𝒙 ? Given a compact set 𝒳 ∈ ℝ𝑑 , for any continuous function 𝑔: 𝒳 → ℝ, can we find 𝚿, 𝑶, 𝒘 s.t 7/20 UAP. Let 𝒢 be a space of continuous functions 𝑔: 𝒳 → ℝ.The quantum feature framework ℱ = {𝑓 𝒙; Ψ, 𝑶, 𝒘 } has the UAP w.r.t. 𝒢 and a norm ⋅ if given any 𝑔 ∈ 𝒢; then for any 𝜀, there exists 𝑓 ∈ ℱ s.t. 𝑓 − 𝑔 < 𝜀 Classification capability. For arbitrary disjoint regions 𝒦1, 𝒦2, … , 𝒦𝑛 ⊂ 𝒳, there exists 𝑓 ∈ ℱ s.t. 𝑓 can separate these regions (can be induced from UAP) 𝒳 ℝ 𝒦1 𝒦2 𝑓 ∈ ℱ
  • 8. UAP - Scenarios Parallel Sequential ◼ Encode the “nonlinear” property in the basis functions for UAP 1. Using appropriate observables to construct a polynomial approximation 2. Using appropriate activation functions in pre-processing 8/20
  • 9. Parallel scenario Approach 1. Produce the “nonlinear” by suitable observables ◼ Basis functions can be polynomials if we choose the correlated measurement operators ◼ Then we apply the Weierstrass’s polynomial approximation theorem 𝑈𝑁 𝒙 = 𝑉1 𝒙 ⊗ ⋯ ⊗ 𝑉𝑁 𝒙 𝑉 𝑗 𝒙 = 𝑒−𝑗𝜃𝑗 𝒙 𝑌 𝜃𝑗 𝒙 = arccos( 𝑥𝑘) for 1 ≤ 𝑘 ≤ 𝑑, 𝑗 ≡ 𝑘(mod 𝑑) 𝑂𝒂 = 𝑍𝑎1 ⊗ ⋯ ⊗ 𝑍𝑎𝑁, 𝒂 ∈ 0,1 𝑁 Observables Basis functions 𝜓𝒂 𝑥 = Ψ𝑁 𝒙 𝑂𝒂 Ψ𝑁 𝒙 = ෑ 𝑖=1 𝑁 2𝑥[𝑖] − 1 𝑎𝑖 Feature map: Ψ𝑁(𝒙) = 𝑈𝑁 𝒙 0 ⊗𝑁 1 ≤ 𝑖 ≤ 𝑑 𝑖 ≡ 𝑖 (𝑚𝑜𝑑 𝑑) 9/20
  • 10. Parallel scenario Approach 1. Produce the “nonlinear” by suitable observables Lemma 1. Consider a polynomial 𝑃(𝒙) of the input 𝒙 = 𝑥1, 𝑥2, … , 𝑥𝑑 ∈ 0,1 𝑑 where the degree of 𝑥𝑗 in 𝑃(𝒙) is less than or equal to 𝑁+(𝑑−𝑗) 𝑑 for 𝑗 = 1, … , 𝑑 𝑁 ≥ 𝑑 ; then there exists a collection of output weights {𝑤𝒂 ∈ 𝑅|𝒂 ∈ 0,1 𝑁} s.t. ෍ 𝒂∈ 0,1 𝑁 𝑤𝒂 𝜓𝒂 𝒙 = 𝑃(𝒙) Result 1. (UAP in the parallel scenario). For any continuous function 𝑔: 𝒳 → 𝑅 lim 𝑁→∞ min 𝒘 ෍ 𝒂∈ 0,1 𝑁 𝑤𝒂𝜓𝒂 𝒙 − 𝑔(𝒙) ∞ = 0 ※The number of observables ≤ 2 + 𝑁−1 𝑑 𝑑 10/20
  • 11. Parallel scenario Approach 2. Activation function in pre-processing (straight-forward) [Huang+, Neurocomputing, 2007] 𝜎: 𝑅 → [−1,1] is nonconstant piecewise continuous function such that 𝜎(𝒂 ⋅ 𝒙 + 𝒃) is dense in 𝐿2(𝑋).Then, for any continuous function 𝑔: 𝑋 → 𝑅 and any function sequence 𝜎𝑗 𝒙 = 𝜎 𝒂𝑗 ⋅ 𝒙 + 𝑏𝑗 where 𝒂𝑗, 𝑏𝑗 are randomly generated based on any continuous sampling distribution, lim 𝑁→∞ min 𝒘 ෍ 𝑗 𝑁 𝑤𝑗𝜎𝑗 𝒙 − 𝑔(𝒙) 𝐿2 = 0 We obtain UAP if we can implement the squashing activation function in pre-processing process (Result 2) • 𝑈𝑁 𝒙 = 𝑉1 𝒙 ⊗ ⋯ ⊗ 𝑉𝑁 𝒙 • 𝑉 𝑗 𝒙 = 𝑒−𝑗𝜃𝑗 𝒙 𝑌 • 𝜃𝑗 𝒙 = arccos( 1+𝜎(𝒂𝑗⋅𝒙+𝑏𝑗) 2 ) 𝜓𝑗 𝑥 = Ψ𝑁 𝒙 𝑍𝑗 Ψ𝑁 𝒙 = ⟨0|𝑒𝑖𝜃𝑗𝑌 𝑍𝑒−𝑖𝜃𝑗𝑌 |0⟩ = 2 cos2(𝜃𝑗) − 1 = 𝜎 𝒂𝑗 ⋅ 𝒙 + 𝑏𝑗 = 𝜎𝑗(𝒙) 11/20
  • 12. Sequential scenario with a single qubit Finite input set 𝒳 = {𝒙1, 𝒙2, … , 𝒙𝑀} 𝑉 𝒙 = 𝑒−𝜋𝑖𝜃 𝒙 𝑌 → 𝑉𝑛 𝒙 = 𝑒−𝜋𝑖𝑛𝜃 𝒙 𝑌 Basis functions with the Pauli-Z 𝜓𝑛 𝒙 = 0 𝑉𝑛 † 𝒙 𝑍𝑉𝑛 𝒙 0 = 2 cos2 (𝜋𝑛𝜃(𝒙)) = cos 2𝜋𝑛𝜃 𝒙 = cos(2𝜋{𝑛𝜃 𝒙 }) We can use the ergodic theory of the dynamical system on M- dimensional torus Lemma 3. If real numbers 1, 𝑎1, … , 𝑎𝑀 are linearly independent over ℚ, then 𝑛𝑎1 , … , 𝑛𝑎𝑀 𝑛∈ℕ is dense in 0, 1 𝑀 . Result 3. If 1, 𝜃(𝒙1), … , 𝜃(𝒙𝑀) are linearly independent over ℚ, then for any function 𝑔: 𝒳 → ℝ and for any 𝜀 > 0, there exists 𝑛 ∈ ℕ such that 𝑓 𝑛 𝑥 − 𝑔 𝑥 < 𝜀 for ∀𝑥 ∈ 𝒳 12/20
  • 13. UAP – Approximation Rate Classical UAP of DNN inf 𝐰 𝑓 𝒘 − 𝑔 = 𝑂(𝐾−𝛽/𝑑 ) [1] Hrushiksh M Mhaskar. Neural networks for optimal approximation of smooth and analytic functions. Neural Computation, 8(1):164-167, 1996 Optimal rate[3] [2] Dmitry Yarotsky. Error bounds for approximations with deep Relu networks. Neural Networks, 94:103-114, 2017 [3] Ronald A DeVore. Optimal nonlinear approximation. Manuscripta Mathematica, 63(4):469-478, 1989. 𝐾 𝜀 𝑂(𝐾−𝛼1) 𝑂(𝐾−𝛼2) 𝛼1 < 𝛼2 13/20 The following statement[1,2] holds for continuous activation function (with 𝐿 = 2), and ReLU (with 𝐿 = 𝑂(𝛽log 𝐾 𝑑 )) ➢ 𝑑: input dimension ➢ 𝐾: number of network parameters ➢ 𝛽: derivative order of g ➢ 𝐿: number of layers. How to describe relative good and bad UAP? Approximation Rate = Decay rate of appr. error vs. (better rate)
  • 14. UAP – Approximation Rate Approximation Rate = Decay rate of appr. error vs. ➢ Number K of observables ➢ Input dimension d ➢ Number N of qubits 𝐾 𝜀 𝑂(𝐾−1/3𝑑) 𝑂(𝐾−1/𝑑) 14/20 In the parallel scenario: 𝐾 = 𝑂 𝑁𝑑 → 𝜀 = 𝑂 𝐾−1/𝑑 Result 4. If 𝑋 = 0,1 𝑑 and the target function 𝑔: 𝑋 → 𝑅 is Lipschitz continuous w.r.t. the Euclidean norm, we can construct an explicit form of the approximator to 𝑔 in the parallel scenario by 𝑁 qubits with the error 𝜀 = 𝑂(𝑑7/6𝑁−1/3). Furthermore, there exist an approximator with 𝜀 = 𝑂(𝑑3/2𝑁−1) (optimal rate)
  • 15. Sketch of the proof Use multivariate Bernstein polynomials for approximating continuous function g Number of qubits 𝑁 = 𝑛𝑑 (𝑑: input dimension) How to create these terms by observables? 15/20
  • 16. Consider the operators for each 𝒑 = (𝑝1, … , 𝑝𝑑) 16/20
  • 17. We choose the following observables Basis functions 17/20
  • 18. Approximate rate in parallel scenario We evaluate the approximation rate via the approximation of multivariate Bernstein polynomials to a continuous function 18/20
  • 19. Approximate rate in parallel scenario We use the Jackson theorem in higher dimension of the quantitative information on the degree of polynomial approximation (D. Newman+1964) Bernstein basis polynomials form a basis of the linear space of 𝑑–variate polynomials of degree less than or equal to 𝑁 (but we do not know the explicit form) 19/20
  • 20. Summary ◼ UAP via simple quantum feature maps with observables ◼ Other questions: ◼ Evaluate how well the approximation could perform (vs. classical scenario) ➢ UAP with other properties of the target function: non continuous, varied smoothness from place to place ➢ Entanglement and UAP ➢ Approximation rate with the number of layers (such as in data re-uploading scheme) Thank you for listening! 20/20 ◼ Require a large number of qubits