Transition role of entangled data in quantum machine learning

We first recap the duty of learning quantum dynamics. Let ({{{{{{{boldsymbol{U}}}}}}}}in {mathbb{S}}{mathbb{U}}({2}^{n})) be the goal unitary and ({{{{{{{boldsymbol{O}}}}}}}}in {{mathbb{C}}}^{{2}^{n}occasions {2}^{n}}) be the observable which is a Hermitian matrix performing on an n-qubit quantum system. Here we specify the observable because the projective measurement ({{{{{{{boldsymbol{O}}}}}}}}=leftvert {{{{{{{boldsymbol{o}}}}}}}}rightrangle leftlangle {{{{{{{boldsymbol{o}}}}}}}}rightvert) since any observable reads out the classical info from the quantum system by way of their eigenvectors. The purpose of the quantum dynamics learning is to foretell the capabilities of the shape$${{{{{{{{rm{f}}}}}}}}}_{{{{{{{{boldsymbol{U}}}}}}}}}({{{{{{{boldsymbol{psi }}}}}}}})={{{{{{{rm{Tr}}}}}}}}({{{{{{{boldsymbol{O}}}}}}}}{{{{{{{boldsymbol{U}}}}}}}}leftvert {{{{{{{boldsymbol{psi }}}}}}}}rightrangle leftlangle {{{{{{{boldsymbol{psi }}}}}}}}rightvert {{{{{{{{boldsymbol{U}}}}}}}}}^{{{{dagger}}} }),$$
(1)
the place (leftvert {{{{{{{boldsymbol{psi }}}}}}}}rightrangle) is an n-qubit quantum state residing in a 2n-dimensional Hilbert area ({{{{{{{{mathcal{H}}}}}}}}}_{{{{{{{{mathcal{X}}}}}}}}}). This activity might be finished by using the coaching data ({{{{{{{mathcal{S}}}}}}}}) to assemble a unitary ({{{{{{{{boldsymbol{V}}}}}}}}}_{{{{{{{{mathcal{S}}}}}}}}}), i.e., the realized speculation has the shape of ({{{{{{{{rm{h}}}}}}}}}_{{{{{{{{mathcal{S}}}}}}}}}({{{{{{{boldsymbol{psi }}}}}}}})={{{{{{{rm{Tr}}}}}}}}({{{{{{{boldsymbol{O}}}}}}}}{{{{{{{{boldsymbol{V}}}}}}}}}_{{{{{{{{mathcal{S}}}}}}}}}leftvert {{{{{{{boldsymbol{psi }}}}}}}}rightrangle leftlangle {{{{{{{boldsymbol{psi }}}}}}}}rightvert {{{{{{{{boldsymbol{V}}}}}}}}}_{{{{{{{{mathcal{S}}}}}}}}}^{{{{dagger}}} })), which is predicted to precisely approximate fU(ψ) for the unseen data. While the realized unitary acts on an n-qubit system ({{{{{{{{mathcal{H}}}}}}}}}_{{{{{{{{mathcal{X}}}}}}}}}), the enter state could possibly be entangled with a reference system ({{{{{{{{mathcal{H}}}}}}}}}_{{{{{{{{mathcal{R}}}}}}}}}), i.e., (leftvert {{{{{{{boldsymbol{psi }}}}}}}}rightrangle in {{{{{{{{mathcal{H}}}}}}}}}_{{{{{{{{mathcal{X}}}}}}}}}otimes {{{{{{{{mathcal{H}}}}}}}}}_{{{{{{{{mathcal{R}}}}}}}}}). We suppose that every one enter states have the identical Schmidt rank r ∈ {1, ⋯  , 2n}. Then the response of the state (leftvert {{{{{{{{boldsymbol{psi }}}}}}}}}_{j}rightrangle) is given by the measurement output ({{{{{{{{boldsymbol{o}}}}}}}}}_{j}={sum }_{okay=1}^{m}{{{{{{{{boldsymbol{o}}}}}}}}}_{jk}/m), the place m is the quantity of measurements and ojk is the output of the k-th measurement of the observable O on the output quantum state (({{{{{{{boldsymbol{U}}}}}}}}otimes {{mathbb{I}}}_{{{{{{{{mathcal{R}}}}}}}}})vert {{{{{{{{boldsymbol{psi }}}}}}}}}_{j}rangle). In this fashion, the coaching data with N examples takes the shape ({{{{{{{mathcal{S}}}}}}}}={{(vert {{{{{{{{boldsymbol{psi }}}}}}}}}_{j}rangle,{{{{{{{{boldsymbol{o}}}}}}}}}_{j}):vert {{{{{{{{boldsymbol{psi }}}}}}}}}_{j}rangle in {{{{{{{{mathcal{H}}}}}}}}}_{{{{{{{{mathcal{X}}}}}}}}}otimes {{{{{{{{mathcal{H}}}}}}}}}_{{{{{{{{mathcal{R}}}}}}}}},{mathbb{E}}[{{{{{{{{boldsymbol{o}}}}}}}}}_{j}]={u}_{j}}}_{j=1}^{N}) with ({u}_{j}={{{{{rm{Tr}}}}}}( ( {{{{{boldsymbol{U}}}}}}^{{{dagger}} } {{{{{boldsymbol{O}}}}}} {{{{{boldsymbol{U}}}}}}otimes {mathbb{I}}_{{{{{mathcal{R}}}}}})vert {{{{{boldsymbol{psi }}}}}}_{j}rangle langle {{{{{boldsymbol{psi }}}}}}_{j} vert )) being the expectation worth of the observable O on the state (({{{{{boldsymbol{U}}}}}}otimes {mathbb{I}}_{{{{{mathcal{R}}}}}})vert {{{{{boldsymbol{psi }}}}}}_{j}rangle) and N being the dimensions of the coaching data. Notably, in quantum dynamics learning, pattern complexity refers back to the measurement of coaching data N, or equivalently, the quantity of quantum states in the coaching data; question complexity refers back to the complete quantity of queries of the explored quantum system, i.e., the manufacturing of pattern complexity and the quantity of measurements Nm.The threat perform is an important measure in statistical learning principle to quantify how effectively the speculation perform ({{{{{{{{rm{h}}}}}}}}}_{{{{{{{{mathcal{S}}}}}}}}}) performs in predicting fU, outlined as$${{{{{{{{rm{R}}}}}}}}}_{{{{{{{{boldsymbol{U}}}}}}}}}({{{{{{{{boldsymbol{V}}}}}}}}}_{{{{{{{{mathcal{S}}}}}}}}})=int{{{{{{{rm{d}}}}}}}}{{{{{{{boldsymbol{psi }}}}}}}}{left({{{{{{{{rm{f}}}}}}}}}_{{{{{{{{boldsymbol{U}}}}}}}}}({{{{{{{boldsymbol{psi }}}}}}}})-{{{{{{{{rm{h}}}}}}}}}_{{{{{{{{mathcal{S}}}}}}}}}({{{{{{{boldsymbol{psi }}}}}}}})proper)}^{2},$$
(2)
the place the integral is over the uniform Haar measure dψ on the state area. Intuitively, ({{{{{{{{rm{R}}}}}}}}}_{{{{{{{{boldsymbol{U}}}}}}}}}({{{{{{{{boldsymbol{V}}}}}}}}}_{{{{{{{{mathcal{S}}}}}}}}})) quantities to the typical sq. error distance between the true output f(ψ) and the speculation output ({{{{{{{{rm{h}}}}}}}}}_{{{{{{{{mathcal{S}}}}}}}}}({{{{{{{boldsymbol{psi }}}}}}}})). Moreover, we observe the therapies in ref. 48 selecting the Haar unitary because the goal unitary. Additionally, we assemble a sampling rule of the coaching enter states which approximates the uniform distribution of all entangled states with Schmidt rank r (consult with Supplementary Note 2).Under the above setting, we show the next quantum NFL theorem in learning quantum dynamics, the place the formal assertion and proof are deferred to Supplementary Note 3.
Theorem 1
(Quantum NFL theorem in learning quantum dynamics, casual). Following the settings in Eq. (1), suppose that the coaching error of the realized speculation on the coaching data ({{{{{{{mathcal{S}}}}}}}}) is lower than (varepsilon={{{{{{{mathcal{O}}}}}}}}(1/{2}^{n})). Then the decrease sure of the averaged prediction error in Eq. (2) yields$${{mathbb{E}}}_{{{{{{{{boldsymbol{U}}}}}}}},{{{{{{{mathcal{S}}}}}}}}}{{{{{{{{rm{R}}}}}}}}}_{{{{{{{{boldsymbol{U}}}}}}}}}({{{{{{{{boldsymbol{V}}}}}}}}}_{{{{{{{{mathcal{S}}}}}}}}})ge Omega left(frac{{tilde{varepsilon }}^{2}}{{4}^{n}}left(1-frac{Ncdot min {m/({2}^{n}r{c}_{1}),rn}}{{2}^{n}{c}_{2}}proper)proper),$$the place ({c}_{1}=128/{tilde{varepsilon }}^{2}), ({c}_{2}=min {{(1-2tilde{varepsilon })}^{2},{(64{tilde{varepsilon }}^{2}-1)}^{2}}), (tilde{varepsilon }=Theta ({2}^{n}varepsilon )), and the expectation is taken over all goal unitary U, entangled states (leftvert {{{{{{{{boldsymbol{psi }}}}}}}}}_{j}rightrangle) and measurement outputs oj.
The achieved outcomes point out the transition role of the entangled data in figuring out the prediction error. Particularly, when a ample quantity of measurements m is allowed such that the Schmidt rank r obeys (r < sqrt{m/({c}_{1}{2}^{n}n)}), the minimum term in the achieved lower bound refers to Nrn and hence increasing r can constantly decrease the prediction error. Accordingly, in the two extreme cases of r = 1 and r = 2n, achieving zero averaged risk requires N = 2nc2/n and N = 1 training input states, where the latter achieves an exponential reduction in the number of training data compared with the former. This observation implies that the entangled data empower QML with provable quantum advantage, which accords with the achieved results of ref. 48 in the ideal coherent learning protocol with infinite measurements.By contrast, in the scenario with (rge sqrt{m/({c}_{1}{2}^{n}n)}), increasing r could enlarge the prediction error. This result indicates that the entangled data can be harmful to achieving quantum advantages, which contrasts with previous results where the entanglement (e.g., entangled operations or measurements) is believed to contribute to the quantum advantage48,53,54,55. This counterintuitive phenomenon stems from the fact that when incoherently learning quantum dynamics, information obtained from each measurement decreases with the increased r and hence a small m is incapable of extracting all information of the target unitary carried by the entangled state.Another implication of Theorem 1 is that although the number of measurements m contributes to a small prediction error, it is not decisive to the ultimate performance of the prediction error. Specifically, when m ≥ r2c12nn, further increasing m could not help decrease the prediction error which is determined by the entanglement and the size of the training data, i.e., r and N. Meanwhile, at least r2c12nn measurements are required to fully utilize the power of entangled data. These results suggest that the value of m should be adaptive to r to pursue a low prediction error.We next comprehend the scenario in which the lower bound of averaged risk in Theorem 1 reaches zero and correlate with the results in quantum state learning and quantum dynamics learning26,27,29,30,56,57. In particular, the main focus of those studies is proving the minimum query complexity of the target unitary to warrant zero risk. The results in Theorem 1 indicate that the minimum query complexity is Nm = Ω(4nrc1c2), implying the proportional relation between the entanglement degree r and the query complexity. Notably, this lower bound is tighter than that achieved in ref. 26 in the same setting. The achieved results in terms of query complexity are also non-trivial, as previous works show that query complexity can benefit from using entanglement in quantum data58,59 and quantum measurements26,30. The advance of our results stems from the fact that ref. 26 simply employs Holevo’s theorem to give an upper bound on the extracted information in a single measurement, while our bound integrates more refined analysis such as the consideration of Schmidt rank r, the direct use of a connection between the mutual information of the target unitary U and the measurement outputs oj, and the KL-divergence of related distributions (refer to Supplementary Note 3 for more details). Moreover, the adopted projective measurement O in Eqn. (1) hints that the learning task explored in our study amounts to learning a pure state U†OU. From the perspective of state learning, the derived lower bound in Theorem 1 is optimal for the nonadaptive measurement with a constant number of outcomes60. Taken together, while the entangled data hold the promise of gaining advantages in terms of the sample complexity for achieving the same level of prediction error, they may be inferior to the training data without entanglement in terms of query complexity.The transition role of entanglement explained above leads to the following construction rule of quantum learning models. First, when a large number of measurements is allowed, the entangled data is encouraged to be used for improving the prediction performance. To this end, initial research efforts61,62,63,64,65,66, which develop effective methods for preparing and storing entangled states, may contribute to QML. Second, when the total number of measurements is limited, it is advised to refrain from using entangled data for learning quantum dynamics.Remark. (i) The training error scaling (varepsilon={{{{{{{mathcal{O}}}}}}}}(1/{2}^{n})) in Theorem 1 and the factor of the achieved lower bound ({tilde{varepsilon }}^{2}/{4}^{n}) comes from the consideration of average performance over Haar unitaries where the expectation value of observable O scales as ({{{{{{{rm{Tr}}}}}}}}({{{{{{{boldsymbol{O}}}}}}}})/{2}^{n}) (Refer to Supplementary Note 2). (ii) The results of the transition role for entangled data achieved in Theorem 1 can be generalized to the mixed states because the mixed state can be produced by taking the partial trace of a pure entangled state.In a more generic learning setting, the observable used in the target function defined in Eqn. (1) and the measurement used for collecting the response of training data o could be arbitrary and varied. In particular, we consider that the observable O defined in Eqn. (1) could be arbitrary Hermitian operator satisfying ∥O∥1≤∞. The response aj for given input states (leftvert {{{{{{{{boldsymbol{psi }}}}}}}}}_{j}rightrangle) could be obtained from measuring the output states on system ({{{{{{{mathcal{X}}}}}}}}) with ℓ-outcome POVM. The training dataset in this case refers to ({{{{{{{{mathcal{S}}}}}}}}}_{ell }={{(leftvert {{{{{{{{boldsymbol{psi }}}}}}}}}_{j}rightrangle,{{{{{{{{boldsymbol{a}}}}}}}}}_{j}):leftvert {{{{{{{{boldsymbol{psi }}}}}}}}}_{j}rightrangle in {{{{{{{{mathcal{H}}}}}}}}}_{{{{{{{{mathcal{X}}}}}}}}{{{{{{{mathcal{R}}}}}}}}},{{{{{{{{boldsymbol{a}}}}}}}}}_{j}=({{{{{{{{boldsymbol{a}}}}}}}}}_{j1},cdots ,,{{{{{{{{boldsymbol{a}}}}}}}}}_{jm}),{{{{{{{{boldsymbol{a}}}}}}}}}_{jk}in {{z}_{1},cdots ,,{z}_{ell }}}}_{j=1}^{N}), where (vert {{{{{{{{boldsymbol{psi }}}}}}}}}_{j}rangle) refers to the entangled states with Schmidt rank r, aj is the m-measurement outputs with ℓ-outcome POVM, and ({{{z}_{i}}}_{i=1}^{ell }) is the ℓ possible outcomes of the employed POVM. In this case, denoting the learned unitary as ({{{{{{{{boldsymbol{V}}}}}}}}}_{{{{{{{{{mathcal{S}}}}}}}}}_{ell }}), we get the following quantum NFL theorem in learning quantum dynamics for generic measurements, where the formal statement and proof are deferred to Supplementary Note 4. Theorem 2 (Quantum NFL theorem in learning quantum dynamics for generic measurements, informal) Following the settings in Eq. (1) with arbitrary O satisfying ∥O∥1≤∞, suppose the learned hypothesis is learned from training data ({{{{{{{{mathcal{S}}}}}}}}}_{ell }). Then the lower bound of the averaged prediction error in Eqn. (2) yields$${{mathbb{E}}}_{{{{{{{{boldsymbol{U}}}}}}}},{{{{{{{{mathcal{S}}}}}}}}}_{ell }}{{{{{{{{rm{R}}}}}}}}}_{{{{{{{{boldsymbol{U}}}}}}}}}({{{{{{{{boldsymbol{V}}}}}}}}}_{{{{{{{{{mathcal{S}}}}}}}}}_{ell }})ge {varepsilon }^{2}left(1-frac{Ncdot min {4m/r,6mell /{2}^{n}r,rn}}{log (| {{{{{{{{mathcal{X}}}}}}}}}_{2varepsilon }({{{{{{{boldsymbol{O}}}}}}}})| )}right)$$where (| {{{{{{{{mathcal{X}}}}}}}}}_{2varepsilon }({{{{{{{boldsymbol{O}}}}}}}})|) refers to the model complexity and only depends on ε and the employed observable O. For projective measurement ({{{{{{{boldsymbol{O}}}}}}}}=leftvert {{{{{{{boldsymbol{o}}}}}}}}rightrangle leftlangle {{{{{{{boldsymbol{o}}}}}}}}rightvert), (log (| {{{{{{{{mathcal{X}}}}}}}}}_{2varepsilon }({{{{{{{boldsymbol{O}}}}}}}})| )={2}^{n}{c}_{2}) is given in the denominator of the achieve lower bound in Theorem 1. The achieved results in Theorem 2 deliver three implications. First, the transition role of entangled data still holds for arbitrary observable and POVM. In particular, no matter how large the number of possible outcomes of POVM ℓ is, increasing the Schmidt rank will decrease the prediction error as long as the number of measurements m satisfies (min {4m/r,6mell /{2}^{n}r}le rn), and increase the prediction error otherwise. Second, when the observable is projective measurement and the number of possible outcomes ℓ is of constant order, the achieved result in Theorem 2 reduces to the results achieved in Theorem 1 for the case of employing projective measurement up to a constant factor. Third, increasing the number of possible outcomes of POVM ℓ can exponentially reduce the number of measurements required to achieve the same level of prediction error. Particularly, considering two extreme cases of the possible outcomes of POVM ℓ being constant scaling Θ(1) and exponential scaling Θ(2n), achieving the same level of prediction error requires the query complexity scaling with the order of ({2}^{n}rlog (| {{{{{{{{mathcal{X}}}}}}}}}_{2varepsilon }({{{{{{{boldsymbol{O}}}}}}}})| )) and (rlog (| {{{{{{{{mathcal{X}}}}}}}}}_{2varepsilon }({{{{{{{boldsymbol{O}}}}}}}})| )), where the latter case achieves an exponential reduction in terms of the query complexity.Numerical resultsWe conduct numerical simulations to exhibit the transition role of entangled data, the effect of the number of measurements, and the training data size in determining the prediction error. The omitted construction details and results are deferred to Supplementary Note 5.We focus on the task of learning an n-qubit unitary under a fixed projective measurement ({{{{{{{boldsymbol{O}}}}}}}}={(leftvert {{{{{{{boldsymbol{0}}}}}}}}rightrangle leftlangle {{{{{{{boldsymbol{0}}}}}}}}rightvert )}^{otimes n}). The number of qubits is n = 4. The target unitary UX is chosen uniformly from a discrete set ({{{{{{{{{{boldsymbol{U}}}}}}}}}_{i}}}_{i=1}^{M}), where M = 2n refers to the set size and the operators ({{{{{{{{boldsymbol{U}}}}}}}}}_{j}^{{{{dagger}}} }{{{{{{{boldsymbol{O}}}}}}}}{{{{{{{{boldsymbol{U}}}}}}}}}_{j}) with Uj in this set are orthogonal such that the operators ({{{{{{{{boldsymbol{U}}}}}}}}}_{j}^{{{{dagger}}} }{{{{{{{boldsymbol{O}}}}}}}}{{{{{{{{boldsymbol{U}}}}}}}}}_{j}) are well distinguished. The entangled states in ({{{{{{{mathcal{S}}}}}}}}) is uniformly sampled from the set ({{sum }_{j=1}^{r}sqrt{{c}_{j}}{{{{{{{{boldsymbol{U}}}}}}}}}_{j}leftvert {{{{{{{boldsymbol{0}}}}}}}}rightrangle otimes leftvert {{{{{{{{boldsymbol{xi }}}}}}}}}_{j}rightrangle ,| ,{(sqrt{{c}_{1}},cdots,sqrt{{c}_{r}})}^{top }in {mathbb{S}}{mathbb{U}}(r),,leftvert {{{{{{{{boldsymbol{xi }}}}}}}}}_{j}rightrangle in {mathbb{S}}{mathbb{U}}({2}^{n})}). The size of training data is N ∈ {1, 2, ⋯  , 16} and the Schmidt rank takes r = {20, ⋯  , 24}. The number of measurements takes m ∈ {10, 100, 300, ⋯  , 5000, 20000}. We record the averaged prediction error by learning four different 4-qubit unitaries for 10 training data.The simulation results are displayed in Fig. 2. Particularly, Fig. 2a shows that for both the cases of N = 2 and N = 8, the prediction error constantly decreases with respect to an increased number of measurements m and increased Schmidt rank r when the number of measurements is large enough, namely m > 1000. On the opposite hand, for a small quantity of measurements with m ≤ 100 in the case of N = 8, because the Schmidt rank is regularly elevated, the averaged prediction error initially decreases after which will increase after the Schmidt rank surpasses a important level which is r = 3 for m = 10 and r = 4 for m = 100. This phenomenon accords with the theoretical outcomes in Theorem 1 in the sense that the entangled data play a transition role in figuring out the prediction error for a restricted quantity of measurements. This commentary can be verified in Fig. 2b for the various sizes of coaching data, the place for the small measurement occasions m = 10, growing the Schmidt rank could possibly be not useful for reducing the prediction error. By distinction, a big coaching data measurement constantly contributes to a small prediction error, which echoes with Theorem 1.Fig. 2: Simulation outcomes of quantum NFL theorem when incoherently learning quantum dynamics.a The averaged prediction error with a various quantity of measurements m and Schmidt rank r when N = 2 and N = 8. The z-axis refers back to the averaged prediction error outlined in Eq (1). b The averaged prediction error with the various sizes of coaching data. The label “r = a&m = b” refers that the Schmidt rank is a and the quantity of measurements is b. The label “( × 2/4n)” refers that the plotted prediction error is normalized by a multiplier issue 2/4n.

https://www.nature.com/articles/s41467-024-47983-1

Recommended For You