1. Mixed states
    1. Density Matrix
    2. Schmidt Decomposition
    3. Purification
  2. Quantifying information
    1. Shannon entropy (classical)
    2. Von Neumann entropy (quantum)
  3. State distinguishability - Fidelity

Mixed states

Density Matrix

The density matrix formalism is another way of representing quantum states. If we have a given quantum state \( \ket{\psi} \) we define its density matrix as \( \rho = \ket{\psi}\bra{\psi} \). This formalism allows us to study not only what are called pure states but also mixed states. Pure states are the kinds of states that you study in introductory quantum mechanics, which are denoted by a ket vector \( \ket{\psi} \). A mixed state is a statistical ensemble of pure states, i.e. if we had \( \ket{\psi_{i}} \) states each with a (classical) probability \( p_{i} \) of occurring, then its density matrix is given by \( \rho = \sum_{i} p_{i} \ket{\psi_{i}} \bra{\psi_{i}}\ , p_{i} \geq 0 \). Note that they are fundamentally two different types of states, let us analyze them in the following example:

If we have the following superposition of states of a one qubit system \[ \ket{\psi} = \frac{1}{\sqrt{2} } \left( \ket{0} + \ket{1}\right)\ , \] then its corresponding density matrix is given by \[ \rho = \ket{\psi}\bra{\psi} = \frac{1}{2} \left( \ket{0}\bra{0} + \ket{0}\bra{1} + \ket{1} \bra{0} + \ket{1 } \bra{1} \right)\ , \] which can be written in matrix form as \[ \rho = \begin{pmatrix} \frac{1}{2} & \frac{1}{2} \\ \frac{1}{2} & \frac{1}{2} \end{pmatrix}\ . \]

Now let us imagine that we want to prepare a statistical ensemble of a one qubit system such that all possible eigenstates are equiprobable. Then we would have \[ \rho = \frac{1}{2} \ket{0}\bra{0} + \frac{1}{2} \ket{1} \bra{1} = \operatorname{diag} \left( \frac{1}{2}, \frac{1}{2} \right)\ . \]

As we can see, we obtain two very different matrices! In the first case, where we have a pure state, the system is in a state such that, when observed it has equal probability of either being in the \( 0 \) or \( 1 \) state. However, on the second case we have a very different setup, what is happening is that we have equal probability for the system to be in either of those states, i.e., there is no collapse of the wave function happening here, in fact, we do not even know what is the state of the system to begin with, in that sense, in mixed states we do not have the complete information about the system.

Another way of producing mixed states is by focusing on a subsystem of a composite system. For example, assume that the Hilbert space is a direct product of system A and B with orthonormal base given by \( \ket{\phi_{i}^{A}} \otimes \ket{\phi_{j}^{B}} \), and that this system is in a pure state \( \ket{\psi} = \sum_{ij} \psi_{ij} \ket{\phi_{i}^{A}} \otimes \ket{\phi_{j}^{B}} \). If we only look into subsystem A we have, for an observable \( O_{A} \) that only acts on subsystem A, that \[ \left< O_{A}\right> = \bra{\psi}O_{A}\ket{\psi} = \operatorname{Tr} \left( O_{A} \ket{\psi} \bra{\psi} \right) = \operatorname{Tr}_{A} \left(O_{A} \rho_{A}\right)\ , \] with \( \rho^{A} = \operatorname{Tr}_{B} \rho = \sum_{i i' j} \psi_{ij} \bar{\psi}_{i' j} \ket{\phi_{i}^{A}} \bra{\phi_{i'}^{A}} \). If \( \psi_{ij} = \sqrt{p_{i}} \delta_{ij} \), then \( \rho_{A} \) is diagonal.

This density matrix has several properties:

  1. Hermiticity: \( \rho_{A} = \rho_{A}^{\dagger} \), which can be seen directly from the definition of both the density matrix as well as the reduced one.
  2. Positivity: \( \bra{\varphi} \rho_{A} \ket{\varphi} \geq 0 \), for densities matrices this can also be seen directly from the definitions.
  3. Unit Trace: \( \operatorname{Tr}\rho_{A} = 1 \), also follows from both the definition of the density matrix and the reduced density matrix.

To see that the reduced density matrix also has the positivity property see the below:

Positivity of reduced density matrix
Note that \( \bra{\phi}\rho_{A}\ket{\phi} = \bra{\phi} \operatorname{Tr}_{B} \rho \ket{\phi} \). We can rewrite this as \[ \bra{\phi}\rho_{A}\ket{\phi} = \sum_{i} \left( \bra{\phi} \otimes \bra{\phi_{i}^{B}} \right) \rho \left( \ket{\phi} \otimes \ket{\phi_{i}^{B}} \right) \geq 0\ , \] where in the end we used the fact that the density matrix \( \rho \) is semi definite positive to argue that each term of the sum must be non negative.

The aforementioned conditions imply that any density matrix can be diagonalized in some orthogonal basis \( \ket{\tilde{\phi}_{i}} \), being given by \( \rho = \sum_{i} \ket{\tilde{\phi}_{i}} \bra{\tilde{\phi}_{i}} \). The density matrix obtained this way is the same one that would be obtain via a statistical ensemble using this orthogonal basis of states. Note, however, just because the density matrix is diagonal, it does not imply we will have state \( \ket{\tilde{\phi}_{i}} \) with probability \( p_{i} \)! This is because we can write a given mixed density matrix in multiple ways, i.e. multiple ensembles can lead to the same density matrix.

There is a criterion to see if a given density matrix represents a pure or a mixed state. We compute what is called the purity of the density matrix, which is given by \( \operatorname{Tr} \rho^{2} \). For pure states, as we can write \( \rho = \ket{\psi}\bra{\psi} \implies \rho^{2} = \rho \implies \operatorname{Tr} \rho^{2} = 1 \). For mixed states, remembering that the trace operation is basis independent, by going to the orthonormal basis of the density operator we will have \[ \operatorname{Tr} \rho^{2} = \sum_{i} \rho_{ii}^{2}\ , \] because \( \operatorname{Tr} \rho = 1 \) and the entries on the diagonal must be non-negative (this follows from positivity), if only one entry is positive, then we have that entry equal to unity and we have a pure state. However, if more than one entry is positive, then both entries are less than one and as such \( \rho_{ii}^{2} < \rho^{ii} \) which implies \( \operatorname{Tr} \rho^{2} = \sum_{i} \rho_{ii}^{2} < 1 \). Thus, we see that a pure state has purity equal to one, with a mixed state having less purity. An equivalent criterion would be to check if \( \rho = \rho^{2} \) to see if it is pure, being mixed otherwise.

Schmidt Decomposition

The Schmidt decomposition is a procedure to obtain an orthogonal basis for A and B such that a state of the system can be written as \( \ket{\psi} = \sum_{i} \sqrt{p_{i}} \ket{\tilde{\phi}_{i}^{A}} \otimes \ket{\tilde{\phi}_{i}^{B}} \). An advantage of this is that both the reduced density matrices of the subsystems A and B will be diagonal. Another advantage is that this gives us a canonical basis to study entanglement between two systems.

To see how we can construct this basis let us start with an initial state of the system given by \( \ket{\psi} = \sum_{ij} \psi_{ij} \ket{\phi_{i}^{A}} \otimes \ket{\phi_{j}^{B}} \), where the state is written in eigenbasis of each subsystem. We consider the amplitudes \( \psi_{ij} \) as a \( m \times n \) matrix and we perform its singular value decomposition (SVD) from where we obtain \( \psi = U D V^{\dagger} \) where \( U, V \) are unitary with dimension \( m \times m, n \times n \), respectively, and \( D \) is a \( m \times n \) diagonal matrix with non-negative entries (we have freedom to choose this). Using this decomposition we thus have \[ \ket{\psi} = \sum_{ijk} U_{ik} D_{kk} V^{\dagger}_{kj} \ket{\phi^{A}_{i}} \otimes \ket{\phi_{j}^{B}} = \sum_{k} \underbrace{D_{kk}}_{\sqrt{p_{k}}} \underbrace{\left( \sum_{i} U_{ik} \ket{\phi_{i}^{A}}\right)}_{ \ket{\tilde{\phi}_{k}^{A}}} \otimes \underbrace{\left( \sum_{j} V^{\dagger}_{kj} \ket{\phi_{j}^{B}}\right)}_{\ket{\tilde{\phi}_{k}^{B}}} \]

Proof of orthogonality

Take \( \ket{\tilde{\phi}_{k}^{A}} = \sum_{i} U_{ik} \ket{\phi_{i}^{A}} \) then we have that \[ \bra{\tilde{\phi}_{j}^{A}} \ket{\tilde{\phi}_{k}^{A}} = \sum_{i i'} \bra{\phi_{i'}^{A}} \ket{\phi_{i}^{A}} U^{\dagger}_{j i'}U_{i k} = \sum_{i} U^{\dagger}_{j i} U_{i k} = \left[ U^{\dagger} U \right]_{j k} = \delta_{j k}\ . \] Thus unitary of \( U \) ensures orthogonality. The calculations are similar for the other subsystem.

One remark worth mentioning here is that the Schmidt decomposition allows one to reduce any state to being described by \( \min(m, n) \) complex numbers.

Purification

The purification of a state is the converse of the Schmidt decomposition, in the sense that we want to go from a subsystem to the pure state of a bigger composite system, i.e. find a density matrix of the composite system such that, when reduced to our initial subsystem we obtain the previous density matrix, \( \rho_{A} = \operatorname{Tr}_{B} \left[\bra{\psi}\ket{\psi}\right] \).

From the previous subsection we know that we can write \( \ket{\psi} \) as \[ \ket{\psi} = \sum_{i} \sqrt{p_{i}} \ket{\tilde{\phi}_{i}^{A}} \otimes \ket{\tilde{\phi}_{i}^{B}}\ , \] however, as we do not really care about the subsystem B, the states \( \ket{\tilde{\phi}_{i}^{B}} \) are defined up to an arbitrary unitary transformation, since the set of states \( U \ket{\tilde{\phi}_{i}^{B}} \) would give an equally good purification. The only condition is that the basis of states must be orthogonal.

It is of note that the dimension of \( B \) must be equal or greater to that of \( A \) [1].

Quantifying information

Shannon entropy (classical)

In the diagonal basis of the density matrix, we can interpret the diagonal elements as a classical probability distribution since we have \( \left< A\right> = \sum_{i} p_{i} A_{i}, A_{i} = \bra{\tilde{\phi_{i}}} A \ket{\tilde{\phi}_{i}} \). A useful notion to quantify the degree of uncertainty of such a probability distribution is the Shannon entropy of a random process \( X = \left\{ x, p_{x} \right\} \), given by \[ H(X) = - \sum_{x} p_{x} \ln p_{x}\ . \]

This quantity has a remarkable number of properties:

  1. This quantity represents the amount of information (number of bits if we replace \( \ln \rightarrow \log_{2} \) ) needed to encode a message if the letter \( x \) occurs with probability \( p_{x} \).
  2. It is maximized by the uniform distribution \( q_{x} = \frac{1}{\Omega} \), with \( \Omega = \sum_{x} 1 \).
    Maximizing entropy.
    Defining \( s(p) = - p \ln p \) we have that \( s(\sum_{n} \lambda_{n} p_{n}) \geq \sum_{n} \lambda_{n} s(p_{n}) \) where \( \sum_{n} \lambda_{n} = 1 \), this follows from Jensen's inequality for convex function. Thus we get \( \frac{1}{\Omega} \sum_{x} s(q_{x}) \leq s\left(\frac{1}{\Omega} \sum_{x} q_{x}\right) \). \todo{Link to future article on Jensen's inequality.} Another method would be to use Lagrange multipliers to maximize the entropy with the restriction that \( \sum_{i} p_{i} = 1 \). For that, construct the function \[ F(p_{x}, \lambda) = - \sum_{x} p^{x} \ln p_{x} - \lambda \left( \sum_{x} p_{x} - 1 \right)\ . \] Note that \[ \frac{dF}{dp_{x}} = 0 \Leftrightarrow - \lambda - 1 - \ln p_{x} = 0 \Leftrightarrow p_{x} = e^{ - (1 + \lambda)}\ , \] and so because \( \sum_{x} p_{x} = 1 \) we have that \( N e^{ - (1 + \lambda)} = 1 \implies p_{x} = \frac{1}{N}\ , \forall x \), thus proving that the uniform distribution is the one that maximizes the entropy.

  3. The entropy is stable against the addition of other outcomes with zero probability: \( H(p_{1}, \dots, p_{\Omega}) = H(p_{1}, \dots, p_{\Omega}, p_{\Omega + 1}) \) if \( p_{\Omega + 1} = 0 \). This makes sense by defining \( 0 ln 0 = 0 \), which makes sense when one takes the limit coming from positive \( p_{x} \).
  4. Tells us that information is extensive. Let \( XY = \left\{ \left(x, y\right), p_{x,y} \right\} \) be a stochastic process. Describe an element \( \left(x, y\right) \) as being the event of an object \( x \) belonging to a box \( y \). \( Y = \left\{ y, p_{y} = \sum_{x} p_{x, y} \right\} \) is the probability of any element being on box \( y \) and \( X_{y} = \left\{ x, p_{x|y} = \frac{p_{x,y}}{\sum_{x} p_{x,y}} \right\} \) the probability of a particular element being in box \( y \).

    Then the following equality is true \[ H(XY) = H(Y) + \sum_{y} p_{y} H(X_{y})\ , \] i.e, the entropy of knowing the entire distribution of elements inside boxes is the same entropy as knowing the distributions of the boxes added to the average entropy of objects being inside each box.

  5. Information decreases ignorance, thus entropy. For the random process \( XY, Y, X_{y} \) defined on the previous point, the average entropy over the possible values of \( y \) is given by \[ H(X|Y) = \sum_{y} p_{y} H(X_{y}) = H(XY) - H(Y) < H(XY)\ , \] called the conditional entropy of \( x \) given \( y \). This quantity represents the amount of information (number of bits per letter) needed to specify both \( x \) and \( y \) once \( y \) is known. The amount of information learned on \( x \) by knowing \( y \) is obtained by \[ I(X;Y) = H(X) - H(X|Y) = H(X) + H(Y) - H(XY)\ , \] which is a symmetric quantity in \( x \) and \( y \) called the mutual information. If \( I(X;Y) = 0 \) then \( X \) and \( Y \) are uncorrelated, i.e. independent processes, \( p_{x,y} = p_{x} p_{y} \).

Von Neumann entropy (quantum)

The Von Neumann entropy is the quantum generalization of the Shannon entropy for a density matrix \( \rho = \sum_{x} p_{x} \ket{\tilde{\phi}_{x}} \bra{\tilde{\phi}_{x}} \): \[ S(\rho) = - \operatorname{Tr} \left[ \rho \ln \rho \right] = - \sum_{x} p_{x} \ln p_{x}\ , \] where we can make the second equality by noting that for \( \rho \) in its diagonal form we have \[ \rho^{n} = \sum_{x} p_{x}^{n} \ket{\tilde{\phi}_{x}}\bra{\tilde{\phi}_{x}}\ , \] and as such, analytic functions of \( \rho \), \( f(\rho) \), can be written as \( f(\rho) = \sum_{x} f(p_{x}) \ket{\tilde{\phi}_{x}}\bra{\tilde{\phi}_{x}} \).

It also has a list of remarkable properties:

  1. \( S(\ket{\psi}\bra{\psi}) = 0 \), i.e a pure state has no entropy.
  2. \( S(U\rho U^{\dagger}) = S(\rho) \) where \( U \) is an unitary matrix. Because any power of \( \left(U \rho U^{\dagger}\right)^{n} = U \rho^{n} U^{\dagger} \), if you write the logarithm in its Taylor series form and use the cyclic property of the trace every unitary matrix cancels out. This statement is intuitive because we can look at \( U \rho U^{\dagger} \) as just being a redefinition of the basis in which the density matrix is written, thus the entropy should not change with it.
  3. \( S(\rho) \leq \ln d \), with \( d \) being the dimension of the Hilbert space.
  4. \( S\left(\sum_{n} \lambda_{n} \rho_{n} \geq \sum_{n} \lambda_{n} S(\rho_{n}), \sum_{n} \lambda_{n} = 1 \right) \) by the same argument as for the Shannon entropy.
  5. \( S(\rho_{AB}) \leq S\left(\rho_{A}\right) + S(\rho_{B}) \), called the subadditivity, which states that we have more information if we have the correlations instead of both density matrices of the subsystems.
  6. For pure states \( \rho_{AB} = \ket{\psi}\bra{\psi} \) we have that \( S(\rho_{A}) = S(\rho_{B}) \).
    Proof
    Let \( \ket{\varphi} \) be the pure state the system is in. Performing the Schmidt decomposition of this state we arrive at the following density matrix \[ \rho = \sum_{ij} \sqrt{p_{i} p_{j}} \left( \ket{\tilde{\phi}_{i}^{A}} \otimes \ket{\tilde{\phi}_{i}^{B}} \right) \left( \bra{\tilde{\phi}_{j}^{A}} \otimes \bra{\tilde{\phi}_{j}^{B}}\right)\ , \] where the density matrix is written in an orthonormal basis of both A and B. Performing the partial trace of \( \rho \) in either A or B leads us to \begin{align} & \rho_{A} = \operatorname{Tr}_{B} \left[\rho \right] = \sum_{k} p_{k} \ket{\tilde{\phi}_{k}^{A}} \bra{\tilde{\phi}_{k}^{A}} \\ & \rho_{B} = \operatorname{Tr}_{A} \left[\rho \right] = \sum_{k} p_{k} \ket{\tilde{\phi}_{k}^{B}} \bra{\tilde{\phi}_{k}^{B}}\ , \end{align} thus, as both of the above are density matrices written in a diagonal form, when we compute their Von Neumann entropy we will reach at the same result \[ S(\rho_{A}) = S(\rho_{B}) = - \sum_{k} p_{k} \ln{p_{k}} \]

  7. The mutual information \( I(A;B) = S(\rho_{A}) + S(\rho_{B}) - S(\rho_{AB}) \) is non negative.
  8. Triangle inequality \( S(\rho_{AB}) \geq \left|S(\rho_{A}) - S(\rho_{B})\right| \).
    Proof
    Let us purify the state the system AB is in by considering the system ABC. As ABC is in a pure state then we know from previous properties that \( S(\rho_{AB}) = S(\rho_{C}) \), \( S(\rho_{A}) = S(\rho_{BC}) \) and \( S(\rho_{AC}) = S(\rho_{B}) \). As such, using subadditivity, we have \begin{align} &S(\rho_{BC}) \geq S(\rho_{B}) + S(\rho_{C}) \Leftrightarrow S(\rho_{AB}) \geq S(\rho_{A}) - S(\rho_{B}) \\ &S(\rho_{AC}) \geq S(\rho_{A}) + S(\rho_{C}) \Leftrightarrow S(\rho_{AB}) \geq S(\rho_{B}) - S(\rho_{A})\ . \end{align} Thus leading to the intended result.
    This contrasts with \( H(XY) \geq H(X), H(Y) \) for the Shannon entropy, which states that the entropy of a part cannot be larger than that of the whole system. In the quantum case this is not true! For example, in a pure state we will have \( S(\rho_{AB}) = 0 \) but, in general, \( S(\rho_{A}) = S(\rho_{B}) \neq 0 \). The same can be said for the quantum conditional entropy, defined as \[ S(A|B) = S(\rho_{AB}) - S(\rho_{B})\ , \] which classically was always non negative, describing that information decreases ignorance. However, in the quantum case it is also violated by a pure state, with the possibility of being a negative number in that case.

State distinguishability - Fidelity

On the space of pure states, the similarity between two states \( \ket{\psi} \) and \( \ket{\varphi} \) can be given by their overlap \( \left|\bra{\psi}\ket{\varphi}\right|^2 \), also called fidelity. For density matrices it is not so easy to come up with a generalization of this simple notion. A quantity that does the trick is \[ F(\rho, \sigma) = \left(\operatorname{Tr}\left[\sqrt{\rho^{\frac{1}{2}}\sigma\rho^{\frac{1}{2}}} \right]\right)^{2} = \left|\left|\sigma^{\frac{1}{2}}\rho^{\frac{1}{2}}\right|\right|^2\ , \] with \( \left|\left|A\right|\right| = \operatorname{Tr}\left[A^{\dagger}A\right] \) the trace norm.

How to arrive at the fidelity?

Imagine we want to calculate the fidelity between two density matrices, \( \rho, \sigma \). As we know how to do this process for quantum states one possible idea is to perform the purification of both density matrices and then calculate the fidelity of those pure states. Doing the purification we get \begin{align} &\rho \rightarrow \ket{\psi_{\rho}(U)} = \sum_{i} \sqrt{p_{i}} \ket{{\phi}_{i}^{A}} \otimes U \ket{{\phi}_{i}^{B}} = \left(\rho^{\frac{1}{2}} \otimes U \right) \ket{{\phi}_{i}^{A}} \otimes \ket{{\phi}_{i}^{B}} \\ &\sigma \rightarrow \ket{\psi_{\sigma}(U')} = \sum_{i} \sqrt{p_{i}} \ket{\tilde{\phi}_{i}^{A}} \otimes U' \ket{\tilde{\phi}_{i}^{B}} = \left(\sigma \otimes U'\right) \ket{\tilde{\phi}_{i}^{A}} \otimes U' \ket{\tilde{\phi}_{i}^{B}}\ , \end{align} where \( U, U' \) are two unitary transformations of the eigenbasis of the auxiliary space B, which we have freedom in setting during the purification process.

Afterwards, we calculate the usual fidelity between the purified states.

The fidelity has a number of interesting properties.



Footnotes


[1] If it would be less then we would not be able to purify the state without going to another basis of states in A.