矩阵微积分的一些实用结论与推导

断了今生、忘了曾经 提交于 2019-12-26 23:06:56


在一些优化问题中,经常会出现选择向量或者矩阵来最优化某个目标函数的情况,要想从理论上求解这类优化,就需要正确计算目标函数关于向量或者矩阵的导数。比如多元回归模型中,要用最小二乘法估计回归系数,需要做以下的最优化:
minβQ=(YXβ)2 {\min_{\beta}} Q=(Y - X\beta)^2
然而现有的教材和论文都只是需要什么就临时查证推导一下,很少有系统地总结目标函数怎么对向量或矩阵求导的资料。这篇博文比较全面地整理了向量与矩阵的一些常用运算,以及怎么对这些常用运算求导的方法。有张量积和拉平算子就足以解决大部分领域的问题了,所以这篇博文不会涉及张量以及张量分析的内容。出于实用考虑本文考察的向量与矩阵的元素取值均为实数。

向量与矩阵的相关运算

加法和减法太简单了就不说了。本文向量默认都是列向量。以下运算中除了行列式或者使用了行列式的运算外,其他运算都能用在非方阵的情形。

矩阵的一元运算

矩阵的拉直算子

矩阵的拉直算子(Matrix Vec Operator)将矩阵按列拉直成一个列向量。
vec(A)=[A.1A.2...A.n1A.n]Rn2 vec(A) = \left[ \begin{matrix} A_{.1} \\ A_{.2} \\ ...\\ A_{.n-1} \\ A_{.n} \\ \end{matrix} \right] \in \mathbb{R}^{n^2}

矩阵的迹

矩阵AA的迹记为tr(A)tr(A),按Einstein求和约定的记法,tr(A)=i=1nAii=Aiitr(A)= \sum_{i=1}^{n} A_{ii}=A_{ii}

矩阵的行列式

定义nn元的置换运算:
σ=[12...n1nσ(1)σ(2)...σ(n1)σ(n)] \sigma = \left[ \begin{matrix} 1 & 2 & ... &n-1 &n \\ \sigma(1) & \sigma(2) & ... & \sigma(n-1)&\sigma(n) \\ \end{matrix} \right]
其含义是将指标1,2,...,n1,2,...,n重新排列。置换运算的符号取决于将排列还原为1,2,...,n1,2,...,n的顺序需要多少步,如果需要偶数步则符号为正。记所有nn元置换运算的集合为PnP_n。定义矩阵AA的行列式为:
det(A)=A=σPnsgn(σ)A1σ(1)A2σ(2)...Anσ(n) det(A)=|A|=\sum_{\sigma \in P_n} sgn(\sigma)A_{1\sigma(1)}A_{2\sigma(2)}...A_{n\sigma(n)}

伴随矩阵与矩阵的逆

矩阵AA的迹记为adj(A)adj(A)或者AA^*(A)ij=Mji(A^*)_{ij}=M_{ji}MjiM_{ji}是矩阵AA的代数余子式,
Mji=σPn,σ(j)=isgn(σ)A1σ(1)...Aj1,σ(j1)Aj+1,σ(j+1)...Anσ(n) M_{ji} = \sum_{\sigma \in P_n, \sigma(j)=i} sgn(\sigma)A_{1\sigma(1)}...A_{j-1,\sigma(j-1)}A_{j+1,\sigma(j+1)}...A_{n\sigma(n)}
有一个比较明显的结论,先把计算行列式的各项分成含AijA_{ij}的与不含AijA_{ij}
A=σPn,σ(i)=jsgn(σ)A1σ(1)A2σ(2)...Anσ(n)+σPn,σ(i)jsgn(σ)A1σ(1)A2σ(2)...Anσ(n) |A|=\sum_{\sigma \in P_n,\sigma(i)=j} sgn(\sigma)A_{1\sigma(1)}A_{2\sigma(2)}...A_{n\sigma(n)} + \sum_{\sigma \in P_n,\sigma(i)\ne j} sgn(\sigma)A_{1\sigma(1)}A_{2\sigma(2)}...A_{n\sigma(n)}
从而将行列式对AijA_{ij}求导的结果是
AAij=σPn,σ(i)=jsgn(σ)A1σ(1)...Aj1,σ(j1)Aj+1,σ(j+1)...Anσ(n)=Mij \frac{\partial |A|}{\partial A_{ij}} = \sum_{\sigma \in P_n, \sigma(i)=j} sgn(\sigma)A_{1\sigma(1)}...A_{j-1,\sigma(j-1)}A_{j+1,\sigma(j+1)}...A_{n\sigma(n)} = M_{ij}
矩阵的逆等于
A1=AA A^{-1} = \frac{A^*}{|A|}

矩阵的二元运算

矩阵的乘法

矩阵的乘法被定义为对应的行和列分别相乘。假设ARm×n,BRn×mA \in \mathbb{R}^{m\times n},B \in \mathbb{R}^{n\times m}
AB=[A1.B.1A1.B.2...A1.B.(m1)A1.B.mA2.B.1A2.B.2...A2.B.(m1)A2.B.m...............A(m1).B.1A(m1).B.2...A(m1).B.(m1)A(m1).B.mAm.B.1Am.B.2...Am.B.(m1)Am.B.m] AB = \left[ \begin{matrix} A_{1.}B_{.1} & A_{1.}B_{.2} & ... &A_{1.}B_{.(m-1)} &A_{1.}B_{.m}\\ A_{2.}B_{.1} & A_{2.}B_{.2} & ... &A_{2.}B_{.(m-1)} &A_{2.}B_{.m} \\ ... & ... & ... &... &...\\ A_{(m-1).}B_{.1} & A_{(m-1).}B_{.2} & ... &A_{(m-1).}B_{.(m-1)} &A_{(m-1).}B_{.m} \\ A_{m.}B_{.1} & A_{m.}B_{.2} & ... &A_{m.}B_{.(m-1)} &A_{m.}B_{.m} \\ \end{matrix} \right]
用这个定义可以验证AA=AInAA^* = |A|I_n,因此上面给出的矩阵的逆是符合逆的含义的。

Hadamard乘积

Hadamard乘积也是用处非常广的一种运算,它将两个形状一样的矩阵按对应位置元素进行乘积,并保持矩阵形状不变。
AB=[A11B11A12B12...A1(m1)B1(m1)A1mB1mA21B21A22B22...A2(m1)B2(m1)A2mB2m...............A(n1)1B(n1)1A(n1)2B(n1)2...A(n1)(m1)B(n1)(m1)A(n1)mB(n1)mAn1Bn1An2Bn2...An(m1)Bn(m1)AnmBnm] A\circ B = \left[ \begin{matrix} A_{11}B_{11} & A_{12}B_{12} & ... &A_{1(m-1)}B_{1(m-1)} &A_{1m}B_{1m}\\ A_{21}B_{21} & A_{22}B_{22} & ... &A_{2(m-1)}B_{2(m-1)} &A_{2m}B_{2m} \\ ... & ... & ... &... &...\\ A_{(n-1)1}B_{(n-1)1} & A_{(n-1)2}B_{(n-1)2} & ... &A_{(n-1)(m-1)}B_{(n-1)(m-1)} &A_{(n-1)m}B_{(n-1)m} \\ A_{n1}B_{n1} & A_{n2}B_{n2} & ... &A_{n(m-1)}B_{n(m-1)} &A_{nm}B_{nm} \\ \end{matrix} \right]

Kronecker积

矩阵的Kronecker积对矩阵的形状没有要求,假设ARn×m,BRp×qA \in \mathbb{R}^{n\times m},B \in \mathbb{R}^{p\times q},则ABnp×mqA \otimes B \in \mathbb{np \times mq}
AB=[A11BA12B...A1(m1)BA1mBA21BA22B...A2(m1)BA2mB...............A(n1)1BA(n1)2B...A(n1)(m1)BA(n1)mBAn1BAn2B...An(m1)BAnmB] A\otimes B = \left[ \begin{matrix} A_{11}B& A_{12}B & ... &A_{1(m-1)}B &A_{1m}B\\ A_{21}B& A_{22}B & ... &A_{2(m-1)}B &A_{2m}B\\ ... & ... & ... &... &...\\ A_{(n-1)1}B & A_{(n-1)2}B & ... &A_{(n-1)(m-1)}B &A_{(n-1)m}B \\ A_{n1}B & A_{n2}B& ... &A_{n(m-1)}B&A_{nm}B \\ \end{matrix} \right]

数量对向量的导数

数量对列向量的导数

xRn,yRx \in \mathbb{R}^n,y\in \mathbb{R},则
yx=[yx1yx2...yxn1yxn]T \frac{\partial y}{\partial x} = \left[ \begin{matrix} \frac{\partial y}{\partial x_1} & \frac{\partial y}{\partial x_2} & ... &\frac{\partial y}{\partial x_{n-1}} &\frac{\partial y}{\partial x_n} \\ \end{matrix} \right]^T
简单扩展一下就知道,如果xRn,yRnx \in \mathbb{R}^n,y\in \mathbb{R}^n,则列向量对列向量的导数
yx=[y1x1y1x2...yn1xn1ynxn...............ynx1ynx2...ynxn1ynxn] \frac{\partial y}{\partial x} = \left[ \begin{matrix} \frac{\partial y_1}{\partial x_1} & \frac{\partial y_1}{\partial x_2} & ... &\frac{\partial y_{n-1}}{\partial x_{n-1}} &\frac{\partial y_n}{\partial x_n} \\ ... & ... & ... &... &...\\ \frac{\partial y_n}{\partial x_1} & \frac{\partial y_n}{\partial x_2} & ... &\frac{\partial y_n}{\partial x_{n-1}} &\frac{\partial y_n}{\partial x_n} \\ \end{matrix} \right]
显然yy=In\frac{\partial y}{\partial y}=I_n

对内积运算求导

xRn,yRn,zRnx \in \mathbb{R}^n,y\in \mathbb{R}^n,z\in \mathbb{R}^n,则
(y,z)x=[(y,z)x1(y,z)x2...(y,z)xn1(y,z)xn]T \frac{\partial (y,z)}{\partial x} =\left[ \begin{matrix} \frac{\partial (y,z)}{\partial x_1} & \frac{\partial (y,z)}{\partial x_2} & ... &\frac{\partial (y,z)}{\partial x_{n-1}} &\frac{\partial (y,z)}{\partial x_n} \\ \end{matrix} \right]^T
其中
(y,z)xk=yizixk=yixkzi+yizixk \frac{\partial (y,z)}{\partial x_{k}} = \frac{\partial y^iz_i}{\partial x_{k}} = \frac{\partial y^i}{\partial x_{k}} z_i+y^i\frac{\partial z_i}{\partial x_{k}}
如果x=yx=yzz为常数,则
(y,z)yk=yiykzi+yiziyk=δkizi=zk \frac{\partial (y,z)}{\partial y_{k}} = \frac{\partial y^i}{\partial y_{k}} z_i+y^i\frac{\partial z_i}{\partial y_{k}} = \delta_k^iz_i=z_k
(y,z)y=[z1z2...zn1zn]T=z \frac{\partial (y,z)}{\partial y} =\left[ \begin{matrix} z_1 &z_2 & ... &z_{n-1} & z_n \\ \end{matrix} \right]^T = z
如果x=y=zx=y=z,则
(y,y)yk=yiykyi+yiyiyk=δkiyi+yiδik=2yk \frac{\partial (y,y)}{\partial y_{k}} = \frac{\partial y^i}{\partial y_{k}} y_i+y^i\frac{\partial y_i}{\partial y_{k}} = \delta_k^iy_i + y^i \delta_{ik}=2y_k
(y,y)y=2[y1y2...yn1yn]T=2y \frac{\partial (y,y)}{\partial y} =2\left[ \begin{matrix} y_1 &y_2 & ... &y_{n-1} & y_n \\ \end{matrix} \right]^T = 2y
一般性的结论为
(y,z)x=[yix1ziyix2zi...yixn1ziyixnzi]T+[yizix1yizix2...yizixn1yizixn]T=yxz+zxy \frac{\partial (y,z)}{\partial x} =\left[ \begin{matrix} \frac{\partial y^i}{\partial x_1}z_i &\frac{ \partial y^i}{\partial x_2}z_i & ... &\frac{\partial y^i}{\partial x_{n-1}}z_i &\frac{\partial y^i}{\partial x_n}z_i \\ \end{matrix} \right]^T \\ + \left[ \begin{matrix} y_i \frac{\partial z^i}{\partial x_1}& y_i \frac{ \partial z^i}{\partial x_2} & ... &y_i\frac{\partial z^i}{\partial x_{n-1}}&y_i \frac{\partial z^i}{\partial x_n} \\ \end{matrix} \right]^T \\ = \frac{\partial y}{\partial x}z + \frac{\partial z}{\partial x} y

对矩阵与向量的乘积求导

假设ARm×n,xRnA \in \mathbb{R}^{m \times n},x \in \mathbb{R}^n,记y=Axy=Ax
Axx=[y1x1y1x2...yn1xn1ynxn...............ymx1ymx2...ymxn1ymxn] \frac{\partial Ax}{\partial x} = \left[ \begin{matrix} \frac{\partial y_1}{\partial x_1} & \frac{\partial y_1}{\partial x_2} & ... &\frac{\partial y_{n-1}}{\partial x_{n-1}} &\frac{\partial y_n}{\partial x_n} \\ ... & ... & ... &... &...\\ \frac{\partial y_m}{\partial x_1} & \frac{\partial y_m}{\partial x_2} & ... &\frac{\partial y_m}{\partial x_{n-1}} &\frac{\partial y_m}{\partial x_n} \\ \end{matrix} \right]
其中
yixj=Aikxkxj=Aij \frac{\partial y_i}{\partial x_j} = \frac{\partial A_{ik}x^k}{\partial x_j} = A_{ij}
所以Axx=A\frac{\partial Ax}{\partial x} =A

对二次型求导

考虑二次型xTAyx^TAyxRm,ARm×n,yRnx \in \mathbb{R}^m, A \in \mathbb{R}^{m \times n}, y \in \mathbb{R}^n,记u=Ay,vT=xTAu=Ay,v^T=x^TA
xTAyx=(x,u)x=u=Ay \frac{\partial x^TAy}{\partial x} = \frac{\partial (x,u)}{\partial x} = u = Ay
xTAyy=(v,y)y=v=ATx \frac{\partial x^TAy}{\partial y} = \frac{\partial (v,y)}{\partial y} = v = A^Tx
如果AA为方阵,且x=yx=y,则
xTAxx=(x,u)x=Inu+Axxx=Inu+Ax=2Ax \frac{\partial x^TAx}{\partial x} = \frac{\partial (x,u)}{\partial x} = I_nu + \frac{\partial Ax}{\partial x} x = I_nu + A x=2Ax

矩阵对矩阵的导数

数量对矩阵的导数

假设aR,XRm×na \in \mathbb{R},X \in \mathbb{R}^{m \times n},定义
aX=[aX11aX12...aX1(n1)aX1naX21aX22...aX2(n1)aX2n...............aX(m1)1aX(m1)2...aX(m1)(n1)aX(m1)naXm1aXm2...aXm(n1)aXmn] \frac{\partial a}{\partial X} = \left[ \begin{matrix} \frac{\partial a}{\partial X_{11}} & \frac{\partial a}{\partial X_{12}} & ... &\frac{\partial a}{\partial X_{1(n-1)}} &\frac{\partial a}{\partial X_{1n}} \\ \frac{\partial a}{\partial X_{21}} & \frac{\partial a}{\partial X_{22}} & ... &\frac{\partial a}{\partial X_{2(n-1)}} &\frac{\partial a}{\partial X_{2n}} \\ ... & ... & ... &... &...\\ \frac{\partial a}{\partial X_{(m-1)1}} & \frac{\partial a}{\partial X_{(m-1)2}} & ... &\frac{\partial a}{\partial X_{(m-1)(n-1)}} &\frac{\partial a}{\partial X_{(m-1)n}} \\ \frac{\partial a}{\partial X_{m1}} & \frac{\partial a}{\partial X_{m2}} & ... &\frac{\partial a}{\partial X_{m(n-1)}} &\frac{\partial a}{\partial X_{mn}} \\ \end{matrix} \right]
如果a=Xija=X_{ij},则
XijX=Eij \frac{\partial X_{ij}}{\partial X} = E_{ij}
其中EijE_{ij}为只有第ii行第jj列是1,其他元素均为0矩阵。类似地,假设YRp×q,XRm×nY \in \mathbb{R}^{p \times q},X \in \mathbb{R}^{m \times n},可以定义矩阵对矩阵的导数
YX=[Y11XY12X...Y1(q1)XY1qXY21XY22X...Y2(q1)XY2qX...............Y(p1)1XY(p1)2X...Y(p1)(q1)XY(p1)qXYp1XYp2X...Yp(q1)XYpqX]Rpm×qn \frac{\partial Y}{\partial X} = \left[ \begin{matrix} \frac{\partial Y_{11}}{\partial X} & \frac{\partial Y_{12}}{\partial X} & ... & \frac{\partial Y_{1(q-1)}}{\partial X} & \frac{\partial Y_{1q}}{\partial X} \\ \frac{\partial Y_{21}}{\partial X} & \frac{\partial Y_{22}}{\partial X} & ... & \frac{\partial Y_{2(q-1)}}{\partial X} & \frac{\partial Y_{2q}}{\partial X} \\ ... & ... & ... &... &...\\ \frac{\partial Y_{(p-1)1}}{\partial X} & \frac{\partial Y_{(p-1)2}}{\partial X} & ... & \frac{\partial Y_{(p-1)(q-1)}}{\partial X} & \frac{\partial Y_{(p-1)q}}{\partial X} \\ \frac{\partial Y_{p1}}{\partial X} & \frac{\partial Y_{p2}}{\partial X} & ... & \frac{\partial Y_{p(q-1)}}{\partial X} & \frac{\partial Y_{pq}}{\partial X} \\ \end{matrix} \right] \in \mathbb{R}^{pm \times qn}
如果Y=XY=X
XX=[E11E12...E1(n1)E1nE21E22...E2(n1)E2n...............E(m1)1E(m1)2...E(m1)(n1)E(m1)nEm1Em2...Em(n1)Emn] \frac{\partial X}{\partial X} = \left[ \begin{matrix} E_{11}& E_{12} & ... &E_{1(n-1)} &E_{1n} \\ E_{21}& E_{22} & ... &E_{2(n-1)} &E_{2n} \\ ... & ... & ... &... &...\\ E_{(m-1)1}& E_{(m-1)2} & ... &E_{(m-1)(n-1)} &E_{(m-1)n} \\ E_{m1}& E_{m2} & ... &E_{m(n-1)} &E_{mn} \\ \end{matrix} \right]
定义这个矩阵为ERm2×n2E \in \mathbb{R}^{m^2 \times n^2}。类似的可以计算出转置的导数
XTX=ET \frac{\partial X^T}{\partial X} = E^T

对矩阵的一元运算求导

对拉直算子求导

vec(X)X=X[X.1X.2...X.n1X.n] \frac{\partial vec(X)}{\partial X} = \frac{\partial }{\partial X} \left[ \begin{matrix} X_{.1} \\ X_{.2} \\ ...\\ X_{.n-1} \\ X_{.n} \\ \end{matrix} \right]
其中
X.jX=X[X1jX2j...X(m1)jXmj]=[E1jE2j...E(m1)jEmj]=E.j \frac{\partial X_{.j}}{\partial X} = \frac{\partial }{\partial X} \left[ \begin{matrix} X_{1j} \\ X_{2j} \\ ...\\ X_{(m-1)j} \\ X_{mj} \\ \end{matrix} \right] = \left[ \begin{matrix} E_{1j} \\ E_{2j} \\ ...\\ E_{(m-1)j} \\ E_{mj} \\ \end{matrix} \right] = E_{.j}
所以
vec(X)X=[E.1E.2...E.n1E.n] \frac{\partial vec(X)}{\partial X} = \left[ \begin{matrix} E_{.1} \\ E_{.2} \\ ...\\ E_{.n-1} \\ E_{.n} \\ \end{matrix} \right]

对矩阵的迹求导数

a=tr(X)a=tr(X)XX是一个方阵
aX=[aX11aX12...aX1(n1)aX1naX21aX22...aX2(n1)aX2n...............aX(n1)1aX(n1)2...aX(n1)(n1)aX(n1)naXn1aXn2...aXn(n1)aXnn] \frac{\partial a}{\partial X} = \left[ \begin{matrix} \frac{\partial a}{\partial X_{11}} & \frac{\partial a}{\partial X_{12}} & ... &\frac{\partial a}{\partial X_{1(n-1)}} &\frac{\partial a}{\partial X_{1n}} \\ \frac{\partial a}{\partial X_{21}} & \frac{\partial a}{\partial X_{22}} & ... &\frac{\partial a}{\partial X_{2(n-1)}} &\frac{\partial a}{\partial X_{2n}} \\ ... & ... & ... &... &...\\ \frac{\partial a}{\partial X_{(n-1)1}} & \frac{\partial a}{\partial X_{(n-1)2}} & ... &\frac{\partial a}{\partial X_{(n-1)(n-1)}} &\frac{\partial a}{\partial X_{(n-1)n}} \\ \frac{\partial a}{\partial X_{n1}} & \frac{\partial a}{\partial X_{n2}} & ... &\frac{\partial a}{\partial X_{n(n-1)}} &\frac{\partial a}{\partial X_{nn}} \\ \end{matrix} \right]
其中
aXij=tr(X)Xij=XiiXij=δij \frac{\partial a}{\partial X_{ij}} = \frac{\partial tr(X)}{\partial X_{ij}} = \frac{\partial X_{ii}}{\partial X_{ij}} = \delta_{ij}
所以
tr(X)X=In \frac{\partial tr(X)}{\partial X} = I_n
有趣的是,tr(X)=In:Xtr(X)=I_n:X,所以对二点积求导感觉就像和对常数乘以变量的求导形式差不多。我们可以探究一下是不是真的有类似的性质。考虑
a=tr(XY)=X:Ya=tr(XY)=X:Y
aX=[aX11aX12...aX1(n1)aX1naX21aX22...aX2(n1)aX2n...............aX(n1)1aX(n1)2...aX(n1)(n1)aX(n1)naXn1aXn2...aXn(n1)aXnn] \frac{\partial a}{\partial X} = \left[ \begin{matrix} \frac{\partial a}{\partial X_{11}} & \frac{\partial a}{\partial X_{12}} & ... &\frac{\partial a}{\partial X_{1(n-1)}} &\frac{\partial a}{\partial X_{1n}} \\ \frac{\partial a}{\partial X_{21}} & \frac{\partial a}{\partial X_{22}} & ... &\frac{\partial a}{\partial X_{2(n-1)}} &\frac{\partial a}{\partial X_{2n}} \\ ... & ... & ... &... &...\\ \frac{\partial a}{\partial X_{(n-1)1}} & \frac{\partial a}{\partial X_{(n-1)2}} & ... &\frac{\partial a}{\partial X_{(n-1)(n-1)}} &\frac{\partial a}{\partial X_{(n-1)n}} \\ \frac{\partial a}{\partial X_{n1}} & \frac{\partial a}{\partial X_{n2}} & ... &\frac{\partial a}{\partial X_{n(n-1)}} &\frac{\partial a}{\partial X_{nn}} \\ \end{matrix} \right]
其中
aXij=tr(XY)Xij=Xi.Y.iXij=Yji+Xi.Y.iXij \frac{\partial a}{\partial X_{ij}} = \frac{\partial tr(XY)}{\partial X_{ij}} = \frac{\partial X_{i.}Y_{.i}}{\partial X_{ij}} = Y_{ji}+X_{i.}\frac{\partial Y_{.i}}{\partial X_{ij}}
如果YY是常量,
tr(XY)X=YT \frac{\partial tr(XY)}{\partial X} = Y^T
类似的
tr(XY)Y=XT \frac{\partial tr(XY)}{\partial Y} = X^T

对矩阵的行列式求导数

之前已经推导了
AAij=Mij=(A)ji \frac{\partial |A|}{\partial A_{ij}} = M_{ij} = (A^*)_{ji}
所以
AA=(A)T \frac{\partial |A|}{\partial A} = (A^*)^T

对矩阵的二元运算求导

对矩阵的乘法求导

假设AA是一个常量
ABB=B[A1.B.1A1.B.2...A1.B.(m1)A1.B.mA2.B.1A2.B.2...A2.B.(m1)A2.B.m...............A(m1).B.1A(m1).B.2...A(m1).B.(m1)A(m1).B.mAm.B.1Am.B.2...Am.B.(m1)Am.B.m] \frac{\partial AB}{\partial B} = \frac{\partial }{\partial B} \left[ \begin{matrix} A_{1.}B_{.1} & A_{1.}B_{.2} & ... &A_{1.}B_{.(m-1)} &A_{1.}B_{.m}\\ A_{2.}B_{.1} & A_{2.}B_{.2} & ... &A_{2.}B_{.(m-1)} &A_{2.}B_{.m} \\ ... & ... & ... &... &...\\ A_{(m-1).}B_{.1} & A_{(m-1).}B_{.2} & ... &A_{(m-1).}B_{.(m-1)} &A_{(m-1).}B_{.m} \\ A_{m.}B_{.1} & A_{m.}B_{.2} & ... &A_{m.}B_{.(m-1)} &A_{m.}B_{.m} \\ \end{matrix} \right]
定义ll为元素全为1的列向量,JJ为元素全是1的矩阵,其中
Ai.B.jB=Ai.B.jB=ejAi. \frac{\partial A_{i.}B_{.j}}{\partial B} = A_{i.} \frac{\partial B_{.j}}{\partial B} = e_j \otimes A_{i.}
所以
ABB=vec(In)Tvec(AT) \frac{\partial AB}{\partial B} = vec(I_n)^T \otimes vec(A^T)

假设BB是一个常量
ABA=A[A1.B.1A1.B.2...A1.B.(m1)A1.B.mA2.B.1A2.B.2...A2.B.(m1)A2.B.m...............A(m1).B.1A(m1).B.2...A(m1).B.(m1)A(m1).B.mAm.B.1Am.B.2...Am.B.(m1)Am.B.m] \frac{\partial AB}{\partial A} = \frac{\partial }{\partial A} \left[ \begin{matrix} A_{1.}B_{.1} & A_{1.}B_{.2} & ... &A_{1.}B_{.(m-1)} &A_{1.}B_{.m}\\ A_{2.}B_{.1} & A_{2.}B_{.2} & ... &A_{2.}B_{.(m-1)} &A_{2.}B_{.m} \\ ... & ... & ... &... &...\\ A_{(m-1).}B_{.1} & A_{(m-1).}B_{.2} & ... &A_{(m-1).}B_{.(m-1)} &A_{(m-1).}B_{.m} \\ A_{m.}B_{.1} & A_{m.}B_{.2} & ... &A_{m.}B_{.(m-1)} &A_{m.}B_{.m} \\ \end{matrix} \right]
其中
Ai.B.jA=Ai.B.jB=eiB.j \frac{\partial A_{i.}B_{.j}}{\partial A} = A_{i.} \frac{\partial B_{.j}}{\partial B} = e_i \otimes B_{.j}
所以
ABB=vec(In)Tvec(B) \frac{\partial AB}{\partial B} = vec(I_n)^T \otimes vec(B)

对矩阵的Hadamard乘积求导

ABB=B[A11B11A12B12...A1(m1)B1(m1)A1mB1mA21B21A22B22...A2(m1)B2(m1)A2mB2m...............A(n1)1B(n1)1A(n1)2B(n1)2...A(n1)(m1)B(n1)(m1)A(n1)mB(n1)mAn1Bn1An2Bn2...An(m1)Bn(m1)AnmBnm]=[A11E11A12E12...A1(m1)E1(m1)A1mE1mA21E21A22E22...A2(m1)E2(m1)A2mE2m...............A(n1)1E(n1)1A(n1)2E(n1)2...A(n1)(m1)E(n1)(m1)A(n1)mE(n1)mAn1En1An2En2...An(m1)En(m1)AnmEnm] \frac{ \partial A\circ B}{\partial B} = \frac{ \partial }{\partial B} \left[ \begin{matrix} A_{11}B_{11} & A_{12}B_{12} & ... &A_{1(m-1)}B_{1(m-1)} &A_{1m}B_{1m}\\ A_{21}B_{21} & A_{22}B_{22} & ... &A_{2(m-1)}B_{2(m-1)} &A_{2m}B_{2m} \\ ... & ... & ... &... &...\\ A_{(n-1)1}B_{(n-1)1} & A_{(n-1)2}B_{(n-1)2} & ... &A_{(n-1)(m-1)}B_{(n-1)(m-1)} &A_{(n-1)m}B_{(n-1)m} \\ A_{n1}B_{n1} & A_{n2}B_{n2} & ... &A_{n(m-1)}B_{n(m-1)} &A_{nm}B_{nm} \\ \end{matrix} \right] \\= \left[ \begin{matrix} A_{11}E_{11} & A_{12}E_{12} & ... &A_{1(m-1)}E_{1(m-1)} &A_{1m}E_{1m}\\ A_{21}E_{21} & A_{22}E_{22} & ... &A_{2(m-1)}E_{2(m-1)} &A_{2m}E_{2m} \\ ... & ... & ... &... &...\\ A_{(n-1)1}E_{(n-1)1} & A_{(n-1)2}E_{(n-1)2} & ... &A_{(n-1)(m-1)}E_{(n-1)(m-1)} &A_{(n-1)m}E_{(n-1)m} \\ A_{n1}E_{n1} & A_{n2}E_{n2} & ... &A_{n(m-1)}E_{n(m-1)} &A_{nm}E_{nm} \\ \end{matrix} \right]
所以
ABB=(AJ)E \frac{ \partial A\circ B}{\partial B} = (A \otimes J) \circ E

对矩阵的张量积求导

ABB=B[A11BA12B...A1(m1)BA1mBA21BA22B...A2(m1)BA2mB...............A(n1)1BA(n1)2B...A(n1)(m1)BA(n1)mBAn1BAn2B...An(m1)BAnmB]=AE \frac{ \partial A\otimes B}{\partial B}= \frac{ \partial }{\partial B} \left[ \begin{matrix} A_{11}B& A_{12}B & ... &A_{1(m-1)}B &A_{1m}B\\ A_{21}B& A_{22}B & ... &A_{2(m-1)}B &A_{2m}B\\ ... & ... & ... &... &...\\ A_{(n-1)1}B & A_{(n-1)2}B & ... &A_{(n-1)(m-1)}B &A_{(n-1)m}B \\ A_{n1}B & A_{n2}B& ... &A_{n(m-1)}B&A_{nm}B \\ \end{matrix} \right] \\= A \otimes E

ABA=A[A11BA12B...A1(m1)BA1mBA21BA22B...A2(m1)BA2mB...............A(n1)1BA(n1)2B...A(n1)(m1)BA(n1)mBAn1BAn2B...An(m1)BAnmB]=EB \frac{ \partial A\otimes B}{\partial A}= \frac{ \partial }{\partial A} \left[ \begin{matrix} A_{11}B& A_{12}B & ... &A_{1(m-1)}B &A_{1m}B\\ A_{21}B& A_{22}B & ... &A_{2(m-1)}B &A_{2m}B\\ ... & ... & ... &... &...\\ A_{(n-1)1}B & A_{(n-1)2}B & ... &A_{(n-1)(m-1)}B &A_{(n-1)m}B \\ A_{n1}B & A_{n2}B& ... &A_{n(m-1)}B&A_{nm}B \\ \end{matrix} \right] \\= E \otimes B

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!