Vector

Intro

Vector is the most fundamental concept in linear algebra. It is also used universally in machine learning algorithms. One simple application is to write our familiar simultaneous equation into its vector form.

$4x + 3y = 20\\ x + y = 10\tag{1}$

Equation (1) can be written as,

$\begin{pmatrix}4&3\\1&1\end{pmatrix}*\begin{pmatrix}x\\y\end{pmatrix}=\begin{pmatrix}20\\10\end{pmatrix}$

This vector form of equation is what we will use most of the time to write out machine learning equations.

In addition, we can summarize an object’s features in a vector. For example, if we have a house of 120 square meter with 2 bedrooms and 1 bathroom located at city center, we can specify the house parameters as

$house_A = \begin{pmatrix}120\\2\\1\\1\end{pmatrix}$

The first entry represents the area of the house. The second and third entry represent the number of bedrooms and bathrooms respectively in the house. The last entry is a boolean (0 or 1) value that indicates if the house is at the city center (value 1) or outside the city center (value 0). This vectorized express can then be used as input to our machine learning algorithms.

Lastly, we also express our model parameters as a vector. In a normal distribution, we have two parameters $\mu$ and $\sigma$ that specify its center and spread. So a normal distribution is represented as

$\begin{pmatrix}\mu\\\sigma\end{pmatrix}=\begin{pmatrix}1\\2\end{pmatrix}$

In machine learning, we will keep on optimizing the model parameters so that they can better fit the actual data. During this process, we are effectively updating $\mu$ and $\sigma$ of our model vector in an iterative manner.

In this chapter, we will cover some essential topics on vector. We will start with explaining the basic vector operations. Then we will introduce one of the most important vector operation called dot product. After that, we will see how to use dot product to calculate angles between two vectors as well as how to perform vector projections. Lastly, we will discuss the basis which vectors are referenced to and how to change the vector basis.

Basic Vector Operation

There are 4 basic operations on vectors namely, addition, subtraction, scaler multiplication and modulus.

To explain these operations, we first define two vectors $r$ and $s$ where

$r=\begin{pmatrix}4 \\ 3 \end{pmatrix},s=\begin{pmatrix}1 \\ 2 \end{pmatrix}$

Plot $r$ and $s$ on a graph
Vectors

Addition

To add vector $r$ and $s$ , we add each element of vector $r$ and $s$ together.

$r+s=\begin{pmatrix}4\\3\end{pmatrix}+\begin{pmatrix}1\\2\end{pmatrix}=\begin{pmatrix}4+1\\1+2\end{pmatrix}=\begin{pmatrix}5\\3\end{pmatrix}$

This can be shown graphically as follows. We shift vector $s$ parallel to the end of vector $r$ , connecting the end of vector $r$ to the beginning of the new vector $s'$ . The resultant vector is then from the beginning of vector $r$ to the end of vector $s'$ .

It is also worth noting that vector addition is associative, i.e. $r+s = s+r$ .

Subtraction

Vector subtraction is similar to vector addition. We subtract each element of a vector from that of the other vector.

$r-s=\begin{pmatrix}4\\3\end{pmatrix}-\begin{pmatrix}1\\2\end{pmatrix}=\begin{pmatrix}4-1\\3-2\end{pmatrix}=\begin{pmatrix}3\\1\end{pmatrix}$

To solve this graphically, there are essentially two steps involved. First, we calculate the negative of vector $s$ ,

$-s=-\begin{pmatrix}1\\2\end{pmatrix}=\begin{pmatrix}-1\\-2\end{pmatrix}$

Then, we perform a normal vector addition of vector $r$ and vector $-s$ . Vector $-s$ is shifted parallel to the end of vector $r$ to form the resultant vector. This is illustrated in the graph below.
Vector Subtraction

Scalar Multiplication

Scalar multiplication calculates the multiple of a vector. Again, element-wise multiplication is performed.

$2*r=2*\begin{pmatrix}4\\3\end{pmatrix}=\begin{pmatrix}2*4\\2*3\end{pmatrix}=\begin{pmatrix}8\\6\end{pmatrix}$

It is equivalent to performing vector addition multiple times

$2*r=\begin{pmatrix}4\\3\end{pmatrix}+\begin{pmatrix}4\\3\end{pmatrix}=\begin{pmatrix}4+4\\3+3\end{pmatrix}=\begin{pmatrix}8\\6\end{pmatrix}$

Graphically, this is to extend the vector along the line where this vector lies.
Scalar Multiplication
Multiplying a vector by a negative scalar works almost the same way, except that now the vector is extending in the opposite direction.

Modulus

Lastly, modulus of a vector is the length of this vector. By Pythagoras Theorem, the square of the hypotenuse (the longest side) is equal to the sum of squares of the other two sides. To calculate the modulus of vector $r$ , we know that $r$ has a horizontal length of 4 and a vertical length of 3. Therefore,

$|r|=\sqrt{4^2+3^2}=5$
Modulus
The modulus operation is represented by two vertical bars (|) enclosing the vector, $|r|$ . This operation is not limited to 2-dimensional space. The way to calculate the modulus of a vector with more dimensions is the same - get the square root of sum of squares of the vector components.

That is all you need to know about the basic vector operations. Let’s move on to our next topic for more advanced vector operations.

Dot Product

Dot product, or sometimes is called inner product, is one of the most important vector operations. You are going to see it a lot later when we dive into the derivation of different machine learning algorithms. It also preludes how we calculate angles between two vectors and projection of one vector onto the other.

We have learnt that scalar multiplication multiplies a vector by a scalar. Dot product, on the other hand multiplies a vector by another vector. In general, for vector $a$ and vector $b$ , dot product ( $a\cdot b$ ) evaluates to:

$a\cdot b=\begin{pmatrix}a_1\\a_2\\\vdots\\a_n\end{pmatrix}\cdot\begin{pmatrix}b_1\\b_2\\\vdots\\b_n\end{pmatrix}=a_1*b_1+a_2*b_2+...+a_n*b_n$

In our example of previously defined vectors $r$ and $s$ ,

$r\cdot s=\begin{pmatrix}4\\3\\\end{pmatrix}\cdot\begin{pmatrix}1\\2\\\end{pmatrix}=4*1+3*2=10$

There are three properties of dot product operation.

Dot products are commutative, so $r\cdot s=s\cdot r$ .
Dot products are distributive, so $r\cdot (s+t)=r\cdot s + r\cdot t$ .
Dot products are not associative, so $r\cdot(s\cdot t)\neq(r\cdot s)\cdot t$ .

It is interesting to note that dot product of a vector by itself is equal to the square of its modulus.

$\begin{aligned} r\cdot r &= r_1*r_1 + r_2*r_2 + \cdots + r_n*r_n \\ &= r_1^2+r_2^2+\cdots+r_n^2 \\ &= |r|^2 \end{aligned} \tag{2}$

Calculate Angle Between Two Vectors

Now we are ready to derive the angle between two vectors using what we have learnt on dot product.

First, let’s refresh our memory on cosine rule. Given the lengths of two sides of a triangle (a and b) and the angle between them (θ), we can calculate the length of the opposite side (c) using following formula.

$c^2=a^2+b^2-2ab\cos\theta\tag{3}$

If we take side a as our vector $r$ and side b as our vector $s$ , then side c is vector $r-s$ as shown below.
Cosine Rule in Vector
Equation (3) can be rewritten as

$\begin{aligned} |r-s|^2&=(r-s)\cdot(r-s)\\ &=r\cdot r+s\cdot s-r\cdot s-r\cdot s\\ &=|r|^2+|s|^2-2*|r|*|s|*\cos\theta \end{aligned} \tag{4}$

Because we learned from previous equation (2) that square of modulus of a vector equals to the inner product of the vector by itself.

So now we have $r$ , $s$ , and $r-s$ , how can we calculate the angle θ between $r$ and $s$ ?

On the left hand side of equation (4) we can convert $|r-s|^2$ to a dot product as

$\begin{aligned} |r-s|^2&=(r-s)\cdot(r-s)\\ &=r\cdot r+s\cdot s - r\cdot s - r\cdot s \\ &= |r|^2+|s|^2-2*(r\cdot s) \end{aligned}$

Substitute this back to equation (4), we get

$\begin{aligned} |r|^2+|s|^2-2*(r\cdot s) &= |r|^2+|s|^2-2*|r|*|s|*\cos\theta\\ r\cdot s &= |r|*|s|*\cos\theta\\ \cos\theta &= \frac{r\cdot s}{|r|*|s|} \end{aligned} \tag{5}$

Therefore, angle θ between vector $r$ and $s$ can be calculated by dot product of $r$ and $s$ and modulus of $r$ and $s$ .

We are also interested in some special angle θ between $r$ and $s$ . For example

When θ = 0, $r$ and $s$ are in the same direction. $\cos\theta=\frac{r\cdot s}{|r|*|s|}=1$
When θ = 90, $r$ and $s$ are orthogonal to each other. $\cos\theta=\frac{r\cdot s}{|r|*|s|}=$0
When θ = 180, $r$ and $s$ are in the opposite directions. $\cos\theta=\frac{r\cdot s}{|r|*|s|}=-1$

Vector Projection

Another important concept in vectors is projection. For vector $r$ and $s$ , we can draw a line $s_\perp$ from $s$ to $r$ such that $s_\perp$ is perpendicular to $r$ . The length $s'$ represents the projection of vector $s$ on vector $r$ .
Vector Projection
We know that

$\cos\theta = \frac{|s'|}{|s|} \tag{6}$

Substitute equation (6) into (5)

$\begin{aligned} \frac{|s'|}{|s|}&=\frac{r\cdot s}{|r|*|s|}\\ |s'|&=\frac{r\cdot s}{|r|} \end{aligned}$

$|s'|$ is called the scalar projection of vector $s$ onto $r$ . It has only magnitude, but no direction. In order to find the direction of the projection, we need to use following formula

$s'=|s'|*\frac{r}{|r|}=\frac{r\cdot s}{|r|}*\frac{r}{|r|}$

$\frac{r}{|r|}$ is the unit length vector in the direction of $r$ . Multiplying the scalar projection $|s'|$ with unit length vector $\frac{r}{|r|}$ gives us the projection in the direction of $r$ . This is called the vector projection of $s$ onto $r$ .

With this, we have concluded our discussion on dot product operation on vectors and the calculation of angle and projection between two vectors. In the next topic, we will see these concepts in action when they are applied to changing basis of a vector

Changing Basis

So far we have only seen vectors in their own coordinate systems. It would be worthwhile to define the coordinate system or basis where our vectors are referenced to.

We can express a 2-dimensional vector as the sum of two basis vectors. For our vector $r=\begin{pmatrix}4\\3\end{pmatrix}$ , we can define 2 basis vectors $e_1=\begin{pmatrix}1\\0\end{pmatrix}$ and $ e_2=\begin{pmatrix}0\1\end{pmatrix}$such that $r = 4e_1 + 3e_2$ .

However, the choice of basis vectors $e_1$ and $e_2$ is arbitrary. It depends entirely on how coordinate systems are set up. You might want to have 2 basis vectors that are of unequal lengths or are not orthogonal to each other. Let’s see what happens when we change the basis vectors to a different one.

For example, we can define a new set of basis vectors $b_1=\begin{pmatrix}2\\1\end{pmatrix}$ and $b_2=\begin{pmatrix}-2\\4\end{pmatrix}$ . $b_1$ and $b_2$ are defined on the basis vector $e_1$ and $e_2$ . What is our vector $r$ expressed in $b_1$ and $b_2$ now?
New Basis Vectors
This is where vector projection comes into play. We need to calculate the vector projection of $r$ on the new basis vector $b_1$ and $b_2$ respectively.

To calculate the vector projection of $r$ onto $b_1$ ,

$\begin{aligned} \frac{r\cdot b_1}{|b_1|}*\frac{1}{|b_1|}&=\frac{\begin{pmatrix}4\\3\end{pmatrix}\cdot \begin{pmatrix}2\\1\end{pmatrix}}{\begin{pmatrix}2\\1\end{pmatrix}\cdot \begin{pmatrix}2\\1\end{pmatrix}}\\ &=\frac{11}{5} \end{aligned}$

$\frac{r\cdot b_1}{|b_1|}$ gives us the scalar projection of $r$ onto $b_1$ . By dividing that by the magnitude of $b_1$ , we know that the projection is $\frac{11}{5}$ of length of $b_1$ , thus the vector projection of $r$ onto $b_1$ is $\frac{11}{5}b_1$ .

Similarly, we can calculate the vector projection of $r$ onto $b_2$ .

$\begin{aligned} \frac{r\cdot b_2}{|b_2|}*\frac{1}{|b_2|}&=\frac{\begin{pmatrix}4\\3\end{pmatrix}\cdot \begin{pmatrix}-2\\4\end{pmatrix}}{\begin{pmatrix}-2\\4\end{pmatrix}\cdot \begin{pmatrix}-2\\4\end{pmatrix}}\\ &=\frac{4}{20}\\ &=\frac{1}{5} \end{aligned}$

So our vector $r$ can be expressed as a vector sum of $b_1$ and $b_2$ .

$r=\frac{11}{5}b_1+\frac{1}{5}b_2$

If we evaluate this expression by substituting vectors $b_1$ and $b_2$ in the original $e_1$ and $e_2$ basis, we will get back our original vector $r$ .

$\begin{aligned} r&=\frac{11}{5}b_1+\frac{1}{5}b_2\\ &=\frac{11}{5}*\begin{pmatrix}2\\1\end{pmatrix}+\frac{1}{5}*\begin{pmatrix}-2\\4\end{pmatrix}\\ &=\begin{pmatrix}\frac{22}{5}\\\frac{11}{5}\end{pmatrix}+\begin{pmatrix}-\frac{2}{5}\\\frac{4}{5}\end{pmatrix}\\ &=\begin{pmatrix}4\\3\end{pmatrix} \end{aligned}$

Note here $b_1$ and $b_2$ are orthogonal to each other. We can verify this by calculating the cosine of angle θ between $b_1$ and $b_2$ .

$\begin{aligned} \cos\theta&= \frac{b_1\cdot b_2}{|b_1|*|b_2|}\\ &=\frac{\begin{pmatrix}2\\1\end{pmatrix}\cdot \begin{pmatrix}-2\\4\end{pmatrix}}{\sqrt5*\sqrt20} \\ &=\frac{0}{10} \\ &=0 \end{aligned}$

Since $\cos\theta=0$ , $\theta=90$ .

So we have successfully converted the basis for our vector $r$ from the original basis vectors $e_1$ and $e_2$ to the new basis vectors $b_1$ and $b_2$ . This method of change basis works as long as the new basis vectors are orthogonal to each other. For a more general case where the new basis vectors can have any angle between them involves a different matrix operation which will be covered in our next chapter.

When we extend this method to 3 or more dimensional space, it is critical that the additional basis vector is not a linear combination of existing ones. This property is called linearly independent. It means we cannot find a value ⍺ and β that satisfies the linear equation below, so $b_3$ does not lie on the same plane as $b_1$ and $b_2$ .

$b_3=\alpha*b_1+\beta*b_2$

That is it! We have completed our discussion on vectors in linear algebra. You have built a solid foundation for what we will explore further in future chapters.

(Inspired by Mathematics for Machine Learning lecture series from Imperial College London)

来源：CSDN

作者：Lin D.

链接：https://blog.csdn.net/datascientistlin/article/details/103874756

标签

vector

sss

Mathematics Basics - Linear Algebra (Vector)