# Vectorization of deep learning formula

2022-01-27 05:34:50

This is my participation 8 The fourth of the yuegengwen challenge 18 God , Check out the activity details ：8 Yuegengwen challenge

Σ( ° △ °|||)︴ Let's start with a few nonsense . As a biology student （ At that time, all majors chose computers , As a result, I was forced to learn biology ）, But I had to study by myself . I use C++ As an introductory language , When I first started learning data structures , Because it's self-study , A lot of pain , I make complaints about it every day. ： Yes STL Why write it yourself ！！！ I learned later , In order to practice , Therefore, when you often brush questions, you can write some functions that even the library function already has to realize . At that time, all roads lead to Rome in my eyes , You can get the same result anyway . Although I learned the time complexity of the algorithm, I didn't put it into practice , After all, there is not so much test data in question brushing .

Later, an elder told me , You can have a look at the source code of the library function in your spare time . For example, the ranking of others may be a combination of several sorts , They are all algorithms optimized by various bosses .

This is also true for some formulas in our in-depth learning , If you can simplify , It's great to use the library of linear algebra built in the language .

Benefits of using built-in algorithms ：

• Faster
• Implemented in less code
• It's less error prone than what you write yourself
• Better coordination with hardware system

## Take a simple chestnut ：

$h_{\theta}(x)=\sum_{j=0}^{n} \theta_{j} x_{j}$

You can think of it as $\theta^{T} x$ And calculate , This translates into two vectors $\theta=\left[\begin{array}{l}\theta_{0} \\ \theta_{1} \\ \theta_{2} \\ ... \end{array}\right] \quad x=\left[\begin{array}{l}x_{0} \\ x_{1} \\ x_{2} \\ ...\end{array}\right]$ The product of .

octave：

Unvectorized implementation

Is to declare two vectors , Traverse .

%  No vectorization code is as follows ：
prediction =0.0;
for j = 1:n+l
prediction = prediction + theta(j) * x(j)
end
Copy code 

Vectorized implementation

%  After vectorization, the code is as follows ：
prediction= theta' * x; %  Single quotation mark in English  '  Represents the transpose of a matrix or vector . Forget to go back and see octave grammar   Copy code 

C++：

Unvectorized implementation

double prediction = 0.0;
for(int j = 0; j<=n; j++)
{
prediction += theta[j]*x[j];
}
Copy code 

Vectorized implementation

double prediction = theta.transpose()*x;
Copy code 

## Another chestnut ：

$\begin{array}{l} \theta_{j}:=\theta_{j}-\alpha \frac{1}{m} \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) x_{j}^{(i)} \end{array} \text{ for all j}$

This formula is familiar , Is the formula for gradient descent . In this formula x It's a data matrix ,y Is a column vector .

For multiple linear regression ：

$\begin{array}{l} \theta_{0}:=\theta_{0}-\alpha \frac{1}{m} \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) x_{0}^{(i)} \\ \theta_{1}:=\theta_{1}-\alpha \frac{1}{m} \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) x_{1}^{(i)} \\ \theta_{2}:=\theta_{2}-\alpha \frac{1}{m} \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) x_{2}^{(i)} \end{array} \\...$

First simplify the gradient descent formula to make it easier to write code ：

$\theta:=\theta-\alpha \delta$
$\delta = \frac{1}{m} \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) x^{(i)}$

among δ It's a column vector ： $\delta=\left[\begin{array}{l} \delta_{0} \\ \delta_{1} \\ \delta_{2} \\... \end{array}\right]$

And for x Come on , $x^{(i)}=\left[\begin{array}{c} x_{0}^{(i)} \\ x_{1}^{(i)} \\ x_{2}^{(i)} \\... \end{array}\right]$

$\delta = Count ×(...)× x^{(i)}$

So it must be $\delta_{n \times 1} = Count ×(...)× x_{1×n}^{(i)}$ namely $x_j^{(i)}$ To transpose .

If this logic is not vectorized, it may need to write many cycles to complete . Just a little garbage like me can't write it for a while . I'm handy c++ I wrote it , No guarantee, right , Just take a general look , I didn't run this code either .

for(int j=0;j<n;j++)
{
// First calculate h(x) Come on
for(int i=0;i<len;i++)
{
hx[i] += theta[i][j]*x[i][j];
}
// Calculation [h(x)-j]*x And sum up
for(int i=0;i<len;i++)
{
sum += (h[i] - y[i])*x[i][j];
}
// The rest of the formula
theta[j] = theta[j] - alpha * (1/m) * sum;
}
Copy code 

But if you vectorize it, you can write it as ：

%  Suppose it's binary now
hx = X * theta;
theta(1) = theta(1) - alpha * (1/m) * sum((hx-y)*X(:,1:1))
theta(2) = theta(2) - alpha * (1/m) * sum((hx-y)*X(:,2:2))
% Be careful octave and C And so on , Subscript from 1 Start
Copy code 
%  If it is n yuan
hx = X * theta;
for i = 1:n
theta(i) = theta(i) - alpha * (1/m) * sum((hx-y)*X(:,i:i))
% Be careful octave and C And so on , Subscript from 1 Start
endfor
Copy code 

After vectorization, everything is easy to deal with , But without vectorization , Then you may have to nest a lot for Circulated . So we should make good use of vectorization to reduce the workload .