current position:Home>Vectorization of deep learning formula

Vectorization of deep learning formula

2022-01-27 05:34:50 LolitaAnn

This is my participation 8 The fourth of the yuegengwen challenge 18 God , Check out the activity details :8 Yuegengwen challenge


Σ( ° △ °|||)︴ Let's start with a few nonsense . As a biology student ( At that time, all majors chose computers , As a result, I was forced to learn biology ), But I had to study by myself . I use C++ As an introductory language , When I first started learning data structures , Because it's self-study , A lot of pain , I make complaints about it every day. : Yes STL Why write it yourself !!! I learned later , In order to practice , Therefore, when you often brush questions, you can write some functions that even the library function already has to realize . At that time, all roads lead to Rome in my eyes , You can get the same result anyway . Although I learned the time complexity of the algorithm, I didn't put it into practice , After all, there is not so much test data in question brushing .

Later, an elder told me , You can have a look at the source code of the library function in your spare time . For example, the ranking of others may be a combination of several sorts , They are all algorithms optimized by various bosses .

Get down to business

This is also true for some formulas in our in-depth learning , If you can simplify , It's great to use the library of linear algebra built in the language .

Benefits of using built-in algorithms :

  • Faster
  • Implemented in less code
  • It's less error prone than what you write yourself
  • Better coordination with hardware system

Take a simple chestnut :

h θ ( x ) = j = 0 n θ j x j h_{\theta}(x)=\sum_{j=0}^{n} \theta_{j} x_{j}

You can think of it as θ T x \theta^{T} x And calculate , This translates into two vectors θ = [ θ 0 θ 1 θ 2 . . . ] x = [ x 0 x 1 x 2 . . . ] \theta=\left[\begin{array}{l}\theta_{0} \\ \theta_{1} \\ \theta_{2} \\ ... \end{array}\right] \quad x=\left[\begin{array}{l}x_{0} \\ x_{1} \\ x_{2} \\ ...\end{array}\right] The product of .

octave:

Unvectorized implementation

Is to declare two vectors , Traverse .

%  No vectorization code is as follows :
prediction =0.0; 
for j = 1:n+l
    prediction = prediction + theta(j) * x(j)
end
 Copy code 

Vectorized implementation

%  After vectorization, the code is as follows :
prediction= theta' * x; %  Single quotation mark in English  '  Represents the transpose of a matrix or vector . Forget to go back and see octave grammar   Copy code 

C++:

Unvectorized implementation

double prediction = 0.0; 
for(int j = 0; j<=n; j++)
{
    prediction += theta[j]*x[j];
}
 Copy code 

Vectorized implementation

double prediction = theta.transpose()*x;
 Copy code 

Another chestnut :

θ j : = θ j α 1 m i = 1 m ( h θ ( x ( i ) ) y ( i ) ) x j ( i )  for all j \begin{array}{l} \theta_{j}:=\theta_{j}-\alpha \frac{1}{m} \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) x_{j}^{(i)} \end{array} \text{ for all j}

This formula is familiar , Is the formula for gradient descent . In this formula x It's a data matrix ,y Is a column vector .

For multiple linear regression :

θ 0 : = θ 0 α 1 m i = 1 m ( h θ ( x ( i ) ) y ( i ) ) x 0 ( i ) θ 1 : = θ 1 α 1 m i = 1 m ( h θ ( x ( i ) ) y ( i ) ) x 1 ( i ) θ 2 : = θ 2 α 1 m i = 1 m ( h θ ( x ( i ) ) y ( i ) ) x 2 ( i ) . . . \begin{array}{l} \theta_{0}:=\theta_{0}-\alpha \frac{1}{m} \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) x_{0}^{(i)} \\ \theta_{1}:=\theta_{1}-\alpha \frac{1}{m} \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) x_{1}^{(i)} \\ \theta_{2}:=\theta_{2}-\alpha \frac{1}{m} \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) x_{2}^{(i)} \end{array} \\...

First simplify the gradient descent formula to make it easier to write code :

θ : = θ α δ \theta:=\theta-\alpha \delta
δ = 1 m i = 1 m ( h θ ( x ( i ) ) y ( i ) ) x ( i ) \delta = \frac{1}{m} \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) x^{(i)}

among δ It's a column vector : δ = [ δ 0 δ 1 δ 2 . . . ] \delta=\left[\begin{array}{l} \delta_{0} \\ \delta_{1} \\ \delta_{2} \\... \end{array}\right]

And for x Come on , x ( i ) = [ x 0 ( i ) x 1 ( i ) x 2 ( i ) . . . ] x^{(i)}=\left[\begin{array}{c} x_{0}^{(i)} \\ x_{1}^{(i)} \\ x_{2}^{(i)} \\... \end{array}\right]

δ = Count × ( . . . ) × x ( i ) \delta = Count ×(...)× x^{(i)}

So it must be δ n × 1 = Count × ( . . . ) × x 1 × n ( i ) \delta_{n \times 1} = Count ×(...)× x_{1×n}^{(i)} namely x j ( i ) x_j^{(i)} To transpose .

If this logic is not vectorized, it may need to write many cycles to complete . Just a little garbage like me can't write it for a while . I'm handy c++ I wrote it , No guarantee, right , Just take a general look , I didn't run this code either .

for(int j=0;j<n;j++)
{
    // First calculate h(x) Come on 
    for(int i=0;i<len;i++)
    {
        hx[i] += theta[i][j]*x[i][j];
    }
    // Calculation [h(x)-j]*x And sum up 
    for(int i=0;i<len;i++)
    {
        sum += (h[i] - y[i])*x[i][j]; 
    }
    // The rest of the formula 
    theta[j] = theta[j] - alpha * (1/m) * sum;
}
 Copy code 

But if you vectorize it, you can write it as :

%  Suppose it's binary now 
hx = X * theta;
theta(1) = theta(1) - alpha * (1/m) * sum((hx-y)*X(:,1:1))
theta(2) = theta(2) - alpha * (1/m) * sum((hx-y)*X(:,2:2))
% Be careful octave and C And so on , Subscript from 1 Start 
 Copy code 
%  If it is n yuan 
hx = X * theta;
for i = 1:n
    theta(i) = theta(i) - alpha * (1/m) * sum((hx-y)*X(:,i:i))
    % Be careful octave and C And so on , Subscript from 1 Start 
endfor
 Copy code 

After vectorization, everything is easy to deal with , But without vectorization , Then you may have to nest a lot for Circulated . So we should make good use of vectorization to reduce the workload .

copyright notice
author[LolitaAnn],Please bring the original link to reprint, thank you.
https://en.cdmana.com/2022/01/202201270534167776.html

Random recommended