current position:Home>Linear Algebra and Probability Theory

Linear Algebra and Probability Theory

2022-08-06 18:09:55deliberately

Talk about artificial intelligence and machine learning,You must have some basic knowledge of mathematics,Only then can we better understand its nature.The most important of these basic mathematical knowledge contains two pieces of content:线性代数和概率论.

线性代数

The core meaning of linear algebra:万事万物都可以被抽象成某些特征的组合,并在由预置规则定义的框架之下以静态和动态的方式加以观察.

在线性代数中,由单独的数aThe constituent elements are called标量,一个标量a可以是整数、实数或复数.

多个标量a1、a2、a3.....an按一定顺序组成一个序列,这样的元素就被称为向量.A vector can be seen as an extension of a scalar,The original number is replaced by a set of numbers,This leads to an increase in dimension.

如果将向量的所有标量都替换成相同规格的向量,得到就是矩阵image.png

相对于向量,The matrix also represents an increase in dimension,Each element in the matrix needs to be determined using two indices.同理,If you replace each scalar element in the matrix with a vector,得到的就是张量.Tensors are higher-order matrices,A three-dimensional concept.

在计算机存储中,A scalar occupies a zero-dimensional array,such as binary characters;A vector occupies a one-dimensional array,例如语音信号;A matrix occupies a two-dimensional array,例如灰度图像;Tensors occupy three-dimensional arrays,例如RGB图像和视频.

Vectors that describe mathematical objects require a specific mathematical language,范数内积就是代表.

The norm is a measure of the size of a single vector,描述的是向量自身的性质,其作用是将向量映射为一个非负的数值.通用的L(p)范数定义如下: image.png

L(1)范数计算的是向量所有元素绝对值的和,L(2)范数计算的是通常意义上的向量长度,L(+)范数计算的则是向量中最大元素的取值.

The norm computes the scale of a single vector,The inner product computes the relationship between two vectors. The inner product expression of two vectors of the same dimension is :

image.png

内积能够表示两个向量之间的相对位置,即向量之间的夹角.一种特殊的情况是内积为 0,即 *x,y*=0.在二维空间上,这意味着两个向量的夹角为 90 度,即相互垂直.而在高维空间上,这种关系被称为正交(orthogonality).如果两个向量正交,说明他们线性无关,相互独立,互不影响.

在线性空间中,任意一个向量代表的都是 n 维空间中的一个点;反过来, 空间中的任意点也都可以唯一地用一个向量表示.

线性空间的一个重要特征是能够承载变化.当作为参考系的标准正交基确定后,空间中的点就可以用向量表示.当这个点从一个位置移动到另一个位置时,描述它的向量也会发生改变.The change of the point corresponds to the linear transformation of the vector(linear transformation),A mathematical language for describing object changes or vector transformations,it is the matrix.

在线性空间中,变化的实现有两种方式:一是点本身的变化,二是参考系的变化.因此,对于矩阵和向量的相乘,就存在不同的解读方式:Ax=y

This expression can be understood as a vector x 经过矩阵 A 所描述的变换,变成了向量 y;也可以理解为一个对象在坐标系 A 的度量下得到的结果为向量 x,在标准坐标系 I(单位矩阵:主对角线元素为 1,其余元素为 0)的度量下得到的结果为向量 y.

Describe the matrix⼀The important parameters are特征值(eigenvalue特征向量(eigenvector).对于给定的矩阵 A,Suppose its eigenvalue is λ,特征向量为 x,Then the relationship between them is as follows: Ax=λx

A matrix represents a transformation of a vector,其效果通常是对原始向量同时施加方向变化和尺度变化.Available for some special vectors,矩阵的作用只有尺度变化而没有方向变化,也就是只有伸缩的效果而没有旋转的效果.对于给定的矩阵来说,这类特殊的向量就是矩阵的特征向量,特征向量的尺度变化系数就是特征值.

The dynamic significance of matrix eigenvalues ​​and eigenvectors is to represent the speed and direction of change.

概率论

Same as linear algebra,Probability theory also represents a way of looking at the world,其关注的焦点是无处不在的可能性.The formal mathematical description of the probability of random events is the axiomatic process of probability theory.The axiomatic structure of probability reflects an understanding of the nature of probability.

The method of recognizing probability from the frequency of events is called频率学派(frequentist probability),In the mouth of the frequentist school“概率”,In fact, it is the limit of the frequency of occurrence of a single result in an independently repeatable random experiment.Because the stable frequency is the embodiment of statistical regularity,The frequency is thus calculated from a large number of independent replicates,It is a reasonable idea to use it to characterize the possibility of an event happening.

在概率的定量计算上,频率学派依赖的基础是古典概率模型.在古典概率模型中,试验的结果只包含有限个基本事件,且每个基本事件发生的可能性相同.如此一来,假设所有基本事件的数目为 n,待观察的随机事件 A 中包含的基本事件数目为 k,则古典概率模型下事件概率的计算公式为

image.png

条件概率(conditional probability)是根据已有信息对样本空间进行调整后得到的新的概率分布.假定有两个随机事件 A 和 B,条件概率就是指事件 A 在事件 B 已经发生的条件下发生的概率,用以下公式表示

image.png

上式中的 P(AB) 称为联合概率(joint probability),表示的是 A 和 B 两个事件共同发生的概率.如果联合概率等于两个事件各自概率的乘积,即 P(AB)=P(A)⋅P(B),说明这两个事件的发生互不影响,即两者相互独立.对于相互独立的事件,条件概率就是自身的概率,即 P(A∣B)=P(A).

贝叶斯定理(Bayes' theorem):

image.png

式中的 P(H) 被称为先验概率(prior probability),即预先设定的假设成立的概率;P(D∣H) 被称为似然概率(likelihood function),是在假设成立的前提下观测到结果的概率;P(H∣D) 被称为后验概率(posterior probability),即在观测到结果的前提下假设成立的概率.

Frequentists believe that assumptions exist objectively and do not change,即存在固定的先验分布,只是作为观察者的我们无从知晓.

贝叶斯学派则认为固定的先验分布是不存在的,参数本身也是随机数.换言之,假设本身取决于观察结果,is uncertain and can be corrected.数据的作用就是对假设做出不断的修正,使观察者对概率的主观认识更加接近客观实际.

概率的估计有两种方法:最大似然估计法(maximum likelihood estimation)和最大后验概率法(maximum a posteriori estimation),两者分别体现出频率学派和贝叶斯学派对概率的理解方式.

The idea of ​​maximum likelihood estimation is to maximize the probability that the training data will appear,依此确定概率分布中的未知参数,估计出的概率分布也就最符合训练数据的分布.The idea of ​​the maximum posterior probability method is based on the training data and other known conditions,使未知参数出现的可能性最大化,并选取最可能的未知参数取值作为估计值.

概率论的一个重要应用是描述随机变量(random variable).根据取值空间的不同,随机变量可以分成两类:离散型随机变量(discrete random variable)and continuous random variables(continuous random variable).

离散变量的每个可能的取值都具有大于 0 的概率,取值和概率之间一一对应的关系就是离散型随机变量的分布律,也叫概率质量函数(probability mass function).概率质量函数在连续型随机变量上的对应就是概率密度函数(probability density function).

总结

Whether machine learning or artificial intelligence,These high-level nouns can finally be connected with the mathematics that I have learned for many years,Really happy.Although I am not a math major,But I have always been confident in mathematics,大学时候的《线性代数》,in graduate school《概率论》,I still have the impression of these basic knowledge,But deep understanding still requires practice and consolidation.

copyright notice
author[deliberately],Please bring the original link to reprint, thank you.
https://en.cdmana.com/2022/218/202208061751323153.html

Random recommended