# 2 PAC learning framework (page 23 24)

2022-01-26 23:58:32 DoomNuo

### Theorem 2.2 Learning community - Co., LTD. H, Inconsistencies

set up $H$ Is a finite set of assumptions . then , For any $δ>0$ , The probability is at least $1−δ$ 、 The following inequality holds ：

$\forall h\in H,R(h)\le \widehat R(r)+\sqrt {\frac{log|H|+log\frac{2}{δ}}{2m}}.(2.20)$

prove set up $h_1,…,h_{|h|}$ by $H$ The elements of . Use union boundaries and infer 2.2 Apply to each hypothesis to arrive at ：

\begin{aligned} &Pr\Big[h\in H|\widehat R(h)-R(h)|>\epsilon \Big]\\ &=Pr\Big[(|\widehat R(h_1)-R(h_1)|>\epsilon)\vee...\vee (|\widehat R(h_{|H|})-R(h_{|H|})|>\epsilon)\Big]\\ &\le\sum_{h\in H}Pr[|\widehat R(h)-R(h)|>\epsilon]\\ &\le 2|H|exp(-2m\epsilon^2). \end{aligned}
• Set the right side equal to $δ$ This completes the verification

therefore , For finite hypothesis sets $H$ ,

$R(h)\le\widehat R(h)+O\bigg(\sqrt\frac{log_2|H|}{m}\bigg).$

As mentioned earlier , $log_2 | H |$ It can be interpreted as indicating $H$ Number of digits required . Here you can make some comments similar to those made on the generalization boundary in the case of consistency ： Larger sample size   $m$  Ensure better generalization , And the boundary follows   $|H|$  Increase and increase , But it's just a logarithmic increase . however , The boundary here is   $\frac{log2 |H|}{m}$  A less favorable function of ; It varies with the square root of the term . This is not a small price ： For fixed   $|H|$, To obtain the same assurance as in the case of consistency , Need a second larger marker sample .

Be careful , The bounds show that a trade-off is sought between reducing empirical errors and controlling the size of the hypothesis set ： The larger hypothetical assembly is punished by the second item , But it may help reduce empirical errors , I.e. item 1 . however , For similar empirical errors , It recommends using a smaller set of assumptions . This can be seen as an example of the so-called Occam razor principle named after Occam's theologian William ： Diversity should not be assumed unnecessarily , It can also be restated as , The simplest explanation is the best . under these circumstances , It can be expressed as ： In all other cases under the same conditions , The simpler the hypothesis set （ The smaller it is ） The better .

## 2.4 An introduction to

In this section , We will consider several important issues related to the learning scenario , For the sake of simplicity , We omitted the discussion in the previous chapter .

### 2.4.1 Deterministic scenario and random scenario

In the most general scenario of supervised learning , Distribution $D$ It's defined in $X×Y$ On , Training data is a basis $D$ With i.i.d. Marked samples $S$:

$S=((x_1,y_1),... ,(x_m,y_m)).$

The learning problem is to find an assumption with small generalization error $h∈ H$,

$R(h)=\underset {(x,y)\sim D}{Pr}[h(x)\neq y]=\underset {(x,y)\sim D}{E}[1_{h(x)\neq y}].$

This more general case is called the random case . In this setting , The output tag is the probability function of the input . Random scenes capture many practical problems , The label of the input point is not unique . for example , If we try to predict gender based on input pairs formed by a person's height and weight , Then labels are usually not unique . For most pairings , Both men and women are possible genders . For each fixed pair , There is a probability distribution labeled male .
PAC The natural extension of the learning framework to this setting is called agnostic PAC Study .