# Data structure Huffman tree

2022-01-26 23:37:56

## One . The basic concept of Huffman tree

1. route : The branch from one node to another in the tree constitutes the path between the two nodes .

2. The path length of the node : The number of branches on the path between two nodes .

3. The path length of the tree : The sum of the path length from the root of the tree to each node . Write it down as : TL

In a binary tree with the same number of nodes , A complete binary tree is a binary tree with the shortest path length .

4. power (weight): Assign a node in the tree to a value with some meaning , Then this value is called the weight of the node .

5. The weighted path length of the node : The product of the path length from the root node to the node and the weight of the node .

6. Weighted path length of tree ： All the trees leaf node The sum of the weighted path length of .  Huffman tree ： The best tree   --  Weighted path length WPL The shortest tree

Be careful ：“ Shortest weighted path length ” Is in " The same degree ” The result of comparison in the tree , Therefore, there is an optimal binary tree 、 The best trigeminal tree and so on .

Huffman tree ： The best binary tree -- Weighted path length (WPL) The shortest binary tree

Two . The construction method of Huffman tree

## 1. Huffman's algorithm

(1) Structural forests are all roots ： according to n A given weight {W, W2, .... wn} constitute n A forest of binary trees

F={T, T2.... T branch }, among T; There is only one with the right to W; Root node of .

(2) Choose two small trees to build new trees ： stay F Select the tree with the smallest weight of the two root nodes as the left and right subtrees , Construction one A new one

Binary tree , And set the weight of the root node of the new binary tree as the weight of the root node on its left and right subtrees

The sum of the .

(3) Delete two small additions ： stay F Delete these two trees , At the same time, the newly obtained binary tree is added to the forest .

(4) repeat (2) and (3), Until there is only one tree in the forest , This tree is called Huffman tree .

## ​​2. Huffman algorithm formula :

1、 Structural forests are all roots ;

2、 Choose two small trees to build new trees ;

3、 Delete two small additions ;

4、 repeat 2、3 Single root left . 3、 ... and . Implementation of Huffman tree construction algorithm
• A sequential storage structure is adopted - One dimensional structure array
• Node type definition
```

typedef struct {

int weight;

int parent, Ich, rch;

}HTNode,*HuffmanTree;

1.

2.

3.

4.

```

Huffman tree 2n-1 Nodes , Don't use 0 Subscript , The array size is 2n ```

void CreatHuffmanTree (HuffmanTree HT, int n){ // Constructing Huffman tree - Huffman algorithm

if(n<= 1)return;

m=2*n-1; // Array total 2n-1 Elements

HT=malloc (sizeof((HTNode)*(m+1)); //Q Unit No. is not used , HT[m] Represents the root node

for(i=1;i<=m;++i){ // take 2n-1 An element of Ich. rch. parent Set as 0

HT[].Ich=0; HT[i].rch=0; HT[i].parent=0;

}

for(i=1;i<=n;++i) scanf(&HT[i].weight) ; // Before input n An element of weight value

// Initialization finished , Let's start building a Huffman tree

for( i=n+1i<=m;i++){ // Merger n-1 One node, one   structure Huffman Trees

Select(HT, i-1,s1, s2); // stay HT[k](1≤k≤i-1) Select two parents whose parent domains are 0,

// And the node with the smallest weight , And return them in HT The serial number in s1 and s2

HT[s1].parent=i; HT[s2] .parent=i; // From F Delete in s1,s2

HT[ij.Ich=s1; HT[i].rch=s2; //s1,s2 Respectively as i Right and left

HT[i].weight=HT[s1].weight + HT[s2] .weight; //i The weight of is the sum of the weight of the left and right children

}

1.

2.

3.

4.

5.

6.

7.

8.

9.

10.

11.

12.

13.

14.

15.

16.

17.

```
Four . Huffman code

The key : To design codes with different lengths , The code of any character must not be the prefix of the code of another character, which is called Prefix code

What prefix code can make the total length of the message shortest ? ----- Huffman code

### ​1. Method :

1、 Count the average probability of each character in the character set in the message ( The greater the probability ,

The shorter the code is required ).

2、 Using the characteristics of Huffman tree : The greater the weight, the closer the leaf is to the root ; Put each character of

The probability value is used as the weight , Construct the Huffman tree . Then the node with greater probability , The shorter the path .

3、 Mark each branch of the Huffman tree with 0 or 1:

The left branch of a node 0, Right branch mark 1

Connect the labels on the path from the root to each leaf , The encoding of the character represented by the leaf .

### 2. Two questions ：

#### （1）. Why Huffman coding can guarantee prefix coding ?

Because no leaf is another The ancestor of a leaf , Therefore, the code of each leaf node cannot be the prefix of other leaf node codes .

#### （2）. Why Huffman coding can ensure the shortest total length of character coding ?

Because the weighted path length of Huffman tree is the shortest , Therefore, the total length of character coding is the shortest .

● nature 1 Huffman code is prefix code

● nature 2 Huffman code is the optimal prefix code

From table to Huffman coding algorithm ：

```

void CreatHuffmanCode(HuffmanTree HT, HuffmanCode HC, int n)

{

// Reverse Huffman code of each character from leaf to root , Stored in the encoding table HC in

HC =  malloc(sizeof(char) *(n + 1));

// Distribute n A character encoded header pointer vector

cd = malloc(sizeof(char)*n);

// Allocate dynamic array space for temporarily storing codes

cd[n - 1] = '\0' ;

// End of code

int i = 0;

int start,c,f;

for (i = 1; i <= n; ++i){

// Find Huffman code character by character

start = n - 1; c = i; f = HT[i].parent;

while (f != O){

//  Start from the leaf node and go back up , Until the root node

--start;

// to flash back - - Time start Point forward one position

if (HT[f].Ichild = = c) cd[start] = '0';

// node c yes f The left child , Then generate code 0

else

cd[start] = '1';

// node c yes f The right child , Then generate code 1

c = f; f = HT[f].parent;

// Go on to . Go back

}

// Find out the 1 Character encoding

HC[i] = malloc(sizeof(char)*(n - start));

// For the first time i A string encoding allocates space

strcpy(HC[i], &cd[start]);

// The obtained code is extracted from the temporary space cd complex   Control to HC In the current line of

}

free(cd);

// Free up temporary space

} // CreatHuffanCode

1.

2.

3.

4.

5.

6.

7.

8.

9.

10.

11.

12.

13.

14.

15.

16.

17.

18.

19.

20.

21.

22.

23.

24.

25.

26.

27.

28.

29.

30.

31.

32.

33.

34.

35.

```

## 5、 ... and . Encoding and decoding of files

• Plaintext ：A computer program is a sequence of instructions written to perform a

specified task with a computer. The program has an executable form that the

computer can use directly to execute the instructions. Computer source code

is often written by computer programmers. Source code may be converted

into an executable file by a compiler and later executed by a central

processing unit.

378 Characters , Storage ASCI code , Each character 8 position ,378*8 bit=3024bit

• Huffman code

0011110101100101100110101100100010110111111010001010110111001110110101010110001110111011101011001111100000110010010101111010010000011000010010111011111010000011111000110010UTU1T10111000111111011000111111110110010110111110011101000101110111000001001101110101110010011011101100010110100000000000110100011111010011111101001001111100111100011100100110010110011010100010000111111011001100110111001001100001011110100011011100111100111011010010101110100001010011101110010000101001100110101101010001111010000011000111101000001001101110101110111110000101001111111110000101111001011001101010001000011110111011110010100001010000011010111101010010001101101101011111111000001110111110011100110011010110101000011110111101111100001011110000100101110111111011000011100011001001011101001100110111001010100110101100001111110111111010010000101101010111101011010010011110001110111010010000111101010110111111000111111011001011010100000111011001011001101011000100001111111110100110111011110011101101001010110101011101111101001100110111001011100100001011010101111001011001101001011110101010100011101101000101110010110010010001111000111011111011101001110000100101111100111001000010110010011010110101000011110100101000110000111110000000100011101010000011011001001001101101011000000110001101111001000010101001110110000111101111110011001101011010100001111011101001101010000011101100100110010101100111110110100111000110100011011100101010111111110100010010111001110000000011111001100

​1 596bit

1 596/3024=52.8%​

### 1. code ：

① Enter each character and its weight

② Construct Huffman tree 1 HT[i]

③ Huffman coding a HC[i]

④ check HC[i] , Get the Huffman code of each character

### 2. decode :

① Construct the Huffman tree ( Huffman tree constructed corresponding to the translated characters )

② Read in the binary code in turn

③ Read in 0, Then go to the left child ; Read in 1, Then go to the right child

④ Once you reach a leaf , You can translate characters

⑤ Then continue decoding from the root , End of guidance

Example ： ■ Building the Huffman tree HT ■ Find out the original code message OC

uvxwxxxvywyuyyvxxyxxuxuvvyvxy