current position:Home>A HashMap talked with the interviewer for half an hour

A HashMap talked with the interviewer for half an hour

2022-06-24 09:37:34m0_ sixty-seven million four hundred and three thousand one hun

One HashMap Can talk to the interviewer for half an hour

《 Angela and the interviewer 》 Series articles
One HashMap Can talk to the interviewer for half an hour
One synchronized Talked to the interviewer for half an hour
One volatile Talked to the interviewer for half an hour

《 Angela teaches Luban algorithm 》 Series articles

Dynamic planning of skills training in Luban by Angela

Preface

HashMap Should be Java Back end engineer interview questions , Because there are too many knowledge points , It's very suitable for investigating interviewers Java Basics .

start

interviewer : Please introduce yourself first !

Angela : I'm Angela , One of the three whores in the grass , The best medium order ( Zhong Kui refuses to accept )! Oh , incorrect , It's been a long time , I am a **, Currently in – Companies do – System development .

interviewer : Look at your resume. It's familiar with Java aggregate ,HashMap Used it ?

Angela : Used .( Or familiar taste )

interviewer : Then tell me about HashMap The internal data structure of ?

Angela : At present, I use JDK1.8 Version of , Use arrays internally + Red and black trees on the chain ;

Angela : Let me draw a data structure diagram for you :

interviewer : Then you know HashMap The principle of data insertion ?

Angela : Uh [ Meditate ]. I think it's better to draw a picture , as follows :

 Insert picture description here

  1. Determine whether the array is empty , Null to initialize ;
  2. Not empty , Calculation k Of hash value , adopt (n - 1) & hash The calculation should be stored in an array of subscripts index;
  3. see table[index] Is there data , Build a... Without data Node Nodes are stored in table[index] in ;
  4. There is data , It means that hash Conflict ( There are two nodes key Of hash Have the same value ), Continue to judge key Whether it is equal or not , equal , With the new value Replace the original data (onlyIfAbsent by false);
  5. If it's not equal , Determine whether the current node type is a tree node , If it's a tree node , Create tree nodes and insert them into the red black tree ;( If the current node is a tree node, it proves that it is a red black tree )
  6. If it's not a tree node , Create common Node Add to the list ; Determine if the list length is greater than 8 And the array length is greater than 64, If it is larger than, the chain list will be converted into a red black tree ;
  7. After inserting, judge whether the current number of nodes is greater than the threshold value , If it is larger than the initial expansion to double the original array .

interviewer : Fall into silence , So clear , Did you also pay attention to wechat official account 【 Angela's blog 】, I continued to follow the routine , Just now you mentioned HashMap The initialization , that HashMap How to set the initial capacity ?

Angela : [ That's a problem ?] Generally, if new HashMap() Not by value , The default size is 16, The load factor is 0.75, If you pass in the initial size k, Initialization size is Greater than k Of 2 Integer power of , For example, if 10, The size is 16.( Additional explanation : The implementation code is as follows )

static final int tableSizeFor(int cap) {
  int n = cap - 1;
  n |= n >>> 1;
  n |= n >>> 2;
  n |= n >>> 4;
  n |= n >>> 8;
  n |= n >>> 16;
  return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}

Additional explanation : The following is the detailed process , The algorithm is to shift the initial binary to the right 1,2,4,8,16 position , With their own position or , Put the first high position as 1 By constantly moving right , Set the high position to 1 The back of them all changed into 1, Last but not least +1 operation ,111111 + 1 = 1000000 = 2 6 2^6 26 ( Conformity is greater than 50 And is 2 Omega to an integer power )

 Insert picture description here

interviewer : You mentioned hash function , You know, HashMap How to design the hash function of ?

Angela : [ The question is quite detailed ] hash The function is to get... First key Of hashcode, It's a 32 Bit int value , And then let hashcode The height of 16 Bit and low 16 Bit to perform exclusive or operation .

interviewer : Do you know why it's so designed ?

Angela : [ This also want to ask ], This is also called the perturbation function , There are two reasons for this design :

  1. Be sure to minimize hash Collision , The more dispersed the better ;
  2. Algorithms must be as efficient as possible , Because it's a high frequency operation , So we use bit operations ;

interviewer : Why use hashcode The height of 16 Bit and low 16 Bit XOR can reduce hash Collision ?hash Can functions be used directly key Of hashcode?

[ This problem is a bit tricky ], Angela almost stood still ?? 了 , I can't hate biubiubiu Two one three moves .

Angela : because key.hashCode() Function calls key Hash function of key type , return int Type hash value .int The value range is **-2147483648~2147483647**, It's about 40 Billion mapping space . As long as the hash functions are mapped evenly and loosely , It's hard to collide in general applications . But the problem is that 40 A hundred million long array , There is no room for memory . Would you , If HashMap The initial size of the array 16, Before using, we need to modulo the length of the array , The remainder can be used to access the array subscript .( From Zhihu - Chujun )

In source code, module operation is to hash value and array length -1 Make one " And " operation , Bit operation ratio remainder % Fast calculation .

bucketIndex = indexFor(hash, table.length);

static int indexFor(int h, int length) {
     return h & (length-1);
}

By the way , It also explains why HashMap The array length of should take 2 The integral power of . Because of this ( The length of the array -1) It's exactly equivalent to a “ Low mask ”.“ And ” The result of the operation is that the high bits of the hash value are all zeroed , Keep only the low value , Used for array subscript access . In the initial length 16 For example ,16-1=15.2 The decimal representation is 00000000 00000000 00001111. And some hash value “ And ” The operation is as follows , The result is that the lowest four bit value .

  10100101 11000100 00100101
& 00000000 00000000 00001111
----------------------------------
  00000000 00000000 00000101    // All highs go to zero , Only the last four 

But then the question comes , So even if my hash distribution is loose , If only the last few are chosen , The collision will be serious . What's more, if the hash itself doesn't do well , A hole in the distribution of a series of equal errors , If just let the last few low order show regular repetition , It hurts .

When “ Disturbance function ” The value of , When it comes to this, you should guess . Look at the picture below ,

img

Move right 16 position , Is precisely 32bit Half of , Do exclusive or in your upper and lower half , To mix the high and low bits of the original hash code , To increase the randomness of the low order . And the lower part of the mixture is doped with some characteristics of the higher part , Such high-level information is also preserved in disguise .

Finally, let's take a look Peter Lawley A column of 《An introduction to optimising a hashing strategy》 An experiment in : He randomly selected 352 A string , Without any conflict between their hash values , Mask them low , Take array subscript .

img

Results show , When HashMap The array length is 512 When ( 2 9 2^9 29), That is to use mask to get low 9 When a , In the absence of a perturbed function , It happened. 103 Secondary collision , near 30%. And after using the perturbation function, it's just 92 Secondary collision . Collision reduction 10%. It seems that the perturbation function does work .

in addition Java1.8 comparison 1.7 Do the adjustment ,1.7 Four shifts and four exclusive or , But obviously Java 8 I think one disturbance is enough , do 4 If time , More likely, marginal utility is not great , The so-called consideration for efficiency has been changed to once .

Here is 1.7 Of hash Code :

static int hash(int h) {
    h ^= (h >>> 20) ^ (h >>> 12);
    return h ^ (h >>> 7) ^ (h >>> 4);
}

interviewer : I seem to have done my homework , It's a bit of material ! Did you secretly read the official account Angela's blog , You just said 1.8 Yes hash The function is optimized ,1.8 Is there any other optimization ?

Angela : 1.8 There are three major optimizations :

  1. Array + The list has been changed to an array + Chain or red black tree ;
  2. The insertion method of linked list has been changed from the beginning to the end , In short, when inserting , If there are already elements in the array position ,1.7 Put the new elements in the array , The original node is the successor of the new node ,1.8 Traversing the linked list , Put the element at the end of the list ;
  3. expansion 1.7 The elements in the original array need to be re hash Position in the new array ,1.8 Use simpler judgment logic , Location invariant or index + Old capacity size ;
  4. When inserting ,1.7 First, judge whether expansion is needed , Insert again ,1.8 Insert first , After insertion, judge whether expansion is needed ;

interviewer : You will tell me why you need to do these optimizations ;

Angela : 【 Cough , As expected, it's a series of guns 】

  1. To prevent hash Conflict , The length of the list is too long , The complexity of time is determined by O(n) Reduced to O(logn);

  2. because 1.7 When the head is inserted to expand the capacity , Head insertion will reverse the list , In a multithreaded environment, rings are generated ;

    A Thread is inserting node B,B Threads are also inserting , When the capacity is not enough, start to expand , again hash, Place the elements , Head insertion , After traversing to B The node is placed in the head , This forms a ring , As shown in the figure below :

     Insert picture description here

    1.7 Expansion call of transfer Code , As shown below :

    void transfer(Entry[] newTable, boolean rehash) {
      int newCapacity = newTable.length;
      for (Entry<K,V> e : table) {
        while(null != e) {
          Entry<K,V> next = e.next;
          if (rehash) {
            e.hash = null == e.key ? 0 : hash(e.key);
          }
          int i = indexFor(e.hash, newCapacity);
          e.next = newTable[i]; //A Thread if execution to this line is suspended ,B The thread starts to expand 
          newTable[i] = e;
          e = next;
        }
      }
    }
    
  3. Why did you expand the capacity 1.8 There is no need to hash You can directly locate the location of the original node in the new data

    This is because the expansion is expanded to the original array size 2 times , The mask used to calculate the array position is just one more bit 1, How do you understand that ?

    The length before expansion is 16, Used to calculate (n-1) & hash Binary system n-1 by 0000 1111, Capacity for 32 The latter binary is much higher 1, by 0001 1111.

    the reason being that & operation ,1 And any number & It's all about itself , There are two kinds of situations , Here's the picture : The original data hashcode High order 4 Position as 0 And high is 1 The situation of ;

    The fourth highest position is 0, again hash Constant value , The fourth is 1, again hash It's bigger than the original 16( The capacity of the old array )

     Insert picture description here

interviewer : that HashMap Is it thread safe ?

Angela : No , In multithreaded environment ,1.7 There will be a life and death cycle 、 Data loss 、 The problem of data coverage ,1.8 There will be data coverage issues in , With 1.8 For example , When A Thread judgment index When the position is empty, it just hangs ,B The thread starts to index Write node data of location , At this time A Thread recovery site , Perform assignment operation , Just put A The thread's data is overwritten ; also ++size This place will also cause problems such as multithreading and capacity expansion .

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
  Node<K,V>[] tab; Node<K,V> p; int n, i;
  if ((tab = table) == null || (n = tab.length) == 0)
    n = (tab = resize()).length;
  if ((p = tab[i = (n - 1) & hash]) == null)  // Multithreaded execution here 
    tab[i] = newNode(hash, key, value, null);
  else {
    Node<K,V> e; K k;
    if (p.hash == hash &&
        ((k = p.key) == key || (key != null && key.equals(k))))
      e = p;
    else if (p instanceof TreeNode) //  It's important here 
      e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
    else {
      for (int binCount = 0; ; ++binCount) {
        if ((e = p.next) == null) {
          p.next = newNode(hash, key, value, null);
          if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
            treeifyBin(tab, hash);
          break;
        }
        if (e.hash == hash &&
            ((k = e.key) == key || (key != null && key.equals(k))))
          break;
        p = e;
      }
    }
    if (e != null) { // existing mapping for key
      V oldValue = e.value;
      if (!onlyIfAbsent || oldValue == null)
        e.value = value;
      afterNodeAccess(e);
      return oldValue;
    }
  }
  ++modCount;
  if (++size > threshold) //  Multiple threads come here , May repeat resize()
    resize();
  afterNodeInsertion(evict);
  return null;
}

interviewer : Then how do you usually solve the thread unsafe problem ?

Angela : Java There is HashTable、Collections.synchronizedMap、 as well as ConcurrentHashMap Can achieve thread safety Map.

HashTable It's directly adding synchronized keyword , Lock the whole array , Large particle size ,Collections.synchronizedMap It's using Collections The inner class of the collection tool , By passing in Map Encapsulate a SynchronizedMap object , An object lock is defined internally , Method through object lock ;ConcurrentHashMap Use segmented locks , Reduced lock granularity , Let concurrency greatly improve .

interviewer : So you know ConcurrentHashMap The implementation principle of the segmented lock ?

Angela : 【 Oh, my God ! Russian Dolls , Set up one by one 】ConcurrentHashMap Member variables use volatile modification , No reordering of instructions , At the same time, memory visibility , In addition, use CAS Operation and synchronized Combined with the implementation of assignment operation , Multithreaded operations lock only the nodes of the current operation index .

Here's the picture , Threads A Lock the A List of nodes , Threads B Lock the B List of nodes , The operation does not interfere with each other .

interviewer : You mentioned before that the length of the link to red black tree is up to the threshold value , What's the threshold ?

Angela : The threshold is 8, The threshold value of red black tree chain transfer is 6

interviewer : Why 8, No 16,32 Even 7 ? And why is the threshold value of red black tree linked list 6, No 8 What about it ?

Angela : 【 Ask the author ! Oh, my God ,biubiubiu Really want to 213 combo 】 Because that's what the author designed , Oh , incorrect , Because after calculation , stay hash When the function design is reasonable , happen hash Collision 8 The second chance is in a million 6, Probability talk .. because 8 Will be enough , As for why it turned back 6, Because if hash The number of collisions is 8 Wandering around , There will always be mutual transformation between linked list and red black tree , To prevent this from happening .

interviewer : HashMap Are the internal nodes orderly ?

Angela : Is chaotic , according to hash Value randomly inserted

interviewer : Is there any order Map?

Angela : LinkedHashMap and TreeMap

interviewer : Tell me about it LinkedHashMap How to achieve orderly ?

Angela : LinkedHashMap Internal maintenance of a single chain table , There are head and tail nodes , meanwhile LinkedHashMap node Entry In addition to internal inheritance HashMap Of Node attribute , also before and after Used to identify front and back nodes . It can be sorted by insertion order or access order .

/**
 * The head (eldest) of the doubly linked list.
*/
transient LinkedHashMap.Entry<K,V> head;

/**
  * The tail (youngest) of the doubly linked list.
*/
transient LinkedHashMap.Entry<K,V> tail;
// Link to the new p Node to the back end of the list 
private void linkNodeLast(LinkedHashMap.Entry<K,V> p) {
  LinkedHashMap.Entry<K,V> last = tail;
  tail = p;
  if (last == null)
    head = p;
  else {
    p.before = last;
    last.after = p;
  }
}
//LinkedHashMap Node class of 
static class Entry<K,V> extends HashMap.Node<K,V> {
  Entry<K,V> before, after;
  Entry(int hash, K key, V value, Node<K,V> next) {
    super(hash, key, value, next);
  }
}

Sample code :

public static void main(String[] args) {
  Map<String, String> map = new LinkedHashMap<String, String>();
  map.put("1", " Angela ");
  map.put("2", " Of ");
  map.put("3", " Blog ");

  for(Map.Entry<String,String> item: map.entrySet()){
    System.out.println(item.getKey() + ":" + item.getValue());
  }
}
//console Output 
1: Angela 
2: Of 
3: Blog 

interviewer : Tell me about it TreeMap How to achieve orderly ?

Angela :TreeMap Is in accordance with the Key The natural order of or Comprator In the order of , The interior is realized through the red black tree . So either key The class to which it belongs implements Comparable Interface , Or customize an implementation Comparator Comparator of interface , Pass to TreeMap be used for key Comparison .

interviewer : As mentioned earlier, the adoption of CAS and synchronized Combination of lock granularity reduction , Can you tell me something about CAS And synchronized How does it work ?

Angela : We'll make another appointment for the next issue ,OK?

interviewer : ok , Go back and wait for the announcement !


Reply to a few questions in the comments section :

  1. @ A little smile on the palm : put When it comes to methods , Data exists at the specified location -> no -> Storage nodes -> Put in the red black tree node ? It should not be a storage node -> Whether the number of nodes is greater than the threshold ? I don't understand here , Ask the boss to explain
    There is something wrong with the pictures of this place , Thank you for correcting , Updated .
  2. @ Haidian good boy : The initial capacity is not 2 The power of is automatically changed to 2 There are some mistakes in the power of ,50 The binary and the unsigned right shift below 4 It's wrong
    Here is to 50 For initial value demonstration , to -1 Operation and then start the binary operation .

Reference material

  1. An introduction to optimising a hashing strategy
  2. JDK Source code HashMap Of hash What is the principle of the method ?
  3. Maple of light Teng -HashMap Medium hash function

Focus on Wx official account :【 Angela's blog 】 — reveal Java back-end technology , The essence of reduction technology

《 Angela and the interviewer 》 Series articles Ongoing update
One HashMap Can talk to the interviewer for half an hour
One synchronized Talked to the interviewer for half an hour

copyright notice
author[m0_ sixty-seven million four hundred and three thousand one hun],Please bring the original link to reprint, thank you.
https://en.cdmana.com/2022/175/202206240820122648.html

Random recommended