current position:Home>"Performance optimization series" app startup optimization theory and Practice (Part 2)

"Performance optimization series" app startup optimization theory and Practice (Part 2)

2022-01-26 23:50:13 Fu Xi

This article has participated in the third phase of nuggets creators training camp , View details : Project digging | The third phase of the creator training camp is under way ,「 Write 」 Personal influence

zero 、 Preface

I wrote an article about startup optimization more than a year ago , see 「 Performance optimization series 」APP Start optimization theory and practice ( On ). Every year there are new insights , This article will supplement the previous article app Start up optimization scheme , Please check with .

The main contents of this article are as follows :

  • Start time-consuming monitoring practice : Manual management and AspectJ Methods contrast ;
  • Start optimization practice : Directed acyclic graph 、IdleHandler Starters and other black technology solutions ;
  • Introduction to optimization tools .

One 、 Optimization tool

1.1、Traceview( Abandoning )

TraceView yes Android The platform is a good tool for performance analysis , It can display the trace log in the form of graph , But it has been abandoned . in addition TraceView The performance consumption of is too high , The result is not true .

1.2、CPU Profiler

Instead of Traceview That is CPU Profiler. It can be checked by using Debug Class to detect the pile insertion of the application .trace  file 、 Record new method tracking information 、 preservation .trace  File and check the real-time of the application process CPU usage . Refer to Use CPU Performance profiler check CPU Activities

1.3、Systrace + Function stakes

Systrace Allows you to collect and check timing information for all processes running on the device . It includes Androidkernel Some of the data ( for example CPU The scheduler ,IO and APP Thread), And will generate HTML The report , It is convenient for users to view and analyze trace Content . But time consuming analysis of application code is not supported , If you need to analyze the execution time of the program code , Then we need to combine the method of function pile insertion , Analyze the details . A practical case is given in the second section below , Please refer to .

Two 、 Start time-consuming monitoring

There are many ways to calculate the starting speed , Such as manual dot 、AOP Dot 、adb command 、Traceview、Systrace etc. , stay 「 Performance optimization series 」APP Start optimization theory and practice ( On ) It has been preliminarily explained in this article , I'm not going to go over it here . Next, time-consuming monitoring and processing will be carried out from the direction of actual combat .

To monitor start-up time , I am here Application Of onCreate Some third-party frameworks are initialized in , For example, initialization ARouter、Bugly、LoadSir etc. , Simulation time operation .

2.1、 How to monitor the execution time of each method ?

2.1.1、 Mode one : Manual dot

After learning that manual management can monitor app Starting time , Can that be applied to every method , Let's try , We dot before and after the initialization method of each third-party framework ,

override fun onCreate() {

 Copy code 

In this way , Beyond all doubt , The time-consuming of each method can certainly be calculated , however , Each method adds duplicate code , One method plus two lines , There's a hundred , A thousand ways ? Do you knock one by one ?!!

This way is too “ stupid ”, And very intrusive to the source code , Discard .

Does that have Calculate the execution time of each method in a more elegant way ? The answer is, of course .

AOP( Section oriented programming ), A technology that can dynamically and uniformly add some specific functions to the program without modifying the source code can be realized by precompiling and running its dynamic agent .

and Its main purpose is log , Performance statistics , safety control , Transaction processing , Code such as exception handling is separated from the business logic code , By separating these behaviors , We want to be able to separate them into methods that do not guide business logic , This in turn changes the behavior of the code without affecting the business logic .

The way of manual management above , Strong coupling with business logic code , and AOP It's a good solution to this problem . stay Android To realize AOP There are many ways , Here we will talk about the commonly used implementation -AspectJ

2.1.2、 Mode two 、AOP-AspectJ

AspectJ yes AOP One of the concrete ways to realize , It handles crosscutting concerns . And as a AOP One of the concrete implementations of AspectJ, It is to Java Join points are added to (Join Point) The concept . It is to Java Add a few new structures to the language , such as : Tangent point (pointcut)、 notice (Advice)、 Inter type declaration (Inter-type declaration) And aspects (Aspect). Pointcuts and notifications dynamically affect program flow , Inter type declaration is a static class hierarchy that affects the program , The aspect is the encapsulation of all these new structures .

Then come down and use AspectJ Perform calculation operation .

Add dependency


dependencies {
    classpath 'com.hujiang.aspectjx:gradle-android-plugin-aspectjx:2.0.10'
 Copy code 


plugins {
    id ''
    id 'kotlin-android'
    id 'android-aspectjx'
dependencies {
    implementation 'org.aspectj:aspectjrt:1.8.+'
 Copy code 

newly build class class , add @Aspect Annotation indicates that the current class is an aspect for the container to read .

class PerformanceAOP {
 Copy code 

The next step is to focus on the needs , Write logic code . Our requirement is to calculate the execution time of each method , Then use @Around as well as JoinPoint Unified treatment of methods .

@Around("call(* com.fuusy.fuperformance.App.**(..))")
fun getMethodTime(joinPoint: ProceedingJoinPoint) {
    val signature = joinPoint.signature
    val time: Long = System.currentTimeMillis()
    Log.d(TAG, "${signature.toShortString()} speed time = ${System.currentTimeMillis() - time}")
 Copy code 

Run to see the effect :

21:05:44.504 3597-3597/com.fuusy.fuperformance D/PerformanceAOP: App.initRouter() speed time = 2009
21:05:45.104 3597-3597/com.fuusy.fuperformance D/PerformanceAOP: App.initBugly() speed time = 599
21:05:45.112 3597-3597/com.fuusy.fuperformance D/PerformanceAOP: App.initLoadSir() speed time = 8
 Copy code 

3、 ... and 、 Start optimization means

about app Optimization of start-up speed , All the application layer can do is intervene its Application and Activity Business logic in . For example Application in , Often in onCreate Initialize the third-party framework in , This is undoubtedly time-consuming . How to do the specific optimization operation ?

There are two main directions to start optimization , Asynchronous execution 、 Delay the .

3.1、 Asynchronous execution

3.1.1、 Open child thread

Speaking of asynchronous processing logic , The first reaction is whether to start the sub thread ? Then let's have a real fight . Still Application Simulate time-consuming operations in , This time I will create a thread pool , The initialization of the tripartite framework is performed in the thread pool .

override fun onCreate() {
        // Asynchronous method 1 、 Creating a thread pool 
        val newFixedThreadPool = Executors.newFixedThreadPool(CORE_POOL_SIZE)
        newFixedThreadPool.submit {
        newFixedThreadPool.submit {
        newFixedThreadPool.submit {

        TimeMonitorManager.instance?.endMonitor("APP onCreate")
 Copy code 

Look at the execution time

// Total time 
com.fuusy.fuperformance D/TimeMonitorManager: APP onCreate: 45
// The execution time of a single method 
com.fuusy.fuperformance D/PerformanceAOP: App.initLoadSir() speed time = 8
com.fuusy.fuperformance D/PerformanceAOP: App.initBugly() speed time = 678
com.fuusy.fuperformance D/PerformanceAOP: App.initRouter() speed time = 1768
 Copy code 

Single method initLoadSir The execution time is 8 millisecond ,initBugly by 678 millisecond ,initRouter by 1768 millisecond , After using thread pool ,onCreate The total execution time is only 45 millisecond ,2400 Milliseconds to 45 millisecond , Speed up 90% many . This effect is undoubtedly remarkable .

however , In the actual project, the business is complex , The scheme of thread pool is also cover Not everything , For example, a third-party framework can only be initialized in the main thread , For example, a framework must first onCreate Initialization completed in , To move on . So how to deal with these situations ?

If the method can only be executed in the main thread , Then we can only abandon the way of sub thread ;

If the method needs to be completed at a specific stage , have access to CountDownLatch Such a synchronization aid .

CountDownLatch Is a general synchronization tool , It can be used for many purposes . Count as 1 Of CountDownLatch Used as a simple opening / Close the latch or door : call await All threads wait at the door , Until it is called countDown The thread of countDown . Initialize to N Of CountDownLatch Can be used to make a thread wait , until N A thread completes an operation , Or an operation has been completed N Time .CountDownLatch A useful property is that it does not need to call countDown The thread waits for the count to reach zero before continuing , It just prevents any thread from passing through await Until all threads can pass .

Put it in a more general way ,CountDownLatch Is used to wait for the child thread to complete , Then let the program continue to the next operation of the tool class . Then let's take a look at the actual combat .

Create a CountDownLatch And count as 1, simulation initBugly Method needs to wait .

class App : Application() {
    // establish CountDownLatch
    private val countDownLatch: CountDownLatch = CountDownLatch(1)
    override fun onCreate() {

         newFixedThreadPool.submit {
              // perform countDown
         TimeMonitorManager.instance?.endMonitor("APP onCreate")
 Copy code 

Restart APP

com.fuusy.fuperformance D/PerformanceAOP: App.initBugly() speed time = 642
com.fuusy.fuperformance D/TimeMonitorManager: APP onCreate: 667
 Copy code 

You can see that the last total time is waiting initBugly Only after the execution is completed , The start-up time is longer .

From the above description, we can know that the method of opening thread pool can only deal with general situations , When you encounter complex logic, there are disadvantages . For example, when there is a dependency between two tasks , How to deal with ? Simultaneous discovery , For each method , All need to submit a Runnable Task for execution , This is undoubtedly consuming resources .

that It can operate asynchronously 、 It can also solve the dependency between tasks , A more elegant way to execute code at the same time Is there any ? Of course. , The next step is to provide a more elegant means of asynchrony - Directed acyclic graph .

3.1.2、 Directed acyclic graph

In the actual project , The execution of tasks is sequential , For example, wechat payment SDK On initialization , You need to get the corresponding information from the background first App secret key , Then initialize the payment according to this key .

For the problem of task execution order , There is a data structure that can be well solved , That is directed acyclic graph . Let's take a look at the specific description of directed acyclic graph .、 Finite acyclic graph (DAG)

Directed acyclic graph : If there is no ring in a directed graph , It is called directed acyclic graph , Also known as DAG chart .

 Directed acyclic graph .png

The above figure is a directed acyclic graph , There is no edge pointing to each other between two vertices . If in this figure B->A Then there is a ring , It's not a directed acyclic graph .

that What does startup optimization have to do with this ?

It was said that ,DAG What the graph wants to solve is the dependency between tasks . And solve this problem , In fact, it also involves a knowledge point AOV network (Activity On Vertex Network).、AOV network (Activity On Vertex Network)

AOV A net is a net that uses vertices to represent activity , yes DAG One of the typical applications . use DAG As a project , The vertex represents activity , There is a directional side <Vi,Vj> said Activities Vi Must precede the activity Vj Conduct . If there is a directed acyclic graph above ,B Must be in A Carry out later ,D Must take precedence over E perform , There is a sequential relationship between the vertices .

This coincides with the dependency of the startup task , As long as through the AOV The execution mode of the network to perform the startup task , This solves the dependency problem of the startup task .

stay AOV In the net , Find the order of task execution , You use A topological sort .、 A topological sort

Topological sorting is a sort of the vertices of the directed acyclic graph , It makes if there is a path from the vertex A To the top B The path of , Then the vertices in the sort B Appear at the top A Behind , Every AOV Each net has one or more topological sorts . The implementation steps of topology sorting are also very simple , as follows :

The implementation of topological sort :

  1. from AOV Select one in the network without precursor ( The degree of 0) And output ;
  2. Delete the vertex and all directed edges starting from it from the net ;
  3. repeat 1 and 2 The operation of , Until now AOV Until the net is empty or there is no Vertex without precursor in the current net .

Take the case of making tea in life .

AOV Case study .png

Pictured above , Is a directed acyclic graph of making tea , For the implementation of topology sorting , We will follow the above steps :

  1. Find the entry of 0 The summit of , The degree of penetration here is 0 The vertex of is only “ Prepare tea set ” and “ Buy tea ”, Just choose one of them “ Prepare tea set ”;
  2. Get rid of “ Prepare tea set ” This vertex and remove the edge starting from it , It becomes the following figure :

 Wechat screenshot _20210815222604.png

  1. At this time, there is only “ Buy tea ” The vertex penetration is 0, Then select the vertex , And repeat 1 and 2 The operation of .

So again and again , Finally, the order of vertex execution is as follows :

 Output .png

Of course , There are many final results of topological sorting , For example, at the beginning, you can choose the penetration as 0 Of “ Buy tea ” Vertex as initial task , The result has changed , I won't discuss it in detail here .

There is a directed acyclic graph on it 、AOV The network and topology sorting have been clearly explained , The next step is to combine with the startup task . In fact, it is to execute tasks in order according to the rules of topological sorting .

/** *  A topological sort  */
fun topologicalSort(): Vector<Int> {
    val indegree = IntArray(mVerticeCount)
    for (i in 0 until mVerticeCount) { // Initialize the number of degrees of penetration of all points 
        val temp = mAdj[i] as ArrayList<Int>
        for (node in temp) {
    val queue: Queue<Int> = LinkedList()
    for (i in 0 until mVerticeCount) { // Find all the entrances as 0 The point of 
        if (indegree[i] == 0) {
    var cnt = 0
    val topOrder = Vector<Int>()
    while (!queue.isEmpty()) {
        val u = queue.poll()
        for (node in mAdj[u]) { // Find the point ( The degree of 0) All the adjacency points of 
            if (--indegree[node] == 0) { // Reduce the penetration of this point by one , If the penetration becomes 0, Then add to the penetration 0 In the line of 
    check(cnt == mVerticeCount) {  // Check whether there are rings , Theoretically, the number of points should be the same as the number of points , If it's not consistent , Description ring 
        "Exists a cycle in the graph"
    return topOrder

 Copy code 

The initiator that handles the startup task can go directly to github View in FuPerformance.

After implementing the initiator , stay Application perhaps Activity The basic usage in is as follows :

  1. Carry out each task separately , Perform inheritance in child threads Task abstract class , Such as initialization ARouter;
class RouterTask() : Task() {
    override fun run() {
        if (BuildConfig.DEBUG) {
        ARouter.init(mContext as Application?)
 Copy code 

If it must be executed in the main thread, inherit MainTask, If you need to wait for the task to complete before proceeding to the next step , You need to achieve needWait Method , return true.

override fun needWait(): Boolean {
    return true
 Copy code 
  1. If there are dependencies between tasks , You need to achieve dependsOn Method , For example, wechat payment needs to rely on AppId Acquisition .
class WeChatPayTask :Task(){

    /** *  Wechat payment depends on AppId */
    override fun dependsOn(): List<Class<out Task?>?>? {
        val task = mutableListOf<Class<out Task?>>()
        // add to AppID Acquisition Task
        return task

    override fun run() {
        // Initialize wechat payment 
 Copy code 
  1. After processing the tasks separately , Last in Application Of onCreate Add task queue to .
// Mode two 、 starter 

 Copy code 

This is the implementation and use of directed acyclic graph initiator , It can be found that it makes the code elegant , It has solved several pain points mentioned at the beginning :

  • The dependency of tasks in sub threads ;
  • When a task is executed in a child thread, it must wait for it to finish executing ;
  • Set to execute in the main thread .
  • High code coupling and waste of resources .

3.2、 Delay the

The second part of the optimization method is to delay execution , There are many ways to implement delayed execution :

  • Thread to sleep
object : Thread() {
    override fun run() {
        sleep(3000) // Sleep 3 second 
        /** *  What to do  */
 Copy code 
  • Handler#postDelayed
    Runnable {
        /** *  What to do  */
    }, 3000
 Copy code 
  • TimerTask Realization
val task: TimerTask = object : TimerTask() {
    override fun run() {
        /** *  What to do  */
val timer = Timer()
timer.schedule(task, 3000) //3 Seconds later TimeTask Of run Method 
 Copy code 

These three methods can realize delayed operation , But it can be applied to the startup task , They all have a common pain point - Unable to determine delay duration .

How to solve this pain point ?

You can use Handler Medium IdleHandler Mechanism .


In the startup process , In fact, there are some tasks that are not App It must be executed immediately after startup , In this case, we need to find the right time to perform the task . How can I find it at this time ?Android In fact, it provides us with a good mechanism . stay Handler In mechanism , Provides a method of when the message queue is idle , The timing of the mission -IdleHandler

IdleHandler It is mainly used when the message queue of the current thread is idle . Maybe you want to ask , If the message queue is not idle ,IdleHandler It has not been implemented , So what ? because IdleHandler Uncontrollability of start time , In fact, it needs to be used in combination with project business .

basis IdleHandler Characteristics of , Achieve one IdleHandler starter , as follows :

class DelayDispatcher {
    private val mDelayTasks: Queue<Task> = LinkedList<Task>()

    private val mIdleHandler = IdleHandler {
        if (mDelayTasks.size > 0) {
            val task: Task = mDelayTasks.poll()

    /** *  Add delay task  */
    fun addTask(task: Task): DelayDispatcher? {
        return this

    fun start() {
 Copy code 


 Copy code 

3.3、 Other options

  • Advance loading SharedPreferences;
  • The startup phase does not start child processes ;
  • Class loading optimization
  • I/O Optimize

Zhang Shaowen mentioned in the development master class :

When the load is too high ,I/O The performance will decline faster . Especially for low-end machines , alike I/O The operation time may be dozens of times that of high-end machines . Network... Is not recommended during startup I/O, And disk I/O For optimization, it is necessary to know what files are read during the startup process 、 How many bytes 、Buffer How big is it? 、 How long has it been used 、 In what thread and a series of information .

  • Class rearrangement

The loading order of the startup process class can be copied ClassLoader obtain

class GetClassLoader extends PathClassLoader {
    public Class<?> findClass(String name) {
        //  take  name  Record to file 
        return super.findClass(name);
 Copy code 

And then use it Facebook Open source Dex Optimization tool Whole class in Dex The order of arrangement in .

ReDex It's a Android Bytecode (dex) Optimizer , By the first Facebook Development . It provides a for reading 、 Write and analyze .dex The framework of the document , And a set of optimized transmission of bytecode using the framework .

  • Resource file rearrangement

For the principle and landing scheme of resource file rearrangement, please refer to Alipay App Build optimization resolution : Optimized by package rearrangement Android End boot performance

3.4、 Black science and technology

  • Start phase suppression GC

Alipay uses this way. , You can refer directly to Analysis of Alipay Client Architecture :Android Client start speed optimization 「 Garbage collection 」

  • CPU Frequency locking

CPU The higher the working frequency , The faster the operation , But the higher the energy consumption , In order to start the speed increase , The tensile CPU frequency , It's fast , But mobile phones also consume more energy .

Four 、 summary

The above describes some business-related optimization methods and some business-related black technologies , Can effectively improve App The starting speed of . There are many schemes to start optimization , However, we still need to judge and implement the scheme in combination with the actual project situation .

Last , Performance optimization is a long-term process , I will open a series of performance optimization theory and practice , It mainly involves starting 、 Memory 、 Carton 、 Slimming 、 Network optimization , Stay tuned .

  • Startup optimization

Project address : fuusy/FuPerformance

Reference material :

Analysis of Alipay Client Architecture :Android Client start speed optimization 「 Garbage collection 」
Alipay App Build optimization resolution : Optimized by package rearrangement Android End boot performance
At home Top Team bull takes you to play Android Performance analysis and optimization
Android Development master class
Lightweight APP Start the information construction scheme

Recommended reading :

「 Performance optimization series 」APP Start optimization theory and practice ( On )

copyright notice
author[Fu Xi],Please bring the original link to reprint, thank you.

Random recommended