Android14 WatchDog源码解析

Android 14源码参考:Search

一、Android WatchDog 概述

WatchDog 是 Android 系统中的一个关键组件,负责监控系统性能并检测是否存在应用或系统服务的长时间无响应(ANR: Application Not Responding)。它通过检测系统中的主要线程(如主线程、Binder 线程)的响应情况,来判断系统是否处于正常运行状态。一旦发现长时间无响应,WatchDog 会采取相应的措施,如记录日志、重启系统服务等,以保持系统的流畅性和稳定性。

1.1 工作原理

  1. 初始化与启动
    • 初始化:在SystemServer启动过程中,通过Watchdog.getInstance().init(context, mActivityManagerService);完成初始化,并注册必要的回调和监控线程。
    • 启动:在AMS(ActivityManagerService)的systemReady方法执行完毕后,通过Watchdog.getInstance().start();启动Watchdog线程。
  1. 监控线程
    • Watchdog通过HandlerChecker对象来监控特定的线程。这些HandlerChecker对象与特定的Handler(即Looper和Thread)关联,用于检查线程是否处于正常工作状态。
    • 监控的线程包括前台线程(FgThread)、主线程(MainThread)、UI线程、IO线程、Display线程等关键系统线程。
  1. 超时检测
    • Watchdog会定期检查每个被监控线程的Handler消息队列,如果在设定的超时时间内没有消息处理,则认为该线程可能出现了死锁或异常。
    • 超时时间可以在创建HandlerChecker时指定,默认为一定的毫秒数(如60秒)。
  1. 异常处理
    • 一旦检测到线程超时,Watchdog会触发异常处理流程,包括记录异常日志、尝试恢复线程、以及最终重启system_server进程。
    • 重启system_server进程是Watchdog作为最后手段的恢复措施,旨在通过重启来清除可能存在的死锁或异常状态。

二、WatchDog初始化

2.1 SystemServer.startBootstrapServices

private void startBootstrapServices(@NonNull TimingsTraceAndSlog t) {...t.traceBegin("StartWatchdog");//创建watchdog【见小节2.2】final Watchdog watchdog = Watchdog.getInstance();// watchdog启动【见小节3.1】watchdog.start();mDumper.addDumpable(watchdog);t.traceEnd();....t.traceBegin("InitWatchdog");//注册reboot广播【见小节2.3】watchdog.init(mSystemContext, mActivityManagerService);t.traceEnd();
}

system_server进程启动的过程中初始化WatchDog,主要有:

  • 创建watchdog对象,该对象本身继承于Thread
  • 调用start()开始工作
  • 注册reboot广播

从源码看到Android10开始将Watchdog初始化、启动放到了startBootstrapServices中,启动放到了注册reboot广播前

2.2 getInstance

Watchdog.java

public static Watchdog getInstance() {if (sWatchdog == null) {//单例模式,创建实例对象【见小节2.3】sWatchdog = new Watchdog();}return sWatchdog;
}

2.3 创建Watchdog

public class Watchdog implements Dumpable {//所有的HandlerChecker对象组成的列表,HandlerChecker对象类型【见小节2.3.1】/* This handler will be used to post message back onto the main thread */private final ArrayList<HandlerCheckerAndTimeout> mHandlerCheckers = new ArrayList<>();.....private Watchdog() {mThread = new Thread(this::run, "watchdog");// Initialize handler checkers for each common thread we want to check.  Note// that we are not currently checking the background thread, since it can// potentially hold longer running operations with no guarantees about the timeliness// of operations there.//// Use a custom thread to check monitors to avoid lock contention from impacted other// threads.ServiceThread t = new ServiceThread("watchdog.monitor",android.os.Process.THREAD_PRIORITY_DEFAULT, true /*allowIo*/);t.start();mMonitorChecker = new HandlerChecker(new Handler(t.getLooper()), "monitor thread");mHandlerCheckers.add(withDefaultTimeout(mMonitorChecker));mHandlerCheckers.add(withDefaultTimeout(new HandlerChecker(FgThread.getHandler(), "foreground thread")));// Add checker for main thread.  We only do a quick check since there// can be UI running on the thread.mHandlerCheckers.add(withDefaultTimeout(new HandlerChecker(new Handler(Looper.getMainLooper()), "main thread")));// Add checker for shared UI thread.mHandlerCheckers.add(withDefaultTimeout(new HandlerChecker(UiThread.getHandler(), "ui thread")));// And also check IO thread.mHandlerCheckers.add(withDefaultTimeout(new HandlerChecker(IoThread.getHandler(), "i/o thread")));// And the display thread.mHandlerCheckers.add(withDefaultTimeout(new HandlerChecker(DisplayThread.getHandler(), "display thread")));// And the animation thread.mHandlerCheckers.add(withDefaultTimeout(new HandlerChecker(AnimationThread.getHandler(), "animation thread")));// And the surface animation thread.mHandlerCheckers.add(withDefaultTimeout(new HandlerChecker(SurfaceAnimationThread.getHandler(),"surface animation thread")));// Initialize monitor for Binder threads.addMonitor(new BinderThreadMonitor());mInterestingJavaPids.add(Process.myPid());// See the notes on DEFAULT_TIMEOUT.assert DB ||DEFAULT_TIMEOUT > ZygoteConnectionConstants.WRAPPED_PID_TIMEOUT_MILLIS;mTraceErrorLogger = new TraceErrorLogger();
}
}

mHandlerCheckers队列包括、 主线程,fg, ui, io, display,animation线程的HandlerChecker对象等。

2.3.1 HandlerChecker
public final class HandlerChecker implements Runnable {public final class HandlerChecker implements Runnable {private final Handler mHandler;//Handler对象private final String mName; //线程描述名private final ArrayList<Monitor> mMonitors = new ArrayList<Monitor>();private final ArrayList<Monitor> mMonitorQueue = new ArrayList<Monitor>();private long mWaitMaxMillis;//最长等待时间private boolean mCompleted;//开始检查时先设置成falseprivate Monitor mCurrentMonitor;private long mStartTimeMillis; //开始准备检查的时间点private int mPauseCount;HandlerChecker(Handler handler, String name) {mHandler = handler;mName = name;mCompleted = true;}
}
2.3.2 addMonitor
public class Watchdog implements Dumpable {public void addMonitor(Monitor monitor) {synchronized (mLock) {//此处mMonitorChecker数据类型为HandlerCheckermMonitorChecker.addMonitorLocked(monitor);}}public final class HandlerChecker implements Runnable {private final ArrayList<Monitor> mMonitors = new ArrayList<Monitor>();void addMonitorLocked(Monitor monitor) {// We don't want to update mMonitors when the Handler is in the middle of checking// all monitors. We will update mMonitors on the next schedule if it is safemMonitorQueue.add(monitor);}...}
}

监控Binder线程, 将monitor添加到HandlerChecker的成员变量mMonitors列表中。 在这里是将BinderThreadMonitor对象加入该线程。

private static final class BinderThreadMonitor implements Watchdog.Monitor {@Overridepublic void monitor() {Binder.blockUntilThreadAvailable();}
}

blockUntilThreadAvailable最终调用的是IPCThreadState,等待有空闲的binder线程

void IPCThreadState::blockUntilThreadAvailable()
{pthread_mutex_lock(&mProcess->mThreadCountLock);mProcess->mWaitingForThreads++;while (mProcess->mExecutingThreadsCount >= mProcess->mMaxThreads) {ALOGW("Waiting for thread to be free. mExecutingThreadsCount=%lu mMaxThreads=%lu\n",static_cast<unsigned long>(mProcess->mExecutingThreadsCount),static_cast<unsigned long>(mProcess->mMaxThreads));//等待正在执行的binder线程小于进程最大binder线程上限(16个)pthread_cond_wait(&mProcess->mThreadCountDecrement, &mProcess->mThreadCountLock);}mProcess->mWaitingForThreads--;pthread_mutex_unlock(&mProcess->mThreadCountLock);
}

可见addMonitor(new BinderThreadMonitor())是将Binder线程添加到android.fg线程的handler(mMonitorChecker)来检查是否工作正常。

2.3 init

[-> Watchdog.java]

public void init(Context context, ActivityManagerService activity) {mActivity = activity;//注册reboot广播接收者【见小节2.3.1】context.registerReceiver(new RebootRequestReceiver(),new IntentFilter(Intent.ACTION_REBOOT),android.Manifest.permission.REBOOT, null);
}
2.3.1 RebootRequestReceiver
final class RebootRequestReceiver extends BroadcastReceiver {@Overridepublic void onReceive(Context c, Intent intent) {if (intent.getIntExtra("nowait", 0) != 0) {//【见小节2.3.2】rebootSystem("Received ACTION_REBOOT broadcast");return;}Slog.w(TAG, "Unsupported ACTION_REBOOT broadcast: " + intent);}
}
2.3.2 rebootSystem
void rebootSystem(String reason) {Slog.i(TAG, "Rebooting system because: " + reason);IPowerManager pms = (IPowerManager)ServiceManager.getService(Context.POWER_SERVICE);try {//通过PowerManager执行reboot操作pms.reboot(false, reason, false);} catch (RemoteException ex) {}
}

最终是通过PowerManagerService来完成重启操作,具体的重启流程后续会单独讲述。

三、Watchdog检测机制

当调用Watchdog.getInstance().start()时,则进入线程“watchdog”的run()方法, 该方法分成两部分:

  • 前半部 [小节3.1] 用于监测是否触发超时;
  • 后半部 [小节4.1], 当触发超时则输出各种信息。

3.1 run

 private void run() {boolean waitedHalf = false;while (true) {List<HandlerChecker> blockedCheckers = Collections.emptyList();String subject = "";boolean allowRestart = true;int debuggerWasConnected = 0;boolean doWaitedHalfDump = false;The value of mWatchdogTimeoutMillis might change while we are executing the loop.// We store the current value to use a consistent value for all handlers.final long watchdogTimeoutMillis = mWatchdogTimeoutMillis;final long checkIntervalMillis = watchdogTimeoutMillis / 2;final ArrayList<Integer> pids;synchronized (mLock) {long timeout = checkIntervalMillis;// Make sure we (re)spin the checkers that have become idle within// this wait-and-check intervalfor (int i=0; i<mHandlerCheckers.size(); i++) {HandlerCheckerAndTimeout hc = mHandlerCheckers.get(i);// We pick the watchdog to apply every time we reschedule the checkers. The// default timeout might have changed since the last run.//执行所有的Checker的监控方法, 每个Checker记录当前的mStartTime[见小节3.2]hc.checker().scheduleCheckLocked(hc.customTimeoutMillis().orElse(watchdogTimeoutMillis * Build.HW_TIMEOUT_MULTIPLIER));}if (debuggerWasConnected > 0) {debuggerWasConnected--;}// NOTE: We use uptimeMillis() here because we do not want to increment the time we// wait while asleep. If the device is asleep then the thing that we are waiting// to timeout on is asleep as well and won't have a chance to run, causing a false// positive on when to kill things.long start = SystemClock.uptimeMillis();//通过循环,保证执行30s才会继续往下执行while (timeout > 0) {if (Debug.isDebuggerConnected()) {debuggerWasConnected = 2;}try {//触发中断,直接捕获异常,继续等待.mLock.wait(timeout);// Note: mHandlerCheckers and mMonitorChecker may have changed after waiting} catch (InterruptedException e) {Log.wtf(TAG, e);}if (Debug.isDebuggerConnected()) {debuggerWasConnected = 2;}timeout = checkIntervalMillis - (SystemClock.uptimeMillis() - start);}//评估Checker状态【见小节3.3】final int waitState = evaluateCheckerCompletionLocked();if (waitState == COMPLETED) {// The monitors have returned; resetwaitedHalf = false;continue;} else if (waitState == WAITING) {// still waiting but within their configured intervals; back off and recheckcontinue;} else if (waitState == WAITED_HALF) {if (!waitedHalf) {Slog.i(TAG, "WAITED_HALF");//首次进入等待时间过半的状态waitedHalf = true;// We've waited half, but we'd need to do the stack trace dump w/o the lock.blockedCheckers = getCheckersWithStateLocked(WAITED_HALF);//【见小节3.5】subject = describeCheckersLocked(blockedCheckers);pids = new ArrayList<>(mInterestingJavaPids);doWaitedHalfDump = true;} else {continue;}} else {// something is overdue!blockedCheckers = getCheckersWithStateLocked(OVERDUE);subject = describeCheckersLocked(blockedCheckers);allowRestart = mAllowRestart;pids = new ArrayList<>(mInterestingJavaPids);}} // END synchronized (mLock)//如果我们到了这里,这意味着系统很可能挂起了。//首先从系统进程的所有线程收集堆栈跟踪。//然后,如果我们达到了完全超时,请终止此进程,以便系统重新启动。如果我们达到了超时时间的一半,只需记录一些信息并继续。logWatchog(doWaitedHalfDump, subject, pids);if (doWaitedHalfDump) {// We have waited for only half of the timeout, we continue to wait for the duration// of the full timeout before killing the process.continue;}IActivityController controller;synchronized (mLock) {controller = mController;}if (controller != null) {Slog.i(TAG, "Reporting stuck state to activity controller");try {Binder.setDumpDisabled("Service dumps disabled due to hung system process.");// 1 = keep waiting, -1 = kill systemint res = controller.systemNotResponding(subject);if (res >= 0) {Slog.i(TAG, "Activity controller requested to coninue to wait");waitedHalf = false;continue;}} catch (RemoteException e) {}}// Only kill the process if the debugger is not attached.if (Debug.isDebuggerConnected()) {debuggerWasConnected = 2;}if (debuggerWasConnected >= 2) {Slog.w(TAG, "Debugger connected: Watchdog is *not* killing the system process");} else if (debuggerWasConnected > 0) {Slog.w(TAG, "Debugger was connected: Watchdog is *not* killing the system process");} else if (!allowRestart) {Slog.w(TAG, "Restart not allowed: Watchdog is *not* killing the system process");} else {Slog.w(TAG, "*** WATCHDOG KILLING SYSTEM PROCESS: " + subject);WatchdogDiagnostics.diagnoseCheckers(blockedCheckers);Slog.w(TAG, "*** GOODBYE!");if (!Build.IS_USER && isCrashLoopFound()&& !WatchdogProperties.should_ignore_fatal_count().orElse(false)) {breakCrashLoop();}//杀死进程system_server【见小节4.5】Process.killProcess(Process.myPid());System.exit(10);}waitedHalf = false;}
}

该方法主要功能:

  1. 执行所有的Checker的监控方法scheduleCheckLocked()
    • 当mMonitor个数为0(除了android.fg线程之外都为0)且处于poll状态,则设置mCompleted = true;
    • 当上次check还没有完成, 则直接返回.
  1. 等待30s后, 再调用evaluateCheckerCompletionLocked来评估Checker状态;
  2. 根据waitState状态来执行不同的操作:
    • 当COMPLETED或WAITING,则相安无事;
    • 当WAITED_HALF(超过30s)且为首次, 则输出system_server和3个Native进程的traces;
    • 当OVERDUE, 则输出更多信息.

由此,可见当触发一次Watchdog, 则必然会调用两次AMS.dumpStackTraces, 也就是说system_server和3个Native进程的traces 的traces信息会输出两遍,且时间间隔超过30s.

收集完信息后便会杀死system_server进程。此处allowRestart默认值为true, 当执行am hang操作则设置不允许重启(allowRestart =false), 则不会杀死system_server进程.

3.2 scheduleCheckLocked

public void scheduleCheckLocked(long handlerCheckerTimeoutMillis) {mWaitMaxMillis = handlerCheckerTimeoutMillis;if (mCompleted) {// Safe to update monitors in queue, Handler is not in the middle of workmMonitors.addAll(mMonitorQueue);mMonitorQueue.clear();}if ((mMonitors.size() == 0 && mHandler.getLooper().getQueue().isPolling())|| (mPauseCount > 0)) {// Don't schedule until after resume OR// If the target looper has recently been polling, then// there is no reason to enqueue our checker on it since that// is as good as it not being deadlocked.  This avoid having// to do a context switch to check the thread. Note that we// only do this if we have no monitors since those would need to// be executed at this point.mCompleted = true;//当目标looper正在轮询状态则返回。return;}if (!mCompleted) {// we already have a check in flight, so no needreturn;//有一个check正在处理中,则无需重复发送}mCompleted = false;mCurrentMonitor = null;// 记录当下的时间mStartTimeMillis = SystemClock.uptimeMillis();//发送消息,插入消息队列最开头, 见下方的run()方法mHandler.postAtFrontOfQueue(this);@Override
public void run() {// Once we get here, we ensure that mMonitors does not change even if we call// #addMonitorLocked because we first add the new monitors to mMonitorQueue and// move them to mMonitors on the next schedule when mCompleted is true, at which// point we have completed execution of this method.final int size = mMonitors.size();for (int i = 0 ; i < size ; i++) {synchronized (mLock) {mCurrentMonitor = mMonitors.get(i);}//回调具体服务的monitor方法mCurrentMonitor.monitor();}synchronized (mLock) {mCompleted = true;mCurrentMonitor = null;}
}

该方法主要功能: 向Watchdog的监控线程的Looper池的最头部执行该HandlerChecker.run()方法, 在该方法中调用monitor(),执行完成后会设置mCompleted = true. 那么当handler消息池当前的消息, 导致迟迟没有机会执行monitor()方法, 则会触发watchdog.

其中postAtFrontOfQueue(this),该方法输入参数为Runnable对象,根据消息机制, 最终会回调HandlerChecker中的run方法,该方法会循环遍历所有的Monitor接口,具体的服务实现该接口的monitor()方法。

可能的问题,如果有其他消息不断地调用postAtFrontOfQueue()也可能导致watchdog没有机会执行;或者是每个monitor消耗一些时间,雷加起来超过1分钟造成的watchdog. 这些都是非常规的Watchdog.

3.3 evaluateCheckerCompletionLocked

private int evaluateCheckerCompletionLocked() {int state = COMPLETED;for (int i=0; i<mHandlerCheckers.size(); i++) {HandlerChecker hc = mHandlerCheckers.get(i).checker();【见小节3.4】state = Math.max(state, hc.getCompletionStateLocked());}return state;
}

获取mHandlerCheckers列表中等待状态值最大的state.

3.4 getCompletionStateLocked

public int getCompletionStateLocked() {if (mCompleted) {return COMPLETED;} else {long latency = SystemClock.uptimeMillis() - mStartTimeMillis;if (latency < mWaitMaxMillis / 2) {return WAITING;} else if (latency < mWaitMaxMillis) {return WAITED_HALF;}}return OVERDUE;
}
  • COMPLETED = 0:等待完成;
  • WAITING = 1:等待时间小于DEFAULT_TIMEOUT的一半,即30s;
  • WAITED_HALF = 2:等待时间处于30s~60s之间;
  • OVERDUE = 3:等待时间大于或等于60s。

3.5describeCheckersLocked

String describeBlockedStateLocked() {final String prefix;/非前台线程进入该分支if (mCurrentMonitor == null) {prefix = "Blocked in handler on ";//前台线程进入该分支} else {prefix =  "Blocked in monitor " + mCurrentMonitor.getClass().getName();}long latencySeconds = (SystemClock.uptimeMillis() - mStartTimeMillis) / 1000;return prefix + " on " + mName + " (" + getThread().getName() + ")"+ " for " + latencySeconds + "s";
}

将所有执行时间超过1分钟的handler线程或者monitor都记录下来.

  • 当输出的信息是Blocked in handler,意味着相应的线程处理当前消息时间超过1分钟;
  • 当输出的信息是Blocked in monitor,意味着相应的线程处理当前消息时间超过1分钟,或者monitor迟迟拿不到锁;

四. Watchdog处理流程

4.1 logWatchog

private void logWatchog(boolean halfWatchdog, String subject, ArrayList<Integer> pids) {// Get critical event log before logging the half watchdog so that it doesn't// occur in the log.String criticalEvents =CriticalEventLog.getInstance().logLinesForSystemServerTraceFile();final UUID errorId = mTraceErrorLogger.generateErrorId();if (mTraceErrorLogger.isAddErrorIdEnabled()) {mTraceErrorLogger.addProcessInfoAndErrorIdToTrace("system_server", Process.myPid(),errorId);mTraceErrorLogger.addSubjectToTrace(subject, errorId);}final String dropboxTag;if (halfWatchdog) {dropboxTag = "pre_watchdog";CriticalEventLog.getInstance().logHalfWatchdog(subject);FrameworkStatsLog.write(FrameworkStatsLog.SYSTEM_SERVER_PRE_WATCHDOG_OCCURRED);} else {dropboxTag = "watchdog";CriticalEventLog.getInstance().logWatchdog(subject, errorId);EventLog.writeEvent(EventLogTags.WATCHDOG, subject);// Log the atom as early as possible since it is used as a mechanism to trigger// Perfetto. Ideally, the Perfetto trace capture should happen as close to the// point in time when the Watchdog happens as possible.FrameworkStatsLog.write(FrameworkStatsLog.SYSTEM_SERVER_WATCHDOG_OCCURRED, subject);}long anrTime = SystemClock.uptimeMillis();StringBuilder report = new StringBuilder();report.append(ResourcePressureUtil.currentPsiState());ProcessCpuTracker processCpuTracker = new ProcessCpuTracker(false);StringWriter tracesFileException = new StringWriter();//【见小节4.2】final File stack = StackTracesDumpHelper.dumpStackTraces(pids, processCpuTracker, new SparseBooleanArray(),CompletableFuture.completedFuture(getInterestingNativePids()), tracesFileException,subject, criticalEvents, Runnable::run, /* latencyTracker= */null);// Give some extra time to make sure the stack traces get written.// The system's been hanging for a whlie, another second or two won't hurt much.SystemClock.sleep(5000);processCpuTracker.update();report.append(processCpuTracker.printCurrentState(anrTime));report.append(tracesFileException.getBuffer());if (!halfWatchdog) {// Trigger the kernel to dump all blocked threads, and backtraces on all CPUs to the// kernel logdoSysRq('w');doSysRq('l');}// Try to add the error to the dropbox, but assuming that the ActivityManager// itself may be deadlocked.  (which has happened, causing this statement to// deadlock and the watchdog as a whole to be ineffective)Thread dropboxThread = new Thread("watchdogWriteToDropbox") {public void run() {// If a watched thread hangs before init() is called, we don't have a// valid mActivity. So we can't log the error to dropbox.if (mActivity != null) {mActivity.addErrorToDropBox(dropboxTag, null, "system_server", null, null, null,null, report.toString(), stack, null, null, null,errorId);}}};dropboxThread.start();try {dropboxThread.join(2000);  // wait up to 2 seconds for it to return.} catch (InterruptedException ignored) { }
}

Watchdog检测到异常的信息收集工作:

  • dumpStackTraces:输出Java和Native进程的栈信息;
  • doSysRq
  • dropBox

4.2 StackTracesDumpHelper.dumpStackTraces

/* package */ static File dumpStackTraces(ArrayList<Integer> firstPids,ProcessCpuTracker processCpuTracker, SparseBooleanArray lastPids,Future<ArrayList<Integer>> nativePidsFuture, StringWriter logExceptionCreatingFile,AtomicLong firstPidEndOffset, String subject, String criticalEventSection,String memoryHeaders, @NonNull Executor auxiliaryTaskExecutor,Future<File> firstPidFilePromise, AnrLatencyTracker latencyTracker) {try {if (latencyTracker != null) {latencyTracker.dumpStackTracesStarted();}Slog.i(TAG, "dumpStackTraces pids=" + lastPids);// Measure CPU usage as soon as we're called in order to get a realistic sampling// of the top users at the time of the request.Supplier<ArrayList<Integer>> extraPidsSupplier = processCpuTracker != null? () -> getExtraPids(processCpuTracker, lastPids, latencyTracker) : null;Future<ArrayList<Integer>> extraPidsFuture = null;if (extraPidsSupplier != null) {extraPidsFuture =CompletableFuture.supplyAsync(extraPidsSupplier, auxiliaryTaskExecutor);}final File tracesDir = new File(ANR_TRACE_DIR);// NOTE: We should consider creating the file in native code atomically once we've// gotten rid of the old scheme of dumping and lot of the code that deals with paths// can be removed.File tracesFile;try {tracesFile = createAnrDumpFile(tracesDir);} catch (IOException e) {Slog.w(TAG, "Exception creating ANR dump file:", e);if (logExceptionCreatingFile != null) {logExceptionCreatingFile.append("----- Exception creating ANR dump file -----\n");e.printStackTrace(new PrintWriter(logExceptionCreatingFile));}if (latencyTracker != null) {latencyTracker.anrSkippedDumpStackTraces();}return null;}if (subject != null || criticalEventSection != null || memoryHeaders != null) {appendtoANRFile(tracesFile.getAbsolutePath(),(subject != null ? "Subject: " + subject + "\n" : "")+ (memoryHeaders != null ? memoryHeaders + "\n\n" : "")+ (criticalEventSection != null ? criticalEventSection : ""));}long firstPidEndPos = dumpStackTraces(tracesFile.getAbsolutePath(), firstPids, nativePidsFuture,extraPidsFuture, firstPidFilePromise, latencyTracker);if (firstPidEndOffset != null) {firstPidEndOffset.set(firstPidEndPos);}// Each set of ANR traces is written to a separate file and dumpstate will process// all such files and add them to a captured bug report if they're recent enough.maybePruneOldTraces(tracesDir);return tracesFile;} finally {if (latencyTracker != null) {latencyTracker.dumpStackTracesEnded();}}
}

输出system_server和mediaserver,/sdcard,surfaceflinger这3个native进程的traces信息。

4.3 doSysRq

private void doSysRq(char c) {try {FileWriter sysrq_trigger = new FileWriter("/proc/sysrq-trigger");sysrq_trigger.write(c);sysrq_trigger.close();} catch (IOException e) {Slog.w(TAG, "Failed to write to /proc/sysrq-trigger", e);}
}

通过向节点/proc/sysrq-trigger写入字符,触发kernel来dump所有阻塞线程,输出所有CPU的backtrace到kernel log。

4.4 dropBox

输出文件到/data/system/dropbox。对于触发watchdog时,生成的dropbox文件的tag是system_server_watchdog,内容是traces以及相应的blocked信息。

4.5 killProcess

Process.killProces通过发送信号9给目标进程来完成杀进程的过程。

当杀死system_server进程,从而导致zygote进程自杀,进而触发init执行重启Zygote进程,这便出现了手机framework重启的现象。

五. 总结

Watchdog是一个运行在system_server进程的名为”watchdog”的线程::

  • Watchdog运作过程,当阻塞时间超过1分钟则触发一次watchdog,会杀死system_server,触发上层重启;
  • mHandlerCheckers记录所有的HandlerChecker对象的列表,包括foreground, main, ui, i/o, display线程的handler;
  • mHandlerChecker.mMonitors记录所有Watchdog目前正在监控Monitor,所有的这些monitors都运行在foreground线程。
  • 有两种方式加入Watchdog监控:
    • addThread():用于监测Handler线程,默认超时时长为60s.这种超时往往是所对应的handler线程消息处理得慢;
    • addMonitor(): 用于监控实现了Watchdog.Monitor接口的服务.这种超时可能是”android.fg”线程消息处理得慢,也可能是monitor迟迟拿不到锁;

以下情况,即使触发了Watchdog,也不会杀掉system_server进程:

  • monkey: 设置IActivityController,拦截systemNotResponding事件, 比如monkey.
  • hang: 执行am hang命令,不重启;
  • debugger: 连接debugger的情况, 不重启;
5.1 监控Handler线程

Watchdog监控的线程有:默认地DEFAULT_TIMEOUT=60s,调试时才为10s方便找出潜在的ANR问题。

线程名

对应handler

说明

Timeout

main

new Handler(Looper.getMainLooper())

当前主线程

1min

android.fg

FgThread.getHandler

前台线程

1min

android.ui

UiThread.getHandler

UI线程

1min

android.io

IoThread.getHandler

I/O线程

1min

android.display

DisplayThread.getHandler

display线程

1min

ActivityManager

AMS.MainHandler

AMS线程

1min

PowerManagerService

PMS.PowerManagerHandler

PMS线程

1min

PackageManager

PKMS.PackageHandler

PKMS线程

10min

目前watchdog会监控system_server进程中的以上8个线程:

  • 前7个线程的Looper消息处理时间不得超过1分钟;
  • PackageManager线程的处理时间不得超过10分钟;
5.2 监控同步锁

能够被Watchdog监控的系统服务都实现了Watchdog.Monitor接口,并实现其中的monitor()方法。运行在android.fg线程, 系统中实现该接口类主要有:

  • ActivityManagerService
  • WindowManagerService
  • InputManagerService
  • PowerManagerService
  • NetworkManagementService
  • MountService
  • NativeDaemonConnector
  • BinderThreadMonitor
  • MediaProjectionManagerService
  • MediaRouterService
  • MediaSessionService
  • BinderThreadMonitor
5.3 输出信息

watchdog在check过程中出现阻塞1分钟的情况,则会输出:

  1. AMS.dumpStackTraces:输出system_server和3个native进程的traces
    • 该方法会输出两次,第一次在超时30s的地方;第二次在超时1min;
  1. doSysRq, 触发kernel来dump所有阻塞线程,输出所有CPU的backtrace到kernel log;
    • 节点/proc/sysrq-trigger
  1. dropBox,输出文件到/data/system/dropbox,内容是trace + blocked信息
  2. 杀掉system_server,进而触发zygote进程自杀,从而重启上层framework。

到这里分析结束了,有什么问题欢迎指正

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.rhkb.cn/news/388201.html

如若内容造成侵权/违法违规/事实不符,请联系长河编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

八股文无用?也许是计算机大学生的重要人生指南!

大家所说的"八股文"其实指的是那些固定、标准化的面试问题和答案&#xff0c;通常涉及特定的知识点和技术概念。 博主本人也是一枚大学生&#xff0c;个人也记背过相关的八股文&#xff0c;比如计算机网络里的TCP和UDP的区别、TCP三次握手和四次挥手的具体过程等等&a…

汽车电子KL15,KLR,KL30等术语解释

KL作为术语&#xff0c;是德语’klemme’的缩写&#xff0c;代表连接器或连接 缩略词解释KL15汽车电源的RUN模式KL50汽车电源的Crank模式KLR汽车电源的ACC模式KL30汽车蓄电池的正极&#xff0c;始终保持带电状态KL31汽车蓄电池的负极&#xff0c;持续与车辆接地连接KL4048V汽车…

遇到Websocket就不会测了?别慌,学会这个Jmeter插件轻松解决....

websocket 是一种双向通信协议&#xff0c;在建立连接后&#xff0c;websocket服务端和客户端都能主动向对方发送或者接收数据&#xff0c;而在http协议中&#xff0c;一个request只能有一个response&#xff0c;而且这个response也是被动的&#xff0c;不能主动发起。 websoc…

OpenCV C++的网络实时视频流传输——基于Yolov5 face与TCP实现实时推流的深度学习图像处理客户端与服务器端

前言 在Windows下使用TCP协议&#xff0c;基于OpenCV C与Yolov5实现了一个完整的实时推流的深度学习图像处理客户端与服务器端&#xff0c;为了达到实时传输的效果&#xff0c;客户端使用了多线程的方式实现。深度学习模型是基于onnxruntime的GPU推理。&#xff0c;实现效果如…

微服务架构三大利器:限流、降级与熔断

文章目录 前言一、限流&#xff08;Rate Limiting&#xff09;二、降级&#xff08;Degradation&#xff09;三、熔断&#xff08;Circuit Breaker&#xff09;四、三者关系总结 前言 限流、降级和熔断是分布式系统中常用的容错策略&#xff0c;它们各自承担着不同的角色&#…

干货 | 2024中国联通算力网络安全白皮书(免费下载)

本白皮书以国家整体安全观为指导&#xff0c;充分发挥网络安全现代产业链链长的主体支撑和融通带动作用&#xff0c;提出算力网络“新质安全、共链可信”的安全愿景和“构建开放融合内生免疫弹性健壮网安智治的一体化安全”的安全目标。从运营商开展网络建设和应用部署的角度出…

WebWorker处理百万数据

Home.vue <template><el-input v-model"Val" style"width: 400px"></el-input><el-button click"imgHandler">过滤</el-button><hr /><canvas id"myCanvas" width"500" height&quo…

Linux系统之DHCP服务配置

1、准备阶段 Windows&#xff08;客户端&#xff09;开启Vmnet8网卡Linux6&#xff08;服务端&#xff09;网络连接选择NAT模式&#xff0c;并配置IP地址为192.168.11.1/24Linux5&#xff08;客户端&#xff09;网络连接选择NAT模式将NAT的DHCP功能取消 2、DHCP服务器相关软件…

宝塔部署springboot vue ruoyi前后端分离项目,分离lib、resources

1、“文件”中创建好相关项目目录,并将项目相关文件传到对应目录 例如&#xff1a;项目名称/ #项目总目录 api/ #存放jar项目的Java项目文件 manage/ #vue管理后端界面 …

Vue3_对接声网实时音视频_多人视频会议

目录 一、声网 1.注册账号 2.新建项目 二、实时音视频集成 1.声网CDN集成 2.iframe嵌入html 3.自定义UI集成 4.提高进入房间速度 web项目需要实现一个多人会议&#xff0c;对接的声网的灵动课堂。在这里说一下对接流程。 一、声网 声网成立于2014年&#xff0c;是全球…

ARCGIS PRO DSK GraphicsLayer创建文本要素

一、判断GraphicsLayer层【地块注记】是否存在&#xff0c;如果不存在则新建、如果存在则删除所有要素 Dim GraphicsLayer pmap.GetLayersAsFlattenedList().OfType(Of ArcGIS.Desktop.Mapping.GraphicsLayer).FirstOrDefault() 获取当前map对象中的GetLayer图层 Await Queue…

DataKit之OpenGauss数据迁移工具

# 在讲openGauss和datakit之前&#xff0c;我先说下pgloader这个工具也支持将数据从mysql同步到openGauss或者postgresql&#xff0c;但是 注意了&#xff0c;官网明确说明了不支持视图和触发器的迁移&#xff0c;如果你只是迁移表结构和数据&#xff0c;那么这个既简单又快下面…

使用Go的tls库搭建HTTPS服务

文章目录 tls.go 中文文档使用OpenSSL生成证书Win系统安装openssl生成证书 HTTP情况下的通信编写服务器代码编写客户端代码 tls.go 中文文档 https://studygolang.com/pkgdoc 使用OpenSSL生成证书 Win系统安装openssl 安装地址 https://slproweb.com/products/Win32OpenSSL.…

设计模式17-适配模式

设计模式17-适配模式 动机定义与结构C代码推导总结应用具体应用示例 动机 在软件系统中由于应用环境的变化常常需要将一些现存的对象。放到新的环境中去应用。但是新环境要求的接口是这些现存对象所不满足的。那么这种情况下如何应对这种迁移的变化&#xff1f;如何既能利用现…

计算机毕业设计选题推荐-戏曲文化体验系统-Java/Python项目实战

✨作者主页&#xff1a;IT毕设梦工厂✨ 个人简介&#xff1a;曾从事计算机专业培训教学&#xff0c;擅长Java、Python、微信小程序、Golang、安卓Android等项目实战。接项目定制开发、代码讲解、答辩教学、文档编写、降重等。 ☑文末获取源码☑ 精彩专栏推荐⬇⬇⬇ Java项目 Py…

Python自动发送邮件如何设置邮件内容格式?

Python自动发送邮件时&#xff0c;如何自动化发送HTML格式邮件&#xff1f; Python是一种功能强大且灵活的编程语言&#xff0c;广泛用于各种自动化任务&#xff0c;其中包括自动发送邮件。AokSend将介绍在使用Python自动发送邮件时&#xff0c;如何设置邮件内容的格式&#x…

【系统架构设计师】二十二、嵌入式系统架构设计理论与实践②

目录 五、嵌入式中间件 5.1 嵌入式中间件定义 5.2 嵌入式中间件的分类 六、嵌入式系统软件架构设计方法 6.1 基于架构的软件设计开发方法的应用 6.2 属性驱动的软件设计方法 6.2.1 ADD 开发方法的质量属性与场景 6.2.2 ADD 开发过程 6.3 实时系统设计方法 6.3.1 DART…

索引:SpringCloudAlibaba分布式组件全部框架笔记

索引&#xff1a;SpringCloudAlibaba分布式组件全部框架笔记 一推荐一套分布式微服务的版本管理父工程pom模板&#xff1a;Springcloud、SpringCloudAlibaba、Springboot二SpringBoot、SpringCloud、SpringCloudAlibaba等各种组件的版本匹配图&#xff1a;三Spring Cloud Aliba…

【MySQL篇】Percona XtraBackup标准化全库完整备份策略(第三篇,总共五篇)

&#x1f4ab;《博主介绍》&#xff1a;✨又是一天没白过&#xff0c;我是奈斯&#xff0c;DBA一名✨ &#x1f4ab;《擅长领域》&#xff1a;✌️擅长Oracle、MySQL、SQLserver、阿里云AnalyticDB for MySQL(分布式数据仓库)、Linux&#xff0c;也在扩展大数据方向的知识面✌️…

C++初学(8)

8.1、string类简介 现在可以用string类型的变量而不是字符数组来存储字符串&#xff0c;string类也用的会比数组简单&#xff0c;同时提供了将字符串作为一种数据类型的表示方式。 要使用string类&#xff0c;必须在程序中包含头文件string。string类位于名称空间std中&#…