源码基于:Android U
0. 前言
在前一篇《Android statsd 埋点简析》一文中简单剖析了Android 埋点采集、传输的框架,本文在其基础对埋点信息进行解析,来看下Android 中埋下的内存信息有哪些。
1. 通过代码剖析google 埋点内容
1.1 PROCESS_MEMORY_STATE
frameworks/base/services/core/java/com/android/server/stats/pull/StatsPullAtomService.javaint pullProcessMemoryStateLocked(int atomTag, List<StatsEvent> pulledData) {List<ProcessMemoryState> processMemoryStates =LocalServices.getService(ActivityManagerInternal.class).getMemoryStateForProcesses();for (ProcessMemoryState processMemoryState : processMemoryStates) {final MemoryStat memoryStat = readMemoryStatFromFilesystem(processMemoryState.uid,processMemoryState.pid);if (memoryStat == null) {continue;}pulledData.add(FrameworkStatsLog.buildStatsEvent(atomTag, processMemoryState.uid,processMemoryState.processName, processMemoryState.oomScore, memoryStat.pgfault,memoryStat.pgmajfault, memoryStat.rssInBytes, memoryStat.cacheInBytes,memoryStat.swapInBytes, -1 /*unused*/, -1 /*unused*/, -1 /*unused*/));}return StatsManager.PULL_SUCCESS;}
getMemoryStateForProcesses
函数:读取每个应用进程;
readMemoryStatFromFilesystem
函数:读取/proc/<pid>/stat
节点,解析pgfault(9)、pgmajfault(11)、rssInbytes(23) 数据;
统计数据有:
- uid
- processName
- oomScore
- pgmajfault
- rss
- cache(memcg)
- swap (memcg)
1.2 PROCESS_MEMORY_HIGH_WATER_MARK
int pullProcessMemoryHighWaterMarkLocked(int atomTag, List<StatsEvent> pulledData) {List<ProcessMemoryState> managedProcessList =LocalServices.getService(ActivityManagerInternal.class).getMemoryStateForProcesses();for (ProcessMemoryState managedProcess : managedProcessList) {final MemorySnapshot snapshot = readMemorySnapshotFromProcfs(managedProcess.pid);if (snapshot == null) {continue;}pulledData.add(FrameworkStatsLog.buildStatsEvent(atomTag, managedProcess.uid,managedProcess.processName,// RSS high-water mark in bytes.snapshot.rssHighWaterMarkInKilobytes * 1024L,snapshot.rssHighWaterMarkInKilobytes));}// Complement the data with native system processesSparseArray<String> processCmdlines = getProcessCmdlines();managedProcessList.forEach(managedProcess -> processCmdlines.delete(managedProcess.pid));int size = processCmdlines.size();for (int i = 0; i < size; ++i) {...}// Invoke rss_hwm_reset binary to reset RSS HWM counters for all processes.SystemProperties.set("sys.rss_hwm_reset.on", "1");return StatsManager.PULL_SUCCESS;}
该函数主要查询所有应用进程和native 进程的内存信息。
getMemoryStateForProcesses
函数:读取每个应用进程;
readMemorySnapshotFromProcfs
函数:读取/proc/<pid>/status
节点,解析Uid
、VmHWM
、VmRss
、RssAnon
、RssShmem
、VmSwap
数据。通过判定/proc/<pid>/status
节点中是否有RssAnon
、RssShmem
、VmSwap
数据排除 kernel 进程;
最后通过设置 prop 唤醒 rss_hwm_reset 程序,将VmHWM 清除。
统计数据有:
- uid
- processName / cmdline (native是cmdline)
- VmHWM
1.3 PROCESS_MEMORY_SNAPSHOT
同上 HWM,统计每一个应用进程和native 进程的内存快照,区别在于这里另外统计了每个进程的 GPU 使用量:sys/fs/bpf/map_fpuMem_gpu_mem_total_map
统计数据有:
- uid
- processName / cmdline(native是cmdline)
- pid
- oomScore
- rss
- rss_anon
- swap
- rss_anon + swap
- gpu memory
- hasForegroundServices (native 为false)
- rss_shmem
1.4 SYSTEM_ION_HEAP_SIZE
int pullSystemIonHeapSizeLocked(int atomTag, List<StatsEvent> pulledData) {final long systemIonHeapSizeInBytes = readSystemIonHeapSizeFromDebugfs();pulledData.add(FrameworkStatsLog.buildStatsEvent(atomTag, systemIonHeapSizeInBytes));return StatsManager.PULL_SUCCESS;}
解析 sys/kernel/debug/ion/heaps/system
节点total 部分的数据。
1.5 ION_HEAP_SIZE
int pullIonHeapSizeLocked(int atomTag, List<StatsEvent> pulledData) {int ionHeapSizeInKilobytes = (int) getIonHeapsSizeKb();pulledData.add(FrameworkStatsLog.buildStatsEvent(atomTag, ionHeapSizeInKilobytes));return StatsManager.PULL_SUCCESS;}
调用 Debug.getIonHeapsSizeKb
,详细可以查看 android_os_Debug.cpp
解析/sys/kernel/ion/total_heaps_kb
1.6 PROCESS_SYSTEM_ION_HEAP_SIZE
int pullProcessSystemIonHeapSizeLocked(int atomTag, List<StatsEvent> pulledData) {List<IonAllocations> result = readProcessSystemIonHeapSizesFromDebugfs();for (IonAllocations allocations : result) {pulledData.add(FrameworkStatsLog.buildStatsEvent(atomTag, getUidForPid(allocations.pid),readCmdlineFromProcfs(allocations.pid),(int) (allocations.totalSizeInBytes / 1024), allocations.count,(int) (allocations.maxSizeInBytes / 1024)));}return StatsManager.PULL_SUCCESS;}
readProcessSystemIonHeapSizesFromDebugfs
解析sys/kernel/debug/ion/heaps/system
节点进程部分数据。
1.7 PROCESS_DMABUF_MEMORY
int pullProcessDmabufMemory(int atomTag, List<StatsEvent> pulledData) {KernelAllocationStats.ProcessDmabuf[] procBufs =KernelAllocationStats.getDmabufAllocations();if (procBufs == null) {return StatsManager.PULL_SKIP;}for (KernelAllocationStats.ProcessDmabuf procBuf : procBufs) {pulledData.add(FrameworkStatsLog.buildStatsEvent(atomTag,procBuf.uid,procBuf.processName,procBuf.oomScore,procBuf.retainedSizeKb,procBuf.retainedBuffersCount,0, /* mapped_dmabuf_kb - deprecated */0, /* mapped_dmabuf_count - deprecated */procBuf.surfaceFlingerSizeKb,procBuf.surfaceFlingerCount));}return StatsManager.PULL_SUCCESS;}
getDmabufAllocations
函数主要是调用 dmabufinfo.cpp 中ReadProcfsDmaBufs
函数获取进程dmabuf 信息。
统计数据有:
- uid
- cmdline
- oomScore
- total (KB)
- inode count
- surfaceflinger size (KB)
- surfaceflinger inode cnt
1.8 SYSTEM_MEMORY
int pullSystemMemory(int atomTag, List<StatsEvent> pulledData) {SystemMemoryUtil.Metrics metrics = SystemMemoryUtil.getMetrics();pulledData.add(FrameworkStatsLog.buildStatsEvent(atomTag,metrics.unreclaimableSlabKb, //meminfo.SUnreclaimmetrics.vmallocUsedKb, //meminfo.VmallocUsedmetrics.pageTablesKb, //meminfo.PageTablesmetrics.kernelStackKb, //meminfo.KernelStackmetrics.totalIonKb,metrics.unaccountedKb,metrics.gpuTotalUsageKb,metrics.gpuPrivateAllocationsKb,metrics.dmaBufTotalExportedKb,metrics.shmemKb, //meminfo.Shmemmetrics.totalKb, //meminfo.MemTotalmetrics.freeKb, //meminfo.MemFreemetrics.availableKb, //meminfo.MemAvailablemetrics.activeKb, //meminfo.Activemetrics.inactiveKb, //meminfo.Inactivemetrics.activeAnonKb, //meminfo.Active(anon)metrics.inactiveAnonKb, //meminfo.Inactive(anon)metrics.activeFileKb, //meminfo.Active(file)metrics.inactiveFileKb, //meminfo.Inactive(file)metrics.swapTotalKb, //meminfo.SwapTotalmetrics.swapFreeKb, //meminfo.SwapFreemetrics.cmaTotalKb, //meminfo.CmaTotalmetrics.cmaFreeKb)); //meminfo.CmaFreereturn StatsManager.PULL_SUCCESS;}
totalIonKb:统计/
sys
/
kernel
/
dmabuf
/
buffers
下所有定义在 /dev/dma_heap 的 exporter的总大小;如果不支持dmabuf,那就统计/sys/kernel/ion/total_heaps_kb
节点;
gpuTotalUsageKb:解析节点 /
sys
/
fs
/
bpf
/
map_gpuMem_gpu_mem_total_map
gpuPrivateAllocationsKb:获取GPU private
dmaBufTotalExportedKb:统计/
sys
/
kernel
/
dmabuf
/
buffers
下dmabuf 总和;
unaccountedKb:meminfo.MemTotal - accountedKb;
accountedKb 包括:
meminfo.MemFree + zram + meminfo.Buffers + meminfo.active + meminfo.inactive + meminfo.Unevictable + meminfo.SUnreclaim + meminfo.KReclaimable + meminfo.VmallocUsed + meminfo.PageTables + meminfo.KernelStack + dmaBufTotalExportedKb + gpuPrivateAllocationsKb
1.9 VMSTAT
int pullVmStat(int atomTag, List<StatsEvent> pulledData) {ProcfsMemoryUtil.VmStat vmStat = ProcfsMemoryUtil.readVmStat();if (vmStat != null) {pulledData.add(FrameworkStatsLog.buildStatsEvent(atomTag,vmStat.oomKillCount));}return StatsManager.PULL_SUCCESS;}
只统计 oom_kill 的次数。
2. 通过看板剖析 google 埋点内容
2.1 RSS hwm
结合代码第 1.2 节应该是统计每个进程的 hwm,其中包含顺序、倒序显示,显示的数值应该是平均值 ± 体现最大值和最小值。
Metric details 有可能显示更多的分位数信息。
从看板数据来看,三方的应用占用内存较大,例如 com.tencent.ig 和 com.roblox.client,后期内存健康优化可以考虑三方应用给系统带来的压力,也需要确定应用在后台时的内存占用。这里可以优先查看这些进程的anon RSS + swap 的内存占用,确定是否存在内存泄漏。
2.2 P95 anon RSS + swap
结合代码第 1.3 节应该是统计每个进程的 anon RSS + swap 高于P95 的分布。
Metric details 可能有更多分位数的分布。
从看板数据来看,三方的应用占用匿名页内存较大,可能存在内存泄露的可能。可以查看details
anon RSS + swap 中包含leaked、unused 内存,这些都会swap out 到zram,需要限制这个阈值。
2.3 ION heap Size
这里应该统计的是dmabuf,结合代码第 1.8 节。
Distribution details 中可能有每个进程的 dmabuf 的分布。
从看板数据来看,有还有1% 的进程使用DMABUF超过了910M,需要通过details 进行细细确认进程占用。