结论先行
【结论】
SkyWalking通过字节码增强技术实现,结合依赖注入和控制反转思想,以SkyWalking方式将追踪身份traceId
编织到链路追踪上下文TraceContext
中。
是不是很有趣,很有意思!!!
【收获】
skywalking-agent
启用的插件列表plugins/
要有所取舍与衡量,组件开启的越多对链路追踪和拓扑的越复杂,影响面越大,未知不可控的因素也会增多。
背景
发现问题
生产环境,发现同一个链路追踪traceId
出现在不同时间段的N个请求,都串在一起,影响链路追踪复原和拓扑展示。
@Configuration
public class ThreadPoolConfig {@Bean(name = "eventThreadPool")public ThreadPoolExecutor commonThreadPool() {
// int corePoolSize = Runtime.getRuntime().availableProcessors();ThreadPoolExecutor executor = new ThreadPoolExecutor(1, // 分析问题时有意设置的,让问题能100%复现1,1,TimeUnit.SECONDS,new ArrayBlockingQueue<>(50000),new NamedThreadFactory("wanda_event"),new ThreadPoolExecutor.CallerRunsPolicy());return executor;}
}
分析问题
我们需要找出线程池的线程中的追踪身份traceId
是怎么生成的?
【说明】
- 使用的
skywalking-agent.jar
版本是8.13.0
,使用默认的插件列表plugins/
配置,包括apm-guava-eventbus-plugin
; - 没有启用引导插件列表
bootstrap-plugins/
,将其复制到plugins/
,包括apm-jdk-threadpool-plugin
,SkyWalking默认不启用引导插件列表,因为其影响面较大,对应用性能和追踪数据都可能产生较大影响;
【思考】
- 追踪身份
traceId
是在请求根节点创建,且不可变,后续在请求生命周期中都是透传。所以,抓住生成traceId
的源头很关键; - 生成
traceId
的源头在哪里?需要从实现层面掌握traceId
生成逻辑; - 一个应用实例中包含很多线程,还需考虑生成
traceId
的线程名称;
综上所述,以新的追踪身份traceId生成 + 线程名称
作为核心排查思路。
追踪身份traceId生成的实现原理剖析
org.apache.skywalking:java-agent:9.1.0
以当前最新版本v9.1.0
源代码作为剖析对象,两个版本的代码几乎一样。
TraceContext.traceId()
org.apache.skywalking.apm.toolkit.trace.TraceContext#traceId
请求链路追踪上下文TraceContext
,调用TraceContext.traceId()
获取追踪身份traceId
package org.apache.skywalking.apm.toolkit.trace;import java.util.Optional;/*** Try to access the sky-walking tracer context. The context is not existed, always. only the middleware, component, or* rpc-framework are supported in the current invoke stack, in the same thread, the context will be available.* <p>*/
public class TraceContext {/*** Try to get the traceId of current trace context.* 尝试获取当前追踪上下文的追踪身份traceId** @return traceId, if it exists, or empty {@link String}.*/public static String traceId() {return "";}/*** Try to get the segmentId of current trace context.** @return segmentId, if it exists, or empty {@link String}.*/public static String segmentId() {return "";}/*** Try to get the spanId of current trace context. The spanId is a negative number when the trace context is* missing.** @return spanId, if it exists, or empty {@link String}.*/public static int spanId() {return -1;}/*** Try to get the custom value from trace context.** @return custom data value.*/public static Optional<String> getCorrelation(String key) {return Optional.empty();}/*** Put the custom key/value into trace context.** @return previous value if it exists.*/public static Optional<String> putCorrelation(String key, String value) {return Optional.empty();}}
1.链路追踪上下文的traceId是如何设置进去的?
在GitHub skywalking:java-agent
项目仓库里搜索org.apache.skywalking.apm.toolkit.trace.TraceContext
repo:apache/skywalking-java org.apache.skywalking.apm.toolkit.trace.TraceContext language:Java
在IDEA skywalking:java-agent
项目源代码里搜索org.apache.skywalking.apm.toolkit.trace.TraceContext
【结论】
SkyWalking通过字节码增强技术实现,结合依赖注入和控制反转思想,以SkyWalking方式将追踪身份traceId
编织到链路追踪上下文TraceContext
中。
数据更新是不是又多了一种实现方式。。。
TraceContextActivation
org.apache.skywalking.apm.toolkit.activation.trace.TraceContextActivation
链路追踪上下文激活TraceContextActivation
,通过TraceIDInterceptor
拦截TraceContext.traceId()
,将追踪身份traceId
设置到链路追踪上下文TraceContext
中
package org.apache.skywalking.apm.toolkit.activation.trace;import net.bytebuddy.description.method.MethodDescription;
import net.bytebuddy.matcher.ElementMatcher;
import org.apache.skywalking.apm.agent.core.plugin.interceptor.enhance.ClassStaticMethodsEnhancePluginDefine;
import org.apache.skywalking.apm.agent.core.plugin.match.ClassMatch;
import org.apache.skywalking.apm.agent.core.plugin.match.NameMatch;
import org.apache.skywalking.apm.agent.core.plugin.interceptor.StaticMethodsInterceptPoint;import static net.bytebuddy.matcher.ElementMatchers.named;/*** Active the toolkit class "TraceContext". Should not dependency or import any class in* "skywalking-toolkit-trace-context" module. Activation's classloader is diff from "TraceContext", using direct will* trigger classloader issue.* <p>*/
public class TraceContextActivation extends ClassStaticMethodsEnhancePluginDefine {// 追踪身份traceId拦截类public static final String TRACE_ID_INTERCEPT_CLASS = "org.apache.skywalking.apm.toolkit.activation.trace.TraceIDInterceptor";public static final String SEGMENT_ID_INTERCEPT_CLASS = "org.apache.skywalking.apm.toolkit.activation.trace.SegmentIDInterceptor";public static final String SPAN_ID_INTERCEPT_CLASS = "org.apache.skywalking.apm.toolkit.activation.trace.SpanIDInterceptor";// 增强类-追踪上下文public static final String ENHANCE_CLASS = "org.apache.skywalking.apm.toolkit.trace.TraceContext";// 获取追踪身份traceId的静态方法名称public static final String ENHANCE_TRACE_ID_METHOD = "traceId";public static final String ENHANCE_SEGMENT_ID_METHOD = "segmentId";public static final String ENHANCE_SPAN_ID_METHOD = "spanId";public static final String ENHANCE_GET_CORRELATION_METHOD = "getCorrelation";public static final String INTERCEPT_GET_CORRELATION_CLASS = "org.apache.skywalking.apm.toolkit.activation.trace.CorrelationContextGetInterceptor";public static final String ENHANCE_PUT_CORRELATION_METHOD = "putCorrelation";public static final String INTERCEPT_PUT_CORRELATION_CLASS = "org.apache.skywalking.apm.toolkit.activation.trace.CorrelationContextPutInterceptor";/*** @return the target class, which needs active.*/@Overrideprotected ClassMatch enhanceClass() {// 增强类return NameMatch.byName(ENHANCE_CLASS);}/*** @return the collection of {@link StaticMethodsInterceptPoint}, represent the intercepted methods and their* interceptors.*/@Overridepublic StaticMethodsInterceptPoint[] getStaticMethodsInterceptPoints() {// 静态方法拦截点return new StaticMethodsInterceptPoint[] {new StaticMethodsInterceptPoint() {@Overridepublic ElementMatcher<MethodDescription> getMethodsMatcher() {// 获取追踪身份traceId的静态方法名称return named(ENHANCE_TRACE_ID_METHOD);}@Overridepublic String getMethodsInterceptor() {// 追踪身份traceId拦截类return TRACE_ID_INTERCEPT_CLASS;}@Overridepublic boolean isOverrideArgs() {return false;}},// ...};}
}
TraceIDInterceptor
org.apache.skywalking.apm.toolkit.activation.trace.TraceIDInterceptor
追踪身份拦截器TraceIDInterceptor
,调用ContextManager.getGlobalTraceId()
获取追踪身份traceId
,将其返回给TraceContext.traceId()
package org.apache.skywalking.apm.toolkit.activation.trace;import java.lang.reflect.Method;
import org.apache.skywalking.apm.agent.core.logging.api.ILog;
import org.apache.skywalking.apm.agent.core.logging.api.LogManager;
import org.apache.skywalking.apm.agent.core.plugin.interceptor.enhance.StaticMethodsAroundInterceptor;
import org.apache.skywalking.apm.agent.core.context.ContextManager;
import org.apache.skywalking.apm.agent.core.plugin.interceptor.enhance.MethodInterceptResult;public class TraceIDInterceptor implements StaticMethodsAroundInterceptor {private static final ILog LOGGER = LogManager.getLogger(TraceIDInterceptor.class);@Overridepublic void beforeMethod(Class clazz, Method method, Object[] allArguments, Class<?>[] parameterTypes,MethodInterceptResult result) {// 获取第一个全局追踪身份traceId,将其定义为方法返回值result.defineReturnValue(ContextManager.getGlobalTraceId());}@Overridepublic Object afterMethod(Class clazz, Method method, Object[] allArguments, Class<?>[] parameterTypes,Object ret) {// 返回追踪身份traceIdreturn ret;}@Overridepublic void handleMethodException(Class clazz, Method method, Object[] allArguments, Class<?>[] parameterTypes,Throwable t) {LOGGER.error("Failed to getDefault trace Id.", t);}
}
ContextManager.getGlobalTraceId()
org.apache.skywalking.apm.agent.core.context.ContextManager#getGlobalTraceId
链路追踪上下文管理器ContextManager
ContextManager.getGlobalTraceId()
是获取第一个全局追踪身份traceId
,其调用AbstractTracerContext.getReadablePrimaryTraceId()
获取全局追踪身份traceId
package org.apache.skywalking.apm.agent.core.context;import java.util.Objects;
import org.apache.skywalking.apm.agent.core.boot.BootService;
import org.apache.skywalking.apm.agent.core.boot.ServiceManager;
import org.apache.skywalking.apm.agent.core.context.trace.AbstractSpan;
import org.apache.skywalking.apm.agent.core.context.trace.TraceSegment;
import org.apache.skywalking.apm.agent.core.logging.api.ILog;
import org.apache.skywalking.apm.agent.core.logging.api.LogManager;
import org.apache.skywalking.apm.agent.core.sampling.SamplingService;
import org.apache.skywalking.apm.util.StringUtil;import static org.apache.skywalking.apm.agent.core.conf.Config.Agent.OPERATION_NAME_THRESHOLD;/*** {@link ContextManager} controls the whole context of {@link TraceSegment}. Any {@link TraceSegment} relates to* single-thread, so this context use {@link ThreadLocal} to maintain the context, and make sure, since a {@link* TraceSegment} starts, all ChildOf spans are in the same context. <p> What is 'ChildOf'?* https://github.com/opentracing/specification/blob/master/specification.md#references-between-spans** <p> Also, {@link ContextManager} delegates to all {@link AbstractTracerContext}'s major methods.*/
public class ContextManager implements BootService {private static final String EMPTY_TRACE_CONTEXT_ID = "N/A";private static final ILog LOGGER = LogManager.getLogger(ContextManager.class);// 追踪上下文的线程本地变量private static ThreadLocal<AbstractTracerContext> CONTEXT = new ThreadLocal<AbstractTracerContext>();private static ThreadLocal<RuntimeContext> RUNTIME_CONTEXT = new ThreadLocal<RuntimeContext>();private static ContextManagerExtendService EXTEND_SERVICE;private static AbstractTracerContext getOrCreate(String operationName, boolean forceSampling) {AbstractTracerContext context = CONTEXT.get();if (context == null) {if (StringUtil.isEmpty(operationName)) {if (LOGGER.isDebugEnable()) {LOGGER.debug("No operation name, ignore this trace.");}context = new IgnoredTracerContext();} else {if (EXTEND_SERVICE == null) {EXTEND_SERVICE = ServiceManager.INSTANCE.findService(ContextManagerExtendService.class);}context = EXTEND_SERVICE.createTraceContext(operationName, forceSampling);}CONTEXT.set(context);}return context;}/*** 获取第一个全局追踪身份traceId* @return the first global trace id when tracing. Otherwise, "N/A".*/public static String getGlobalTraceId() {// 追踪上下文AbstractTracerContext context = CONTEXT.get();// 获取全局追踪身份traceIdreturn Objects.nonNull(context) ? context.getReadablePrimaryTraceId() : EMPTY_TRACE_CONTEXT_ID;}/*** @return the current segment id when tracing. Otherwise, "N/A".*/public static String getSegmentId() {AbstractTracerContext context = CONTEXT.get();return Objects.nonNull(context) ? context.getSegmentId() : EMPTY_TRACE_CONTEXT_ID;}/*** @return the current span id when tracing. Otherwise, the value is -1.*/public static int getSpanId() {AbstractTracerContext context = CONTEXT.get();return Objects.nonNull(context) ? context.getSpanId() : -1;}// ...}
AbstractTracerContext.getReadablePrimaryTraceId()
org.apache.skywalking.apm.agent.core.context.AbstractTracerContext#getReadablePrimaryTraceId
追踪上下文定义接口AbstractTracerContext
本方法获取全局追踪身份traceId
package org.apache.skywalking.apm.agent.core.context;import org.apache.skywalking.apm.agent.core.context.trace.AbstractSpan;/*** The <code>AbstractTracerContext</code> represents the tracer context manager.* 表示追踪上下文管理器*/
public interface AbstractTracerContext {/*** Get the global trace id, if needEnhance. How to build, depends on the implementation.* 获取全局追踪身份traceId** @return the string represents the id.*/String getReadablePrimaryTraceId();/*** Prepare for the cross-process propagation. How to initialize the carrier, depends on the implementation.** @param carrier to carry the context for crossing process.*/void inject(ContextCarrier carrier);/*** Build the reference between this segment and a cross-process segment. How to build, depends on the* implementation.** @param carrier carried the context from a cross-process segment.*/void extract(ContextCarrier carrier);/*** Capture a snapshot for cross-thread propagation. It's a similar concept with ActiveSpan.Continuation in* OpenTracing-java How to build, depends on the implementation.** @return the {@link ContextSnapshot} , which includes the reference context.*/ContextSnapshot capture();/*** Build the reference between this segment and a cross-thread segment. How to build, depends on the* implementation.** @param snapshot from {@link #capture()} in the parent thread.*/void continued(ContextSnapshot snapshot);/*** Get the current segment id, if needEnhance. How to build, depends on the implementation.** @return the string represents the id.*/String getSegmentId();/*** Get the active span id, if needEnhance. How to build, depends on the implementation.** @return the string represents the id.*/int getSpanId();/*** Create an entry span** @param operationName most likely a service name* @return the span represents an entry point of this segment.*/AbstractSpan createEntrySpan(String operationName);/*** Create a local span** @param operationName most likely a local method signature, or business name.* @return the span represents a local logic block.*/AbstractSpan createLocalSpan(String operationName);/*** Create an exit span** @param operationName most likely a service name of remote* @param remotePeer the network id(ip:port, hostname:port or ip1:port1,ip2,port, etc.). Remote peer could be set* later, but must be before injecting.* @return the span represent an exit point of this segment.*/AbstractSpan createExitSpan(String operationName, String remotePeer);/*** @return the active span of current tracing context(stack)*/AbstractSpan activeSpan();/*** Finish the given span, and the given span should be the active span of current tracing context(stack)** @param span to finish* @return true when context should be clear.*/boolean stopSpan(AbstractSpan span);/*** Notify this context, current span is going to be finished async in another thread.** @return The current context*/AbstractTracerContext awaitFinishAsync();/*** The given span could be stopped officially.** @param span to be stopped.*/void asyncStop(AsyncSpan span);/*** Get current correlation context*/CorrelationContext getCorrelationContext();/*** Get current primary endpoint name*/String getPrimaryEndpointName();
}
AbstractTracerContext
有两个子类IgnoredTracerContext
和TracingContext
。
IgnoredTracerContext.getReadablePrimaryTraceId()
org.apache.skywalking.apm.agent.core.context.IgnoredTracerContext#getReadablePrimaryTraceId
可忽略的追踪上下文IgnoredTracerContext
本方法返回"Ignored_Trace"
package org.apache.skywalking.apm.agent.core.context;import java.util.LinkedList;
import java.util.List;
import org.apache.skywalking.apm.agent.core.context.trace.AbstractSpan;
import org.apache.skywalking.apm.agent.core.context.trace.NoopSpan;
import org.apache.skywalking.apm.agent.core.profile.ProfileStatusContext;/*** The <code>IgnoredTracerContext</code> represent a context should be ignored. So it just maintains the stack with an* integer depth field.* <p>* All operations through this will be ignored, and keep the memory and gc cost as low as possible.*/
public class IgnoredTracerContext implements AbstractTracerContext {private static final NoopSpan NOOP_SPAN = new NoopSpan();private static final String IGNORE_TRACE = "Ignored_Trace";private final CorrelationContext correlationContext;private final ExtensionContext extensionContext;private final ProfileStatusContext profileStatusContext;private int stackDepth;public IgnoredTracerContext() {this.stackDepth = 0;this.correlationContext = new CorrelationContext();this.extensionContext = new ExtensionContext();this.profileStatusContext = ProfileStatusContext.createWithNone();}// ...@Overridepublic String getReadablePrimaryTraceId() {// 获取全局追踪身份traceIdreturn IGNORE_TRACE;}@Overridepublic String getSegmentId() {return IGNORE_TRACE;}@Overridepublic int getSpanId() {return -1;}// ...}
TracingContext.getReadablePrimaryTraceId()
org.apache.skywalking.apm.agent.core.context.TracingContext#getReadablePrimaryTraceId
链路追踪上下文TracingContext
本方法返回DistributedTraceId
的id
字段属性
package org.apache.skywalking.apm.agent.core.context;import java.util.LinkedList;
import java.util.List;
import java.util.concurrent.atomic.AtomicIntegerFieldUpdater;
import java.util.concurrent.locks.ReentrantLock;
import org.apache.skywalking.apm.agent.core.boot.ServiceManager;
import org.apache.skywalking.apm.agent.core.conf.Config;
import org.apache.skywalking.apm.agent.core.conf.dynamic.watcher.SpanLimitWatcher;
import org.apache.skywalking.apm.agent.core.context.ids.DistributedTraceId;
import org.apache.skywalking.apm.agent.core.context.ids.PropagatedTraceId;
import org.apache.skywalking.apm.agent.core.context.trace.AbstractSpan;
import org.apache.skywalking.apm.agent.core.context.trace.AbstractTracingSpan;
import org.apache.skywalking.apm.agent.core.context.trace.EntrySpan;
import org.apache.skywalking.apm.agent.core.context.trace.ExitSpan;
import org.apache.skywalking.apm.agent.core.context.trace.ExitTypeSpan;
import org.apache.skywalking.apm.agent.core.context.trace.LocalSpan;
import org.apache.skywalking.apm.agent.core.context.trace.NoopExitSpan;
import org.apache.skywalking.apm.agent.core.context.trace.NoopSpan;
import org.apache.skywalking.apm.agent.core.context.trace.TraceSegment;
import org.apache.skywalking.apm.agent.core.context.trace.TraceSegmentRef;
import org.apache.skywalking.apm.agent.core.logging.api.ILog;
import org.apache.skywalking.apm.agent.core.logging.api.LogManager;
import org.apache.skywalking.apm.agent.core.profile.ProfileStatusContext;
import org.apache.skywalking.apm.agent.core.profile.ProfileTaskExecutionService;
import org.apache.skywalking.apm.util.StringUtil;import static org.apache.skywalking.apm.agent.core.conf.Config.Agent.CLUSTER;/*** The <code>TracingContext</code> represents a core tracing logic controller. It build the final {@link* TracingContext}, by the stack mechanism, which is similar with the codes work.* <p>* In opentracing concept, it means, all spans in a segment tracing context(thread) are CHILD_OF relationship, but no* FOLLOW_OF.* <p>* In skywalking core concept, FOLLOW_OF is an abstract concept when cross-process MQ or cross-thread async/batch tasks* happen, we used {@link TraceSegmentRef} for these scenarios. Check {@link TraceSegmentRef} which is from {@link* ContextCarrier} or {@link ContextSnapshot}.*/
public class TracingContext implements AbstractTracerContext {private static final ILog LOGGER = LogManager.getLogger(TracingContext.class);private long lastWarningTimestamp = 0;/*** @see ProfileTaskExecutionService*/private static ProfileTaskExecutionService PROFILE_TASK_EXECUTION_SERVICE;/*** The final {@link TraceSegment}, which includes all finished spans.* 追踪段,同一线程内的所有调用*/private TraceSegment segment;/*** Active spans stored in a Stack, usually called 'ActiveSpanStack'. This {@link LinkedList} is the in-memory* storage-structure. <p> I use {@link LinkedList#removeLast()}, {@link LinkedList#addLast(Object)} and {@link* LinkedList#getLast()} instead of {@link #pop()}, {@link #push(AbstractSpan)}, {@link #peek()}*/private LinkedList<AbstractSpan> activeSpanStack = new LinkedList<>();/*** @since 8.10.0 replace the removed "firstSpan"(before 8.10.0) reference. see {@link PrimaryEndpoint} for more details.*/private PrimaryEndpoint primaryEndpoint = null;/*** A counter for the next span.*/private int spanIdGenerator;/*** The counter indicates*/@SuppressWarnings("unused") // updated by ASYNC_SPAN_COUNTER_UPDATERprivate volatile int asyncSpanCounter;private static final AtomicIntegerFieldUpdater<TracingContext> ASYNC_SPAN_COUNTER_UPDATER =AtomicIntegerFieldUpdater.newUpdater(TracingContext.class, "asyncSpanCounter");private volatile boolean isRunningInAsyncMode;private volatile ReentrantLock asyncFinishLock;private volatile boolean running;private final long createTime;/*** profile status*/private final ProfileStatusContext profileStatus;@Getter(AccessLevel.PACKAGE)private final CorrelationContext correlationContext;@Getter(AccessLevel.PACKAGE)private final ExtensionContext extensionContext;//CDS watcherprivate final SpanLimitWatcher spanLimitWatcher;/*** Initialize all fields with default value.*/TracingContext(String firstOPName, SpanLimitWatcher spanLimitWatcher) {this.segment = new TraceSegment();this.spanIdGenerator = 0;isRunningInAsyncMode = false;createTime = System.currentTimeMillis();running = true;// profiling statusif (PROFILE_TASK_EXECUTION_SERVICE == null) {PROFILE_TASK_EXECUTION_SERVICE = ServiceManager.INSTANCE.findService(ProfileTaskExecutionService.class);}this.profileStatus = PROFILE_TASK_EXECUTION_SERVICE.addProfiling(this, segment.getTraceSegmentId(), firstOPName);this.correlationContext = new CorrelationContext();this.extensionContext = new ExtensionContext();this.spanLimitWatcher = spanLimitWatcher;}/*** 获取全局追踪身份traceId* @return the first global trace id.*/@Overridepublic String getReadablePrimaryTraceId() {// 获取分布式的追踪身份的id字段属性return getPrimaryTraceId().getId();}private DistributedTraceId getPrimaryTraceId() {// 获取追踪段相关的分布式的追踪身份return segment.getRelatedGlobalTrace();}@Overridepublic String getSegmentId() {return segment.getTraceSegmentId();}@Overridepublic int getSpanId() {return activeSpan().getSpanId();}// ...}
DistributedTraceId
org.apache.skywalking.apm.agent.core.context.ids.DistributedTraceId#id
分布式的追踪身份DistributedTraceId
,表示一个分布式调用链路。
package org.apache.skywalking.apm.agent.core.context.ids;import lombok.EqualsAndHashCode;
import lombok.Getter;
import lombok.RequiredArgsConstructor;
import lombok.ToString;/*** The <code>DistributedTraceId</code> presents a distributed call chain.* 表示一个分布式调用链路。* <p>* This call chain has a unique (service) entrance,* <p>* such as: Service : http://www.skywalking.com/cust/query, all the remote, called behind this service, rest remote, db* executions, are using the same <code>DistributedTraceId</code> even in different JVM.* <p>* The <code>DistributedTraceId</code> contains only one string, and can NOT be reset, creating a new instance is the* only option.*/
@RequiredArgsConstructor
@ToString
@EqualsAndHashCode
public abstract class DistributedTraceId {@Getterprivate final String id;
}
DistributedTraceId
有两个子类PropagatedTraceId
和NewDistributedTraceId
。
PropagatedTraceId
org.apache.skywalking.apm.agent.core.context.ids.PropagatedTraceId
传播的追踪身份PropagatedTraceId
,表示从对等端传播的DistributedTraceId
。
package org.apache.skywalking.apm.agent.core.context.ids;/*** The <code>PropagatedTraceId</code> represents a {@link DistributedTraceId}, which is propagated from the peer.*/
public class PropagatedTraceId extends DistributedTraceId {public PropagatedTraceId(String id) {// 透传追踪身份traceIdsuper(id);}
}
NewDistributedTraceId
org.apache.skywalking.apm.agent.core.context.ids.NewDistributedTraceId
新的分布式的追踪身份NewDistributedTraceId
,是具有新生成的id的DistributedTraceId
。
默认构造函数调用GlobalIdGenerator.generate()
生成新的全局id,即追踪身份traceId
package org.apache.skywalking.apm.agent.core.context.ids;/*** The <code>NewDistributedTraceId</code> is a {@link DistributedTraceId} with a new generated id.*/
public class NewDistributedTraceId extends DistributedTraceId {public NewDistributedTraceId() {// 生成新的全局id,即追踪身份traceIdsuper(GlobalIdGenerator.generate());}
}
GlobalIdGenerator.generate()
org.apache.skywalking.apm.agent.core.context.ids.GlobalIdGenerator#generate
全局id生成器GlobalIdGenerator
本方法用于生成一个新的全局id,是真正生成追踪身份traceId
的地方。
package org.apache.skywalking.apm.agent.core.context.ids;import java.util.UUID;import org.apache.skywalking.apm.util.StringUtil;public final class GlobalIdGenerator {// 应用实例进程身份idprivate static final String PROCESS_ID = UUID.randomUUID().toString().replaceAll("-", "");// 线程的id序列号的上下文private static final ThreadLocal<IDContext> THREAD_ID_SEQUENCE = ThreadLocal.withInitial(() -> new IDContext(System.currentTimeMillis(), (short) 0));private GlobalIdGenerator() {}/*** 生成一个新的id。* Generate a new id, combined by three parts.* <p>* The first one represents application instance id.* 第一部分,表示应用实例进程身份id* <p>* The second one represents thread id.* 第二部分,表示线程身份id* <p>* The third one also has two parts, 1) a timestamp, measured in milliseconds 2) a seq, in current thread, between* 0(included) and 9999(included)* 第三部分,也有两个部分, 1) 一个时间戳,单位是毫秒ms 2) 在当前线程中的一个序列号,位于[0,9999]之间** @return unique id to represent a trace or segment* 表示追踪或追踪段的唯一id*/public static String generate() {return StringUtil.join('.',PROCESS_ID,String.valueOf(Thread.currentThread().getId()),String.valueOf(THREAD_ID_SEQUENCE.get().nextSeq()));}private static class IDContext {private long lastTimestamp;private short threadSeq;// Just for considering time-shift-back only.private long lastShiftTimestamp;private int lastShiftValue;private IDContext(long lastTimestamp, short threadSeq) {this.lastTimestamp = lastTimestamp;this.threadSeq = threadSeq;}private long nextSeq() {return timestamp() * 10000 + nextThreadSeq();}private long timestamp() {long currentTimeMillis = System.currentTimeMillis();if (currentTimeMillis < lastTimestamp) {// Just for considering time-shift-back by Ops or OS. @hanahmily 's suggestion.if (lastShiftTimestamp != currentTimeMillis) {lastShiftValue++;lastShiftTimestamp = currentTimeMillis;}return lastShiftValue;} else {lastTimestamp = currentTimeMillis;return lastTimestamp;}}private short nextThreadSeq() {if (threadSeq == 10000) {threadSeq = 0;}return threadSeq++;}}
}
案例实战
实践出真知识!!!
若不了解其底层实现原理,是很难想到这些切面的拦截点。
monitor/watch/trace 相关 - Arthas 命令列表
// 【切面的拦截点】生成新的追踪身份traceId + wanda_event开头的线程
stack org.apache.skywalking.apm.agent.core.context.ids.NewDistributedTraceId <init> '@java.lang.Thread@currentThread().getName().startsWith("wanda_event")'watch org.apache.skywalking.apm.agent.core.context.ids.NewDistributedTraceId <init> '{target, returnObj}' '@java.lang.Thread@currentThread().getName().startsWith("wanda_event")' -x 6// 【切面的拦截点】获取全局追踪身份traceId + wanda_event开头的线程
stack org.apache.skywalking.apm.agent.core.context.AbstractTracerContext getReadablePrimaryTraceId '@java.lang.Thread@currentThread().getName().startsWith("wanda_event")'watch org.apache.skywalking.apm.agent.core.context.AbstractTracerContext getPrimaryTraceId '{target, returnObj}' '@java.lang.Thread@currentThread().getName().startsWith("wanda_event")' -x 6
【案例1】wanda事件线程的traceId是谁新生成的?
这些操作是否合理?
使用Arthas的stack
命令,可以查看生成新的全局traceId
的调用栈。
通过调用栈,traceId
是由guava事件总线的订阅者Subscriber.invokeSubscriberMethod
触发生成的。
[arthas@7]$ stack org.apache.skywalking.apm.agent.core.context.ids.NewDistributedTraceId <init> '@java.lang.Thread@currentThread().getName().startsWith("wanda_event")'
Press Q or Ctrl+C to abort.
Affect(class count: 1 , method count: 1) cost in 432 ms, listenerId: 5
ts=2024-03-05 11:52:45;thread_name=wanda_event-thread-1;id=f6;is_daemon=false;priority=5;TCCL=org.springframework.boot.loader.LaunchedURLClassLoader@8dfe921@org.apache.skywalking.apm.agent.core.context.ids.NewDistributedTraceId.<init>()at org.apache.skywalking.apm.agent.core.context.trace.TraceSegment.<init>(TraceSegment.java:74)at org.apache.skywalking.apm.agent.core.context.TracingContext.<init>(TracingContext.java:122)at org.apache.skywalking.apm.agent.core.context.ContextManagerExtendService.createTraceContext(ContextManagerExtendService.java:91)at org.apache.skywalking.apm.agent.core.context.ContextManager.getOrCreate(ContextManager.java:60)at org.apache.skywalking.apm.agent.core.context.ContextManager.createLocalSpan(ContextManager.java:123)// guava-eventbus-plugin// 调用方法拦截器at org.apache.skywalking.apm.plugin.guava.eventbus.EventBusSubscriberInterceptor.beforeMethod(EventBusSubscriberInterceptor.java:38)at org.apache.skywalking.apm.agent.core.plugin.interceptor.enhance.InstMethodsInterWithOverrideArgs.intercept(InstMethodsInterWithOverrideArgs.java:75)// 原生方法at com.google.common.eventbus.Subscriber.invokeSubscriberMethod(Subscriber.java:-1)at com.google.common.eventbus.Subscriber$SynchronizedSubscriber.invokeSubscriberMethod(Subscriber.java:145)at com.google.common.eventbus.Subscriber$1.run(Subscriber.java:73)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)at java.lang.Thread.run(Thread.java:750)
其是由apm-guava-eventbus-plugin
插件的EventBusSubscriberInstrumentation
操作改变字节码。
【案例2】在wanda事件线程追踪段中,查看在哪些地方获取traceId?
这些操作是否合理?
LeoaoJsonLayout.addCustomDataToJsonMap(LeoaoJsonLayout.java:29)
方法中有调用TraceContext.traceId()
[arthas@7]$ stack org.apache.skywalking.apm.agent.core.context.AbstractTracerContext getReadablePrimaryTraceId '@java.lang.Thread@currentThread().getName().startsWith("wanda_event")'
Press Q or Ctrl+C to abort.
Affect(class count: 2 , method count: 1) cost in 423 ms, listenerId: 3
ts=2024-03-04 21:03:59;thread_name=wanda_event-thread-1;id=140;is_daemon=false;priority=5;TCCL=org.springframework.boot.loader.LaunchedURLClassLoader@67fe380b@org.apache.skywalking.apm.agent.core.context.TracingContext.getReadablePrimaryTraceId()at org.apache.skywalking.apm.agent.core.context.ContextManager.getGlobalTraceId(ContextManager.java:77)at org.apache.skywalking.apm.toolkit.activation.trace.TraceIDInterceptor.beforeMethod(TraceIDInterceptor.java:35)at org.apache.skywalking.apm.agent.core.plugin.interceptor.enhance.StaticMethodsInter.intercept(StaticMethodsInter.java:73)at org.apache.skywalking.apm.toolkit.trace.TraceContext.traceId(TraceContext.java:-1)// SkyWalking核心链路是上面👆🏻// 调用TraceContext.traceId()at com.leoao.lpaas.logback.LeoaoJsonLayout.addCustomDataToJsonMap(LeoaoJsonLayout.java:29)at ch.qos.logback.contrib.json.classic.JsonLayout.toJsonMap(null:-1)at ch.qos.logback.contrib.json.classic.JsonLayout.toJsonMap(null:-1)at ch.qos.logback.contrib.json.JsonLayoutBase.doLayout(null:-1)at ch.qos.logback.core.encoder.LayoutWrappingEncoder.encode(LayoutWrappingEncoder.java:115)at ch.qos.logback.core.OutputStreamAppender.subAppend(OutputStreamAppender.java:230)at ch.qos.logback.core.rolling.RollingFileAppender.subAppend(RollingFileAppender.java:235)at ch.qos.logback.core.OutputStreamAppender.append(OutputStreamAppender.java:102)at ch.qos.logback.core.UnsynchronizedAppenderBase.doAppend(UnsynchronizedAppenderBase.java:84)at ch.qos.logback.core.spi.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:51)at ch.qos.logback.classic.Logger.appendLoopOnAppenders(Logger.java:270)at ch.qos.logback.classic.Logger.callAppenders(Logger.java:257)at ch.qos.logback.classic.Logger.buildLoggingEventAndAppend(Logger.java:421)at ch.qos.logback.classic.Logger.filterAndLog_1(Logger.java:398)at ch.qos.logback.classic.Logger.info(Logger.java:583)// 输出打印日志// log.info("receive event persistUserPositionEvent=[{}]", event);at com.lefit.wanda.domain.event.listener.PersistUserPositionEventListener.change(PersistUserPositionEventListener.java:23)at sun.reflect.NativeMethodAccessorImpl.invoke0(NativeMethodAccessorImpl.java:-2)at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)at java.lang.reflect.Method.invoke(Method.java:498)at com.google.common.eventbus.Subscriber.invokeSubscriberMethod$original$ToNcZpNk(Subscriber.java:88)at com.google.common.eventbus.Subscriber.invokeSubscriberMethod$original$ToNcZpNk$accessor$utMvob4N(Subscriber.java:-1)at com.google.common.eventbus.Subscriber$auxiliary$8fYqzzq0.call(null:-1)at org.apache.skywalking.apm.agent.core.plugin.interceptor.enhance.InstMethodsInterWithOverrideArgs.intercept(InstMethodsInterWithOverrideArgs.java:85)at com.google.common.eventbus.Subscriber.invokeSubscriberMethod(Subscriber.java:-1)at com.google.common.eventbus.Subscriber$SynchronizedSubscriber.invokeSubscriberMethod(Subscriber.java:145)at com.google.common.eventbus.Subscriber$1.run(Subscriber.java:73)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)at java.lang.Thread.run(Thread.java:750)
【收获】
skywalking-agent
启用的插件列表plugins/
要有所取舍与衡量,组件开启的越多对链路追踪和拓扑的越复杂,影响面越大,未知不可控的因素也会增多。
参考引用
祝大家玩得开心!ˇˍˇ
简放,杭州