前言
分布式链路追踪的客户端实现中,我们会通过各种手段和规则得到一个又一个的Span,得到这些Span后,需要在分布式链路追踪的服务端这边汇总这些Span并拼接出一条请求链路,那么这里就存在一个问题,客户端得到的Span如何给到服务端,通常是会在每个Span调用finish() 方法时将Span发送给服务端,这里的发送有多种形式,例如把Span主动的push到Kafka的Topic,还例如把Span当作一条日志打印出来再由Filebeat采集,我们的本系列文章中,就选择将Span以链路日志的形式打印出来,至于如何采集以及服务端如何拼接,这不在本系列文章的讨论范围内。
正文
这里直接给出定义好的链路日志格式,如下所示。
{"traceId": "testTraceId", "spanId": "testSpanId", "parentSpanId": "testparentSpanId", "timestamp": "1704038400000", "duration": "10", "httpCode": "200", "host": "127.0.0.1", "requestStacks": [ {"subSpanId": "testSubSpanId", "subHttpCode": "200", "subTimestamp": "1704038401000", "subDuration": "5", "subHost": "192.168.10.5", }]
}
特别说明一下requestStacks这个字段,该字段主要就是用于记录当前节点调用下游子节点的Span的信息,包括子节点的SpanId,调用子节点得到的HTTP状态码和调用耗时等。
既然确定了链路日志的格式,现在我们用一个示例demo,来结合链路日志做一个演示说明。示例demo的调用链路如下所示。
假定请求在网络中跑不耗时,client和server1的应用自身逻辑处理不耗时,那么对于client,打印的链路日志如下。
{"traceId": "0001","spanId": "01","parentSpanId": "0","timestamp": "1704038400000","duration": "100","httpCode": "200","host": "192.168.10.1","requestStacks": [{"subSpanId": "02","subHttpCode": "200","subTimestamp": "1704038400000","subDuration": "40","subHost": "192.168.10.2"},{"subSpanId": "04","subHttpCode": "200","subTimestamp": "1704038400040","subDuration": "60","subHost": "192.168.10.3"}]
}
对于server1,打印链路日志如下。
{"traceId": "0001","spanId": "02","parentSpanId": "01","timestamp": "1704038400000","duration": "40","httpCode": "200","host": "192.168.10.2","requestStacks": [{"subSpanId": "03","subHttpCode": "200","subTimestamp": "1704038400000","subDuration": "40","subHost": "192.168.10.4"}]
}
对于server2,打印链路日志如下。
{"traceId": "0001","spanId": "04","parentSpanId": "01","timestamp": "1704038400040","duration": "60","httpCode": "200","host": "192.168.10.3","requestStacks": []
}
对于server3,打印链路日志如下。
{"traceId": "0001","spanId": "03","parentSpanId": "02","timestamp": "1704038400000","duration": "40","httpCode": "200","host": "192.168.10.4","requestStacks": []
}
总结
其实打印链路日志,其核心目的就是记录每个Span的traceId,spanId和parentSpanId,通过这三个字段信息,就可以拼接出一条链路。此外,还可以根据实际的需求添加一些额外字段,例如和时间相关的duration和timestamp,这两个字段能够帮助排查链路中的耗时情况。
原文:https://juejin.cn/post/7331959792787079177