前言
- 故事背景
jenkins部署时总是会有几秒钟接口调用报错,观察日志是因为流量被下发到已下线的服务,重启脚本在停止应用之前先调用nacos注销实例api后再重启依然会短暂出现此问题。项目架构是springcloud alibaba,通过openfeign进行微服务之间调用,猜测是LoadBalancer缓存问题。 - 依赖版本
<dependencyManagement><dependencies><dependency><groupId>com.alibaba.cloud</groupId><artifactId>spring-cloud-alibaba-dependencies</artifactId><version>2021.0.1.0</version><type>pom</type><scope>import</scope></dependency><dependency><groupId>org.springframework.boot</groupId><artifactId>spring-boot-dependencies</artifactId><version>2.6.3</version><type>pom</type><scope>import</scope></dependency><dependency><groupId>org.springframework.cloud</groupId><artifactId>spring-cloud-dependencies</artifactId><version>2021.0.1</version><type>pom</type><scope>import</scope></dependency></dependencies>
</dependencyManagement><dependencies><dependency><groupId>com.alibaba.cloud</groupId><artifactId>spring-cloud-starter-alibaba-nacos-discovery</artifactId><exclusions><exclusion><groupId>org.springframework.cloud</groupId><artifactId>spring-cloud-starter-netflix-ribbon</artifactId></exclusion></exclusions></dependency><dependency><groupId>org.springframework.cloud</groupId><artifactId>spring-cloud-starter-openfeign</artifactId><version>3.1.1</version></dependency>
</dependencies>
- loadbalancer配置
spring:cloud:loadbalancer:#需要引入Spring Retry依赖retry:enabled: true
springcloud loadbalancer缓存原理
-
启用启动首先装配Caffeine一级缓存,缓存应用实例,降低注册中心负载,提升性能
从上图可以看出,可以通过设置spring.cloud.loadbalancer.cache来关闭一级缓存,其值默认是开启的。 -
feign初次从loadbalance获取应用实例会触发装配ServiceInstanceListSupplier逻辑
从一级缓存中获取应用实例:
解决方案
通过上面的源码分析,根本原因是应用从nacos下线后,loadbalancer的一级缓存未移除下线实例,有以下解决办法:
- 重启脚本下线nacos实例后,等待一级缓存失效后(默认35s)再重启应用
- 禁用一级缓存(不建议)
- 监听nacos下线事件,手动移除实例
方案实现
- 采用方案
监听nacos下线事件,手动移除实例 - 代码实现
- 思路
nacos订阅需要删除缓存的服务名(serviceName),下线应用主动调用nacos实例注销api后由nacos server触发自定义的订阅回调逻辑 - nacos订阅源码分析
从上图可以看出默认只会订阅当前服务名,这也是为什么以下代码在其他应用主动下线后没有触发回调的原因
- 编写指定服务nacos订阅与删除实例缓存逻辑
- 思路
package com.chimelong.common.feign.listener;import com.alibaba.cloud.nacos.NacosDiscoveryProperties;
import com.alibaba.cloud.nacos.NacosServiceManager;
import com.alibaba.nacos.api.naming.NamingService;
import com.alibaba.nacos.api.naming.listener.NamingEvent;
import lombok.SneakyThrows;
import org.springframework.beans.factory.InitializingBean;
import org.springframework.boot.autoconfigure.AutoConfigureAfter;
import org.springframework.boot.autoconfigure.condition.ConditionalOnProperty;
import org.springframework.cache.Cache;
import org.springframework.cloud.loadbalancer.cache.LoadBalancerCacheManager;
import org.springframework.cloud.loadbalancer.cache.LoadBalancerCacheProperties;
import org.springframework.cloud.loadbalancer.core.CachingServiceInstanceListSupplier;
import org.springframework.context.annotation.Configuration;import javax.annotation.Resource;
import java.util.Arrays;/*** @description nacos应用监听* @date 2024/7/29*/
@Configuration
@ConditionalOnProperty(name = "spring.cloud.loadbalancer.cache.enabled", havingValue = "true")
@AutoConfigureAfter(LoadBalancerCacheProperties.class)
public class NacosInstanceListener implements InitializingBean {@Resourceprivate NacosServiceManager nacosServiceManager;@Resourceprivate NacosDiscoveryProperties properties;@Resourceprivate LoadBalancerCacheManager caffeineLoadBalancerCacheManager;@Override@SneakyThrowspublic void afterPropertiesSet() {NamingService namingService = nacosServiceManager.getNamingService(properties.getNacosProperties());namingService.subscribe("xxx-product-xxx", properties.getGroup(), Arrays.asList(properties.getClusterName()), event -> {if (event instanceof NamingEvent) {NamingEvent namingEvent = (NamingEvent) event;String svrName = namingEvent.getServiceName();Cache cache = caffeineLoadBalancerCacheManager.getCache(CachingServiceInstanceListSupplier.SERVICE_INSTANCE_CACHE_NAME);if (cache != null) {cache.evict(svrName);}System.out.println(event);}});}
}
- 下线服务主动调用nacos注销实例接口,观察效果
从上图可以看到,删除服务实例缓存回调成功触发,考虑到调用nacos api下线到上述代码被成功执行的耗时,应用重启脚本最好在调用nacos api成功后等待1秒左右再停止服务。