ShardingSphere-JDBC学习笔记

引言

开源产品的小故事

Sharding-JDBC是2015年开源的，早期的定位就是一个分布式数据库的中间件，而在它之前有一个MyCat的产品。MyCat也是从阿里开源出来的，作为分库分表的代名词火了很长一段时间，而MyCat早年的目标就是想进入apache（从命名也可以看出，希望像Tomcat一样），但是很可惜最后由于社区运营不是很成熟没有达成。但是现在它的这个愿望早就已经被ShardingSphere达成了，ShardingSphere现在就是apache的顶级开源项目。

ShardingSphere版本演进

从15年开始作为一个小的中间件，发展至今已成为了一个庞然大物。

当不使用分片键时，ShardingSphere是怎么执行的呢？

在之前4.x版本下，这种情况会拆分成多个SQL（每个真实表分片对应一个sql），查询多次。新版本下，会将每一个真实库里的语句通过UNION合并成一个大SQL，一起进行查询。

如果需要对一个真实库进行多个SQL查询，那么就需要通过多线程进行并发查询，这种情况下，如果要进行后续的结果归并，比如sum、max这样的结果归并，那就只能将所有的结果都合并到一个大内存，再进行归并。这种方式称为内存归并。消耗内存，多线程

如果合并成了一个大的SQL，对一个真实库只要进行一次SQL查询，这样就可以通过一个线程进行查询。在进行结果归并时，就可以拿一条数据归并一次。这种方式称为流式归并。极大的节约内存

另外，在使用in进行查询时，有可能计算出属于多个不同的分片。在4.x版本当中，如果出现了这种情况，由于ShardingSphere无法确定in算出来的分片有多少个，所以遇到这种情况，他就不再去计算in中所有的分片结果了，直接改为全路由分片。这样计算比较简单，但是查询的效率肯定不好。而在新版本下，能够准确的计算出分片

补充：ShardingSphere实现分库分表的核心概念

虚拟库： ShardingSphere的核心就是提供一个具备分库分表功能的虚拟库，他是一个ShardingSphereDatasource实例。应用程序只需要像操作单数据源一样访问这个ShardingSphereDatasource即可。
真实库：实际保存数据的数据库。这些数据库都被包含在ShardingSphereDatasource实例当中，由ShardingSphere决定未来需要使用哪个真实库。
逻辑表：应用程序直接操作的逻辑表。
真实表：实际保存数据的表。这些真实表与逻辑表表名不需要一致，但是需要有相同的表结构，可以分布在不同的真实库中。应用可以维护一个逻辑表与真实表的对应关系，所有的真实表默认也会映射成为ShardingSphere的虚拟表。
分布式主键生成算法：给逻辑表生成唯一主键。由于逻辑表的数据是分布在多个真实表当中的，所以单表的索引就无法保证逻辑表的ID唯一性。ShardingSphere集成了几种常见的基于单机生成的分布式主键生成器。比如SNOWFLAKE，COSID_SNOWFLAKE雪花算法可以生成单调递增的long类型的数字主键，还有UUID，NANOID可以生成字符串类型的主键。当然，ShardingSphere也支持应用自行扩展主键生成算法。比如基于Redis，Zookeeper等第三方服务，自行生成主键。
分片策略：表示逻辑表要如何分配到真实库和真实表当中，分为分库策略和分表策略两个部分。分片策略由分片键和分片算法组成。分片键是进行数据水平拆分的关键字段。如果没有分片键，ShardingSphere将只能进行全路由，SQL执行的性能会非常差。分片算法则表示根据分片键如何寻找对应的真实库和真实表。简单的分片策略可以使用Groovy表达式直接配置，当然，ShardingSphere也支持自行扩展更为复杂的分片算法。

ShardingSphere-JDBC其他策略

广播表

广播表认为在所有的片里面是一致的，不会进行转发，即使配置了分片规则也不会生效

使用场景：在所有分片都需要的表，比如字典表，在所有库上数据保持一致

参考配置：

# 打印SQL，spring.shardingsphere.props.sql-show，不同版本的参数可能不同，中间是-
spring.shardingsphere.props.sql.show=true
spring.main.allow-bean-definition-overriding=true
spring.shardingsphere.datasource.names=m0,m1spring.shardingsphere.datasource.m0.type=com.alibaba.druid.pool.DruidDataSource
spring.shardingsphere.datasource.m0.driver-class-name=com.mysql.cj.jdbc.Driver
spring.shardingsphere.datasource.m0.url=jdbc:mysql://localhost:3306/coursedb?serverTimezone=UTC
spring.shardingsphere.datasource.m0.username=root
spring.shardingsphere.datasource.m0.password=123666spring.shardingsphere.datasource.m1.type=com.alibaba.druid.pool.DruidDataSource
spring.shardingsphere.datasource.m1.driver-class-name=com.mysql.cj.jdbc.Driver
spring.shardingsphere.datasource.m1.url=jdbc:mysql://localhost:3306/coursedb2?serverTimezone=UTC
spring.shardingsphere.datasource.m1.username=root
spring.shardingsphere.datasource.m1.password=123666spring.shardingsphere.sharding.tables.dict.key-generator.column=dictId
spring.shardingsphere.sharding.tables.dict.key-generator.type=SNOWFLAKE
spring.shardingsphere.sharding.tables.dict.key-generator.props.worker.id=1
spring.shardingsphere.sharding.tables.dict.actual-data-nodes=m$->{0..1}.dict_$->{1..2}spring.shardingsphere.sharding.broadcast-tables=dict

测试代码：

@Test
public void dict()
{Dict dict = new Dict();dict.setDictkey("1");dict.setDictval("true");dictMapper.insert(dict);Dict dict2 = new Dict();dict2.setDictkey("2");dict2.setDictval("false");dictMapper.insert(dict2);
}@TableName("dict")
public class Dict {private Long dictid;private String dictkey;private String dictval;@Overridepublic String toString() {return "Dict{" +"dictId=" + dictid +", dictkey='" + dictkey + '\'' +", dictval='" + dictval + '\'' +'}';}public Long getDictid() {return dictid;}public void setDictid(Long dictid) {this.dictid = dictid;}public String getDictkey() {return dictkey;}public void setDictkey(String dictkey) {this.dictkey = dictkey;}public String getDictval() {return dictval;}public void setDictval(String dictval) {this.dictval = dictval;}
}

测试结果：两个库的Dict表都增加了两条数据

注意：这里插入的是dict表，而不是dict_1和dict_2

绑定表

参考配置：

spring.shardingsphere.props.sql.show=truespring.shardingsphere.datasource.names=m0,m1spring.shardingsphere.datasource.m0.type=com.alibaba.druid.pool.DruidDataSource
spring.shardingsphere.datasource.m0.driver-class-name=com.mysql.cj.jdbc.Driver
spring.shardingsphere.datasource.m0.url=jdbc:mysql://localhost:3306/coursedb?serverTimezone=UTC
spring.shardingsphere.datasource.m0.username=root
spring.shardingsphere.datasource.m0.password=123666spring.shardingsphere.datasource.m1.type=com.alibaba.druid.pool.DruidDataSource
spring.shardingsphere.datasource.m1.driver-class-name=com.mysql.cj.jdbc.Driver
spring.shardingsphere.datasource.m1.url=jdbc:mysql://localhost:3306/coursedb2?serverTimezone=UTC
spring.shardingsphere.datasource.m1.username=root
spring.shardingsphere.datasource.m1.password=123666#以下是新增部分
spring.shardingsphere.sharding.tables.user.key-generator.column=userid
spring.shardingsphere.sharding.tables.user.key-generator.type=SNOWFLAKE
spring.shardingsphere.sharding.tables.user.key-generator.props.worker.id=1spring.shardingsphere.sharding.tables.user.actual-data-nodes=m$->{0..1}.user_$->{1..2}
spring.shardingsphere.sharding.tables.user_course_info.actual-data-nodes=m$->{0..1}.user_course_info_$->{1..2}spring.shardingsphere.sharding.tables.user.table-strategy.inline.sharding-column=userid
spring.shardingsphere.sharding.tables.user.table-strategy.inline.algorithm-expression=user_$->{Math.abs(userid.hashCode()%4).intdiv(2) +1}spring.shardingsphere.sharding.tables.user_course_info.table-strategy.inline.sharding-column=userid
spring.shardingsphere.sharding.tables.user_course_info.table-strategy.inline.algorithm-expression=user_course_info_$->{Math.abs(userid.hashCode()%4).intdiv(2) +1}spring.shardingsphere.sharding.binding-tables[0]=user,user_course_info

绑定表有什么作用呢？

举例生产场景：订单表和订单详情表，这两张表分片键的值是相同的，也就是说在相同分片规则的情况下，关联的数据一定会被分配到同一个分片中，所以直接到指定分片查询就可以了。绑定表在生产上非常常用，当分片数量很多时，比如64分片，试想不使用绑定表的情况下要如何查询（64*64的笛卡尔积，业务上肯定不能接受）

主从模式

早期版本叫主从模式，后续版本叫读写分离，主从做的事情其实就是读写分离。

主从参考配置：

spring.shardingsphere.props.sql.show=truespring.shardingsphere.datasource.names=m0,m1spring.shardingsphere.datasource.m0.type=com.alibaba.druid.pool.DruidDataSource
spring.shardingsphere.datasource.m0.driver-class-name=com.mysql.cj.jdbc.Driver
spring.shardingsphere.datasource.m0.url=jdbc:mysql://localhost:3306/coursedb?serverTimezone=UTC
spring.shardingsphere.datasource.m0.username=root
spring.shardingsphere.datasource.m0.password=123666spring.shardingsphere.datasource.m1.type=com.alibaba.druid.pool.DruidDataSource
spring.shardingsphere.datasource.m1.driver-class-name=com.mysql.cj.jdbc.Driver
spring.shardingsphere.datasource.m1.url=jdbc:mysql://localhost:3306/coursedb2?serverTimezone=UTC
spring.shardingsphere.datasource.m1.username=root
spring.shardingsphere.datasource.m1.password=123666#以下是添加的部分，预期效果对dict表操作，更新到m0库，查询从m1
spring.shardingsphere.sharding.master-slave-rules.gao.master-data-source-name=m0
spring.shardingsphere.sharding.master-slave-rules.gao.slave-data-source-names[0]=m1spring.shardingsphere.sharding.tables.dict.actual-data-nodes=gao.dict
spring.shardingsphere.sharding.tables.dict.key-generator.column=dictid
spring.shardingsphere.sharding.tables.dict.key-generator.type=snowflake
spring.shardingsphere.sharding.tables.dict.key-generator.props.worker.id=1

数据加密

对指定的字段类进行加密，加密后的密文字段存在数据表的指定列中。在应用代码使用时仍然操作明文列，但是观察日志可以看出转发到真实表的sql会处理成加密列去做操作，这样就达到了我们想要的效果。

好处是，不需要应用代码中额外的加密操作，内置支持多种加密方式AES、MD5、SM3、RC4等

参跑配置：

spring.shardingsphere.props.sql.show=true
spring.main.allow-bean-definition-overriding=true
spring.shardingsphere.datasource.names=m0,m1spring.shardingsphere.datasource.m0.type=com.alibaba.druid.pool.DruidDataSource
spring.shardingsphere.datasource.m0.driver-class-name=com.mysql.cj.jdbc.Driver
spring.shardingsphere.datasource.m0.url=jdbc:mysql://localhost:3306/coursedb?serverTimezone=UTC
spring.shardingsphere.datasource.m0.username=root
spring.shardingsphere.datasource.m0.password=123666spring.shardingsphere.datasource.m1.type=com.alibaba.druid.pool.DruidDataSource
spring.shardingsphere.datasource.m1.driver-class-name=com.mysql.cj.jdbc.Driver
spring.shardingsphere.datasource.m1.url=jdbc:mysql://localhost:3306/coursedb2?serverTimezone=UTC
spring.shardingsphere.datasource.m1.username=root
spring.shardingsphere.datasource.m1.password=123666spring.shardingsphere.sharding.tables.user.actual-data-nodes=m0.user_$->{1..2}
spring.shardingsphere.sharding.tables.user.key-generator.column=userid
spring.shardingsphere.sharding.tables.user.key-generator.type=SNOWFLAKEspring.shardingsphere.sharding.encrypt-rule.encryptors.encryptor_aes.type=aes
spring.shardingsphere.sharding.encrypt-rule.encryptors.encryptor_aes.props.aes.key.value=123456
spring.shardingsphere.sharding.encrypt-rule.tables.user.columns.password.plainColumn=password
spring.shardingsphere.sharding.encrypt-rule.tables.user.columns.password.cipherColumn=password_cipher
#spring.shardingsphere.sharding.encrypt-rule.tables.user.columns.password.assistedQueryColumn=user_assisted
spring.shardingsphere.sharding.encrypt-rule.tables.user.columns.password.encryptor=encryptor_aes

影子库

主要是用在压测的场景，比如说你的业务开发完了，需要测试性能，这个时候最好的情况是压测环境和生产的环境是一样的，影子库就是和生产环境的库是一样的，但是数据不同。在操作生产环境的库时，ShardingSphere内部会转发到影子库去完成测试。但是要注意，既然是压测，对生产环境的性能肯定是有影响的

这里参考配置就不列举了，因为笔者也没测试过，感兴趣的参考官方文档测试效果

总结

学习时建议多关注各种策略的思想（结合虚拟库、真实表等核心概念理解记忆），而且学习ShardingSphere强烈推荐结合官方文档自己多多尝试。使用其实不难的，关键是要找对方法，网上文章千千万，版本也是千奇百怪，初学者看的配置越多可能越不理解，ShardingSphere每个大版本的配置项都有很多改进，所以笔者建议理解每种策略存在的意义，解决问题的思想才是更有价值的。