Elasticsearch
需要安装elasticsearch和Kibana,应为Kibana中有一套控制台可以方便的进行操作。
安装elasticsearch
使用docker命令安装:
docker run -d \ --name es \-e "ES_JAVA_OPTS=-Xms512m -Xmx512m" \ //设置他的运行内存空间,不要低于512否则出问题-e "discovery.type=single-node" \ //设置安装模式,当前为单点模式,而非集群模式。-v es-data:/usr/share/elasticsearch/data \-v es-plugins:/usr/share/elasticsearch/plugins \--privileged \--network hm-net \-p 9200:9200 \-p 9300:9300 \ //集群通信的端口。elasticsearch:7.12.1
安装Kibana
docker run -d \
--name kibana \
-e ELASTICSEARCH_HOSTS=http://es:9200 \
--network=hm-net \
-p 5601:5601 \
kibana:7.12.1
通过kibana向es发送Http请求。
倒排索引
小结:
IK分词器
安装
在线安装:
docker exec -it es ./bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.12.1/elasticsearch-analysis-ik-7.12.1.zip
重启:
docker restart es
线下安装:
首先,查看之前安装的Elasticsearch容器的plugins数据卷目录:
docker volume inspect es-plugins
结果如下:
[{"CreatedAt": "2024-11-06T10:06:34+08:00","Driver": "local","Labels": null,"Mountpoint": "/var/lib/docker/volumes/es-plugins/_data","Name": "es-plugins","Options": null,"Scope": "local"}
]
可以看到elasticsearch的插件挂载到了/var/lib/docker/volumes/es-plugins/_data
这个目录。我们需要把IK分词器上传至这个目录。
找到课前资料提供的ik分词器插件,课前资料提供了7.12.1
版本的ik分词器压缩文件,你需要对其解压:
然后上传至虚拟机的/var/lib/docker/volumes/es-plugins/_data
这个目录
重启es容器:
docker restart es
IK分词器包含两种模式:
-
ik_smart
:智能语义切分 -
ik_max_word
:最细粒度切分
POST /_analyze
{"analyzer": "ik_smart","text": "黑马程序员学习java太棒了"
}
结果:
{"tokens" : [{"token" : "黑马","start_offset" : 0,"end_offset" : 2,"type" : "CN_WORD","position" : 0},{"token" : "程序员","start_offset" : 2,"end_offset" : 5,"type" : "CN_WORD","position" : 1},{"token" : "学习","start_offset" : 5,"end_offset" : 7,"type" : "CN_WORD","position" : 2},{"token" : "java","start_offset" : 7,"end_offset" : 11,"type" : "ENGLISH","position" : 3},{"token" : "太棒了","start_offset" : 11,"end_offset" : 14,"type" : "CN_WORD","position" : 4}]
}
由于互联网词汇不断增多,需要拓展词汇库。
所以要想正确分词,IK分词器的词库也需要不断的更新,IK分词器提供了扩展词汇的功能。
1)打开IK分词器config目录:
注意,如果采用在线安装的通过,默认是没有config目录的,需要把课前资料提供的ik下的config上传至对应目录。
在IKAnalyzer.cfg.xml配置文件内容添加:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties><comment>IK Analyzer 扩展配置</comment><!--用户可以在这里配置自己的扩展字典 *** 添加扩展词典--><entry key="ext_dict">ext.dic</entry>
</properties>
在IK分词器的config目录新建一个 ext.dic
,可以参考config目录下复制一个配置文件进行修改:
传智播客
泰裤辣
重启elasticsearch
docker restart es
再次测试,可以发现传智播客
和泰裤辣
都正确分词了:
{"tokens" : [{"token" : "传智播客","start_offset" : 0,"end_offset" : 4,"type" : "CN_WORD","position" : 0},{"token" : "开设","start_offset" : 4,"end_offset" : 6,"type" : "CN_WORD","position" : 1},{"token" : "大学","start_offset" : 6,"end_offset" : 8,"type" : "CN_WORD","position" : 2},{"token" : "真的","start_offset" : 9,"end_offset" : 11,"type" : "CN_WORD","position" : 3},{"token" : "泰裤辣","start_offset" : 11,"end_offset" : 14,"type" : "CN_WORD","position" : 4}]
}
总结
分词器的作用是什么?
-
创建倒排索引时,对文档分词
-
用户搜索时,对输入的内容分词
IK分词器有几种模式?
-
ik_smart
:智能切分,粗粒度 -
ik_max_word
:最细切分,细粒度
IK分词器如何拓展词条?如何停用词条?
-
利用config目录的
IkAnalyzer.cfg.xml
文件添加拓展词典和停用词典 -
在词典中添加拓展词条或者停用词条
基础概念
索引库操作
Mapping映射属性
索引库操作
Restful规范
索引库操作
创建索引库:
基本语法:
-
请求方式:
PUT
-
请求路径:
/索引库名
,可以自定义 -
请求参数:
mapping
映射
格式:
PUT /索引库名称
{"mappings": {"properties": {"字段名":{"type": "text","analyzer": "ik_smart"},"字段名2":{"type": "keyword","index": "false"},"字段名3":{"properties": {"子字段": {"type": "keyword"}}},// ...略}}
}
实例代码
PUT /hmall
{"mappings": {"properties": {"info":{"type": "text","analyzer": "ik_smart"},"email":{"type": "keyword","index": false},"name":{"properties": {"firstName":{"type":"keyword"},"lastName":{"type":"keyword"}}}}}
}
返回结果:
{"acknowledged" : true,"shards_acknowledged" : true,"index" : "hmall"
}
查询索引库
基本语法:
-
请求方式:GET
-
请求路径:/索引库名
-
请求参数:无
格式
GET /索引库名
实例代码
GET /hmall
查询结果:
{"hmall" : {"aliases" : { },"mappings" : {"properties" : {"email" : {"type" : "keyword","index" : false},"info" : {"type" : "text","analyzer" : "ik_smart"},"name" : {"properties" : {"firstName" : {"type" : "keyword"},"lastName" : {"type" : "keyword"}}}}},"settings" : {"index" : {"routing" : {"allocation" : {"include" : {"_tier_preference" : "data_content"}}},"number_of_shards" : "1","provided_name" : "hmall","creation_date" : "1731489076840","number_of_replicas" : "1","uuid" : "KJqNTGY9RhiqFHkliRdGcw","version" : {"created" : "7120199"}}}}
}
修改索引库
倒排索引结构虽然不复杂,但是一旦数据结构改变(比如改变了分词器),就需要重新创建倒排索引,这简直是灾难。因此索引库一旦创建,无法修改mapping。
虽然无法修改mapping中已有的字段,但是却允许添加新的字段到mapping中,因为不会对倒排索引产生影响。因此修改索引库能做的就是向索引库中添加新字段,或者更新索引库的基础属性。
语法说明:
PUT /索引库名/_mapping
{"properties": {"新字段名":{"type": "integer"}}
}
添加字段:
PUT /hmall/_mapping
{"properties": {"age":{"type": "integer"}}
}
返回结果:
{"acknowledged" : true
}
删除索引库
语法:
-
请求方式:DELETE
-
请求路径:/索引库名
-
请求参数:无
语法格式
DELETE /索引库名
删除库
DELETE /hmall
执行结果
{"acknowledged" : true
}
索引库操作总结:
索引库操作有哪些?
-
创建索引库:PUT /索引库名
-
查询索引库:GET /索引库名
-
删除索引库:DELETE /索引库名
-
修改索引库,添加字段:PUT /索引库名/_mapping
可以看到,对索引库的操作基本遵循的Restful的风格,因此API接口非常统一,方便记忆。
文档操作
新增文档
语法规范
POST /索引库名/_doc/文档id
{"字段1": "值1","字段2": "值2","字段3": {"子属性1": "值3","子属性2": "值4"},
}
实例代码
POST /hmall/_doc/1
{"info": "黑马程序员Java讲师","email": "zy@itcast.cn","name": {"firstName": "云","lastName": "赵"}
}
运行结果
{"_index" : "hmall","_type" : "_doc","_id" : "1","_version" : 8,"result" : "updated","_shards" : {"total" : 2,"successful" : 1,"failed" : 0},"_seq_no" : 8,"_primary_term" : 1
}
查询文档
语法规范
GET /{索引库名称}/_doc/{id}
实例代码
GET /hmall/_doc/1
运行结果
{"_index" : "hmall","_type" : "_doc","_id" : "1","_version" : 8,"_seq_no" : 8,"_primary_term" : 1,"found" : true,"_source" : {"info" : "黑马程序员Java讲师","email" : "zy@itcast.cn","name" : {"firstName" : "云","lastName" : "赵"}}
}
删除文档
语法规范
DELETE /{索引库名}/_doc/id值
实例代码
DELETE /hmall/_doc/1
运行结果
{"_index" : "hmall","_type" : "_doc","_id" : "1","_version" : 9,"result" : "deleted","_shards" : {"total" : 2,"successful" : 1,"failed" : 0},"_seq_no" : 9,"_primary_term" : 1
}
修改文档
全量修改
语法规范
PUT /{索引库名}/_doc/文档id
{"字段1": "值1","字段2": "值2",// ... 略
}
实例代码
#全量修改
PUT /hmall/_doc/1
{"info": "黑马程序员Python讲师","email": "ls@itcast.cn","name": {"firstName": "四","lastName": "李"}
}
运行结果
第一次运行,由于前面已经删除了该文档所以 执行类型为 created.
{"_index" : "hmall","_type" : "_doc","_id" : "1","_version" : 10,"result" : "created","_shards" : {"total" : 2,"successful" : 1,"failed" : 0},"_seq_no" : 10,"_primary_term" : 1
}
第二次运行,为修改返回的类型为 updated。
{"_index" : "hmall","_type" : "_doc","_id" : "1","_version" : 11,"result" : "updated","_shards" : {"total" : 2,"successful" : 1,"failed" : 0},"_seq_no" : 11,"_primary_term" : 1
}
部分修改
语法规范
POST /{索引库名}/_update/文档id
{"doc": {"字段名": "新的值",}
}
实例代码
#局部修改
POST /hmall/_update/1
{"doc": {"email":"ZhaoYuen@itcast.com"}
}
运行结果
{"_index" : "hmall","_type" : "_doc","_id" : "1","_version" : 12,"result" : "updated","_shards" : {"total" : 2,"successful" : 1,"failed" : 0},"_seq_no" : 12,"_primary_term" : 1
}
当前文档的数据:
{"_index" : "hmall","_type" : "_doc","_id" : "1","_version" : 12,"_seq_no" : 12,"_primary_term" : 1,"found" : true,"_source" : {"info" : "黑马程序员Python讲师","email" : "ZhaoYuen@itcast.com","name" : {"firstName" : "四","lastName" : "李"}}
}
文档基础操作小结
文档批量处理
-
index
代表新增操作-
_index
:指定索引库名 -
_id
指定要操作的文档id -
{ "field1" : "value1" }
:则是要新增的文档内容
-
-
delete
代表删除操作-
_index
:指定索引库名 -
_id
指定要操作的文档id
-
-
update
代表更新操作-
_index
:指定索引库名 -
_id
指定要操作的文档id -
{ "doc" : {"field2" : "value2"} }
:要更新的文档字段
-
#批量新增
POST /_bulk
{"index": {"_index":"hmall", "_id": "3"}}
{"info": "黑马程序员C++讲师", "email": "ww@itcast.cn", "name":{"firstName": "五", "lastName":"王"}}
{"index": {"_index":"hmall", "_id": "4"}}
{"info": "黑马程序员前端讲师", "email": "zhangsan@itcast.cn", "name":{"firstName": "三", "lastName":"张"}}POST /_bulk
{"delete":{"_index":"hmall", "_id": "3"}}
{"delete":{"_index":"hmall", "_id": "4"}}
JavaRestClient
作用:向restful接口发送请求。
客户端初始化
1)在item-service
模块中引入es
的RestHighLevelClient
依赖:
<dependency><groupId>org.elasticsearch.client</groupId><artifactId>elasticsearch-rest-high-level-client</artifactId>
</dependency>
2)因为SpringBoot默认的ES版本是7.17.10
,所以我们需要覆盖默认的ES版本:
<properties><maven.compiler.source>11</maven.compiler.source><maven.compiler.target>11</maven.compiler.target><elasticsearch.version>7.12.1</elasticsearch.version></properties>
3)初始化RestHighLevelClient:
初始化的代码如下:
RestHighLevelClient client = new RestHighLevelClient(RestClient.builder(HttpHost.create("http://192.168.150.101:9200")
));
4)创建一个测试用例来实现
package com.hmall.item.es;import org.apache.http.HttpHost;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.junit.jupiter.api.AfterEach;
import org.junit.jupiter.api.BeforeEach;import java.io.IOException;public class Test {private RestHighLevelClient client;//创建开始初始化函数,对客户端进行初始化@BeforeEachvoid setUp() {this.client = new RestHighLevelClient(RestClient.builder(HttpHost.create("http://192.168.21.101:9200")));}//测试代码@org.junit.jupiter.api.Testpublic void test(){System.out.println(client);}//测试用例结束后释放资源@AfterEachvoid tearDown() throws IOException {client.close();}
}
商品Mapping映射
分析那些字段需要添加到文档中。
根据分析编写的需求:
PUT /hmall
{"mappings": {"properties": {"id":{"type": "keyword"},"name":{"type": "text","analyzer": "ik_smart"},"price":{"type": "integer"},"image":{"type": "keyword","index": false},"category":{"type": "keyword"},"brand":{"type": "keyword"},"sold":{"type": "integer"},"commentCount":{"type": "integer","index": false},"isAD":{"type": "boolean"},"updateTime":{"type": "date"}}}
}
索引库操作
创建索引库:
//创建索引库@org.junit.jupiter.api.Testpublic void testCreateIndex() throws IOException {//创建Requst对象CreateIndexRequest request = new CreateIndexRequest("items");//准备参数 ,将编写的JSON字符串赋值给MAPPINF_JSON,XContentType.JSON用于指定是JSON格式request.source(MAPPINF_JSON, XContentType.JSON);//发送请求,RequestOptions.DEFAULT为是否有自定义配置,选择默认client.indices().create(request, RequestOptions.DEFAULT);}private final String MAPPINF_JSON = "{\n" +" \"mappings\": {\n" +" \"properties\": {\n" +" \"id\":{\n" +" \"type\": \"keyword\"\n" +" },\n" +" \"name\":{\n" +" \"type\": \"text\",\n" +" \"analyzer\": \"ik_smart\"\n" +" },\n" +" \"price\":{\n" +" \"type\": \"integer\"\n" +" },\n" +" \"image\":{\n" +" \"type\": \"keyword\",\n" +" \"index\": false\n" +" },\n" +" \"category\":{\n" +" \"type\": \"keyword\"\n" +" },\n" +" \"brand\":{\n" +" \"type\": \"keyword\"\n" +" },\n" +" \"sold\":{\n" +" \"type\": \"integer\"\n" +" },\n" +" \"commentCount\":{\n" +" \"type\": \"integer\",\n" +" \"index\": false\n" +" },\n" +" \"isAD\":{\n" +" \"type\": \"boolean\"\n" +" },\n" +" \"updateTime\":{\n" +" \"type\": \"date\"\n" +" }\n" +" }\n" +" }\n" +"}";
删除索引库
//删除索引库@org.junit.jupiter.api.Testpublic void testDeleteIndex() throws IOException {//创建Requst对象DeleteIndexRequest request = new DeleteIndexRequest("items");//发送请求client.indices().delete(request,RequestOptions.DEFAULT);}
查看索引库
//查找索引库@org.junit.jupiter.api.Testpublic void testGetIndex() throws IOException {//创建Requst对象GetIndexRequest request = new GetIndexRequest("items");//发送2请求
// client.indices().get(request, RequestOptions.DEFAULT);//由于GEt请求返回的数据较多,如果只需要判断是否存在,则使用exists方法boolean exists = client.indices().exists(request, RequestOptions.DEFAULT);System.out.println("exists = " + exists);}
文档操作
新增文档
由于索引库结构与数据库结构还存在一些差异,因此我们要定义一个索引库结构对应的实体。
package com.hmall.item.domain.po;import io.swagger.annotations.ApiModel;
import io.swagger.annotations.ApiModelProperty;
import lombok.Data;import java.time.LocalDateTime;@Data
@ApiModel(description = "索引库实体")
public class ItemDoc{@ApiModelProperty("商品id")private String id;@ApiModelProperty("商品名称")private String name;@ApiModelProperty("价格(分)")private Integer price;@ApiModelProperty("商品图片")private String image;@ApiModelProperty("类目名称")private String category;@ApiModelProperty("品牌名称")private String brand;@ApiModelProperty("销量")private Integer sold;@ApiModelProperty("评论数")private Integer commentCount;@ApiModelProperty("是否是推广广告,true/false")private Boolean isAD;@ApiModelProperty("更新时间")private LocalDateTime updateTime;
}
API语法
POST /{索引库名}/_doc/1
{"name": "Jack","age": 21
}
可以看到与索引库操作的API非常类似,同样是三步走:
-
1)创建Request对象,这里是
IndexRequest
,因为添加文档就是创建倒排索引的过程 -
2)准备请求参数,本例中就是Json文档
-
3)发送请求
变化的地方在于,这里直接使用client.xxx()
的API,不再需要client.indices()
了。
//新增文档@org.junit.jupiter.api.Testpublic void test() throws IOException {//读取数据库获取商品信息Item item = service.getById(317578);//将商品信息对象转换为DtoItemDoc itemDoc = BeanUtil.copyProperties(item, ItemDoc.class);//将对象转换为JSONString jsonStr = JSONUtil.toJsonStr(itemDoc);//创建Requst对象IndexRequest request = new IndexRequest("items").id(itemDoc.getId());request.source(jsonStr, XContentType.JSON);//发送信息client.index(request, RequestOptions.DEFAULT);}
package com.hmall.item.es;import cn.hutool.core.bean.BeanUtil;
import cn.hutool.json.JSONUtil;
import com.hmall.item.domain.dto.ItemDoc;
import com.hmall.item.domain.po.Item;
import com.hmall.item.service.impl.ItemServiceImpl;
import org.apache.http.HttpHost;
import org.elasticsearch.action.admin.indices.delete.DeleteIndexRequest;
import org.elasticsearch.action.delete.DeleteRequest;
import org.elasticsearch.action.get.GetRequest;
import org.elasticsearch.action.get.GetResponse;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.action.update.UpdateRequest;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.client.indices.CreateIndexRequest;
import org.elasticsearch.client.indices.GetIndexRequest;
import org.elasticsearch.common.xcontent.XContentType;
import org.junit.jupiter.api.AfterEach;
import org.junit.jupiter.api.BeforeEach;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;import java.io.IOException;
@SpringBootTest(properties = {"spring.profiles.active=local"})
public class docTest {private RestHighLevelClient client;@Autowiredprivate ItemServiceImpl service;//创建开始初始化函数,对客户端进行初始化@BeforeEachvoid setUp() {this.client = new RestHighLevelClient(RestClient.builder(HttpHost.create("http://192.168.21.101:9200")));}//新增文档,全局修改文档@org.junit.jupiter.api.Testpublic void test() throws IOException {//读取数据库获取商品信息Item item = service.getById(317578);//将商品信息对象转换为DtoItemDoc itemDoc = BeanUtil.copyProperties(item, ItemDoc.class);//将对象转换为JSONString jsonStr = JSONUtil.toJsonStr(itemDoc);//创建Requst对象IndexRequest request = new IndexRequest("items").id(itemDoc.getId());request.source(jsonStr, XContentType.JSON);//发送信息client.index(request, RequestOptions.DEFAULT);}//查看文档@org.junit.jupiter.api.Testpublic void testGet() throws IOException {//创建Requst对象GetRequest request = new GetRequest("items","317578");//发送信息GetResponse response = client.get(request, RequestOptions.DEFAULT);String json = response.getSourceAsString();ItemDoc bean = JSONUtil.toBean(json, ItemDoc.class);System.out.println("bean = " + bean);}//删除文档@org.junit.jupiter.api.Testpublic void testdelect() throws IOException {//创建Requst对象DeleteRequest request = new DeleteRequest("items","317578");//发送信息client.delete(request, RequestOptions.DEFAULT);}//修改文档@org.junit.jupiter.api.Testpublic void testupdata2() throws IOException {//创建Requst对象UpdateRequest request = new UpdateRequest("items","317578");//数据都是独立的不是成对出现的request.doc("price", 10000);//发送信息client.update(request, RequestOptions.DEFAULT);}//测试用例结束后释放资源@AfterEachvoid tearDown() throws IOException {client.close();}}
批量操作文档
//新增文档,全局修改文档@org.junit.jupiter.api.Testpublic void testBulk() throws IOException {//设置分页信息int size = 1000,pageNo = 1;//循环写入数据while (true){//读取数据库获取商品信息分页查询//设置当前查询条件,即状态为起售//创建分页对象Page<Item> page = service.lambdaQuery().eq(Item::getStatus, 1).page(new Page<Item>(pageNo, size));List<Item> records = page.getRecords();//判空if (records.isEmpty()||records == null){return;}//创建Requst对象
// BulkRequest request = new BulkRequest();
// for (Item item : records) {
// request.add(new IndexRequest("items")
// .id(item.getId().toString())
// .source(JSONUtil.toJsonStr(BeanUtil.copyProperties(item, ItemDoc.class))),XContentType.JSON
// );
// }BulkRequest request = new BulkRequest("items");for (Item item : records) {// 2.1.转换为文档类型ItemDTOItemDoc itemDoc = BeanUtil.copyProperties(item, ItemDoc.class);// 2.2.创建新增文档的Request对象request.add(new IndexRequest().id(itemDoc.getId()).source(JSONUtil.toJsonStr(itemDoc), XContentType.JSON));}//发送信息client.bulk(request, RequestOptions.DEFAULT);pageNo++;}}