文章目录
- 引言
- Elasticsearch检索方式概述
- 两种检索方式介绍
- 方式一:通过REST request uri发送搜索参数
- 方式二:通过REST request body发送搜索参数
- (1)基本语法格式
- (2)返回部分字段
- (3)match匹配查询
- (4) match_phrase [短句匹配]
- (5)multi_math【多字段匹配】
引言
在大数据时代,高效的数据检索能力是众多应用系统的关键需求。Elasticsearch作为一款强大的开源分布式搜索和分析引擎,提供了两种基本的检索方式,帮助开发者从海量数据中精准获取所需信息。这两种检索方式各有特点,适用于不同的业务场景和查询需求。本文将深入探讨这两种检索方式,并通过丰富的示例和详细的解释,帮助读者全面掌握Elasticsearch的检索技巧。
数据准备:数据JSON
Elasticsearch检索方式概述
两种检索方式介绍
Elasticsearch支持通过REST request uri发送搜索参数和通过REST request body发送搜索参数这两种基本检索方式。理解这两种方式的差异和适用场景,是高效使用Elasticsearch的基础。
方式一:通过REST request uri发送搜索参数
- 原理:将搜索参数以查询字符串的形式直接附加在URI后面,传递给Elasticsearch服务器。这种方式简单直观,适用于简单的搜索场景。
- 示例:
GET bank/_search?q=*&sort=account_number:asc
- 参数解释:
q=*
:q
代表查询条件,*
是通配符,表示查询所有文档。sort=account_number:asc
:sort
用于指定排序规则,这里表示按照account_number
字段进行升序排列,asc
表示升序,desc
表示降序。
- 返回结果分析:
{"took" : 235,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 1000,"relation" : "eq"},"max_score" : null,"hits" : [{"_index" : "bank","_type" : "account","_id" : "0","_score" : null,"_source" : {"account_number" : 0,"balance" : 16623,"firstname" : "Bradshaw","lastname" : "Mckenzie","age" : 29,"gender" : "F","address" : "244 Columbus Place","employer" : "Euron","email" : "bradshawmckenzie@euron.com","city" : "Hobucken","state" : "CO"},"sort" : [0]},// 此处省略其他文档数据]}
}
- `took`:表示Elasticsearch执行查询所花费的时间,单位为毫秒,这里是235毫秒,反映了查询的执行效率。
- `timed_out`:表示搜索请求是否超时,`false`表示未超时,说明查询在规定时间内顺利完成。
- `_shards`:包含搜索的分片信息,`total`表示总分片数,`successful`表示成功搜索的分片数,`skipped`表示跳过的分片数,`failed`表示搜索失败的分片数。这里总分片数为1,且成功搜索了1个分片,说明搜索过程顺利。
- `hits.total.value`:表示找到的匹配文档数量,这里是1000,说明在`bank`索引中共有1000个文档符合查询条件(因为这里是查询所有文档)。
- `max_score`:表示文档的最高相关性得分,由于使用`match_all`查询所有文档,没有相关性得分的概念,所以为`null`。
- `hits.sort`:表示文档的排序位置(当不按相关性得分排序时),这里按照`account_number`升序排列,所以每个文档的`sort`值就是其`account_number`的值。
- `hits._score`:表示文档的相关性得分(使用`match_all`时不适用),这里为`null`。
方式二:通过REST request body发送搜索参数
原理:将搜索参数放在HTTP请求的消息体中发送给Elasticsearch服务器,使用的是一种领域对象语言(DSL),以JSON格式来定义复杂的查询条件、排序规则、分页设置等。这种方式灵活性高,能够满足复杂的搜索需求。
(1)基本语法格式
Elasticsearch提供了一个可以执行查询的Json风格的DSL。这个被称为Query DSL,该查询语言非常全面。
一个查询语句的典型结构
QUERY_NAME:{ARGUMENT:VALUE,ARGUMENT:VALUE,...
}
如果针对于某个字段,那么它的结构如下:
{QUERY_NAME:{FIELD_NAME:{ARGUMENT:VALUE,ARGUMENT:VALUE,...} }
}
GET bank/_search
{"query": {"match_all": {}},"from": 0,"size": 5,"sort": [{"account_number": {"order": "desc"}}]
}
//match_al查询所有,从第0个数据拿5个数据
query定义如何查询;
- match_all查询类型【代表查询所有的所有】,es中可以在query中组合非常多的查询类型完成复杂查询;
- 除了query参数之外,我们可也传递其他的参数以改变查询结果,如sort,size;
- from+size限定,完成分页功能;
- sort排序,多字段排序,会在前序字段相等时后续字段内部排序,否则以前序为准;
(2)返回部分字段
GET bank/_search
{"query": {"match_all": {}},"from": 0,"size": 5,"sort": [{"account_number": {"order": "desc"}}],"_source": ["balance","firstname"]}
查询结果:
{"took" : 18,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 1000,"relation" : "eq"},"max_score" : null,"hits" : [{"_index" : "bank","_type" : "account","_id" : "999","_score" : null,"_source" : {"firstname" : "Dorothy","balance" : 6087},"sort" : [999]},{"_index" : "bank","_type" : "account","_id" : "998","_score" : null,"_source" : {"firstname" : "Letha","balance" : 16869},"sort" : [998]},{"_index" : "bank","_type" : "account","_id" : "997","_score" : null,"_source" : {"firstname" : "Combs","balance" : 25311},"sort" : [997]},{"_index" : "bank","_type" : "account","_id" : "996","_score" : null,"_source" : {"firstname" : "Andrews","balance" : 17541},"sort" : [996]},{"_index" : "bank","_type" : "account","_id" : "995","_score" : null,"_source" : {"firstname" : "Phelps","balance" : 21153},"sort" : [995]}]}
}
(3)match匹配查询
- 基本类型(非字符串),“account_number”: 20 可加可不加“ ” 不加就是精确匹配
GET bank/_search
{"query": {"match": {"account_number": "20"}}
}
match返回account_number=20的数据。
查询结果:
{"took" : 1,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 1,"relation" : "eq"},"max_score" : 1.0,"hits" : [{"_index" : "bank","_type" : "account","_id" : "20","_score" : 1.0,"_source" : {"account_number" : 20,"balance" : 16418,"firstname" : "Elinor","lastname" : "Ratliff","age" : 36,"gender" : "M","address" : "282 Kings Place","employer" : "Scentric","email" : "elinorratliff@scentric.com","city" : "Ribera","state" : "WA"}}]}
}
- 字符串,全文检索“ ” 模糊查询
GET bank/_search
{"query": {"match": {"address": "kings"}}
}
全文检索,最终会按照评分进行排序,会对检索条件进行分词匹配。
查询结果:
{"took" : 30,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 2,"relation" : "eq"},"max_score" : 5.990829,"hits" : [{"_index" : "bank","_type" : "account","_id" : "20","_score" : 5.990829,"_source" : {"account_number" : 20,"balance" : 16418,"firstname" : "Elinor","lastname" : "Ratliff","age" : 36,"gender" : "M","address" : "282 Kings Place","employer" : "Scentric","email" : "elinorratliff@scentric.com","city" : "Ribera","state" : "WA"}},{"_index" : "bank","_type" : "account","_id" : "722","_score" : 5.990829,"_source" : {"account_number" : 722,"balance" : 27256,"firstname" : "Roberts","lastname" : "Beasley","age" : 34,"gender" : "F","address" : "305 Kings Hwy","employer" : "Quintity","email" : "robertsbeasley@quintity.com","city" : "Hayden","state" : "PA"}}]}
}
(4) match_phrase [短句匹配]
将需要匹配的值当成一整个单词(不分词)进行检索
GET bank/_search
{"query": {"match_phrase": {"address": "mill road"}}
}
查处address中包含mill_road的所有记录,并给出相关性得分
查看结果:
{"took" : 32,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 1,"relation" : "eq"},"max_score" : 8.926605,"hits" : [{"_index" : "bank","_type" : "account","_id" : "970","_score" : 8.926605,"_source" : {"account_number" : 970,"balance" : 19648,"firstname" : "Forbes","lastname" : "Wallace","age" : 28,"gender" : "M","address" : "990 Mill Road","employer" : "Pheast","email" : "forbeswallace@pheast.com","city" : "Lopezo","state" : "AK"}}]}
}
match_phrase和match的区别,观察如下实例:
match_phrase是做短语匹配
match是分词匹配,例如990 Mill匹配含有990或者Mill的结果
GET bank/_search
{"query": {"match_phrase": {"address": "990 Mill"}}
}
查询结果:
{"took" : 0,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 1,"relation" : "eq"},"max_score" : 10.806405,"hits" : [{"_index" : "bank","_type" : "account","_id" : "970","_score" : 10.806405,"_source" : {"account_number" : 970,"balance" : 19648,"firstname" : "Forbes","lastname" : "Wallace","age" : 28,"gender" : "M","address" : "990 Mill Road","employer" : "Pheast","email" : "forbeswallace@pheast.com","city" : "Lopezo","state" : "AK"}}]}
}
使用match的keyword
GET bank/_search
{"query": {"match": {"address.keyword": "990 Mill"}}
}
查询结果,一条也未匹配到
{"took" : 0,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 0,"relation" : "eq"},"max_score" : null,"hits" : [ ]}
}
修改匹配条件为“990 Mill Road”
GET bank/_search
{"query": {"match": {"address.keyword": "990 Mill Road"}}
}
查询出一条数据
{"took" : 1,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 1,"relation" : "eq"},"max_score" : 6.5032897,"hits" : [{"_index" : "bank","_type" : "account","_id" : "970","_score" : 6.5032897,"_source" : {"account_number" : 970,"balance" : 19648,"firstname" : "Forbes","lastname" : "Wallace","age" : 28,"gender" : "M","address" : "990 Mill Road","employer" : "Pheast","email" : "forbeswallace@pheast.com","city" : "Lopezo","state" : "AK"}}]}
}
文本字段的匹配,使用keyword,匹配的条件就是要显示字段的全部值,要进行精确匹配的。
match_phrase是做短语匹配,只要文本中包含匹配条件既包含这个短语,就能匹配到。
(5)multi_math【多字段匹配】
GET bank/_search
{"query": {"multi_match": {"query": "mill","fields": ["state","address"]}}
}
state或者address中包含mill,并且在查询过程中,会对于查询条件进行分词。
查询结果:
{"took" : 28,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 4,"relation" : "eq"},"max_score" : 5.4032025,"hits" : [{"_index" : "bank","_type" : "account","_id" : "970","_score" : 5.4032025,"_source" : {"account_number" : 970,"balance" : 19648,"firstname" : "Forbes","lastname" : "Wallace","age" : 28,"gender" : "M","address" : "990 Mill Road","employer" : "Pheast","email" : "forbeswallace@pheast.com","city" : "Lopezo","state" : "AK"}},{"_index" : "bank","_type" : "account","_id" : "136","_score" : 5.4032025,"_source" : {"account_number" : 136,"balance" : 45801,"firstname" : "Winnie","lastname" : "Holland","age" : 38,"gender" : "M","address" : "198 Mill Lane","employer" : "Neteria","email" : "winnieholland@neteria.com","city" : "Urie","state" : "IL"}},{"_index" : "bank","_type" : "account","_id" : "345","_score" : 5.4032025,"_source" : {"account_number" : 345,"balance" : 9812,"firstname" : "Parker","lastname" : "Hines","age" : 38,"gender" : "M","address" : "715 Mill Avenue","employer" : "Baluba","email" : "parkerhines@baluba.com","city" : "Blackgum","state" : "KY"}},{"_index" : "bank","_type" : "account","_id" : "472","_score" : 5.4032025,"_source" : {"account_number" : 472,"balance" : 25571,"firstname" : "Lee","lastname" : "Long","age" : 32,"gender" : "F","address" : "288 Mill Street","employer" : "Comverges","email" : "leelong@comverges.com","city" : "Movico","state" : "MT"}}]}
}