Elasticsearch 文档的检索

2015-08-17 20:20:23   最后更新: 2015-11-02 18:09:18   访问数量:791




上一篇日志中,我们对 elasticsearch 的检索做了最简单的介绍:

Elasticsearch 文档的创建与查询

本节我们进行更多的深入介绍

 

通过请求:

GET /megacorp/employee/1

 

可以获取到 index - type - ID 对应的唯一一条数据

{ "_index" : "megacorp", "_type" : "employee", "_id" : "1", "_version" : 1, "found" : true, "_source" : { "first_name" : "John", "last_name" : "Smith", "age" : 25, "about" : "I love to go rock climbing", "interests": [ "sports", "music" ] } }

 

 

数据的具体信息存储在返回的 _source 域中

 

通过检索:

GET /megacorp/employee/_search

 

可以获取全部员工信息

{ "took": 6, "timed_out": false, "_shards": { ... }, "hits": { "total": 3, "max_score": 1, "hits": [ { "_index": "megacorp", "_type": "employee", "_id": "3", "_score": 1, "_source": { "first_name": "Douglas", "last_name": "Fir", "age": 35, "about": "I like to build cabinets", "interests": [ "forestry" ] } }, { "_index": "megacorp", "_type": "employee", "_id": "1", "_score": 1, "_source": { "first_name": "John", "last_name": "Smith", "age": 25, "about": "I love to go rock climbing", "interests": [ "sports", "music" ] } }, { "_index": "megacorp", "_type": "employee", "_id": "2", "_score": 1, "_source": { "first_name": "Jane", "last_name": "Smith", "age": 32, "about": "I like to collect rock albums", "interests": [ "music" ] } } ] } }

 

返回数据包含了全部的数据,以及检索道德条目数

 

可以通过字符串检索去检索姓氏中包含 Smith 的员工

GET /megacorp/employee/_search?q=last_name:Smith

 

参数 q 是 query_string 的简写,表示匹配字符串

 

{ ... "hits": { "total": 2, "max_score": 0.30685282, "hits": [ { ... "_source": { "first_name": "John", "last_name": "Smith", "age": 25, "about": "I love to go rock climbing", "interests": [ "sports", "music" ] } }, { ... "_source": { "first_name": "Jane", "last_name": "Smith", "age": 32, "about": "I like to collect rock albums", "interests": [ "music" ] } } ] } }

 

 

  • 注意:query_string 是完全匹配查询,但是并不区分大小写

 

query_string

Elasticsearch 提供了丰富且灵活的查询语言,称为 DSL 查询,允许复杂而强大的查询

GET /megacorp/employee/_search { "query" : { "match" : { "last_name" : "Smith" } } }

 

上面就是之前通过 query string 查询 Smith 的查询

 

过滤器 -- filter

下面的查询创建了两个过滤器:

GET /megacorp/employee/_search { "query" : { "filtered" : { "filter" : { "range" : { "age" : { "gt" : 30 } } }, "query" : { "match" : { "last_name" : "smith" } } } } }

 

第一个过滤器执行了了区间搜索,获取所有年龄大于 30 的员工信息,第二个过滤器则检索到了 last_name 为 smith 的员工

 

全文检索

GET /megacorp/employee/_search { "query" : { "match" : { "about" : "rock climbing" } } }

 

这个检索取到了所有 about 域与 rock climbing 相关的文档

{ ... "hits": { "total": 2, "max_score": 0.16273327, "hits": [ { ... "_score": 0.16273327, <1> "_source": { "first_name": "John", "last_name": "Smith", "age": 25, "about": "I love to go rock climbing", "interests": [ "sports", "music" ] } }, { ... "_score": 0.016878016, <2> "_source": { "first_name": "Jane", "last_name": "Smith", "age": 32, "about": "I like to collect rock albums", "interests": [ "music" ] } } ] } }

 

返回结果按照相关性排序,score 域给出了相关性评分

 

高亮查询结果

Elasticsearch 支持对查询结果高亮显示:

GET /megacorp/employee/_search { "query" : { "match_phrase" : { "about" : "rock climbing" } }, "highlight": { "fields" : { "about" : {} } } }

 

返回了:

{ ... "hits": { "total": 1, "max_score": 0.23013961, "hits": [ { ... "_score": 0.23013961, "_source": { "first_name": "John", "last_name": "Smith", "age": 25, "about": "I love to go rock climbing", "interests": [ "sports", "music" ] }, "highlight": { "about": [ "I love to go <em>rock</em> <em>climbing</em>" ] } } ] } }

 

返回结果被包含在了 highlight 域中,同时使用 <em>包含了匹配的单词</em>

 

类似于 mysql 的 group by,elasticsearch 支持对查询结果进行聚合

GET /megacorp/employee/_search { "aggs": { "all_interests": { "terms": { "field": "interests" } } } }

 

查询了员工共同的兴趣爱好

 

返回结果:

{ ... "hits": { ... }, "aggregations": { "all_interests": { "buckets": [ { "key": "music", "doc_count": 2 }, { "key": "forestry", "doc_count": 1 }, { "key": "sports", "doc_count": 1 } ] } } }

 

 

下面的查询统计了每种兴趣的平均年龄:

GET /megacorp/employee/_search { "aggs" : { "all_interests" : { "terms" : { "field" : "interests" }, "aggs" : { "avg_age" : { "avg" : { "field" : "age" } } } } } }

 

 

返回结果:

... "all_interests": { "buckets": [ { "key": "music", "doc_count": 2, "avg_age": { "value": 28.5 } }, { "key": "forestry", "doc_count": 1, "avg_age": { "value": 35 } }, { "key": "sports", "doc_count": 1, "avg_age": { "value": 25 } } ] }

 

 

通过 _source 参数,我们可以指定返回数据的字段,多个字段使用逗号分隔

GET /website/blog/123?_source=title,text

 

 

返回结果里只有 title 与 text 两个字段:

{ "_index" : "website", "_type" : "blog", "_id" : "123", "_version" : 1, "exists" : true, "_source" : { "title": "My first blog entry" , "text": "Just trying this out..." } }

 

 

请求:

GET /website/blog/123/_source

 

则不会返回任何字段:

{ "title": "My first blog entry", "text": "Just trying this out...", "date": "2014/01/01" }

 

 






技术帖      技术分享      搜索引擎      检索      search      elasticsearch      文档      查询      query_string     


京ICP备15018585号