一、概述
Tantivy是一个全文搜索引擎库,灵感来自Apache Lucene,用Rust编写。
如果你正在寻找Elasticsearch或Apache Solr的替代品,请查看我们基于Tantivy构建的分布式搜索引擎Quiuckwit。
Tantivy更接近Apache Lucene,而不是Elasticsearch或Apache Solr,因为它不是现成的搜索引擎服务器,而是一个可用于构建此类搜索引擎的库。
Tantivy的性能非常棒,请看下图:
二、特征
- 全文搜索
- 可配置的标记器(可用于 17种拉丁语言的词干提取),并支持第三方对中文(tantivy-jieba和cang-jie)、日语(lindera、Vaporetto和tantivy-tokenizer-tiny-segmenter)和韩语(lindera+ lindera-ko-dic-builder)的支持
- 快速(查看🐎 ✨基准✨ 🐎)
- 启动时间极短 (<10ms),非常适合命令行工具
- BM25 评分(与 Lucene 相同)
- 自然查询语言(例如(michael AND jackson) OR “king of pop”)
- 短语查询搜索(例如"michael jackson")
- 增量索引
- 多线程索引(在我的桌面上索引英文维基百科只需不到 3 分钟)
- Mmap 目录
- 当平台/CPU 包含 SSE2 指令集时,SIMD 整数压缩
- 单值和多值 u64、i64 和 f64 快速字段(相当于 Lucene 中的 doc 值)
- &[u8]快速场
- 文本、i64、u64、f64、日期、ip、bool 和分层方面字段
- 压缩文档存储(LZ4、Zstd、None)
- 范围查询
- 分面搜索
- 可配置索引(可选词频和位置索引)
- JSON 字段
- 聚合收集器:直方图、范围桶、平均值和统计指标
- 带删除的 LogMergePolicy
- 搜索器预热 API
- 带有马的俗气标志
注意:分布式搜索超出了 Tantivy 的范围,但如果您正在寻找此功能,请查看Quickwit。
三、Tanvity的小示例
use tantivy::collector::TopDocs;
use tantivy::query::QueryParser;
use tantivy::schema::*;
use tantivy::{doc, Index, IndexWriter, ReloadPolicy};
use tempfile::TempDir;fn main() -> tantivy::Result<()> {let index_path = TempDir::new()?;let mut schema_builder = Schema::builder();schema_builder.add_text_field("title", TEXT | STORED);schema_builder.add_text_field("body", TEXT);let schema = schema_builder.build();let index = Index::create_in_dir(&index_path, schema.clone())?;let mut index_writer: IndexWriter = index.writer(50_000_000)?;let title = schema.get_field("title").unwrap();let body = schema.get_field("body").unwrap();let mut old_man_doc = TantivyDocument::default();old_man_doc.add_text(title, "The Old Man and the Sea");old_man_doc.add_text(body,"He was an old man who fished alone in a skiff in the Gulf Stream and he had gone \eighty-four days now without taking a fish.",);index_writer.add_document(old_man_doc)?;index_writer.add_document(doc!(title => "Of Mice and Men",body => "A few miles south of Soledad, the Salinas River drops in close to the hillside \bank and runs deep and green. The water is warm too, for it has slipped twinkling \over the yellow sands in the sunlight before reaching the narrow pool. On one \side of the river the golden foothill slopes curve up to the strong and rocky \Gabilan Mountains, but on the valley side the water is lined with trees—willows \fresh and green with every spring, carrying in their lower leaf junctures the \debris of the winter’s flooding; and sycamores with mottled, white, recumbent \limbs and branches that arch over the pool"))?;index_writer.add_document(doc!(title => "Frankenstein",title => "The Modern Prometheus",body => "You will rejoice to hear that no disaster has accompanied the commencement of an \enterprise which you have regarded with such evil forebodings. I arrived here \yesterday, and my first task is to assure my dear sister of my welfare and \increasing confidence in the success of my undertaking."))?;index_writer.commit()?;let reader = index.reader_builder().reload_policy(ReloadPolicy::OnCommitWithDelay).try_into()?;let searcher = reader.searcher();let query_parser = QueryParser::for_index(&index, vec![title, body]);let query = query_parser.parse_query("sea whale")?;let top_docs = searcher.search(&query, &TopDocs::with_limit(10))?;for (_score, doc_address) in top_docs {let retrieved_doc: TantivyDocument = searcher.doc(doc_address)?;println!("{}", retrieved_doc.to_json(&schema));}let query = query_parser.parse_query("title:sea^20 body:whale^70")?;let (_score, doc_address) = searcher.search(&query, &TopDocs::with_limit(1))?.into_iter().next().unwrap();let explanation = query.explain(&searcher, doc_address)?;println!("{}", explanation.to_pretty_json());Ok(())
}
Github: https://github.com/quickwit-oss/tantivy