Lucene是apache软件基金会4 jakarta项目组的一个子项目,是一个开放源代码的全文检索引擎工具包,但它不是一个完整的全文检索引擎,而是一个全文检索引擎的架构,提供了完整的查询引擎和索引引擎,部分文本分析引擎(英文与德文两种西方语言)。Lucene的目的是为软件开发人员提供一个简单易用的工具包,以方便的在目标系统中实现全文检索的功能,或者是以此为基础建立起完整的全文检索引擎。Lucene是一套用于全文检索和搜寻的开源程式库,由Apache软件基金会支持和提供。Lucene提供了一个简单却强大的应用程式接口,能够做全文索引和搜寻。在Java开发环境里Lucene是一个成熟的免费开源工具。就其本身而言,Lucene是当前以及最近几年最受欢迎的免费Java信息检索程序库。人们经常提到信息检索程序库,虽然与搜索引擎有关,但不应该将信息检索程序库与搜索引擎相混淆。(摘自:http://baike.baidu.com/link?url=YfcwwNXbNFaYkMNZqNhk9LIyHdrSuIMsMLlO_NNm3ioxHADGUid2JnF1R9znysICj6w83zJmlpZPBJnv1mHYFK)
下面是全文检索引擎的初步应用,但是很遗憾,原生的lucene不支持中文分词,所以需要插件支持,在后面会继续讲到。
代码摘自:http://iluoxuan.iteye.com/blog/1708695
POM.xml文件:
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>cn.firewarm</groupId> <artifactId>testLucene</artifactId> <packaging>war</packaging> <version>0.0.1-SNAPSHOT</version> <name>testLucene Maven Webapp</name> <url>http://maven.apache.org</url> <repositories> <repository> <id>mine</id> <name>public Releases</name> <layout>default</layout> <url>http://nexus.liuyingguang.cn:8081/nexus/content/groups/public/</url> </repository> </repositories> <dependencies> <dependency> <groupId>org.apache.lucene</groupId> <artifactId>lucene-core</artifactId> <version>4.10.1</version> </dependency> <dependency> <groupId>org.apache.lucene</groupId> <artifactId>lucene-analyzers-common</artifactId> <version>4.10.1</version> </dependency> <dependency> <groupId>org.apache.lucene</groupId> <artifactId>lucene-queryparser</artifactId> <version>4.10.1</version> </dependency> <dependency> <groupId>org.apache.commons</groupId> <artifactId>commons-vfs2</artifactId> <version>2.1</version> </dependency> <dependency> <groupId>commons-io</groupId> <artifactId>commons-io</artifactId> <version>2.4</version> </dependency> </dependencies> <build> <finalName>testLucene</finalName> </build> </project>
创建索引的代码如下:
package com.search.lucene; import java.io.File; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field.Store; import org.apache.lucene.document.StringField; import org.apache.lucene.document.TextField; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.index.IndexWriterConfig; import org.apache.lucene.store.Directory; import org.apache.lucene.store.FSDirectory; import org.apache.lucene.util.Version; import org.junit.Before; import org.junit.Test; public class IndexFile { protected String[] ids={"1", "2"}; protected String[] content={"Amsterdam has lost of add cancals", "i love add this girl"}; protected String[] city={"Amsterdam", "Venice"}; private Directory dir; /** * 初始添加文档 * @throws Exception */ @Test public void init() throws Exception { String pathFile="D://lucene/index"; dir=FSDirectory.open(new File(pathFile)); IndexWriter writer=getWriter(); for(int i=0; i < ids.length; i++) { Document doc=new Document(); doc.add(new StringField("id", ids[i], Store.YES)); doc.add(new TextField("content", content[i], Store.YES)); doc.add(new StringField("city", city[i], Store.YES)); writer.addDocument(doc); } System.out.println("init ok?"); writer.close(); } /** * 获得IndexWriter对象 * @return * @throws Exception */ public IndexWriter getWriter() throws Exception { Analyzer analyzer=new StandardAnalyzer(Version.LUCENE_40); IndexWriterConfig iwc=new IndexWriterConfig(Version.LUCENE_40, analyzer); return new IndexWriter(dir, iwc); } }
查询代码如下:
package com.search.lucene; import java.io.File; import org.apache.lucene.document.Document; import org.apache.lucene.index.DirectoryReader; import org.apache.lucene.index.IndexReader; import org.apache.lucene.index.Term; import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.ScoreDoc; import org.apache.lucene.search.TermQuery; import org.apache.lucene.search.TopDocs; import org.apache.lucene.store.Directory; import org.apache.lucene.store.FSDirectory; import org.junit.Test; public class IndexSearch { /** * 查询 * @throws Exception */ @Test public void search() throws Exception { String filePath="D://lucene/index"; Directory dir=FSDirectory.open(new File(filePath)); IndexReader reader=DirectoryReader.open(dir); IndexSearcher searcher=new IndexSearcher(reader); Term term=new Term("content", "add"); TermQuery query=new TermQuery(term); TopDocs topdocs=searcher.search(query, 5); ScoreDoc[] scoreDocs=topdocs.scoreDocs; System.out.println("查询结果总数---" + topdocs.totalHits+"最大的评分--"+topdocs.getMaxScore()); for(int i=0; i < scoreDocs.length; i++) { int doc = scoreDocs[i].doc; Document document = searcher.doc(doc); System.out.println("content===="+document.get("content")); System.out.println("id--" + scoreDocs[i].doc + "---scors--" + scoreDocs[i].score+"---index--"+scoreDocs[i].shardIndex); } reader.close(); } }
by 刘迎光@萤火虫工作室
OpenBI交流群:495266201
MicroService 微服务交流群:217722918
mail: liuyg#liuyingguang.cn
博主首页(==防止爬虫==):http://blog.liuyingguang.cn