Lucene是apache软件基金会4 jakarta项目组的一个子项目,是一个开放源代码的全文检索引擎工具包,但它不是一个完整的全文检索引擎,而是一个全文检索引擎的架构,提供了完整的查询引擎和索引引擎,部分文本分析引擎(英文与德文两种西方语言)。Lucene的目的是为软件开发人员提供一个简单易用的工具包,以方便的在目标系统中实现全文检索的功能,或者是以此为基础建立起完整的全文检索引擎。Lucene是一套用于全文检索和搜寻的开源程式库,由Apache软件基金会支持和提供。Lucene提供了一个简单却强大的应用程式接口,能够做全文索引和搜寻。在Java开发环境里Lucene是一个成熟的免费开源工具。就其本身而言,Lucene是当前以及最近几年最受欢迎的免费Java信息检索程序库。人们经常提到信息检索程序库,虽然与搜索引擎有关,但不应该将信息检索程序库与搜索引擎相混淆。(摘自:http://baike.baidu.com/link?url=YfcwwNXbNFaYkMNZqNhk9LIyHdrSuIMsMLlO_NNm3ioxHADGUid2JnF1R9znysICj6w83zJmlpZPBJnv1mHYFK)
下面是全文检索引擎的初步应用,但是很遗憾,原生的lucene不支持中文分词,所以需要插件支持,在后面会继续讲到。
代码摘自:http://iluoxuan.iteye.com/blog/1708695
POM.xml文件:
- <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
- xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
- <modelVersion>4.0.0</modelVersion>
- <groupId>cn.firewarm</groupId>
- <artifactId>testLucene</artifactId>
- <packaging>war</packaging>
- <version>0.0.1-SNAPSHOT</version>
- <name>testLucene Maven Webapp</name>
- <url>http://maven.apache.org</url>
- <repositories>
- <repository>
- <id>mine</id>
- <name>public Releases</name>
- <layout>default</layout>
- <url>http://nexus.liuyingguang.cn:8081/nexus/content/groups/public/</url>
- </repository>
- </repositories>
- <dependencies>
- <dependency>
- <groupId>org.apache.lucene</groupId>
- <artifactId>lucene-core</artifactId>
- <version>4.10.1</version>
- </dependency>
- <dependency>
- <groupId>org.apache.lucene</groupId>
- <artifactId>lucene-analyzers-common</artifactId>
- <version>4.10.1</version>
- </dependency>
- <dependency>
- <groupId>org.apache.lucene</groupId>
- <artifactId>lucene-queryparser</artifactId>
- <version>4.10.1</version>
- </dependency>
- <dependency>
- <groupId>org.apache.commons</groupId>
- <artifactId>commons-vfs2</artifactId>
- <version>2.1</version>
- </dependency>
- <dependency>
- <groupId>commons-io</groupId>
- <artifactId>commons-io</artifactId>
- <version>2.4</version>
- </dependency>
- </dependencies>
- <build>
- <finalName>testLucene</finalName>
- </build>
- </project>
创建索引的代码如下:
- package com.search.lucene;
- import java.io.File;
- import org.apache.lucene.analysis.Analyzer;
- import org.apache.lucene.analysis.standard.StandardAnalyzer;
- import org.apache.lucene.document.Document;
- import org.apache.lucene.document.Field.Store;
- import org.apache.lucene.document.StringField;
- import org.apache.lucene.document.TextField;
- import org.apache.lucene.index.IndexWriter;
- import org.apache.lucene.index.IndexWriterConfig;
- import org.apache.lucene.store.Directory;
- import org.apache.lucene.store.FSDirectory;
- import org.apache.lucene.util.Version;
- import org.junit.Before;
- import org.junit.Test;
- public class IndexFile {
- protected String[] ids={"1", "2"};
- protected String[] content={"Amsterdam has lost of add cancals", "i love add this girl"};
- protected String[] city={"Amsterdam", "Venice"};
- private Directory dir;
- /**
- * 初始添加文档
- * @throws Exception
- */
- @Test
- public void init() throws Exception {
- String pathFile="D://lucene/index";
- dir=FSDirectory.open(new File(pathFile));
- IndexWriter writer=getWriter();
- for(int i=0; i < ids.length; i++) {
- Document doc=new Document();
- doc.add(new StringField("id", ids[i], Store.YES));
- doc.add(new TextField("content", content[i], Store.YES));
- doc.add(new StringField("city", city[i], Store.YES));
- writer.addDocument(doc);
- }
- System.out.println("init ok?");
- writer.close();
- }
- /**
- * 获得IndexWriter对象
- * @return
- * @throws Exception
- */
- public IndexWriter getWriter() throws Exception {
- Analyzer analyzer=new StandardAnalyzer(Version.LUCENE_40);
- IndexWriterConfig iwc=new IndexWriterConfig(Version.LUCENE_40, analyzer);
- return new IndexWriter(dir, iwc);
- }
- }
查询代码如下:
- package com.search.lucene;
- import java.io.File;
- import org.apache.lucene.document.Document;
- import org.apache.lucene.index.DirectoryReader;
- import org.apache.lucene.index.IndexReader;
- import org.apache.lucene.index.Term;
- import org.apache.lucene.search.IndexSearcher;
- import org.apache.lucene.search.ScoreDoc;
- import org.apache.lucene.search.TermQuery;
- import org.apache.lucene.search.TopDocs;
- import org.apache.lucene.store.Directory;
- import org.apache.lucene.store.FSDirectory;
- import org.junit.Test;
- public class IndexSearch {
- /**
- * 查询
- * @throws Exception
- */
- @Test
- public void search() throws Exception {
- String filePath="D://lucene/index";
- Directory dir=FSDirectory.open(new File(filePath));
- IndexReader reader=DirectoryReader.open(dir);
- IndexSearcher searcher=new IndexSearcher(reader);
- Term term=new Term("content", "add");
- TermQuery query=new TermQuery(term);
- TopDocs topdocs=searcher.search(query, 5);
- ScoreDoc[] scoreDocs=topdocs.scoreDocs;
- System.out.println("查询结果总数---" + topdocs.totalHits+"最大的评分--"+topdocs.getMaxScore());
- for(int i=0; i < scoreDocs.length; i++) {
- int doc = scoreDocs[i].doc;
- Document document = searcher.doc(doc);
- System.out.println("content===="+document.get("content"));
- System.out.println("id--" + scoreDocs[i].doc + "---scors--" + scoreDocs[i].score+"---index--"+scoreDocs[i].shardIndex);
- }
- reader.close();
- }
- }
by 刘迎光@萤火虫工作室
OpenBI交流群:495266201
MicroService 微服务交流群:217722918
mail: liuyg#liuyingguang.cn
博主首页(==防止爬虫==):http://blog.liuyingguang.cn