Tag-全文检索

lucene java 全文检索 2017-07-19 20:51:52 818

Lucene是apache软件基金会4 jakarta项目组的一个子项目，是一个开放源代码的全文检索引擎工具包，但它不是一个完整的全文检索引擎，而是一个全文检索引擎的架构，提供了完整的查询引擎和索引引擎，部分文本分析引擎（英文与德文两种西方语言）。Lucene的目的是为软件开发人员提供一个简单易用的工具包，以方便的在目标系统中实现全文检索的功能，或者是以此为基础建立起完整的全文检索引擎。Lucene是一套用于全文检索和搜寻的开源程式库，由Apache软件基金会支持和提供。Lucene提供了一个简单却强大的应用程式接口，能够做全文索引和搜寻。在Java开发环境里Lucene是一个成熟的免费开源工具。就其本身而言，Lucene是当前以及最近几年最受欢迎的免费Java信息检索程序库。人们经常提到信息检索程序库，虽然与搜索引擎有关，但不应该将信息检索程序库与搜索引擎相混淆。（摘自：http://baike.baidu.com/link?url=YfcwwNXbNFaYkMNZqNhk9LIyHdrSuIMsMLlO_NNm3ioxHADGUid2JnF1R9znysICj6w83zJmlpZPBJnv1mHYFK）

下面是全文检索引擎的初步应用，但是很遗憾，原生的lucene不支持中文分词，所以需要插件支持，在后面会继续讲到。

代码摘自：http://iluoxuan.iteye.com/blog/1708695

POM.xml文件：

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <groupId>cn.firewarm</groupId>
    <artifactId>testLucene</artifactId>
    <packaging>war</packaging>
    <version>0.0.1-SNAPSHOT</

More

Lucene 学习（二）：使用IK Analyzer中文分词

lucene java 全文检索中文分词 2017-07-19 20:50:21 869

如上一篇所说，Lucene原生功能很强大，但是很遗憾的是，Lucene官方却不支持中文分词，所以需要其他插件辅助，这里我选择使用IK Analyzer进行中文分词。

中文分词(Chinese Word Segmentation) 指的是将一个汉字序列切分成一个一个单独的词。分词就是将连续的字序列按照一定的规范重新组合成词序列的过程。我们知道，在英文的行文中，单词之间是以空格作为自然分界符的，而中文只是字、句和段能通过明显的分界符来简单划界，唯独词没有一个形式上的分界符，虽然英文也同样存在短语的划分问题，不过在词这一层上，中文比之英文要复杂的多、困难的多。

各中文分词插件比较：http://blog.csdn.net/chs_jdmdr/article/details/7359773

注意，IK Analyzer需要使用其下载列表中的 IK Analyzer 2012FF_hf1.zip，否则在和Lucene 4.10配合使用时会报错。

文件地址：http://pan.baidu.com/s/1hrXIeQ4

下载的包中没有源码，所以只使用了jar包，测试没有问题

代码摘自如下链接并略作修改：http://my.oschina.net/letiantian/blog/323887

import java.io.IOException;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.StringField;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.queryparser.classic.ParseException;
import

More

Lucene 学习（三）：在一个(或者多个)字段中查找多个关键字

java lucene 全文检索 2017-07-19 20:48:41 922

上一篇中，实现了中文分词的操作，在实际试用中，发现众多搜索引擎的搜索框中，我们在不同的key之间使用空格来表示“或”的语义，并且也许我们的关键字在title或者content中，那么现在我们就需要实现“在一个(或者多个)字段中查找多个关键字”的需求。以下便来看看如何实现。

下面代码基于上一篇的代码修改（红色标注地方为重点关注点）：

import java.io.IOException;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.StringField;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.queryparser.classic.MultiFieldQueryParser;
import org.apache.lucene.queryparser.classic.ParseException;
import org.apache.lucene.search.BooleanClause.Occur;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.RAMDirectory;
import org.apache.lucene.util.Version;
import o

More

elasticsearch开启跨域访问

elasticsearch 全文检索跨域 cors 2017-07-19 20:47:25 951

参考官方doc地址：https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-http.html

添加如下配置：

http.cors.enabled: true

http.cors.allow-origin: /http?:\/\/192.168.10.139(:[0-9]+)?/

http.cors.enabled 开启跨域访问支持，默认为false

http.cors.allow-origin 跨域访问允许的域名地址，以上使用正则，域名这里我替换了IP

重启下服务即可

by 刘迎光@萤火虫工作室
OpenBI交流群：495266201
MicroService 微服务交流群：217722918
mail: liuyg#liuyingguang.cn
博主首页（==防止爬虫==）：http://blog.liuyingguang.cn

More

elasticsearch创建index和type操作日志记录

elasticsearch 全文检索 2017-07-19 20:46:15 1186

为了学习elasticsearch，跟着官方文档及网上例子操作了一些，整理记录下来，以供大家参考。

本文所用测试工具：火狐插件HttpRequester。

elasticsearch：2.3

----------------------------------------创建index---------------------------------------------

POST http://192.168.10.139:9200/megacorp1/

Content-Type: application/json

-- response --

200 OK

Content-Type: application/json; charset=UTF-8

Content-Length: 21

{"acknowledged":true}

----------------------------------------查询index---------------------------------------------

GET http://192.168.10.139:9200/megacorp1/?pretty

-- response --

200 OK

Content-Type: application/json; charset=UTF-8

Content-Length: 361

{

"megacorp1" : {

"aliases" : { },

"mappings" : { },

"settings" : {

"index" : {

"creation_date" : "1467844960627",

"number_of_shards" : "5",

"number_of_replicas" : "1",

"uuid" : "3luuvHNGSf2RBYJHxRGUNg",

"version" : {

"created" : "2030399"

}

},

"warmers" : { }

}

----------------------------------------创建type--------------------

More

Lighting@刘迎光

Tag - 全文检索

Navigation

Recent Posts

友情链接