current position:Home>Full text search based on elasticsearch
Full text search based on elasticsearch
2022-01-27 01:30:41 【IT1124】
Catalog
- Abstract
- 1 Technology selection
- 1.1 ElasticSearch
- 1.2 springBoot
- 1.3 ik Word segmentation is
- 2 Environmental preparation
- 3 Project framework
- 4 Realization effect
- 4.1 Search page
- 4.2 Search results page
- 5 Specific code implementation
- 5.1 The implementation object of Full-text Retrieval
- 5.2 Client configuration
- 5.3 Business code writing
- 5.4 External interface
- 5.5 page
- 6 Summary
Abstract
For a company , More and more data , It is a difficult problem to find this information quickly , There is a special field in the computer field IR(Information Retrival) Research if you get information , Do Information Retrieval . Domestic search engines such as Baidu also belong to this field , It is very difficult to implement a search engine by yourself , However, information search is very important for every company , Developers can also choose some open source projects in the market to build their own on-site search engine , This article will go through ElasticSearch To build such an information retrieval project .
1 Technology selection
- Search engine services use ElasticSearch
- External services provided web Service selection springboot web
1.1 ElasticSearch
Elasticsearch It's based on Lucene Search server for . It provides a distributed multi-user capability of full-text search engine , be based on RESTful web Interface .Elasticsearch Yes, it is Java Language development , And as a Apache Open source distribution under license terms , Is a popular enterprise search engine .Elasticsearch For Cloud Computing , Real time search , Stable , reliable , Fast , Easy to install and use .
The official client is in Java、.NET(C#)、PHP、Python、Apache Groovy、Ruby And many other languages are available . according to DB-Engines The ranking shows ,Elasticsearch Is the most popular enterprise search engine , The second is Apache Solr, Is based on Lucene.1
Now the most common open source search engine on the market is ElasticSearch and Solr, Both are based on Lucene The implementation of the , among ElasticSearch Relatively more heavyweight , It also performs better in a distributed environment , The selection of the two needs to consider the specific business scenario and data level . When the amount of data is small , Completely need to use something like Lucene Such search engine services , Search through relational database .
1.2 springBoot
Spring Boot makes it easy to create stand-alone, production-grade Spring based Applications that you can “just run”.2
Now? springBoot Doing it web Development is the absolute mainstream , It's not just a development advantage , In deployment , All aspects of operation and maintenance have performed very well , also spring The influence of the ecosystem is too great , Various mature solutions can be found .
1.3 ik Word segmentation is
elasticSearch It does not support Chinese word segmentation , Need to install Chinese word segmentation plug-in , If you need to do Chinese Information Retrieval , Chinese word segmentation is the basis , Here we choose ik, After downloading, put elasticSearch Installation position of plugin directory .
2 Environmental preparation
It needs to be installed elastiSearch as well as kibana( Optional ), And need lk Word segmentation plugin .
- install elasticSearch elasticsearch Official website . I used 7.5.1.
- ik Plugin Download ik plug-in unit github Address . Pay attention to download and you download elasticsearch The same version ik plug-in unit .
- take ik Plug in elasticsearch Install under directory plugins It's a bag , New registration ik, Unzip the downloaded plug-in to this directory , start-up es The plug-in will be loaded automatically when .
- build springboot project idea ->new project ->spring initializer
3 Project framework
- Get data usage ik Word segmentation plugin
- Store data in es In the engine
- adopt es The retrieval method is to retrieve the stored data
- Use es Of java The client provides external services
4 Realization effect
4.1 Search page
Simply implement a search box similar to Baidu .
4.2 Search results page
Click the first search result is my personal blog post , To avoid data copyright issues , The author is in es The engine is full of personal blog data .
5 Specific code implementation
5.1 The implementation object of Full-text Retrieval
The following entity classes are defined according to the basic information of the blog , We mainly need to know the of each blog post url, Check the retrieved articles to jump to this url.
package com.lbh.es.entity;
import com.fasterxml.jackson.annotation.JsonIgnore;
import javax.persistence.*;
/**
* PUT articles
* {
* "mappings":
* {"properties":{
* "author":{"type":"text"},
* "content":{"type":"text","analyzer":"ik_max_word","search_analyzer":"ik_smart"},
* "title":{"type":"text","analyzer":"ik_max_word","search_analyzer":"ik_smart"},
* "createDate":{"type":"date","format":"yyyy-MM-dd HH:mm:ss||yyyy-MM-dd"},
* "url":{"type":"text"}
* } },
* "settings":{
* "index":{
* "number_of_shards":1,
* "number_of_replicas":2
* }
* }
* }
* ---------------------------------------------------------------------------------------------------------------------
* Copyright(c)[email protected]
* @author liubinhao
* @date 2021/3/3
*/
@Entity
@Table(name = "es_article")
public class ArticleEntity {
@Id
@JsonIgnore
@GeneratedValue(strategy = GenerationType.IDENTITY)
private long id;
@Column(name = "author")
private String author;
@Column(name = "content",columnDefinition="TEXT")
private String content;
@Column(name = "title")
private String title;
@Column(name = "createDate")
private String createDate;
@Column(name = "url")
private String url;
public String getAuthor() {
return author;
}
public void setAuthor(String author) {
this.author = author;
}
public String getContent() {
return content;
}
public void setContent(String content) {
this.content = content;
}
public String getTitle() {
return title;
}
public void setTitle(String title) {
this.title = title;
}
public String getCreateDate() {
return createDate;
}
public void setCreateDate(String createDate) {
this.createDate = createDate;
}
public String getUrl() {
return url;
}
public void setUrl(String url) {
this.url = url;
}
}
5.2 Client configuration
adopt java To configure es The client of .
package com.lbh.es.config;
import org.apache.http.HttpHost;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestClientBuilder;
import org.elasticsearch.client.RestHighLevelClient;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import java.util.ArrayList;
import java.util.List;
/**
* Copyright(c)[email protected]
* @author liubinhao
* @date 2021/3/3
*/
@Configuration
public class EsConfig {
@Value("${elasticsearch.schema}")
private String schema;
@Value("${elasticsearch.address}")
private String address;
@Value("${elasticsearch.connectTimeout}")
private int connectTimeout;
@Value("${elasticsearch.socketTimeout}")
private int socketTimeout;
@Value("${elasticsearch.connectionRequestTimeout}")
private int tryConnTimeout;
@Value("${elasticsearch.maxConnectNum}")
private int maxConnNum;
@Value("${elasticsearch.maxConnectPerRoute}")
private int maxConnectPerRoute;
@Bean
public RestHighLevelClient restHighLevelClient() {
// Split address
List<HttpHost> hostLists = new ArrayList<>();
String[] hostList = address.split(",");
for (String addr : hostList) {
String host = addr.split(":")[0];
String port = addr.split(":")[1];
hostLists.add(new HttpHost(host, Integer.parseInt(port), schema));
}
// convert to HttpHost Array
HttpHost[] httpHost = hostLists.toArray(new HttpHost[]{});
// Building connection objects
RestClientBuilder builder = RestClient.builder(httpHost);
// Asynchronous connection delay configuration
builder.setRequestConfigCallback(requestConfigBuilder -> {
requestConfigBuilder.setConnectTimeout(connectTimeout);
requestConfigBuilder.setSocketTimeout(socketTimeout);
requestConfigBuilder.setConnectionRequestTimeout(tryConnTimeout);
return requestConfigBuilder;
});
// Asynchronous connection number configuration
builder.setHttpClientConfigCallback(httpClientBuilder -> {
httpClientBuilder.setMaxConnTotal(maxConnNum);
httpClientBuilder.setMaxConnPerRoute(maxConnectPerRoute);
return httpClientBuilder;
});
return new RestHighLevelClient(builder);
}
}
5.3 Business code writing
Including some information about searching articles , From the article title , View relevant information from the dimensions of article content and author information .
package com.lbh.es.service;
import com.google.gson.Gson;
import com.lbh.es.entity.ArticleEntity;
import com.lbh.es.repository.ArticleRepository;
import org.elasticsearch.action.admin.indices.delete.DeleteIndexRequest;
import org.elasticsearch.action.get.GetRequest;
import org.elasticsearch.action.get.GetResponse;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.action.index.IndexResponse;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.action.support.master.AcknowledgedResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.client.indices.CreateIndexRequest;
import org.elasticsearch.client.indices.CreateIndexResponse;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.common.xcontent.XContentType;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.springframework.stereotype.Service;
import javax.annotation.Resource;
import java.io.IOException;
import java.util.*;
/**
* Copyright(c)[email protected]
* @author liubinhao
* @date 2021/3/3
*/
@Service
public class ArticleService {
private static final String ARTICLE_INDEX = "article";
@Resource
private RestHighLevelClient client;
@Resource
private ArticleRepository articleRepository;
public boolean createIndexOfArticle(){
Settings settings = Settings.builder()
.put("index.number_of_shards", 1)
.put("index.number_of_replicas", 1)
.build();
// {"properties":{"author":{"type":"text"},
// "content":{"type":"text","analyzer":"ik_max_word","search_analyzer":"ik_smart"}
// ,"title":{"type":"text","analyzer":"ik_max_word","search_analyzer":"ik_smart"},
// ,"createDate":{"type":"date","format":"yyyy-MM-dd HH:mm:ss||yyyy-MM-dd"}
// }
String mapping = "{"properties":{"author":{"type":"text"},n" +
""content":{"type":"text","analyzer":"ik_max_word","search_analyzer":"ik_smart"}n" +
","title":{"type":"text","analyzer":"ik_max_word","search_analyzer":"ik_smart"}n" +
","createDate":{"type":"date","format":"yyyy-MM-dd HH:mm:ss||yyyy-MM-dd"}n" +
"},"url":{"type":"text"}n" +
"}";
CreateIndexRequest indexRequest = new CreateIndexRequest(ARTICLE_INDEX)
.settings(settings).mapping(mapping,XContentType.JSON);
CreateIndexResponse response = null;
try {
response = client.indices().create(indexRequest, RequestOptions.DEFAULT);
} catch (IOException e) {
e.printStackTrace();
}
if (response!=null) {
System.err.println(response.isAcknowledged() ? "success" : "default");
return response.isAcknowledged();
} else {
return false;
}
}
public boolean deleteArticle(){
DeleteIndexRequest request = new DeleteIndexRequest(ARTICLE_INDEX);
try {
AcknowledgedResponse response = client.indices().delete(request, RequestOptions.DEFAULT);
return response.isAcknowledged();
} catch (IOException e) {
e.printStackTrace();
}
return false;
}
public IndexResponse addArticle(ArticleEntity article){
Gson gson = new Gson();
String s = gson.toJson(article);
// Create index create object
IndexRequest indexRequest = new IndexRequest(ARTICLE_INDEX);
// Document content
indexRequest.source(s,XContentType.JSON);
// adopt client Conduct http Request
IndexResponse re = null;
try {
re = client.index(indexRequest, RequestOptions.DEFAULT);
} catch (IOException e) {
e.printStackTrace();
}
return re;
}
public void transferFromMysql(){
articleRepository.findAll().forEach(this::addArticle);
}
public List<ArticleEntity> queryByKey(String keyword){
SearchRequest request = new SearchRequest();
/*
* establish Search content parameter setting object :SearchSourceBuilder
* be relative to matchQuery,multiMatchQuery For multiple fi eld, in other words , When multiMatchQuery in ,fieldNames When there is only one parameter , Its function and matchQuery Quite a ;
* And when fieldNames When there are multiple parameters , Such as field1 and field2, In the results of the query , or field1 Contained in the text, or field2 Contained in the text.
*/
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(QueryBuilders
.multiMatchQuery(keyword, "author","content","title"));
request.source(searchSourceBuilder);
List<ArticleEntity> result = new ArrayList<>();
try {
SearchResponse search = client.search(request, RequestOptions.DEFAULT);
for (SearchHit hit:search.getHits()){
Map<String, Object> map = hit.getSourceAsMap();
ArticleEntity item = new ArticleEntity();
item.setAuthor((String) map.get("author"));
item.setContent((String) map.get("content"));
item.setTitle((String) map.get("title"));
item.setUrl((String) map.get("url"));
result.add(item);
}
return result;
} catch (IOException e) {
e.printStackTrace();
}
return null;
}
public ArticleEntity queryById(String indexId){
GetRequest request = new GetRequest(ARTICLE_INDEX, indexId);
GetResponse response = null;
try {
response = client.get(request, RequestOptions.DEFAULT);
} catch (IOException e) {
e.printStackTrace();
}
if (response!=null&&response.isExists()){
Gson gson = new Gson();
return gson.fromJson(response.getSourceAsString(),ArticleEntity.class);
}
return null;
}
}
5.4 External interface
And use springboot Development web The procedure is the same .
package com.lbh.es.controller;
import com.lbh.es.entity.ArticleEntity;
import com.lbh.es.service.ArticleService;
import org.elasticsearch.action.index.IndexResponse;
import org.springframework.web.bind.annotation.*;
import javax.annotation.Resource;
import java.util.List;
/**
* Copyright(c)[email protected]
* @author liubinhao
* @date 2021/3/3
*/
@RestController
@RequestMapping("article")
public class ArticleController {
@Resource
private ArticleService articleService;
@GetMapping("/create")
public boolean create(){
return articleService.createIndexOfArticle();
}
@GetMapping("/delete")
public boolean delete() {
return articleService.deleteArticle();
}
@PostMapping("/add")
public IndexResponse add(@RequestBody ArticleEntity article){
return articleService.addArticle(article);
}
@GetMapping("/fransfer")
public String transfer(){
articleService.transferFromMysql();
return "successful";
}
@GetMapping("/query")
public List<ArticleEntity> query(String keyword){
return articleService.queryByKey(keyword);
}
}
5.5 page
This page uses thymeleaf, The main reason is that I really don't know , Only know the simple h5, Just make a page that can be displayed .
Search page
<!DOCTYPE html>
<html lang="en" xmlns:th="http://www.thymeleaf.org">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>YiyiDu</title>
<!--
input:focus Set when the input box is clicked , The blue outer border appears
text-indent: 11px; and padding-left: 11px; Set the distance between the starting position of the input character and the left border
-->
<style>
input:focus {
border: 2px solid rgb(62, 88, 206);
}
input {
text-indent: 11px;
padding-left: 11px;
font-size: 16px;
}
</style>
<!--input The initial state -->
<style class="input/css">
.input {
width: 33%;
height: 45px;
vertical-align: top;
box-sizing: border-box;
border: 2px solid rgb(207, 205, 205);
border-right: 2px solid rgb(62, 88, 206);
border-bottom-left-radius: 10px;
border-top-left-radius: 10px;
outline: none;
margin: 0;
display: inline-block;
background: url(/static/img/camera.jpg) no-repeat 0 0;
background-position: 565px 7px;
background-size: 28px;
padding-right: 49px;
padding-top: 10px;
padding-bottom: 10px;
line-height: 16px;
}
</style>
<!--button The initial state -->
<style class="button/css">
.button {
height: 45px;
width: 130px;
vertical-align: middle;
text-indent: -8px;
padding-left: -8px;
background-color: rgb(62, 88, 206);
color: white;
font-size: 18px;
outline: none;
border: none;
border-bottom-right-radius: 10px;
border-top-right-radius: 10px;
margin: 0;
padding: 0;
}
</style>
</head>
<body>
<!-- contain table Of div-->
<!-- contain input and button Of div-->
<div style="font-size: 0px;">
<div align="center" style="margin-top: 0px;">
<img src="../static/img/yyd.png" th:src = "@{/static/img/yyd.png}" alt=" 100 million degrees " width="280px" class="pic" />
</div>
<div align="center">
<!--action Realize jump -->
<form action="/home/query">
<input type="text" class="input" name="keyword" />
<input type="submit" class="button" value=" Under 100 million degrees " />
</form>
</div>
</div>
</body>
</html>
Search results page
<!DOCTYPE html>
<html lang="en" xmlns:th="http://www.thymeleaf.org">
<head>
<link rel="stylesheet" href="https://cdn.staticfile.org/twitter-bootstrap/4.3.1/css/bootstrap.min.css">
<meta charset="UTF-8">
<title>xx-manager</title>
</head>
<body>
<header th:replace="search.html"></header>
<div class="container my-2">
<ul th:each="article : ${articles}">
<a th:href="${article.url}"><li th:text="${article.author}+${article.content}"></li></a>
</ul>
</div>
<footer th:replace="footer.html"></footer>
</body>
</html>
6 Summary
Work code , After work, continue to write code and blog , Spent two days studying the following es, In fact, this thing is still very interesting , Now? IR The most basic field is still based on Statistics , So for es This kind of search engine has good performance in the case of big data . Every time I write about the actual combat, the author actually feels that there is no way to start , Because I don't know what to do ? So I also hope to get some interesting ideas, and I will do the actual combat .
copyright notice
author[IT1124],Please bring the original link to reprint, thank you.
https://en.cdmana.com/2022/01/202201270130375022.html
The sidebar is recommended
- Spring IOC container loading process
- [thinking] the difference between singleton mode and static method - object-oriented programming
- Hadoop environment setup (MySQL environment configuration)
- 10 minutes, using node JS creates a real-time early warning system for bad weather!
- Git tool
- Force deduction algorithm - 92 Reverse linked list II
- What is the sub problem of dynamic programming?
- C / C + +: static keyword summary
- Idea does not have the artifacts option when configuring Tomcat
- Anaconda can't open it
guess what you like
-
I don't know how to start this
-
Matlab simulation of transportation optimization algorithm based on PSO
-
MySQL slow log optimization
-
[Vue] as the window is stretched (larger, smaller, wider and higher), the text will not be displayed
-
Popular Linux distributions for embedded computing
-
Suzhou computer research
-
After installing SSL Certificate in Windows + tomcat, the domain name request is not successful. Please answer!!
-
Implementation time output and greetings of jQuery instance
-
The 72 year old uncle became popular. Wu Jing and Guo fan made his story into a film, which made countless dreamers blush
-
How to save computer research
Random recommended
- Springboot implements excel import and export, which is easy to use, and poi can be thrown away
- The final examination subjects of a class are mathematical programming, and the scores are sorted and output from high to low
- Two pronged approach, Tsinghua Professor Pro code JDK and hotspot source code notes, one-time learning to understand
- C + + recursive knapsack problem
- The use of GIT and GitHub and the latest git tutorial are easy to understand -- Video notes of crazy God speaking
- PostgreSQL statement query
- Ignition database test
- Context didn't understand why he got a high salary?, Nginxfair principle
- Bootstrap switch switch control user's guide, springcloud actual combat video
- A list that contains only strings. What other search methods can be used except sequential search
- [matlab path planning] multi ant colony algorithm grid map path planning [including GUI source code 650]
- [matlab path planning] improved genetic algorithm grid map path planning [including source code phase 525]
- Iinternet network path management system
- Appium settings app is not running after 5000ms
- Reactnative foundation - 07 (background image, status bar, statusbar)
- Reactnative foundation - 04 (custom rpx)
- If you want an embedded database (H2, hsql or Derby), please put it on the classpath
- When using stm32g070 Hal library, if you want to write to flash, you must perform an erase. If you don't let it, you can't write continuously.
- Linux checks where the software is installed and what files are installed
- SQL statement fuzzy query and time interval filtering
- 69. Sqrt (x) (c + + problem solving version with vs runnable source program)
- Fresh students are about to graduate. Do you choose Java development or big data?
- Java project: OA management system (java + SSM + bootstrap + MySQL + JSP)
- Titanic passenger survival prediction
- Vectorization of deep learning formula
- Configuration and use of private image warehouse of microservice architect docker
- Relearn JavaScript events
- For someone, delete return 1 and return 0
- How does Java dynamically obtain what type of data is passed? It is used to judge whether the data is the same, dynamic data type
- How does the database cow optimize SQL?
- [data structure] chain structure of binary tree (pre order traversal) (middle order traversal) (post order traversal) (sequence traversal)
- Webpack packaging optimization solution
- 5. Operation element
- Detailed explanation of red and black trees
- redhat7. 9 install database 19C
- Blue Bridge Cup notes: (the given elements are not repeated) complete arrangement (arrangement cannot be repeated, arrangement can be repeated)
- Detailed explanation of springboot default package scanning mechanism and @ componentscan specified scanning path
- How to solve the run-time exception of test times
- Detailed explanation of k8s management tool kubectl
- Android system view memory command