Feature overview about Elasticsearch
Elasticsearch is a real-time distributed search engine. It is used for full-text search, structured search, analytics, and all three in combination.
Here are some use cases.
- Wikipedia uses Elasticsearch to provide full-text search with highlighted search snippets, and search-as-you-type and did-you-mean suggestions.
- Stack Overflow combines full-text search with geolocation queries and uses more-like-this to find related questions and answers.
- GitHub uses Elasticsearch to query 130 billion lines of code.
Elasticsearch is an open-source search engine built on top of Apache Lucene™, a full-text search-engine library.
The distributed document store enables it to process large volumes of data in parallel, quickly finding the best matches for queries.
RESTful API with JSON over HTTP
Consumers can communicate with Elasticsearch using RESTful API.
Elasticsearch supports aggregations to generate sophisticated analytics.
Elasticsearch hides the complexity of distributed systems.
Search across entities
It’s supported to search across all documents in the cluster. Elasticsearch forwarded the search request in parallel to a primary or replica of every shard in the cluster, gathered the results to select the overall top 10, and returned them to us.
Sorting and relevance
Elasticsearch uses the query domain-specific language, or query DSL to expose most of the power of Lucene.
Index setting is used to configure existing analyzers or to create new custom analyzers specific to an index.
- The standard tokenizer, which splits the input text on word boundaries
- The standard token filter, which is intended to tidy up the tokens emitted by the tokenizer (but currently does nothing)
- The lowercase token filter, which converts all tokens into lowercase
- The stop token filter, which removes stopwords.
- Character filters are used to “tidy up” a string before it is tokenized. For instance, the html_strip character filter can remove all HTML tags and convert HTML entities like Á into the corresponding Unicode character Á.
- The keywordtokenizer outputs exactly the same string as it received, without any tokenization. The whitespacetokenizer splits text on whitespace only. The patterntokenizer can be used to split text on a matching regular expression.
- Stemming token filters “stem” words to their root form. The ascii_folding filter removes diacritics, converting a term like “très” into “tres”. The ngram and edge_ngram token filters can produce tokens suitable for partial matching or autocomplete. The synonym token filter allows to easily handle synonyms.
Elasticsearch support fuzzy query which treats two words that are “fuzzily” similar as if they were the same word. It also supports phonetic matching which can search for words that sound similar, even if their spelling differs.