Application Development Blog Posts
Learn and share on deeper, cross technology development topics such as integration and connectivity, automation, cloud extensibility, developing at scale, and security.
cancel
Showing results for 
Search instead for 
Did you mean: 
jason_xia
Employee
Employee
Elasticsearch is a real-time distributed search engine. It is used for full-text search, structured search, analytics, and all three in combination.

Here are some use cases.

  • Wikipedia uses Elasticsearch to provide full-text search with highlighted search snippets, and search-as-you-type and did-you-mean suggestions.

  • Stack Overflow combines full-text search with geolocation queries and uses more-like-this to find related questions and answers.

  • GitHub uses Elasticsearch to query 130 billion lines of code.


Elasticsearch is an open-source search engine built on top of Apache Lucene™, a full-text search-engine library.

HIGH PERFORMANCE


The distributed document store enables it to process large volumes of data in parallel, quickly finding the best matches for queries.

RESTful API with JSON over HTTP


Consumers can communicate with Elasticsearch using RESTful API.

Elasticsearch provides official clients for several languages—Groovy, JavaScript, .NET, PHP, Perl, Python, and Ruby—and there are numerous community-provided clients and integrations, all of which can be found in Elasticsearch Clients.


Phrase search


Elasticsearch supports matching exact sequences of words or phrases. For instance, we could perform a query that will match only employee records that contain the phrase “rock climbing”.



Highlighting Searches





Elasticsearch supports highlighting snippets of text from each search result so the user can see why the document matched the query.

Analytics


Elasticsearch supports aggregations to generate sophisticated analytics.

Aggregations allow hierarchical rollups too. For example, you can find the average age of employees who share a particular interest.

Scalability


Elasticsearch can scale out to hundreds (or even thousands) of servers and handle petabytes of data.

Elasticsearch hides the complexity of distributed systems.

Search across entities


It's supported to search across all documents in the cluster. Elasticsearch forwarded the search request in parallel to a primary or replica of every shard in the cluster, gathered the results to select the overall top 10, and returned them to us.

Sorting and relevance


By default, results are returned and sorted by relevance.

did-you-mean suggestions


Elasticsearch uses the query domain-specific language, or query DSL to expose most of the power of Lucene.

Configuring Analyzers


Index setting is used to configure existing analyzers or to create new custom analyzers specific to an index.

The default analyzer is a good choice for most Western languages. It consists of the following:


  • The standard tokenizer, which splits the input text on word boundaries

  • The standard token filter, which is intended to tidy up the tokens emitted by the tokenizer (but currently does nothing)

  • The lowercase token filter, which converts all tokens into lowercase

  • The stop token filter, which removes stopwords.


A custom analyzer can be created to combine the following functions into a single package, which are executed in sequence:


Character filters

  • Character filters are used to “tidy up” a string before it is tokenized. For instance, the html_strip character filter can remove all HTML tags and convert HTML entities like Á into the corresponding Unicode character Á.


Tokenizers

  • The keywordtokenizer outputs exactly the same string as it received, without any tokenization. The whitespacetokenizer splits text on whitespace only. The patterntokenizer can be used to split text on a matching regular expression.





Token filters

  • Stemming token filters “stem” words to their root form. The ascii_folding filter removes diacritics, converting a term like "très" into "tres". The ngram and edge_ngram token filters can produce tokens suitable for partial matching or autocomplete. The synonym token filter allows to easily handle synonyms.


Fuzzy Query


Elasticsearch support fuzzy query which treats two words that are “fuzzily” similar as if they were the same word. It also supports phonetic matching which can search for words that sound similar, even if their spelling differs.




  • SAP Managed Tags: