ElasticSearch Setting

Index Templates

Before the data are loaded to the ElasticSearch, there should be an index template present, so proper data types are assigned to every field.

This is especially needed for time-based fields, which would not work without index template and could not be used for sorting and creating index patterns in Kibana.

The ElasticSearch index template should be present in the site- repository under the name es_index_template.json.

To insert the index template through PostMan or Kibana, create a following HTTP request to the instance of ElasticSearch you are using:

PUT _template/lm_
{
  //Deploy to <SPECIFY_WHERE_TO_DEPLOY_THE_TEMPLATE>
  "index_patterns" : ["lm_*"],
  "version": 200721, // Increase this with every release
  "order" : 9999998, // Decrease this with every release
  "settings": {
    "index": {
      "lifecycle": {
        "name": "lm_",
        "rollover_alias": "lm_"
      }
    }
  },
  "mappings": {
    "properties": {
      "@timestamp": { "type": "date", "format": "strict_date_optional_time||epoch_millis" },
      "rt": { "type": "date", "format": "strict_date_optional_time||epoch_second" },
      ...
  }
}

The body of the request is the content of the es_index_template.json.

Index Lifecycle Management

Index Lifecycle Management (ILM) in ElasticSearch serves to automatically close or delete old indices (f. e. with data older than three months), so searching performance is kept and data storage is able to store present data. The setting is present in the so-called ILM policy.

The ILM should be set before the data are pumped into ElasticSearch, so the new index finds and associates itself with the proper ILM policy. For more information, please refer to the official documentation: https://www.elastic.co/guide/en/elasticsearch/reference/current/getting-started-index-lifecycle-management.html

LogMan.io components such as Dispatcher then use a specified ILM alias (lm_) and ElasticSearch automatically put the data to the proper index assigned with the ILM policy.

The setting should be done in following way:

Create the ILM policy

Kibana

Kibana version 7.x can be used to create ILM policy in ElasticSearch.

1.) Open Kibana

2.) Click Management in the left menu

3.) In the ElasticSearch section, click on Index Lifecycle Policies

4.) Click Create policy blue button

5.) Enter its name, which should be the same as the index prefix, f. e. lm_

6.) Set max index size to the desired rollover size, f. e. 25 GB (size rollover)

7.) Set maximum age of the index, f. e. 10 days (time rollover)

8.) Click the switch down the screen at Delete phase, and enter the time after which the index should be deleted, f. e. 120 days from rollover

9.) Click on Save policy green button

Use the policy in index template

Modify index template(s)

Add the following lines to the JSON index template:

"settings": {
  "index": {
    "lifecycle": {
      "name": "lm_",
      "rollover_alias": "lm_"
    }
  }
},

Kibana

Kibana version 7.x can be used to link ILM policy with ES index template.

1.) Open Kibana

2.) Click Management in the left menu

3.) In the ElasticSearch section, click on Index Management

4.) At the top, select Index Template

5.) Select your desired index template, f. e. lm_

6.) Click on Edit

7.) On the Settings screen, add:

{
  "index": {
    "lifecycle": {
      "name": "lm_",
      "rollover_alias": "lm_"
    }
  }
}

8.) Click on Save

Create a new index which will utilize the latest index template

Through PostMan or Kibana, create a following HTTP request to the instance of ElasticSearch you are using:

PUT lm_tenant-000001
{
  "aliases": {
    "lm_": {
      "is_write_index": true
    }
  }
}

The alias is then going to be used by the ILM policy to distribute data to the proper ElasticSearch index, so pumps do not have to care about the number of the index.

//Note: The prefix and number of index for ILM rollover must be separated with -000001, not _000001!//

Configure other LogMan.io components

The pumps may now use the ILM policy through the created alias, which in the case above is lm_tenant. The configuration file should then look like this:

[pipeline:<PIPELINE>:ElasticSearchSink]
index_prefix=lm_tenant
doctype=_doc

The pump will always put data to the lm_tenant alias, where ILM will take care of the proper assignment to the index, f. e. lm_-000001.

//Note: Make sure there is no index prefix configuration in the source, like in ElasticSearchSink in the pipeline. The code configuration would replace the file configuration.//

Hot-Warm-Cold architecture (HWC)

HWC is an extension of the standard index rotation provided by the ElasticSearch ILM and it is a good tool for managing time series data. HWC architecture enables us to allocate specific nodes to one of the phases. When used correctly, along with the cluster architecture, this will allow for maximum performance, using available hardware to its fullest potential.

Hot

There is usually some period of time (week, month, etc.), where we want to query the indexes heavily, aiming for speed, rather than memory (and other resources) conservation. That is where the “Hot” phase comes in handy, by allowing us to have the index with more replicas, spread out and accessible on more nodes for optimal user experience.

Hot nodes

Hot nodes should use the fast parts of the available hardware, using most CPU’s and faster IO.

Hot

Warm

Once this period is over, and the indexes are no longer queried as often, we will benefit by moving them to the “Warm” phase, which allows us to reduce the number of nodes (or move to nodes with less resources available) and index replicas, lessening the hardware load, while still retaining the option to search the data reasonably fast.

Warm nodes

Warm nodes, as the name suggests, stand on the crossroads, between being solely for the storage purposes, while still retaining some CPU power to handle the occasional queries.

warm

Cold

Sometimes, there are reasons to store data for extended periods of time (dictated by law, or some internal rule). The data are not expected to be queried, but at the same time, they cannot be deleted just yet.

Cold nodes

This is where the Cold nodes come in, there may be few, with only little CPU resources, they have no need to use SSD drives, being perfectly fine with slower (and optionally larger) storage.

cold

Conclusion

Using the HWC ILM feature to its full effect requires some preparation, it should be considered when building the production ElasticSearch cluster. The added value however, can be very high, depending on the specific use cases.