Elasticsearch Main Blog

Hi readers!

This blog post wants to help all those people who encounter an index writer disk space issue in Elasticsearch.

Let’s start from the first observation, what do we mean by index writer disk space issue?

When dealing with a huge amount of data, it is not that unusual to incur a disk-related error. If too many documents are indexed, the disk space can saturate leading to a BulkIndexError. This is the log message we would obtain:

[ERROR] BulkIndexError: ('500 document(s) failed to index.', [{'index': {'_index': 
'your_index_name', '_type': '_doc', '_id': 'vAAt1H8BpDNDRA5qgPEv', 'status': 403, 'error': 
{'type': 'cluster_block_exception', 'reason': 'index [your_index_name] blocked by: 
[FORBIDDEN/8/index write (api)];’}

What is this error telling us?
The error reports that the writing operation is blocked and therefore no new documents can be indexed in the your_index_name index. Elasticsearch automatically sets this block when the disk space consumption reaches 80% of the total disk size.

You can check the current index block status through the GET settings API:

GET https://localhost:9200/your_index_name/_settings

Here is the parameter to look at in the obtained response:

{
    "your_index_name": {
        "settings": {
            "index": {
                ...
                "blocks": {
                    "write": "true"
                },
                "number_of_shards": "2",
                "provided_name": "your_index_name",
                "creation_date": "1650529887595",
                "number_of_replicas": "1",
                "uuid": "2FPWsd-LMHyQSwXaM523GA",
                "version": {
                    "created": "7100299"
                }
            }
        }
    }
}

The aim of this blog post is to answer these two questions:

    1. How to solve this problem when it arises?
    2. How to prevent the problem from happening again?
TEMPORARY SOLUTION

This wants to be a temporary solution for freeing the disk space and removing the index block.

Here we want to describe an approach that will help you remove unuseful documents in order to free part of the disk and avoid the index block setting.

First of all, we manually remove the index block through this request:

PUT /[_all|<your_index_name>]/_settings
{
  "index.blocks.write": null
}

This is necessary because also the delete is a writing operation.

At this point, we can delete some documents. Be careful and start with a small number of them, since this operation temporarily increases the disk consumption (until a segment merge happens and the removed documents’ free space is claimed).

After the deletion, we can check the current disk usage through:

GET https://localhost:9200/_cat/allocation?v&pretty

Here is the obtained response:

shards disk.indices disk.used disk.avail disk.total disk.percent host ip node
72 11.9gb 17gb 81.2gb 98.3gb 17 x.x.. x.x.. 645784hwe...
72 11.9gb 17gb 81.2gb 98.3gb 17 x.x.. x.x.. 374562gfi...

In case we still need free space, we can repeat this process.
If the erasing caused the disk consumption to exceed 80%, the lock removal will be necessary again.

We call this a “temporary solution” because it helps us free the disk and remove the index block, but it doesn’t avoid the error to arise again in the feature.
In order to do that, we recommend automatic management of the index.

INDEX LIFECYCLE - ROLLOVER SOLUTION

From Elasticsearch documentation:
“The index lifecycle management (ILM) [1] feature of the Elastic Stack provides an integrated and streamlined way to manage time-based data, making it easier to follow best practices for managing your indices. Compared to index curation, migrating to ILM gives you more fine-grained control over the lifecycle of each index.”

“You can configure index lifecycle management (ILM) [2] policies to automatically manage indices according to your performance, resiliency, and retention requirements. For example, you could use ILM to:

    • Spin up a new index when an index reaches a certain size or number of documents
    • Create a new index each day, week, or month and archive previous ones
    • Delete stale indices to enforce data retention standards

You can create and manage index lifecycle policies through Kibana Management or the ILM APIs.”

This is exactly what we need to avoid our disk issues.
We would like to spin up a new index when an index reaches a certain size (I) and delete stale indices (II).

The first requirement (I) can be achieved thanks to the rollover strategy that: “Rolls over a target to a new index when the existing index meets one or more of the rollover conditions” [3]; while the second requirement (II) can be done through the definition of a policy. In the policy, we can define states, actions, and transitions.
The indexes associated with that policy start from the default state, process the actions in that state, and evaluate the transitions condition. If the condition is true, the indexes will pass to the new state for which again action and transitions will be executed.

From the Elasticsearch documentation: “An index’s lifecycle policy specifies which phases are applicable, what actions are performed in each phase, and when it transitions between phases” [4].

Let’s see together all the necessary steps to create a policy that automates index rollover and deletion. For our example, we are using the Elasticsearch 7.10 version managed by Amazon which includes Open Distro plugins.
We leverage Kibana tools for some of these steps.


1 – Create an index template for Rollover

This template is the one used for rollover. This defines the settings that the newly created index will have.

    • In index_patterns we define the name of the new index. In this case, it starts with index_name- and then an incremental number follows.
    • index.opendistro.index_state_management.rollover_alias is the name of the rollover alias. This is associated with the current active index (the one used to index new documents) and will be moved to the new index after the rollover is done.

Here is the request to create the index template:

PUT /_index_template/template-name
{
  "index_patterns": ["index_name-*"],
  "template": {
   "settings": {
    "index.opendistro.index_state_management.rollover_alias": "rollover-alias-name"
   }
 }
}

N.B. If you haven’t Open Distro, these are the equivalent index settings for:
   Elasticsearch: index.lifecycle.rollover_alias
   Opensearch: index.plugins.index_state_management.rollover_alias


2 – Create a policy

The second step is the policy definition.
You can add this in Kibana by going into:

Kibana → Index Management → State management policies → Create

At this point, a Policy ID is required, together with the policy definition.

Here is an example:

{
    "policy_id": "rollover_policy",
    "description": "Rollover policy for index_name-* indexes.",
    "last_updated_time": 1650529799079,
    "schema_version": 1,
    "error_notification": null,
    "default_state": "hot_state",
    "states": [
        {
            "name": "hot_state",
            "actions": [
                {
                    "rollover": {
                        "min_doc_count": 100
                    }
                }
            ],
            "transitions": [
                {
                    "state_name": "warm_state"
                }
            ]
        },
        {
            "name": "warm_state",
            "actions": [],
            "transitions": [
                {
                    "state_name": "delete_state",
                    "conditions": {
                        "min_index_age": "60d"
                    }
                }
            ]
        },
        {
            "name": "delete_state",
            "actions": [
                {
                    "delete": {}
                }
            ],
            "transitions": []
        }
    ],
    "ism_template": [
        {
            "index_patterns": [
                "index_name-*"
            ],
            "priority": 0,
            "last_updated_time": 1650470897111
        }
    ]
}

This policy has 3 states: hot_state, warm_state, and delete_state.
Each new index starts with the default state which is hot_state.
Inside the hot_state, the rollover action is defined. It is executed when the min_doc_count condition is met: when the index contains more than 100 documents.
After all the actions in hot_state have been done, the transitions are evaluated. In this first hot_state, we automatically decide to pass to the warm_state.
At this point, the newly created index (in hot_state) is the one in which the new documents will be inserted, while the previous index (in warm_state) evaluates the actions and transitions of its newly assigned state.
In the warm_state no actions are defined, therefore we directly go to transitions. Here we pass to delete_state when min_index_age is achieved, therefore when 60 days are passed from the index creation.
Once in the delete_state, the index is removed.
In ism_template we define the index name pattern that identifies the indexes to which automatically apply the policy; therefore, this policy will be attached to each newly created index starting with index_name-*.


3 – Create the index (or reindex an existing one)

We can now create the first index. In order to automatically attach the policy to this index, it is important that its name matches the index pattern defined in the ism_template part of the policy.
In general, here is the name pattern an index should have in order to apply rollover:

^.*-\d+$.

We can see that it is important that the name ends with -some_digits. This is because, at each rollover, the newly (automatically) created index will have the same name as the previous one with an increment of 1 in the numerical part.

Here is an example of the index creation:

PUT /index_name-000001
4 – Create an alias

We can now associate an alias with the new index.
This is used in the rollover phase, therefore the alias must be the same as the one in the index template (rollover_alias).

POST /_aliases
{
    "actions" : [
        { "add" : { "index" : "index_name-000001", "alias" : "rollover-alias-name" } }
    ]
}

The alias can also be specified when creating the index, using the related parameter in the request body:

PUT /index-name-000001
{
  "settings": {
	...
  },
  "aliases": {
    "rollover-alias-name": {}
  },
  "mappings": {
    ...
   }
}
5 – Attach the policy to the index (if necessary)

As the final step, we need to attach the policy to the newly created index.
This is automatically done if the policy is created before the index, otherwise, we need to manually attach the policy.
Also, this step can be done in Kibana through the Index Management section:

Kibana → Index Management → Indices → Select the index from the list → Apply policy → Select the policy from the list

Summary

In this blog post, we have seen what is an index writer disk issue and how to solve it.

We present two solutions:

    1. Temporary solution: here we explain how to free space by manually removing the index block to delete unuseful documents. This solution does not avoid the error to appear again.
    2. Index management solution (with rollover): here we explain how automatically manage the index in order to avoid the error to appear again. We define a policy that manages a set of indexes (defined by an index name pattern) in order to roll over them when they reach a certain size. The policy will also delete the older indexes to free space.

Thanks for reading and see you in the next blog post!

// our service

Shameless plug for our training and services!

Did I mention we do trainings for both Elasticsearch Beginner and Apache Solr Beginner training?
We also provide consulting on these topics, get in touch if you want to bring your search engine to the next level!

// STAY ALWAYS UP TO DATE

Subscribe to our newsletter

Did you like this post about Elasticsearch Disk Space Issue and Rollover Solution? Don’t forget to subscribe to our Newsletter to stay always updated on the Information Retrieval world!

Author

Anna Ruggero

Anna Ruggero is a software engineer passionate about Information Retrieval and Data Mining. She loves to find new solutions to problems, suggesting and testing new ideas, especially those that concern the integration of machine learning techniques into information retrieval systems.

Leave a comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.