neo4j, Tips And Tricks

Neo4J Optimization Tips

This ‘tips and tricks’ blog post is a result of our collaboration with the University of Padua, wherein the student Riccardo Forzan played a significant role in selecting the topic and contributing a major portion of the content.

Neo4j is a popular graph database management system designed to store, manage, and query a growing volume of connected data using graph-based structures.

It implements a “bare metal” system to store graph databases and uses index-free adjacency, so every node references its adjacent/next node directly.

This is critical for high-performance graph queries since native graph queries and processing will thus perform at a constant rate regardless of data size.

This blog post covers optimizations for Neo4j, with a specific focus on two aspects:

improving the efficiency of queries that are issued to the database
exploring methods to optimize the system where Neo4j is executed to enhance its overall performance.

Optimizing queries

Query optimization is a crucial aspect of improving the performance and efficiency of a database system. Here you can find three strategies to achieve this:

1) Use functions provided by the DBMS to analyze your queries

To better understand the queries and ensure that they run efficiently, you can analyze them using two functions provided by Neo4j, Explain and Profile:

EXPLAIN

query to see the execution plan, without executing the query:

PROFILE

query to track how the rows are passed through operators:

Leveraging these two ‘Cypher‘ (the query processing component in Neo4j) instructions can provide valuable insights into query execution to help us understand what the possible performance problems are and identify areas for improvement.

2) Reduce as soon as possible the working set of a query

To speed up queries you should aim to reduce the query working set as soon as possible.
How could this be done?

Move the usage of DISTINCT and LIMIT as early as possible in the query
When possible, use COLLECT on places in the query to reduce the number of rows to be processed during the execution

Consider the following example:

This query accesses the property name for all the nodes found after the MATCH operation.
At the end of the query, only the top 20 results are returned in the result set.
How could this be optimized?!

This second query accesses the property name only for the top 20 results. The benefits achieved through this optimization vary based on the size of the result set.

3) Use parametrized queries to leverage caching systems

When running a query in a database, if the requested data is not already in the cache, it must be loaded from the disk, making the initial execution of a new query relatively slower.
In subsequent executions of the same query, if data are available in the cache, response times are significantly reduced because reading from the cache is much faster than retrieving data from the disk.
In addition, any other query that requires the same data as the cached query can benefit from the data already in the cache.

When the Cypher engine receives a query string, it compiles an execution plan for each query it receives and stores this plan in the Query Cache.

Using parametrized queries instead of literal values allows the Cypher engine to reuse the precompiled execution plan and cached results to efficiently answer the query, whereas if literal values are used, the query needs to be parsed again.

This makes parameterized queries a more efficient choice, as they take advantage of the caching system to avoid unnecessary reprocessing of queries and improve overall query performance.

Optimizing the system

The optimization of the system in which the Database Management System (DBMS) operates involves the introduction of various adjustments to improve the efficiency of the database, such as the use of random access memory (RAM) and the fine-tuning of storage configuration.

Optimizing the usage of the RAM

1) Optimizing the size of the page cache

As the name suggests, the page cache is a mechanism used to reduce the need for disk access by storing frequently accessed data in memory.

In an ideal scenario, the goal is to have the page cache large enough to hold the entire database, so as to significantly reduce disk reads. However, for very large databases, storing the entire dataset in memory may not be feasible due to memory constraints.

To determine the appropriate size for the page cache, you can inspect the actual memory usage of the memory by running the command:

				
					neo4j-admin server memory-recommendation

This command provides recommendations regarding the configuration of memory parameters for your Neo4j DBMS, based on the available system resources and the characteristics of the database.

2) Optimize the settings of the JVM

Neo4J (as the name may suggest) has been developed using Java, which is a garbage-collected language. The garbage collector has the duty of freeing the memory from unused objects.

In Neo4J configuration you can set two parameters that influence the dimension of the heap:

				
					server.memory.heap.initial_size

				
					server.memory.heap.max_size

It’s generally recommended to set these two parameters to the same value in order to avoid a stop-the-world event.

You can determine the value of those two parameters by looking at the history of resource usage.

Optimizing the configuration of storage

1) Use EXT4 and XFS, and avoid NFS or NAS file systems

Since NFS and NAS file systems do not offer control over locking files, this may result in concurrent accesses that can lead to corruption.

2) Store data and transaction logs on separate drives

Storing data and transaction logs on physically different drives is a recommended practice in database management to improve performance and reduce contention.
In this way, data read and write operations, along with transaction recording, can occur simultaneously, resulting in increased I/O efficiency and system responsiveness.

Do You Want To Be Published?

This blog post is part of our collaboration with the University of Padua. If you are a University student or professor and want to collaborate, contact us through e-mail.

Click Here

Do You Want To Be Published?

This blog post is part of our collaboration with the University of Padua. If you are a University student or professor and want to collaborate, contact us through e-mail.

databases, DBMS, graph, Neo4j, query

Sign up for our Newsletter

Did you like this post? Don’t forget to subscribe to our Newsletter to stay always updated in the Information Retrieval world!

About the company

about our work

Rated Ranking Evaluator (RRE)

Rated Ranking Evaluator Enterprise (RREE)

Apache Solr LLM Highlighter plugin

News

Main Blog

TIPS AND TRICKS

LATEST BLOG POST

contact us

Don't miss all the news - subscribe to our newsletter!

Neo4J Optimization Tips

Optimizing queries

1) Use functions provided by the DBMS to analyze your queries

EXPLAIN

PROFILE

2) Reduce as soon as possible the working set of a query

3) Use parametrized queries to leverage caching systems

Optimizing the system

Optimizing the usage of the RAM

1) Optimizing the size of the page cache

2) Optimize the settings of the JVM

Optimizing the configuration of storage

1) Use EXT4 and XFS, and avoid NFS or NAS file systems

2) Store data and transaction logs on separate drives

Do You Want To Be Published?

Do You Want To Be Published?

Other posts you may find useful

Efficiently Manage Numeric Ids in JSON and Pandas

Exploring Sexism in Information Retrieval Systems with NLP and ML

How Does Fuzzy Queries Work in Elasticsearch?

Ilaria Petreti

Ilaria Petreti

Follow Us

Top Categories

Recent Posts

Hybrid Search Using a Custom Algorithm in Apache Solr

Hybrid Search with Reciprocal Rank Fusion in Apache Solr

Apache Solr Multivalued Vectors Tutorial

Monthly video

Sign up for our Newsletter

Leave a Reply Cancel reply

Rated Ranking Evaluator
(RRE)