Main Blog, neo4j

Introduction to Property Graphs Using Python With Neo4j

This blog post is a result of our collaboration with the University of Padua, wherein the student Jean Xavier Marie Lechevalier played a significant role in selecting the topic and contributing a major portion of the content.

This blog post will delve into the fascinating world of property graphs, providing readers with a good understanding and practical knowledge to effectively use them in data modelling using Python.

A property graph is a data model that represents a graph structure with nodes (also called vertices) and directed relationships (also called edges), where each node and relationship can have a set of properties (also called key-value pairs) associated with it that provides additional information.

Property graphs are useful because they can represent complex relationships between entities in an intuitive way. They are often used in applications that need to model and analyze connections between entities, such as social networks, recommendation systems, and supply chain networks.

Additionally, they can be used to represent many-to-many relationships, which can be difficult to model using traditional relational databases. As they are often easy to understand and work with, they are a popular choice for data modelling.

This task can be achieved with both NetworkX and Pyprograph which are powerful libraries for manipulating property graphs in Python.

NetworkX is a more general-purpose library that allows the manipulation of graphs, including property graphs whereas Pyprograph was specifically designed for property graphs. In this post, we will use NetworkX but keep in mind both libraries work very similarly.

Let’s create the following simple property graph :

This can be achieved in 3 simple steps:

Creating an empty graph
Adding nodes (and their properties) to the graph
Adding relationships (and their properties) to the graph

import networkx as nx

# Create a Graph object
G = nx.Graph()

# Add the nodes to the graph, with properties
G.add_node("Max", age=20, gender="male")
G.add_node("Alice", age=22, gender="female")
G.add_node("Bob", age=21, gender="male")

# Add the edges to the graph
G.add_edge("Max", "Alice", label="knows")
G.add_edge("Alice", "Max", label="knows")
G.add_edge("Alice", "Bob", label="knows")

The Python code snippet above uses the NetworkX library to create a graph object, called G.
It adds nodes to the graph representing individuals named “Max,” “Alice,” and “Bob,” with associated properties such as age and gender. The add_edge function is then used to establish connections between the nodes, representing relationships such as “knows” between Max and Alice, and between Alice and Bob.

Now that our property graph is created, we could do some interesting manipulations like finding the neighbour(s) of a specific node. This can be achieved in the following way:

# Find Alice's neighbors :
neighbors = G.neighbors("Alice")

# Print the names of Alice's neighbors
for neighbor in neighbors:
    print(neighbor) #Outputs Max and Bob

Although Python libraries (like NetworkX) have a lot of interesting methods to query a graph, they are still quite limited. The best way to manipulate such graph data remains to use a graph databases management system such as Neo4j and its query language: Cypher.
Using such tools will allow us to do more complex queries to our Python property graph.

To connect to a Neo4j local database from Python, you can use the py2neo library.
You will be able to run cypher queries directly from your Python code once you are connected.
(Note that we suppose here that you have already created your empty database on Neo4j).

from py2neo import Graph

# You should replace "bolt://localhost:7687" with the correct Bolt URI for your database (DB)
# You should replace "neo4j" and "password" with the correct credentials for your DB

graph = Graph("bolt://localhost:7687", auth=("neo4j", "password"))

At this point, we have our graph stored in our Python file and have established the connection to Neo4j.
What we need now is to create our graph nodes and relationships in the Neo4j database. This task can be done using the py2neo library.

We can access the node’s name and properties in this way:

for node_id, node_data in G.nodes(data=True):
    print(node_id, node_data)

#Will output :    
#Max {'age': 20, 'gender': 'male'}
#Alice {'age': 22, 'gender': 'female'}
#Bob {'age': 21, 'gender': 'male'}

Similarly for edges :

for source, target, edge_data in G.edges(data=True):
    print(source, target, edge_data)

#Will output :
#Max Alice {'label': 'knows'}
#Alice Bob {'label': 'knows'}

We are now ready to create the nodes and edges in the Neo4j database.
First, we create nodes, then the edges using the py2neo matching function. Note that the first() function evaluates the match and returns the first node:

from py2neo import Node, Relationship

for node_id, node_data in G.nodes(data=True):
    # Create a Neo4j node object
    node = Node("Person", name=node_id, **node_data)
    
    # Create the node in the Neo4j db
    graph.create(node)

for source, target, edge_data in G.edges(data=True):
    # Look up the nodes in the Neo4j db
    a = graph.nodes.match("Person", name=source).first()
    b = graph.nodes.match("Person", name=target).first()
    
    # Create a relationship between the nodes
    rel = Relationship(a, "knows", b, **edge_data)
    
    # Create the relationship in the Neo4j database
    graph.create(rel)

This will create a property graph in the Neo4j database with three nodes (Max, Alice, Bob) and three relationships (Max-knows-Alice, Alice-knows-Max, Alice-knows-Bob). Each node has properties (name, age, gender) that can be used to store additional data about the node. Indeed, if we execute Match (n) Return n (Cypher equivalent of SQL Select * ), the following graph is returned:

We can then execute Cypher queries to retrieve a specific set of data from the graph.
As an example, we could run the following query: Get the people in the DB who only know 1 person (i.e. people who are the starting node of the relationship “knows” just once):

query = """
MATCH (p:Person)-[:knows]->(q:Person)
WITH p, COUNT(q) AS num_friends
WHERE num_friends = 1
RETURN p.name
"""
result = graph.run(query)

# Print the names of the people who only know one people
for result in results:
    print(result[0]) #Outputs "Max"

This code snippet demonstrates how to perform a simple graph query using Cypher within a Python script, allowing you to interact with the graph database and retrieve specific information based on your requirements.

With the help of this simple example, you can now have fun creating more complex networks and executing more complex Cypher queries directly from your Python script!

As said before, property graphs are a powerful tool for storing and manipulating graph data, and the Python libraries cited in this post make it simple to create, store and query them. This post is above all an introduction but if you wish to further explore the topics I strongly recommend reading the websites I will attach in the references.

[1] Oracle Property Graph, “What are Property Graphs?”, Oracle, https://docs.oracle.com/en/database/oracle/property-graph/22.2/spgdg/what-are-property-graphs.html
[2] NetworkX, “Plot Simple Graph”, NetworkX, https://networkx.org/documentation/stable/auto_examples/basic/plot_simple_graph.html
[3] Py2neo, “Py2neo v2021.1 documentation”, https://py2neo.org/2021.1/

This blog post is part of our collaboration with the University of Padua.
If you are a University student or professor and want to collaborate, contact us through e-mail.

Did you like this post about Introduction to Property Graphs Using Python With Neo4j? Don’t forget to subscribe to our Newsletter to stay always updated in the Information Retrieval world!

Cypher, database, Neo4j, NetworkX, propertygraph, py2neo, Pyprograph, python, querylanguage

Sign up for our Newsletter

Did you like this post? Don’t forget to subscribe to our Newsletter to stay always updated in the Information Retrieval world!

About the company

about our work

Rated Ranking Evaluator
(RRE)

Rated Ranking Evaluator Enterprise (RREE)

Apache Solr LLM Highlighter plugin

News

Main Blog

TIPS AND TRICKS

LATEST BLOG POST

contact us

Don't miss all the news - subscribe to our newsletter!

Introduction to Property Graphs Using Python With Neo4j

Other posts you may find useful

Search Limitations and Workarounds in OpenSearch

GLiNER as an Alternative to LLMs for Query Parsing – Introduction

Elasticsearch Disk Space Issue and Rollover Solution

Ilaria Petreti

Ilaria Petreti

Follow Us

Top Categories

Recent Posts

Retrieval and Responsibility: The Ethics of Augmented Knowledge

Faster Vector Search: Early Termination Strategy Now in Apache Solr

OpenSearch and Large Language Models

Monthly video

Sign up for our Newsletter

Leave a Reply Cancel reply

Quick Links

Services

Subscribe

About the company

about our work

Rated Ranking Evaluator (RRE)

Rated Ranking Evaluator Enterprise (RREE)

Apache Solr LLM Highlighter plugin

News

Main Blog

TIPS AND TRICKS

LATEST BLOG POST

contact us

Don't miss all the news - subscribe to our newsletter!

Introduction to Property Graphs Using Python With Neo4j

Other posts you may find useful

Search Limitations and Workarounds in OpenSearch

GLiNER as an Alternative to LLMs for Query Parsing – Introduction

Elasticsearch Disk Space Issue and Rollover Solution

Ilaria Petreti

Ilaria Petreti

Follow Us

Top Categories

Recent Posts

Retrieval and Responsibility: The Ethics of Augmented Knowledge

Faster Vector Search: Early Termination Strategy Now in Apache Solr

OpenSearch and Large Language Models

Monthly video

Sign up for our Newsletter

Leave a Reply Cancel reply

Rated Ranking Evaluator
(RRE)