Database, Main Blog

Time Series Databases: A Hands-On Introduction With InfluxDB

This is one of several posts written in collaboration with the students participating the Sease’s Scientific Blog Posting seminar at the University of Padova. This post is written in collaboration with Mohsen Kordi.

Introduction to Time Series Data

Time series data management is an essential procedure for storing, analyzing, and visualizing time-stamped data. Time series data is a type of data that is collected and stored over time; e.g. sensor, financial, and log data. These are often gathered at regular intervals, and the time aspect of the data is an essential part of its analysis and visualization.

To work with time series data, we need a database that can handle the specific needs of time series data, such as high-performance write, read, and data retention policies. One popular open-source time series database is InfluxDB. InfluxDB is a robust open-source time series database that allows you to easily store, query, and analyze data points collected at frequent intervals. Thus you can use it for monitoring systems, industrial automation, or IoT devices.

Time Series Data: Time Series DB or Relational DB Management System?

When it comes to time series data management, it is important to understand the difference between TSDB and RDBMS. InfluxDB, as a time series database, excels in handling and analyzing time-stamped data providing high-write loads, real-time data collection and analysis, and time-based querying. In contrast, traditional relational databases, such as MySQL or PostgreSQL, are not optimized for time series data. They are designed to handle structured data with fixed schemas. In addition, they are not well-suited for handling large amounts of time-stamped data.

Time Series DB Advantages

Here are a few reasons why time series databases are better suited for handling time series data than traditional relational databases:

High write loads: Time series databases are optimized for handling high write loads. They can take millions of data points per second. This makes them well-suited for use cases where real-time data collection is required, such as monitoring IoT devices.
Time-based querying: Time series databases provide powerful time-based querying capabilities. For example, retrieving data based on time ranges and performing time-based aggregations. This makes it easy to analyze and visualize time series data.
Scalability: Time series databases are designed to be horizontally scalable, allowing them to handle large amounts of data. This makes them well-suited for use cases where data volume is expected to grow over time.
Performance: Time series databases are optimized for time-based queries. Thus they can handle large amounts of data and return results quickly. This makes them well-suited for use cases where real-time data analysis is required.
Retention policies: Time series databases provide support for retention policies. Consequently, you can automatically expire old data and keep only the data that you need. This helps to keep the size of the database manageable and optimize query performance.

You can further explore the benefits of time series data management and discover why time series databases are a superior option for handling time-stamped data by reading here and here.

Setting Up InfluxDB for Time Series Data Management

Installing InfluxDB

In this blog post, we will focus on a popular time series database: InfluxDB. InfluxDB is an optimized open-source database for storing and querying time series data. It has a SQL-like query language and supports multiple data types, including strings, integers, and floats.

Here we would like to show you how to store CPU and RAM usage exploiting InfluxDB. Before we can start storing and analyzing data, we need to set up the database. Firstly, you need to download the appropriate installer for your system. Then, easily install InfluxDB by following the simple steps outlined in the InfluxDB documentation.

After installing InfluxDB you will need to start the InfluxDB service. In Powershell on Windows 11 machines, navigate into the installation folder (e.g. C:\Program Files\InfluxData\influxdb) and start InfluxDB by running the influxd daemon:

				
					./influxd

Installing InfluxDB-Python Library

Next, you need to install the InfluxDB-Python library. This library provides a Python interface for interacting with InfluxDB. You can install it using pip:

				
					pip install influxdb
pip install influxdb-client

What is the InfluxDB Structure?

InfluxDB organizes data around three key concepts: measurements, tags, and fields.

Measurements: A measurement is a collection of data points that are stored together and can be queried as a group. Each measurement has a name, and the data points within a measurement are typically related in some way. For example, a measurement called “temperature” might contain data points representing the temperature at different locations and times.
Tags: Tags are indexed metadata associated with a data point. Thus, you can use tags to filter, search, and group data in an efficient way. Each data point in a measurement can have one or more tags, and each tag has a key and a value. For example, a data point in the “temperature” measurement might have a tag called “location” with the value “New York”.
Fields: Fields are the actual data and contain the numerical or string values. Each data point in a measurement can have one or more fields, and each field has a key and a value. For example, a data point in the “temperature” measurement might have a field called “value” with the value “72”. InfluxDB organizes fields in a columnar format for efficient and fast data retrieval, without the need for indexing.

With InfluxDB, not only you can store measurements, tags, and fields, but also timestamp each data point with nanosecond precision. By utilizing InfluxDB’s timestamp feature, you can consistently organize and evaluate your data by its generating or receiving time, resulting in streamlined and efficient analysis.

InfluxDB supports SQL-like querying and has a built-in HTTP API for easy data ingestion. It uses a data retention policy to automatically expire old data from the database, which can be specified at the time of database creation, or later on.

How to interact with InfluxDB?

There are several ways to connect to and interact with InfluxDB. All of them provide a simple and easy-to-use API for connecting to InfluxDB, writing data, running queries, and managing databases and measurements:

InfluxDB-Python library: This is a Python library that allows you to interact with InfluxDB using Python code.
InfluxDB-JavaScript library: This JavaScript library allows you to interact with InfluxDB using JavaScript code.
InfluxDB REST API: The InfluxDB REST API allows you to interact with InfluxDB using HTTP requests.
InfluxDB CLI: The InfluxDB command-line interface (CLI) allows you to interact with InfluxDB using the command line.
InfluxDB UI: InfluxDB also provides a web-based UI (called Chronograf) that provides a graphical interface for interacting with InfluxDB. You can use it also to create visualizations of your data.
Other client libraries: InfluxDB also provides libraries for different programming languages such as Go, Java, .Net, and more for interacting with InfluxDB using those languages.

The InfluxDB User Interface is called Chronograf. To use it, you can simply navigate to http://localhost:8086 in your web browser.

creating a bucket

InfluxDB uses a structure called a bucket to store time series data. A bucket is a container that holds all the data, metadata, and indexes for a specific retention policy. In InfluxDB 2.0 and later versions, you must create a bucket before you can write data to it. You can use InfluxDB CLI, UI, or API to manage buckets.

Writing Real-Time Data to InfluxDB

Now that you have a database set up, you can start writing data to it. In this tutorial CPU and RAM usage are written to the cpu_ram_usage bucket. To do this, the psutil library is used. It provides an easy way to retrieve system information such as CPU and RAM usage in Python. First, you will need to install it by running the following command:

				
					pip install psutil

You can then use the following Python code to continuously write the current CPU and RAM usage percentage to the “cpu_ram_usage” database every second:

				
					import psutil
from influxdb_client import InfluxDBClient
from influxdb_client.client.write_api import SYNCHRONOUS
import msvcrt
import time

# You can generate an API token from the "API Tokens Tab" in the UI
myToken = "ZuO6b6Km4B0FdM-iciPWghLftxIs5li7E7QMgGrNNZWznEyeSvRspocs57WKftejGnJ4uxPwZr5Cz7YKooufQQ=="
myOrg = "Unipd"
myBucket = "cpu_ram_usage"

# Connect to the InfluxDB client
client = InfluxDBClient(url="http://localhost:8086", token=myToken , org=myOrg )

write_api = client.write_api(write_options=SYNCHRONOUS)

while True:
    # Get the real-time CPU usage
    cpu_usage = psutil.cpu_percent()

    # Get the real-time RAM usage
    ram_usage = psutil.virtual_memory().percent

    # Prepare the data in InfluxDB line protocol format
    cpu_data = f'status,host=myLaptop cpu_usage={cpu_usage}'
    ram_data = f'status,host=myLaptop ram_usage={ram_usage}'

    print([cpu_data, ram_data])
    
    # Write the data to InfluxDB
    write_api.write(myBucket, myOrg, [cpu_data, ram_data])

    # Wait for 3 seconds before writing the next set of data
    time.sleep(3)

    # Press 'q' to stop
    if msvcrt.kbhit() and msvcrt.getch().decode() == 'q':
        break

Reading Data From InfluxDB

To read data from InfluxDB, you can use the InfluxDBClient to execute InfluxQL (InfluxDB’s query language) queries. Here is an example of how to retrieve the last 10 minutes of saved data:

				
					from influxdb_client import InfluxDBClient, Point
from influxdb_client.client.write_api import SYNCHRONOUS

# You can generate an API token from the "API Tokens Tab" in the UI
myToken = "ZuO6b6Km4B0FdM-iciPWghLftxIs5li7E7QMgGrNNZWznEyeSvRspocs57WKftejGnJ4uxPwZr5Cz7YKooufQQ=="
myOrg = "Unipd"
myBucket = "cpu_ram_usage"

# Connect to the InfluxDB client
client = InfluxDBClient(url="http://localhost:8086", token=myToken , org=myOrg )

# Execute the query
result = client.query_api().query('from(bucket:"cpu_ram_usage") |&gt; range(start: -15m) |&gt; filter(fn: (r) =&gt; r._measurement=="status")', myOrg)

# Print the results
for table in result:
    for record in table.records:
        print(str(record["_time"]) + "\t" + record.get_measurement() + "\t" +  str(record["host"]) + "\t" + record.get_field() + "\t" + str(record.get_value()) )

Visualization Using InfluxDB Dashboard

One of the advantages of InfluxDB is that it provides a web-based UI (User Interface) that allows you to interact with your data and create live graphs and dashboards.

The InfluxDB UI has a built-in visualization tool enabling users to easily create and customize their charts and dashboards. You can use the Data Explorer section of the UI to explore your data, write and run queries, and create visualizations of your data. A variety of chart types, such as a graph, heatmap, histogram, scatter, table, band, etc are available. It is also possible to customize the appearance of the charts using various options such as colors, labels, and axis ranges.

Once you have created a visualization, you can add it to a dashboard, which is a collection of one or more visualizations that you can view together. Dashboards are useful for monitoring and analyzing real-time data, as they allow you to view multiple charts at the same time and quickly switch between different views of your data.

Additionally, you can set alerts and notifications based on a specific condition in the data. This feature can be very useful in real-time monitoring and analytics.

Overall, the InfluxDB UI provides a convenient way to interact with your data and create live graphs and dashboards, which is particularly useful for monitoring and analyzing real-time data. Here are some dashboards examples:

Summary of Time Series Data Management Using InfluxDB

In this tutorial, we have seen how to use InfluxDB to store and analyze real-time CPU and RAM usage data. This is just a taste of what is possible with InfluxDB and time series data. You can also use InfluxDB to store and analyze data from other sources like IoT devices and use the data to power real-time analytics and monitoring application.

Do You Want To Be Published?

This blog post is part of our collaboration with the University of Padua. If you are a University student or professor and want to collaborate, contact us through e-mail.

Click Here