Main Blog, Tutorial

Build a Text Search API from a Postgres Database

lucafabbian
June 18, 2024
11 mins read

This is one of several posts written by the students participating in Sease’s Scientific Blog Post seminar at the University of Padova. This post is written by Irfan Ullah Khan, Harjot Singh, Borwoei Huang, Andrea Ceccato, Louis Lions and Luca Fabbian.

Apache Solr, Elasticsearch, OpenSearch and Vespa are excellent technologies for constructing information retrieval systems, delivering impressive search results while providing ample opportunities for fine-tuning. Nevertheless, their usage necessitates the set-up of dedicated storage, compelling you to be responsible for managing and safeguarding your data, including duplication and backup procedures. It may be rather complicated, especially if dealing with data you need to sync with your main database.

If you’re seeking a robust yet easy-to-set-up alternative, you’ve come to the right place. In this tutorial, we will guide you through the process of building a web app that performs text searches within a database. We will first build the backend API, and then showcase how to call it from any web app you already have. Keep in mind, that you can use your client of choice instead of a web app, as long as it’s able to perform HTTP requests. If you do not have any clients already do not worry, you can download a full working demo from here.

In any case, the best part is, that you won’t need to write a single line of code for building the backend and search API. All thanks to a tool called Postgraphile, which automatically generates an API based on your Postgres database.

Postgraphile is an incredibly powerful tool. It enables you to automatically derive entire backends based on your database structure. With the postgraphile-plugin-fulltext-filter, you can extend Postgraphile’s functionalities and perform a full-text search on your textual data.
Following the instructions below, you will be able to set up your text search in three easy steps.

1) Prepare your database

Searching documents without preprocessing is inefficient. We first need to index them. This may be done directly inside Postgres with few SQL statements.

First, set up a Postgres database and log in using the psql command line tool.
Then, make sure to load the uuid-ossp and pg_trgm extensions using the following statements:

				
					CREATE EXTENSION IF NOT EXISTS “uuid-ossp”
CREATE EXTENSION IF NOT EXISTS “pg_trgm”

The uuid-ossp one is required to work with Universal Identifiers (UUID), while pg_trgm is needed to perform text comparison.

For this tutorial, we would create a simple table called “documents”.
Each document will have an id (the UUID generated through the uuid-ossp extension), a title, and a data field where we will store the full text of the document.
To index a text, we need to split it into words and store them somewhere as a vector of words (tsvector), thus we will also add a tsvector_column column.

				
					CREATE TABLE documents (
    id UUID DEFAULT uuid_generate_v4() PRIMARY KEY,
    name TEXT,
    data TEXT,
    tsvector_column TSVECTOR
)

Each time we add or edit a document, we should call the to_tsvector() function to populate the tsvector_column. This may be achieved automatically with Postgres triggers:

				
					CREATE OR REPLACE FUNCTION update_tsvector_column()
  RETURNS TRIGGER AS $$
BEGIN
  NEW.tsvector_column := to_tsvector(NEW.data);
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE OR REPLACE TRIGGER documents_tsvector_trigger
  BEFORE INSERT OR UPDATE ON documents
  FOR EACH ROW
  EXECUTE FUNCTION update_tsvector_column();

Finally, to make our text searchable, let’s ask Postgres to create an index on the tsvector_column field, using the default Generalized Inverted Index (GIN):

				
					CREATE INDEX full_text_idx ON documents USING gin(tsvector_column);

Now it’s time to add some data. For our demo, we used a Python script to scrape 10000 pages from Wikipedia and load them inside the table. Indexing this enormous collection required 30 seconds, even on a mediocre laptop with 4GB of RAM and a 2.1 GHz octa-core CPU.

2) Setup the API

As promised, this will require no line of code.

First, install Node.js. Node is an interpreter for the JavaScript programming language and is required for using Postgraphile and other JavaScript codes. It also comes with npm and npx, two tools to install and run JavaScript libraries.

Install Postgraphile and some required extensions with:

				
					npm i -g postgraphile @graphile-contrib/pg-simplify-inflector postgraphile-plugin-connection-filter @pyramation/postgraphile-plugin-fulltext-filter

Finally, let’s start it:

				
					npx postgraphile -n 0.0.0.0 --cors --legacy-relations omit --no-ignore-indexes --simple-collections omit --append-plugins @graphile-contrib/pg-simplify-inflector,postgraphile-plugin-connection-filter,@pyramation/postgraphile-plugin-fulltext-filter --enhance-graphiql --disable-default-mutations -w -c postgres://postgres:example@db:5432/postgres

Change the last argument -c postgres://postgres:example@db:5432/postgres with your database credentials.

Now, you have a server running on your local machine. You may point your browser to your local endpoint localhost:5000/graphiql to debug your API. Postgraphile generates API using the GraphQL convention, a specification made by Facebook. You don’t need to understand GraphQL to use it, as we will provide you with a query example in the next paragraph.

Since we passed the –disable-default-mutations option, users won’t be able to mutate our documents, but just see them.

3) Connect your web app (or any client) to the API

To connect your web app to the API, you just need to perform an HTTP post request. You can do it by defining the following function inside your website:

				
					// GraphQL endpoint URL - adjust to your own endpoint
const url = `${window.location.protocol}//${window.location.hostname}:5000/graphql`;

window.search = (query) => fetch(url, {
    method: 'POST',
    headers: {
    'Content-Type': 'application/json',
    },
    body: JSON.stringify({ query: `
    query($matches: String) {
        documents(
        filter: {tsvectorColumn: {matches: $matches}}
        orderBy: TSVECTOR_COLUMN_RANK_DESC
        first: 10
        ) {
        nodes {
            title
            id
            data
        }
        totalCount
        }
    }
    `,
    variables: { 
    matches: query
    } 
})
}).then(response => response.json());

The function will perform an API call, and get the first 10 results in order of relevance.

For example, let’s say you want to look for the “hello” term and log the results. You can just type:

				
					search("hello").then(r => console.log(r))

If you are looking for a complete web page demo, check the index.html file.

Conclusion

In this post, we set up a text search using a Postgres database. As you’ve seen, the process is pretty straightforward and, once you get familiar with the tools involved, may be done in minutes.

We encourage you to explore what Postgraphile has to offer. Text search is just the tip of the iceberg. You may improve your API with token authentication, mutations, live updates, and much more.

You could also improve your text search by leveraging advanced indexing techniques such as stop lists and stemmers. Even though it does not give you the same freedom and tooling that came with Apache Solr, Postgres supports a great number of ways to customize your index to get better search results. The purpose of this tutorial is not to delve into an exhaustive explanation of all the available tools. Instead, we recommend referring to the official Postgres guide for a comprehensive overview that covers each tool in detail.

Do You Want To Be Published?

This blog post is part of our collaboration with the University of Padua. If you are a University student or professor and want to collaborate, contact us through e-mail.

Do You Want To Be Published?

This blog post is part of our collaboration with the University of Padua. If you are a University student or professor and want to collaborate, contact us through e-mail.

Click Here

database, information retrieval, postgres, search, tutorial

Sign up for our Newsletter

Did you like this post? Don’t forget to subscribe to our Newsletter to stay always updated in the Information Retrieval world!

About the company

about our work

Rated Ranking Evaluator
(RRE)

Rated Ranking Evaluator Enterprise (RREE)

Apache Solr LLM Highlighter plugin

News

Main Blog

TIPS AND TRICKS

LATEST BLOG POST

contact us

Don't miss all the news - subscribe to our newsletter!

Build a Text Search API from a Postgres Database

1) Prepare your database

2) Setup the API

3) Connect your web app (or any client) to the API

Conclusion

Do You Want To Be Published?

Do You Want To Be Published?

Other posts you may find useful

Entity Search with graph embeddings – Part 4 – Evaluation and conclusion

Semantic Web & Linked Open Data

Solr Is Learning To Rank Better – Part 3 – Ltr tools

lucafabbian

lucafabbian

Follow Us

Top Categories

Recent Posts

Retrieval and Responsibility: The Ethics of Augmented Knowledge

Faster Vector Search: Early Termination Strategy Now in Apache Solr

OpenSearch and Large Language Models

Monthly video

Sign up for our Newsletter

Leave a Reply Cancel reply

Quick Links

Services

Subscribe

About the company

about our work

Rated Ranking Evaluator (RRE)

Rated Ranking Evaluator Enterprise (RREE)

Apache Solr LLM Highlighter plugin

News

Main Blog

TIPS AND TRICKS

LATEST BLOG POST

contact us

Don't miss all the news - subscribe to our newsletter!

Build a Text Search API from a Postgres Database

1) Prepare your database

2) Setup the API

3) Connect your web app (or any client) to the API

Conclusion

Do You Want To Be Published?

Do You Want To Be Published?

Other posts you may find useful

Entity Search with graph embeddings – Part 4 – Evaluation and conclusion

Semantic Web & Linked Open Data

Solr Is Learning To Rank Better – Part 3 – Ltr tools

lucafabbian

lucafabbian

Follow Us

Top Categories

Recent Posts

Retrieval and Responsibility: The Ethics of Augmented Knowledge

Faster Vector Search: Early Termination Strategy Now in Apache Solr

OpenSearch and Large Language Models

Monthly video

Sign up for our Newsletter

Leave a Reply Cancel reply

Rated Ranking Evaluator
(RRE)