Event News
SDMX GLOBAL CONFERENCE

Sease at SDMX Global Conference

SDMX Global Conference 2023

The Global Conference is a bi-annual event for the official statistics community worldwide to share information on recent and upcoming SDMX developments.

Location: The Art Hotel | Kingdom of Bahrain
Date: 29 October to 02 November 2023

// our talk

When SDMX meets AI: Leveraging open source LLMs to make official statistics more accessible and discoverable

29th of October, 11:45 AM (LOCAL TIME)

This intervention draws on experimentations ongoing in the context of the OECD-led Statistical Information System Collaboration Community (SIS-CC) to enable AI applications with SDMX. One important use case is to use AI for better accessibility and discoverability of the data: whilst UX techniques, lexical search improvements, and data harmonisation can take statistical organisations to a good level of accessibility, however, a structural (or “cognitive” gap) remains between the data user needs and the data producer constraints. That is where AI – and most importantly, NLP and LLM techniques – could potentially make a difference. The “StatsBot” could be this natural language, conversational engine that could facilitate access and usage of the data. The “StatsBot” could leverage the semantics of any SDMX source.

 

The objective of the presentation is to propose a technical approach and a way forward to achieve this goal and create the StatsBot as a universal, open asset usable by all statistical organisations. In a first step, the concept tested is to use Large Language Models with the Apache Solr index of SDMX objects so as to transform natural language queries into SDMX queries. In a second step, results could be framed as a natural language statement complementing the top-k search results. For the purpose of initial PoCs – aimed to demonstrate functional features and feasibility – a commercial LLM (such as OpenAI GPT-4) will be used; in a later stage substitution with an open source LLM will be analysed. The presentation will include the results of the first experimental work, lessons learnt, and scope future work that should lead to defining the path for production-grade, fully open source, and universal StatsBot.

// slides
// our speaker

Alessandro Benedetti

Founder @ Sease
APACHE LUCENE/SOLR COMMITTER
APACHE SOLR PMC MEMBER
// video

Author

Lisa Biella

Lisa Biella is a creative digital marketer, geek at heart who is enthusiastic about technology and how it affects people’s lives.

Leave a comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.