Event, News

Our Berlin Buzzwords 2025

Lisa Biella
July 7, 2025
18 mins read

We were back at Berlin Buzzword this month, and we enjoyed it so much that we can’t wait to share our experience with you! Here are the video recording of our talks during the conference.

Thanks to our speakers Alessandro Benedetti, Anna Ruggero & Ilaria Petreti, and our friend Edward Lambe.

This blog post summarises their experiences in regard to the visit to Berlin in June, for Berlin Buzzwords 2025 “Germany’s most exciting conference on storing, processing, streaming, and searching large amounts of digital data, with a focus on open source software projects.”

Alessandro Benedetti

DIRECTOR @ SEASE

I can’t deny Berlin Buzzwords rapidly climbed the ladder of my favorite conferences, representing a balanced mix of interesting talks, ideas, networking, and vibe.

The location is super, the people you meet are world-renowned experts, the talks are thought-provoking and the organisation is as close to perfection as a conference could aspire to.

We presented two talks:
Anna and Ilaria (+ Edward from the BIS client side) gave an overview on how to use LLMs to parse natural language queries, while I updated the community on a new feature of Apache Solr to offer end-to-end semantic search (with vectorisation happening behind the scenes).
More on that will come with a separate blog post.

I won’t spend much time bragging about our own talks, but I have to say that I’m proud of the improvements we made year after year, we have now a solid presence at the conference and I hope our contributions are helping people (both from the audience and post-conference).
I’m grateful for the opportunity Berlin Buzzwords gives us every year and I’m honored to share it with my dear colleagues.

In collaboration with the Apache Solr PMC we also presented a short survey on open source search technologies, that proved to be quite successfull (results will be published soon).

If you haven’t filled it yet, there’s still time:

So which talks did I like the most? Let’s see a short list!

WHICH GPU FOR LOCAL LLMS?

This short talk from our friends Radu and Rafal explored how different GPUs perform for local LLM self-hosting. It’s a nice survey with many interesting insights, definitely worth a replay if you are passionate about the topic!

Performance Tuning Apache Solr for Dense Vectors

Another short talk entered my favorite list and for a good reason: our friends from Bloomberg did an amazing job exploring an iterative approach to performance tuning for dense vector search offering a pragmatic guide that covers their journey.

Kevin’s talk was smooth, and the slides were clear and concise: the perfect recipe for a resource that will be used by many as a reference for their dense vector Solr installation for the years to come.

Contexts & Machines: How Document Parsing Shapes RAG results

Retrieval Augmented Generation was all over the place at the conference, showing strong interest from businesses and practitioners alike.

This talk was quite interesting, especially for their focus on chunking strategies and RAG evaluation, definitely interesting if you are exploring this new field.

Streamlining Search Quality: Search Relevance Workbench

Our friends Stavros and Eric are pushing a lot of effort to provide the community with new tools to collect user interactions and explore/compare queries and results.
Measuring is fundamental to starting your journey toward search relevance improvements and additional monitoring/exploring tools are only going to be beneficial for the community.
On top of that, the Search Relevance Workbench promises not only to monitor and explore but also to action any insight gathered from observation, with future machine learning and optimisers integration.

Keep an eye on future OpenSearch releases!

miniCOIL: Sparse Neural Retrieval Done Right

Last but not least, my favorite talk from the conference.
Evgeniya did a brilliant job in portraying the idea and potential benefits of miniCoil: an approach to encode text to multiple mini-vector-embedding ( 4 dimensions in the talk).
The overall idea is that low-dimensional vectors can be enough to model shades of meaning for a single term in a sparse retrieval scenario, and this can bring interesting insights, performance benefits, and better out-of-domain adaptability.
I’ll definitely explore it more, stay tuned!
More on that: Website and Github

ANNA RUGGERO

R&D SOFTWARE ENGINEER @ SEASE

It’s that time of year! Berlin Buzzwords was here!

Once again this year, I had the privilege of attending this fantastic conference as a speaker. As usual, beautiful location and organisation. Berlin Buzzword is always an opportunity for discussions with other experts in the field on innovative and challenging topics; it features new faces, as well as familiar ones that I look forward to seeing again.

This year, my colleague Ilaria, our client Edward from the Bank of International Settlements, and I have the chance to present our talk on how to exploit JSON structured outputs to implement a filter assistant.
The main idea is to exploit Large Language Models’ capabilities to select relevant information from the client’s available data, depending on the user’s natural language query. This is actually a work in progress with our BIS customer, which has proved very promising and which we are looking forward to bringing into production!
We had great involvement from the audience, and the presentation was followed by many interesting questions. We were very happy to share our work and gather suggestions and alternative solutions with those who attended!

Now, let’s see an overview of the talks I enjoyed the most 😀

I will be brief this year, but a looooot more would be worth mentioning in this list…

end-to-end semantic search with apache solr 9.8 llm module

Our colleague Alessandro had the opportunity to present his work on a Solr contribution for introducing a module that allows Solr users to connect to external Large Language Model services to generate embeddings from text. Before this, users need to generate vectors outside Solr and then index them directly. Now they could delegate this transformation to Solr, which will call the desired external service for the purpose.
Thank you, Alessandro, for your work and dedication to the community!

Contexts & Machines: How Document Parsing Shapes RAG results

“Contexts & Machines: How Document Parsing Shapes RAG Results” from Alessio Vertemati and Andrea Ponti. In their talk, they present a RAG pipeline developed to make information retrieval possible for complex structured documents such as PDFs. They presented different chunking strategies and showed how these influence the RAG performance. It was great to see how chunking is used in different scenarios and how it performs, especially when moving into a production environment!

miniCOIL: Sparse Neural Retrieval Done Right

“miniCOIL: Sparse Neural Retrieval Done Right” from Evgeniya Sukhodolskaya. Here, Evgeniya presents her work in Qdrant about the implementation of a new lightweight model called miniCOIL. This is a sparse neural embedding model that creates 4-dimensional embeddings for each word stem; yes, you read right, “4-dimensional”!
The idea is to exploit this model to weight terms in the BM25 computation, leveraging both advantages of disambiguation terms while still relying on the most used and high-performance scoring function.
I look forward to seeing how this work can develop and be applied to scenarios outside the domain!

How [not] to evaluate your RAG

“How [not] to evaluate your RAG” from Roman Grebennikov. His talks are always a guarantee! Finally, someone who presents what could go wrong when applying Retrieval Augmented Generation on real data. It was great to see the difficulties you encountered and how you managed to solve them! Lots of insights that we can all reflect on and draw inspiration from.

Thank you to all the Berlin Buzzwords staff and participants. I hope to see you again next year!

ilaria petreti

R&D SOFTWARE ENGINEER @ SEASE

I’m not dreaming… fourth year in a row speaking at Berlin Buzzwords, and I still can’t believe it.
We all knew the competition was even higher this year, with so many submissions. Honestly, I didn’t expect to be selected again… but it happened!

Being back in Berlin was amazing. The conference was, as always, super well-organised: great talks, good vibes, and the perfect setup to meet people and share ideas. I felt like part of a big family. You reconnect with the same group of people you’ve shared great moments with before, while also meeting new ones along the way. So grateful, and already looking forward to the next one!

Sease submitted two talks, and once again, both were accepted.
My colleague Anna and I presented a project we recently worked on: “AI-Powered Search Results Navigation with LLMs & JSON Schema“. We explore an AI Filter Assistant for statistical data (SDMX), showing how LLMs can be leveraged to suggest the best filters for your natural language query. This year, it was a real pleasure to be on stage together with our client (BIS), specifically with Edward Lambe.
During the talk, seeing the room gradually fill up was truly rewarding. We’re thankful to everyone who attended, asked questions, and showed interest in our work. It’s moments like these that motivate us to keep improving and make us feel part of a real community.

end-to-end semantic search with apache solr 9.8 llm module

Alessandro presented his talk “End-to-End Semantic Search with Apache Solr 9.8 LLM Module“, where he explored one of his latest open-source contributions. He showed how to configure Apache Solr to connect with external services for text embedding, enabling semantic encoding of both queries and documents directly within the Solr workflow. Atita Arora, our friend and the host of the sessions that day, introduced him with some lovely words. And honestly, he truly deserved them. If Solr is making real progress, a big part of it is thanks to his work, and it’s something everyone in the community can genuinely appreciate.

Performance Tuning Apache Solr for Dense Vectors

One of the talks I enjoyed the most was the one from Kevin Liang: “Performance Tuning Apache Solr for Dense Vectors“. He was excellent at communicating his points: clear, concise, and easy to follow. I took home several practical takeaways that I’m sure will come in handy for future projects and consulting work.

What you see is what you mean: intent-based e-commerce search

Another talk I appreciated was the one by the three guys from Otto, titled “What you see is what you mean: intent-based e-commerce search“. They covered various aspects of their journey, from early prototypes to a production-ready implementation, and shared valuable insights and challenges they faced in a high-volume e-commerce environment. Their use of LLMs followed an approach quite similar to ours, which made the session even more interesting. Talking to them afterward and realising we’re tackling similar problems was helpful; it’s always good to understand where you stand, whether you’re on the right track or if there’s something you could improve.

Most of the other talks I attended focused on hybrid search and RAG, clearly hot topics in the current search landscape. Some of them were truly engaging and full of valuable insights.
That said, one thing that stood out to me — and something to consider for future editions — is that many sessions spent quite a bit of time on the theoretical part, often repeating what had already been said in other talks. As a result, the most exciting parts, like real-world implementations, technical challenges, and production insights, were sometimes rushed or not explored in enough depth.
Of course, given the uncertainty about the audience’s background, it’s understandable that speakers chose to begin with high-level overviews to ensure everyone could follow. Still, finding a better balance between the introduction and the deep dive could make these talks even more impactful, especially for those eager to get into the details.

Did you attend the Berlin Buzzwords conference?

We would love to hear your thoughts on the conference! Leave a comment below and tell us about your experience and your favourite talks.

Did you attend the Berlin Buzzwords conference?

We would love to hear your thoughts on the conference! Leave a comment below and tell us about your experience and your favourite talks.

Sign up for our Newsletter

Did you like this post? Don’t forget to subscribe to our Newsletter to stay always updated in the Information Retrieval world!

About the company

about our work

Rated Ranking Evaluator (RRE)

Rated Ranking Evaluator Enterprise (RREE)

Apache Solr LLM Highlighter plugin

News

Main Blog

TIPS AND TRICKS

LATEST BLOG POST

contact us

Don't miss all the news - subscribe to our newsletter!

Our Berlin Buzzwords 2025

Alessandro Benedetti

WHICH GPU FOR LOCAL LLMS?

Performance Tuning Apache Solr for Dense Vectors

Contexts & Machines: How Document Parsing Shapes RAG results

Streamlining Search Quality: Search Relevance Workbench

miniCOIL: Sparse Neural Retrieval Done Right

ANNA RUGGERO

end-to-end semantic search with apache solr 9.8 llm module

Contexts & Machines: How Document Parsing Shapes RAG results

miniCOIL: Sparse Neural Retrieval Done Right

How [not] to evaluate your RAG

ilaria petreti

end-to-end semantic search with apache solr 9.8 llm module

Performance Tuning Apache Solr for Dense Vectors

What you see is what you mean: intent-based e-commerce search

Did you attend the Berlin Buzzwords conference?

Did you attend the Berlin Buzzwords conference?

Other posts you may find useful

Apache Solr Facets and ACL Filters Using Tag and Exclusion

How to Choose the Right Large Language Model for Your Domain – Open Source Edition

Binary Quantization of Dense Vectors in Apache Solr

Lisa Biella

Lisa Biella

Follow Us

Top Categories

Recent Posts

Hybrid Search Using a Custom Algorithm in Apache Solr

Hybrid Search with Reciprocal Rank Fusion in Apache Solr

Apache Solr Multivalued Vectors Tutorial

Monthly video

Sign up for our Newsletter

Leave a Reply Cancel reply

Rated Ranking Evaluator
(RRE)