This year at Berlin Buzzwords we scored a hat trick! Here are the video recording of our three talks during the conference. Thanks to our speakers Alessandro Benedetti, Anna Ruggero, Ilaria Petreti and Daniele Antuzi.
Hybrid Search With Apache Solr Reciprocal Rank Fusion
From Natural Language to Structured Solr Queries using LLMs
Blazing-Fast Serverless MapReduce Indexer for Apache Solr
This blog post summarises their experiences in regard to the visit to Berlin in June, for Berlin Buzzwords 2024 “Germany’s most exciting conference on storing, processing, streaming, and searching large amounts of digital data, with a focus on open source software projects.”
Anna Ruggero
R&D SOFTWARE ENGINEER @ SEASE
My favourite event of the year!
It was the second time for me at Berlin Buzzwords and as usual, it exceeded my expectations!
I attended some great talks and had the opportunity to co-present one with my colleague Ilaria about “From Natural Language to Structured Solr Queries using LLMs, but the part that I liked the most was networking. I had the pleasure to talk and meet a lot of people with many different and interesting projects and works. And I had the chance to share approaches and points of view on several strategies and trends. Thank you!
Moving to my favourite talks, I cannot fail to mention the talks of my colleagues Alessandro and Daniele, they both did an amazing presentation. Alessandro presented his implementation of the Reciprocal Rank Fusion approach for hybrid search in Solr, while Daniele presented his implementation of a custom serverless map-reduce indexer to speed up the indexing process in Solr. Great work!
From Natural Language to Structured Solr Queries using LLMs
Starting from my and Ilaria’s talk, what an unexpected participation! A lot of people came and we received several very interesting questions. I am very grateful to have had the opportunity to present our work and hope it has provided interesting insights for the participants.
The Paradox of Open: Can Digital Commons Offer a Way Forward?
The opening presentation of the conference by Zuzanna Warso. Really interesting talk that highlighted how recently the term open-source is being misused and the issues surrounding the use of users’ personal data for training language models. This is a very important topic that is often overlooked nowadays.
Improve LLM-based Applications with Fallback Mechanisms
Bilge Yücel presents different fallback strategies for RAG applications. This talk was very close to the topics and strategies we presented, it was interesting to see how Bilge addressed the issues we also encountered and provided good food for thought for further improvements.
Advanced Retrieval-Augmented Generation Techniques
Great presentation from Zain Hasan about advanced RAG techniques! A very useful in-depth overview of several approaches like chunking, filtering with metadata, hybrid search, query rewriting, fine-tuning and re-ranking. Something I certainly intend to explore further and test in the future.
Robust AI Search Ranking for Radical C2C Marketplace Growth
A really nice talk from Teo Narboneta Zosa and Chingis Oinar about one of my favourite topics: learning to rank. Interesting aspects concerning the creation of the dataset for training the model from implicit feedback and evaluation metrics were presented, with a very detailed and in-depth analysis of the generation of relevance labels. The part I enjoyed the most, however, was the part on position bias management, where several valuable techniques were presented.
Cracking the Code: Deciphering Evaluation Essentials for RAG
Finally, a big thank you to Atita Arora for her overview of available RAG evaluation tools and frameworks. This is an interesting starting point for choosing the most appropriate system for your use case.
ILARIA PETRETI
R&D SOFTWARE ENGINEER @ SEASE
Never two without three!
I still find it hard to believe this is my third consecutive year speaking at Europe’s biggest search conference, but it’s true! 🙂
Attending Berlin Buzzwords is always an incredible experience, and this year was no exception.
The conference started as usual on Sunday with the Barcamp, where the beautiful view from the KulturBrauerei’s terrace added charm to the evening, creating a perfect mix of professional and social interactions. The speakers’ dinner that followed was also enjoyable, providing an excellent opportunity to continue networking and engaging with many passionate search enthusiasts.
I really appreciated Xata organizing the Women’s Speaker Breakfast event on Monday morning. It was a fantastic opportunity to meet with other female speakers and “celebrate the brilliance and diversity of women in tech”.
For the first time, three talks from our company were accepted. What a honour!
From Natural Language to Structured Solr Queries using LLMs
This year, Anna and I had the pleasure of presenting on a different stage, Maschinenhaus, which was also very beautiful and intimate. We presented a project we worked on for some of our clients that focused on how to leverage large language models (LLMs) to transform natural language queries to Apache Solr structured ones. It’s always a joy to collaborate with her… we have established a good relationship and work very well together. I was thrilled with the audience’s engagement and their insightful questions. Thank you to everyone who attended and engaged with us!
Hybrid Search With Apache Solr Reciprocal Rank Fusion
Unfortunately, since Alessandro’s talk was scheduled immediately after ours, we arrived late and I couldn’t fully appreciate it. I am looking forward to watching the recording. I found the implementation and the discussion on the challenges of combining sets from different ranking systems very interesting. While it may not be the ultimate solution, it’s the simplest and quickest way to introduce this new feature in Apache Solr. For some clients, I had to manually implement hybrid search using Reciprocal Rank Fusion (RRF), so having this functionality integrated soon will be very beneficial. Thank you very much for your contribution!
Blazing-Fast Serverless MapReduce Indexer for Apache Solr
Also Daniele did an excellent job presenting his project on a blazing-fast serverless MapReduce indexer for Apache Solr. Through a practical approach, he illustrated the entire implemented pipeline, which the audience followed with enthusiasm, as it can be very useful for those working in similar scenarios and facing similar challenges.
At the end of our talks, we received numerous questions. The participants’ keen interest and their ongoing feedbacks made us realize that our presentations are consistently well-received and appreciated. Good job guys!
Apart from our talks, on both the first and second days of the conference, there were other engaging presentations that I followed with great attention.
The Paradox of Open: Can Digital Commons Offer a Way Forward?
I enjoyed the opening keynote, which emphasized the importance of genuine openness in addressing digital challenges. It highlighted the problem of “open washing,” where companies falsely claim to be open and transparent while their practices do not reflect these values. The talk stressed the need for real transparency and regulation in the digital space, and introduced the concept of “rewilding the Internet” to create more open and public digital commons.
Robust AI Search Ranking for Radical C2C Marketplace Growth
At the conference, the focus on Learning To Rank (LTR) remained lively. This topic continues to be crucial and fascinating, as demonstrated by many talks. In particular, the one by the two Japanese experts was highly informative. They discussed practical insights into dataset construction, custom metrics, model building, and de-biasing techniques, ensuring the system’s effectiveness and performance over time.
Then I attended a series of talks on LLMs to see how others used them and what they did differently from us. Integrating large language models (LLMs) with search functionality is clearly a popular trend. I was pleased to see presentations showcasing implementations similar to ours, confirming that we are on the right track. The limitations and findings they shared also mirrored our own observations.
However, while the ideas are promising, it was evident that we are still in the prototype phase (like us) and the systems are not yet robust enough for production.
Improve LLM-based Applications with Fallback Mechanisms
For example, as presented by Bilge Yücel, some fallback mechanisms, while innovative, still seem like expensive patches rather than comprehensive solutions to make the systems fully production-ready. This highlights the need for further refinement and more cost-effective approaches.
Even Retrieval-Augmented Generation (RAG) was a buzzword this year. I attended interesting talks that explored advanced techniques and innovative integrations aimed at enhancing the effectiveness and capabilities of RAG systems.
In addition, in recent times, the evaluation of LLMs and RAG systems has become a crucial area of focus. Two insightful talks (by Atita Arora and Petr Polezhaev) provided in-depth overviews of these topics, covering methodologies, tools, and the importance of robust evaluation processes.
Can ChatGPT build a Data Platform faster than a developer?
Finally I also enjoyed the cool presentation by Chloé Caron. In a competition to build a data visualisation platform, she showed how a developer outperformed ChatGPT, completing the task in half the time. This demonstrate that although ChatGPT is a powerful tool, it lacks consistency when used alone and by combining it with a developer’s expertise, the best of both worlds can be exploited. Chloe emphasised that ChatGPT is not yet ready to replace humans, but can significantly improve our daily work.
Last but not least, I want to highlight that during these four days, I thoroughly enjoyed working and spending quality time with my colleagues Alessandro, Anna, and Daniele, whom I usually only interact with through a webcam. It was a fantastic experience, allowing us to bond and do team building in person. I hope we get the chance to do it again soon.
I am sincerely grateful for the opportunity and happy to have participated in such extraordinary and cool events. I look forward to seeing you next year, hopefully!
DANIELE ANTUZI
R&D SOFTWARE ENGINEER @ SEASE
After two years, second talk accepted at the Berlin Buzzwords and second experience at this conference.
Let’s go day by day
Unfortunately, the first day I had some calls with our clients and I missed a few talks but I managed to be at the conference just in time to enjoy the talks of my colleagues Anna, Ilaria and Alessandro. I don’t want to judge their talk because, you know, they are my colleagues and I risk to not be fair but, in both cases, the audience showed their interest by asking questions.
After lunch some work for clients but I managed to attend a couple of talks and I could spend some time with the rest of the conference attendees.
Can ChatGPT build a Data Platform faster than a developer?
That afternoon I liked the talk of Chloé Caron who proved us that the developer job is not ready to die. She tried to implement a web page “manually” using the tool she know based on her experience, and she tried to do the same using chatGpt without writing any line of code. It turned out that using chatGpt was a really frustrating experience and it took her about the double of the time to achieve the same goal.
Under the hood of vector search with JVector
The day after, I loved the talk of Joel Knighton who showed us how a pretty new field as vector search is evolved from the point of view of a JVector contributor. I took the opportunity to learn more about some ANN (approximate nearest neighbor) vector search and compression techniques.
Lessons learned writing 10+ Kubernetes Operators
Another interesting talk was “Lessons learned writing 10+ Kubernetes Operators” by Lars Francke and Jannik Heyl. I found it interesting because it follow the opposite approach of a normal presentation. They didn’t tell us what were their achievements but they were trying to teach us what not to do with the Kubernetes operands.
BLAZING-FAST SERVERLESS MAPREDUCE INDEXER FOR APACHE SOLR
Finally it was my time talk, I presented a project I implemented for one of our clients and I show the high level idea of designing an Apache Solr indexer using the MapReduce approach implemented using the serverless technologies on AWS. I saw many people coming to me asking me questions about my implementation and it means they were interested in what I recently did.
As the overall experience I found it very positive because, apart from the interesting talks, I had the opportunity to meet people from everywhere in the world and it was a pleasure to talk with them because everybody has a different experience and you can always learn something from that.
Did you attend the Berlin Buzzwords conference?
We would love to hear your thoughts on the conference! Leave a comment below and tell us about your experience and your favourite talks.






One Response