Veröffentlicht am

Bridging the Gap: How LLMs are Transforming Traditional Enterprise Search

How LLM transforms traditional enterprise Search

Our vision is clear: We want to talk to documents: Ask a simple question and get an accurate answer. No more typing keywords and scrolling through hit lists.

It has only been a year since ChatGPT blew us away. It’s capabilities are hilarious and we wonder how soon traditional enterprise search systems will be replaced by conversational systems?

In this article we will identify some challenges to be solved before traditional Enterprise Search will be the past and we will identify a promising way to early start to experiment with these new technologies.

I have a strong background in information retrieval and have helped build several search engines in the past hence have solid experience in building search engines.

When I first heard about chatbots, I did not make the relation to search systems at first. Then I heard about Retrieval Augmented Search (RAG) and of course I was curious about the new possibilities.

I explored if  RAG and LLM systems are capable to replace current search engines namely in enterprise search.
What I found is a whole new world that unfortunately uses the same vocabulary as the traditional world. This makes it very challenging to explain traditional enterprise search to the younger specialist who have a strong data science background.

And vice versa, it is challenging to explain to the traditional search specialists how retrieval augmented search really works.

Bag-of-Words vs Word Embeddings

Traditional systems follow the bag-of-words paradigm. Text is broken down into words, which are transformed into tokens to optimize search results. A very simple example: The user searches for ‘dog’ and expects all documents containing ‘dogs’.

What about multilingual documents? Do we expect all documents containing “Hunde” to be found when the user searches with the keyword “dog”?

A good traditional search engine may even provide this functionality. To obtain this a lot of configuration and evaluation must be done.

However, the bag-of-words approach only allows us to search for keywords – sometimes exact phrases. Good traditional systems go even further and find documents that do not contain the keyword, but related terms. This is achieved by knowledge graphs and similar knowledge structures. But it remains a bag-of-words.

The new LLM based approach relies on word embeddings, which are able to capture more context than a simple word. To achieve this effect, we need a language model – LLM.

The creators of Eliza tried to build an intelligent chatbot back in 1964. From today’s perspective, it would be an exaggeration to call the approach used there an LLM. At the time, it seemed revolutionary.

Since then, a lot of work has been done on natural language processing. today’s llm shows previously unimaginable results. BERT was one of the first to use neural networks to capture even long-range dependencies in text. BERT stands for Bidirectional Encoder Representations from Transforms and was launched in October 2018.  Only five years have passed since then.

I think it is fair to say that it is a very young technology. But it is developing very fast. This young age also implies that we lack reliable experience.

Similarity, Retrieval and Ranking

Retrieval describes the process of finding all documents that match the query, i.e. the user’s question.

Traditional systems are based on keyword search and the bag-of-words model.

  • First: all documents matching all query-keywords are found, enhanced by optimized design such as spell checking, knowledge graphs, etc.
  • Second, these documents are ranked using a specific algorithm to produce an ordered list – the best document on top.
  • Third, the list is presented to the user, at best with some text snippets and highlighted keywords.

BM25 is one of the ranking algorithms often implemented in traditional search engines. It is based on the frequency of terms, both in the retrieved documents and in the entire collection.

The concept of cosine similarity has been around for a long time, even in traditional search.

  • We represent both as vectors: The query and all documents.
  • Then we compute the cosine of the angle of the query vector with each document vector.
  • The document with the smallest cosine is the most relevant document with respect to the query.

Computing the cosine in a vector space is quite straightforward. However, computing the cosine with all documents in a long hit list in the sparse vector space is quite time-consuming. For this reason, cosine similarity is  not widely used in traditional search systems.

However, dense vectors have much smaller dimensions, allowing vector products to be computed in reasonable time. Cosine similarity is used for dense vector retrieval.

Vector Databases vs Inverted Index

We use some database to store our documents, index them and search in these document database.

Traditional search systems rely on an inverted index and a forward index: they are the base of the bag-of-word model and on keyword search. Good data structures and retrieval algorithms have been developed in the past decades and they enhance quality search engines even for millions of documents.

Unfortunately these databases do not work with dense vectors. We need a new generation of databases – vector database management systems VDBMS are popping up like mushrooms – both as completely new systems and on top of existing databases.

New algorithms allow vector search even in log(n) order, n being the number of vectors in the database.

Data Design for a Search System

High-quality traditional search engines allow for very sophisticated indexing: They look at the structure of the text in a collection and identify common elements as “fields”.
Examples of typical fields: Title, Authors, Date, Text, and a rich collection of metadata to enhance the quality of the retrieved documents.

For each field, we design the best processing chain to create an index for that field. This might be a tokenizer to split words into tokens, remove stop words, enrich synonyms, build word stems and enrich documents by metadata. And we define the best query and retrieval processing to find the best results that match the user’s query. In the end the tokens in the documents are matched with the tokens in the query.

With LLM-based retrieval, this seems to be a very different process, as it calculates the similarity of the query on a dense vector bases, which consists in numbers and not in tokens.

The question remains: Do we still rely on fields in a database for LLM-based retrieval? Or will we use a much more finely structured one, as described in the following chapter on Retrieval Augmented Generation?

If so, what are the advantages of LLM over traditional search?

The field is too young to answer this question definitively.

Strength or LLM Based Systems

There is one thing that LLM-based techniques allow that traditional systems do not: conversation with documents. No keywords and hit lists, but natural language questions and answers.

Together with speech-to-text and text-to-speech systems, we will be able to literally talk to documents, even in foreign languages, such as Chinese.

Retrieval Augmented Generation

How is this conversation being implemented? Let us take a look at what is currently being discussed as new solutions to the old retrieval problem.

  • The user enters a query in natural language, procuring also a handful of documents, e.g. ten documents taken from some manual.
  • The system embeds the query to obtain a vector.
  • The system also reads the ten documents and splits them into appropriate snippets.
  • Each snippet is embedded to obtain a vector.
  • All these vectors are stored in a vector database.
  • The query vector is used to find the most similar snippets with some similarity measure, such as cosine similarity.
  • And with the help of a Generative Pretrained Transformer, such as ChatGPT, these are used to generate a natural language answer that is presented to the user.

This leaves us with the question of the appropriate snippet. To generate a natural language answer, we need very concise information. ‘Trash in trash out’ is more valid than ever.

Let us look at the simplest approach. We take snippets of three sentences – so we form a sliding window of three sentences to scroll over the text. Each triplet is embedded to its own vector. Sometimes five-sentence windows work better – or a more sophisticated method is needed.

We do not embed a whole document or a field of a document as a vector – we embed much finer structures, e.g. sentences or a hierarchy of text chunks.

At this point, we cannot answer the question of what granularity works best. We simply lack experience – remember, it was only one year ago that we were overwhelmed by ChatgGPT.

Select the best LLM and the best GPT

To implement RAG we use a large language model LLM and also a generative pretrained transformer GPT. We face an emerging market – both open source and commercial – for these new technologies. Which is the best choice? Digging in Hugging Face soon shows, that there are models trained by one person or models trained by big companies.

To obtain best results we need a model trained on data of a similar context to ours. A model trained on data about medicine will serve better results for medical RAG-engine than a model trained on other data about computer science.

We lack on experience in evaluating the best models for our purpose.

Quality

A good search engine finds all the documents we are looking for and presents the best ones at the top of a list.

Implementing a traditional search engine is far from trivial and is often underestimated, resulting in a low quality engine.

During the development of a traditional search engine, we measure the progress of quality. To do this, we generate or procure a gold standard and measure the system against it in terms of precision recall and F1 score, or even look at a ranked list and measure MAP, MRR, NDCG or similar metrics.

The same could be done for the retrieval part in RAG. But what is the gold standard?

And how do we measure the quality of the generation part?

Does the system generate additional information? Does it really take into account just the snippets we found in our own documents or does it take into account information from its original training on other data?  How much truth is actually in the answer?

The system’s answer will sound perfect, but can we believe it and even make important business decisions based on it?

Can we actually use a generative AI to evaluate the response of a generative AI, as some are proposing? Or is this equivalent to evaluating a student by another student instead of evaluating a student by an expert teacher?

These are still open questions to be answered by science.

Conclusion

There are no obvious advantages of LLM-based vector retrieval over traditional search systems. This may change in the near future as we gain more insight into the emerging search paradigm.

As far as search is concerned, we have a clear idea of what might be useful in an enterprise context. We are still a long way to systems that provide reliable answers for enterprise use.

However, an experimental use of RAG in addition to an existing traditional search system could provide valuable insights into the productive use of these new technologies in an enterprise context:

Using RAG to talk to the top-ranked documents in a traditional enterprise search seems to be a promising approach to gain experience with these technologies.

  • LLM-Tipps & Fachglossar

    Abonniere meinen Newsletter, erhalte regelmäßig Tipps und Tricks über den produktiven Einsatz von LLMs und ich schenke dir mein umfangreiches Fachglossar Von AI-Engineering bis Zero-Shot

  • Chatbot als Lernassistent
  • Prompt Engineering Personas und Wiederholungen
  • AI-Engineering-Fachglossar
  • EBook Tutorial: Cluster aus virtuellen Maschinen
  • Ebook: Apache ZooKeeper
  • Ebook: Realtime Streaming Pipelines
  • LSM-Trees: Log Structured Merge Trees
  • Aufbau einer Enterprise Search
  • Zeit Stream Analytics
  • B-Tree-Index in Datenbanken
  • Ordering Guarantee in Apache Kafka
  • CAP Theorem
  • MapReduce Funktionale Programmierung
  • Konzepte des HDFS
  • Optimistisches Concurrency Control