icon Legendas
Carregando legendas...

Is RAG Still Needed? Choosing the Best Approach for LLMs

youtube traduzir youtube tradutor youtube transcrição youtube legendas traduzir youtube para português youtube tradução de vídeos youtube translate to portuguese translate youtube to portuguese youtube transcript to portuguese translate youtube video to portuguese

YouTube transcript, YouTube translate

32/32

A quick preview of the first subtitles so you know what the video covers.

There's a fundamental truth about LLMs, large  language models. They are frozen in time. They   know everything about our world up until  their training cutoff date and absolutely   nothing about what happened 5 minutes ago. Nor  do they know anything about your private data,   your internal wikis, your proprietary codebase. And  if we do want an LLM to know any of that stuff,   well, we have to solve the problem of context  injection. How do we get the right data into the   model at the right time? And there have been two  very different ways to handle this. Now, the first   is really what we can think of as the engineering  approach. It's RAG, retrieval augmented generation.   So here we've got an LLM and we've also got an  input prompt from the user. Now ahead of time   we take some of the documents that we want to give  to this LLM. So these are documents that could be   PDFs or code files or entire books and we chunk  them. We break them up into smaller chunks and   we pass them through to an embedding model and  the embedding model takes those chunks and it   turns them into vectors and those vectors are then  stored in a dedicated vector database. Now when a   user asks a question, it performs a semantic  search to retrieve the most relevant chunks   and then inject them into the context window.  So now the context window has the user prompt, but it also has all of these chunks that we have  taken from the vector database and together this   forms the context window. Now this works but  it does rely on something. It relies on the   hope that your retrieval logic actually found  the right information in the vector database.   Now the the second approach is really a bit more  of a brute force approach and that one is called   long context. Now this is really the model native  solution because you skip the database here and   you skip the embedding model. All you do is you  take your documents and you just well you put them   straight into the context window and then you let the model's attention mechanism actually do the   heavy lifting of finding the answer. Now for a  long time this kind of brute force method wasn't   really much of an option because initially context  windows were tiny. Early LLMs had context windows   that could maybe store what like 4K of tokens.  You couldn't fit a novel in there, let alone a   corporate knowledge base. You basically had to use  RAG. But today's models have much larger context   windows. Some of them have, you know, a million  tokens plus.

Configurações

100%

Idioma de destino da tradução

🔊 Reprodução de áudio
Reproduzindo áudio traduzido