Back to AI Q&A
Why do knowledge base questions always answer questions? It's usually the case that something goes wrong with the dicing, recall, and context splicing

Why do knowledge base questions always answer questions? It's usually the case that something goes wrong with the dicing, recall, and context splicing

AI Q&A Admin 37 views

Knowledge base Q&A is not a question, nine times out of ten is not that the model suddenly becomes stupid, but that the material is fed crookedly in front of the retrieval link. The three most common problems are too shredded or messy cuts, inaccurate recall results, and incorrect content order and boundaries in context. The model only answers according to the material you give it, so the answer is biased, not necessarily a question of the generation layer.

Let's see if the dicing cuts off the semantics

Many knowledge bases cut documents to a fixed word count as soon as they come up, so it's easiest to break down titles, definitions, table descriptions, and conclusions. As a result, the model gets a "half-paragraph", and it will naturally answer wrong. When encountering system documents, product descriptions, FAQs or contract materials, priority should be given to cutting according to natural paragraphs, title levels, and Q&A units, which is usually more stable than simply cutting by word count.

Let's see if the recall regards "like related" as "really related"

  1. If the user asks about a very specific condition, version, department or restriction, the recall always comes back with a general introduction paragraph, indicating that the search granularity is not granular enough.
  2. If the first few recalls are from the same less relevant document, the sorting signal overwhelms the real business keyword.
  3. If the user is clearly asking about scenario A, the system always pulls the material of scenario B, which is often similar to embedding, but the business boundary is not filtered.

At this time, don't just adjust the model, first add metadata filtering, keyword rearrangement, or at least bring the document type, time, product line and other conditions into the search.

Finally, look at the context to see if it is "spelled in, but not spelled correctly"

Some systems cram recall fragments into prompts, which seem to have a lot of material, but in fact it is difficult for the model to determine which segment is the main evidence. A more stable approach is to put the most relevant snippet first, keep the title and source, and then clearly tell the model that "you can only answer based on the following; If you can't find it, just say it." If the context mixes the old and new versions, explanation segments, and advertising segments at the same time, the model will sway no matter how strong it is.

A simple order of troubleshooting is: first check the cut, then look at the recall, then look at the splicing, and finally doubt the model itself. Most knowledge bases answer questions that are not asked, and if the first three steps are correct, the accuracy will return significantly.

Recommended Tools

More