Generative AI Product Problems #1: Accuracy and Hallucinations

LLMs are notoriously fickle and prone to making up information.

Sundar Solai

Oct 24, 2023 — 1 min read

According to ChatGPT, LLMs make up 73.6% of statistics.

Even when hallucinations happen rarely, that’s enough to cast doubt on 100% of an LLM’s responses. Without an easy way to separate fact from fiction, your only option is to be skeptical of everything the model says.

That becomes a hard blocker to deploying an LLM for any serious business application. How are you supposed to trust AI to process insurance claims or handle customer support if it won’t always tell the truth?

Step 1 in solving a problem is to measure it. How your LLM fares against general-purpose benchmarks isn’t that relevant. What really matters is how your product performs for your customers’ use cases.

Track what topics and questions your users are most frequently asking about. If you haven’t launched yet, you can test your LLM offline. If you’ve already launched, check your logs.

Inaccuracies come in many forms:

Is your LLM incorrectly interpreting users’ inputs?
Or is it making false assumptions about your business’ offerings?
Or maybe the model is making promises it can’t keep.

Then, respond accordingly. You might need to try prompt engineering or augmenting your model by linking it to your company’s knowledge base (RAG). Or it could be that you need to implement guardrails for what your LLM can do or say.

Insight into what your users are saying and what your LLM is saying back is the key to building a more reliable AI product. At Context.ai, we’re building the platform to give you that visibility.

To learn more, request a demo at context.ai/demo.

What product experiences are enabled by multi-agent LLM frameworks?

It feels like everyone is excited about multi-agent frameworks - even though their performance isn’t yet ready for prime-time. These performance problems are improving with increasingly powerful models like Claude 3 and GPT-4o - and great things are expected from GPT-5, a launch that will likely make agentic workflows

Launching Custom Conversion Events - Product Update | July 2024

Today we’re launching support for custom conversion events 🧾 This addresses one of the biggest challenges in the LLM ecosystem - proving ROI 📈 Context.ai users can now log custom conversion events with their LLM conversation transcripts, indicating where users completed an action: a purchase, a link click, or even

Are your LLM Products Guardrails working?

How do you know if the guardrails on your LLM product are working? 🛡️🎯 Some people wait until they show up in the The New York Times - like McDonald's, Air Canada, or Chevrolet Conversational LLM products are a challenging consumer experience as users can ask an infinite number

Is LLM progress slowing?

LLMs haven’t significantly improved since GPT4: is progress slowing? 🐢 Dramatically more powerful model training clusters are being built: 15 of them, with 31 times more power than trained GPT4 This means models much more powerful than GPT4 are coming 🐇 SemiAnalysis did a phenomenal deep dive into this topic -

Read more

What product experiences are enabled by multi-agent LLM frameworks?

Launching Custom Conversion Events - Product Update | July 2024

Are your LLM Products Guardrails working?

Is LLM progress slowing?