Generative AI Product Problems #3: Latency

Intelligence isn't everything. Sometimes, LLMs are just too slow for good user experiences.

Sundar Solai

Oct 26, 2023 — 1 min read

LLMs generate tokens one…

at…

a…

time.

That starts to feel slow pretty quickly.

It seems like a no-brainer that LLMs will replace each and every 1st generation customer support chatbot. LLMs surely are smarter, but they lose big in one category—speed.

GPT-3.5’s latencies start in the ballpark of 0.5 seconds. For GPT-4, expect even the shortest tasks to take at least 1 second.

LLM latency grows linearly with the number of tokens in the output. Asking for a multi-paragraph response from GPT-4? You could be waiting on the order of minutes or more.

The quick solution is to set a cap on the maximum number of tokens you request from the LLM. Output length is the most important factor in latency; reducing your input size won’t make a difference.

You could also try using a different model. Do you really need GPT-4 in all parts of your product? You could reduce latency and cost by switching to GPT-3.5.

You may not even need an LLM to handle all parts of your application. Sometimes, old-fashioned business logic works perfectly fine and is much faster.

Test these tradeoffs before committing to them. Speed is important, but understand what else you’re sacrificing to achieve it.

With Context.ai, you can run A/B tests on your LLM-powered products. Experiment with strategies to reduce latency and measure the end-user impact. That way, you can find the right balance between speed and quality.

To learn more, request a demo at context.ai/demo.

What product experiences are enabled by multi-agent LLM frameworks?

It feels like everyone is excited about multi-agent frameworks - even though their performance isn’t yet ready for prime-time. These performance problems are improving with increasingly powerful models like Claude 3 and GPT-4o - and great things are expected from GPT-5, a launch that will likely make agentic workflows

Launching Custom Conversion Events - Product Update | July 2024

Today we’re launching support for custom conversion events 🧾 This addresses one of the biggest challenges in the LLM ecosystem - proving ROI 📈 Context.ai users can now log custom conversion events with their LLM conversation transcripts, indicating where users completed an action: a purchase, a link click, or even

Are your LLM Products Guardrails working?

How do you know if the guardrails on your LLM product are working? 🛡️🎯 Some people wait until they show up in the The New York Times - like McDonald's, Air Canada, or Chevrolet Conversational LLM products are a challenging consumer experience as users can ask an infinite number

Is LLM progress slowing?

LLMs haven’t significantly improved since GPT4: is progress slowing? 🐢 Dramatically more powerful model training clusters are being built: 15 of them, with 31 times more power than trained GPT4 This means models much more powerful than GPT4 are coming 🐇 SemiAnalysis did a phenomenal deep dive into this topic -

Read more

What product experiences are enabled by multi-agent LLM frameworks?

Launching Custom Conversion Events - Product Update | July 2024

Are your LLM Products Guardrails working?

Is LLM progress slowing?