Why you need Product Analytics to build great LLM products

You wouldn’t launch a product or marketing site without analytics in place to track their performance with real users. Why not? Because you need to understand how real people are using your products and how well they meet user needs.

Why you need Product Analytics to build great LLM products

You wouldn’t launch a product or marketing site without analytics in place to track their performance with real users. 

Why not? Because you need to understand how real people are using your products and how well they meet user needs. This is so you can identify what is working well, and where things need to be improved. Yet a staggering number of businesses have LLM-powered experiences in production without understanding how people use them or how the product is performing. 

MLOps is an increasingly mature ecosystem, yet it is missing the crucial step of observing real user behavior, and product performance with real people.

No product is perfect at launch, especially for something as complex and unbounded as a natural language interface - where any user can provide any input, and the product responds to even the strangest and most unanticipated queries.

Product analytics allows businesses to understand how people are engaging with their products, and where product development cycles should be focused. This allows them to improve the product as it either iterates towards product-market-fit, or scales to the moon.

Why are LLMs different?

The importance of analytics for product development is universal. But unlike marketing websites and graphical user interface (GUI) products, natural language interfaces have a huge surface area of possible user interactions. There are only so many pages on your marketing website, and so many features in your GUI product, but with an LLM the number of possible use cases and interactions is unbounded, and large numbers of users may be engaging in behavior you did not anticipate.

For LLMs, the natural language inputs and output must be evaluated, and this is hard to do with today’s analytics products - they will tell you how many users you have, but they can’t tell you what your users are actually using your product for, and how well it handles their queries, but these are Natural Language Processing (NLP) problems, whereas existing analytics tools are event-driven, analyzing button clicks and page views instead of large quantities of text.

Dedicated analytics tooling is required for text interfaces - and this is hard to build well. Many developers that I speak with are still reading pages and pages of chat transcripts, but this takes a lot of time, and doesn’t give great user understanding. Building dedicated tooling in house requires investing significant machine learning engineering resources to build out NLP features.

Our point of view at Context is that you wouldn’t re-implement Stripe or Datadog when great solutions exists, so why reimplement LLM product analytics tools? That’s why we’re building Context. We give businesses visibility into how users are engaging with LLM-powered products, and track how those products are performing  

What analytics should I be looking at, specifically?

Some important metrics include:

  • Overall product performance, and how it trends over time. This includes signals like user input sentiment, user thumbs up/down ratios, message regeneration rates, and conversation volumes. This gives a snapshot overview of the health of your product, and the product-level impact of your updates.
  • Frequent topics of conversation - what are users talking about most? This can be detected by clustering conversation transcripts into groups, or manually configuring topics to track - or ideally both! This lets you understand how people are using the product.
  • Performance per topic - which topics does the product handle well vs poorly? This allows you to understand areas of relatively stronger vs weaker product performance.
  • User retention - are my users churning, or are they increasingly engaged? Retention tells you how sticky your product is, and if all your users try it once and never come back, or if they keep coming back day after day. 

What do I do with this information?

Make your product better!

You can fine tune your model, improve your training data set, or even change the underlying facts the model is referring to. Perhaps queries about your returns policies are getting low satisfaction due to poor training data, or perhaps customers are frustrated at your returns policies! Going deep in the analytics and then debugging specific conversation transcripts is key to understanding these issues. 

These analytics enable you to build a better user experience that meets user needs and better drives business outcomes - which is the ultimate goal of all product development.

If you’re building a product using LLMs you’re playing in an increasingly competitive market with increasingly demanding users. We’re building Context to help builders develop better LLM products that delight their users. You can visit context.ai or reach out at henry@context.ai to get started with product analytics for your LLM-applications.

Read more