Venture Capitalist at Theory

About / Categories / Subscribe / Twitter

2 minute read / Sep 5, 2024 /

The Challenge of the AI Demo

The AI Demo isn’t easy. Many of the major AI companies have demoed their AI systems, first starting with pre-recorded, & now pushing into live demos. They don’t always work.

Multiply Murphy’s Law by a non-deterministic system & it’s not unreasonable to expect AI demos to nearly always hiccup.

Demo disruptions aren’t disaster. These systems are early & changing rapidly. They might suggest the system requires work & tuning, not a fundamental challenge.

But, they can be problematic in proofs-of concept.

Proofs of concept are extended demonstrations of the software. Well-structured PoCs align on success criteria at the outset. These criteria enable vendors & customers to agree on what success looks like.

Worflow proofs-of-concept are relatively straightforward. They are deterministic. Can I process a loan application in 5 minutes? Yes or no.

But as AI applications shift to selling outcomes implicitly or explicitly, the PoC becomes a testing ground of those outcomes. Non-determinism means sometimes the PoC won’t produce the required wow moment. This also means the PoC criteria must be more flexible.

How does a buyer evaluate a probabilistic system?

Do we compare it to human performance? Speaking to some practitioners, they’ve shared with us human labelers typically agree on 60-70% of the time. Does a AI robot need to be as accurate as a human assuming it will be much less expensive? Or will we expect more as we do in self-driving cars?

If AI systems require human assistance, then the ROI of the system must include some human operating expense - whether explicit or implicit.

Some teams will want to benchmark systems in parallel to determine the relative performance. With most startups building atop existing models & setting aside differences in fine-tuning, the ultimate performance should be relatively comparable, provided they use the same data sets. Will startups compete on access to different data sets?

Today, there are more questions than answers about how to sell AI agent systems. We’re hosting an event on the evening of Sep 10th in San Francisco to interview leaders in the space moderated by Dave Morse, former CRO at Hebbia & VPS/VPCS at ScaleAI to talk about some of these questions.

If you’re interested to attend, see the details here.


Read More:

Which Design Era Are We In?