3 minute read / Jul 11, 2023 /

The Fracking of Information

Large language models enable fracking of documents. Historically, extracting value from unstructured text files has been difficult. But LLMs do this beautifully, pumping value from one of the hardest places to mine.

We have a collection of thousands of notes researching startups. We are tinkering with deploying large language models on top of them.

Here are some quick observations about our initial experiments :

The Future is Constellations of Models. When faced with a search box, a user might ask quantitative questions. For example, how many people from Google have a met in the last month?

Unfortunately, large language models - at least the ones that we have tested - do not answer quantitative questions in this way.

That’s problematic because users don’t stop to think about the type of query (quantitative, classification, segmentation, prediction, etc) before they type it into a search box.

To solve this, knowledge management systems will likely employ a constellation of different models. Perhaps the first model will classify the query, then route it to the right machine learning model to answer.

Summarization works out of the box. We have been researching the robotic process automation (RPA) space. Here is a subset of that output. One could imagine replacing the background or introduction sections in an investment memo or producing a blog post from this in about 2 minutes on a laptop. Editing remains essential.

Question: write a summary of the RPA space & the opportunities & challenges within it

Answer (took 119.74 s.): The context mentions that there are problems with maintaining consistency and quality in process discovery documents, which can cause issues for business continuity. There may be multiple factors contributing to these difficulties such as high attrition rates or lack of experience among personnel involved during development phase due diligence on large enterprises. However the most important factor…

source_documents/redacted.txt the Rpa market is converging with the process discovery market, so understanding which process is to automate and then automating them are really important.Rpa is a bottoms-up business and process discovery is tops down

Source Identification Matters LLM models are now linking to the source text. In the example above, the model cites the file (whose name I’ve redacted) & the location of the contributing source.

This behavior matters for two reasons. First, it builds trust & credibility in the model. Questions will inevitably arise from summaries. Drilling down to the root answer should assuage those doubts.

Second, this pattern should limit hallucinations, when models “invent” answers without basis in the source or training data.

Ubiquity means being everywhere. Our business maintains a single knowledge repository but outputs will appear in email, presentations, investment memos, blog posts, & search results.

New knowledge management systems will find a way to be integrated into all those outputs while respecting permissions, governance, & other policies that matter to a business.

If data is the new oil, then LLMs are the environmentally friendly fracking rigs, blasting value from unstructured text shale formations.

There is No Such Thing as Series A Metrics

The Fracking of Information

Read More:

There is No Such Thing as Series A Metrics