4 minute read / Nov 9, 2023 /
Top 10 Trends for Data in 2024
At the IMPACT Summit yesterday, I shared our Top 10 Trends for Data in 2024.
- LLMs Transform the Stack : Large language models transform data in many ways. First, they have driven an increased demand for data and are causing a complete architecture inside companies. Second, they change the way that we manipulate data. Analysts will use automated data analysis, and it will be an expected tool in every product : notebooks, BI, databases, etc.
If you’re curious about the evolution of the LLM stack or the requirements to build a product with LLMs, please see Theory’s series on the topic here called From Model to Machine.
- Data Teams are Becoming Software Teams : DevOps created a movement within software development that empowers developers to run the software they wrote. The same thing is happening in data. Products have filled those needs by mapping each of the core functions and responsibilities in the dev movement data ops. Most sophisticated data teams run like software engineering teams with product requirement documents, ticketing systems, & sprints.
- Data Products : The combination of large language models and data teams becoming software teams has led to data products. Whether it’s data being used inside applications, feeding machine learning models, or downstream analysis, companies are increasingly reliant on this data, and that’s not changing. 80% of data is unstructured within organizations. LLMs are fantastic first-pass filters and phenomenal classifiers that extract insight or build machine learning features from unstructured data like customer support conversations or sales calls.
- The Semantic Model Becomes a Must-Have: Semantic models unify a single definition across an organization for a particular metric. Looker did this within the context of a BI system. But organizations need this layer across the stack. In addition to the reusability of definitions, composability - creating complex analysis with simple building blocks - will define this layer, both for humans who find it easier to understand and for large language models that synthesize semantics.
- Instrumentation and Governance Enable Many New Use Cases : Today’s data leaders are struggling. Executive teams and boards are demanding innovation with LLMs and data. Meanwhile, regulation and compliance mean the governance burden only increases. Software startups are rising to meet the need. Data contracts encode the data interchange between two different departments (Gable). BI systems marry the centralized control of data teams with the ability to define and promote metrics at the edge of an organization (Omni). Observability systems measure the uptime of pipelines and detect anomalies (Monte Carlo). Semantic understanding of code and ephemeral developer environments enables data engineers to reduce costs and work more fluidly together (SQLMesh).
- The Pendulum Swings to Small Data : Modern Mac laptops have the same computational power as the AWS servers Snowflake used to launch the company. Since most workloads are small, data teams will use in-process, in-memory/in-process databases to analyze data and move data. They are faster to get started (no account creation), they can scale very quickly, and they can rise to enterprise levels with commercial cloud offerings.
- Cost Pressures Continue : The dominant theme of 2023 is doing more with less. Looking at Snowflake’s net dollar retention over the last few years, it’s clear exactly when the office of the CFO became an important voice within the data world. This is leading to a trifurcation of workloads : offloading workload from the most expensive queries to less expensive query engines (in-memory & data lakehouses) where slightly higher latencies and different performance characteristics work well.
- Juggernauts Dueling : Whether it’s Snowflake vs Databricks competing over structured data workloads, or Microsoft Fabric and Databricks competing over unstructured large scale data processing, or Google and Amazon competing over LLM deployments technologies, or Microsoft and OpenAI cooperating/competing in the enterprise, 2024’s data landscape will be shaped by these battles.
- Consolidation : Data companies have produced a huge amount of consolidation in the last few years, and given the competitive dynamics, the rapid growth rates within the ecosystem, which are significantly faster than overall software spend, higher multiples afforded to these businesses, we should expect to see a lot of M&A in 2024.
- The Decade of Data Continues : The pace of innovation within the data world continues to accelerate due to data. And so the decade of data continues.