Data is the new oil: how to avoid wasted AI investment

Executive overview

Most enterprise AI projects fail not because of the model but because of poor data strategy and undefined user intent. The model is a small part of the system — data, intent, and architecture around it determine success.

Start with the problem: what does the end user actually need to accomplish? Then understand what data you have, what you can access, and what you might need to buy. Only then choose a model.

Your proprietary operational data — not the latest model — is your real competitive advantage.

The three biggest mistakes corporate AI leaders make

  • Declaring "we're going to use AI" before defining the human-centred outcome
  • Building without governance, creating a wild west of incompatible architectures
  • Treating data access as equivalent to data availability — having data and being able to use it are different

Why data quality determines project outcomes

  • Inadequate, inaccessible, or poorly scoped data is the root cause of most AI project failures
  • Overfitting and underfitting in classical ML problems are symptoms of a poor understanding of the operating domain
  • If your model is built on a misrepresentation of your data set, you will have to redo the work
  • Ask vendors: what pre-training data was used? How was it fine-tuned? How was it aligned? Inability to answer is a red flag
  • Demand data set samples and lineage documentation before committing to a solution

Your proprietary data is your differentiator

  • Large general models trained on trillions of tokens cannot answer highly specific operational questions — your data can
  • Data accumulated over decades (spreadsheets, databases, legacy systems) is now exploitable via generative AI in ways that were not possible before
  • Example: IBM Maximo used a GenAI model to generate synthetic training data from a small number of labelled operational data points, enabling a classical ML model to deliver operational value
  • Combining internal operational data with commercially available external data (e.g., weather patterns for fleet dispatch) creates signals that unlock new insights
  • Document what data you cannot get today — it creates a roadmap for future capability

How to pick the right model

  • The model is a small component; the surrounding system — data pipelines, deployment environment, feedback loops — is what matters
  • Consider where inference runs: offline on a device versus cloud-hosted changes the entire architecture
  • Design for model swappability: as better models emerge, the rest of the system should remain stable
  • Different data types (time series, image, structured text) require different models — "just chat" is rarely the right answer
  • Use thumbs up/down feedback mechanisms to create signals for ongoing improvement

Avoiding architectural lock-in and parlor tricks

  • Fixating on a specific technology (e.g., "we need a chatbot") without closing a real gap is a warning sign
  • Design for an MVP that delivers "good enough" — good enough is better than a coin flip, and you can iterate
  • "What's the intent behind the outcome?" is the most useful design question; it separates genuine value from demos that impress but don't help
  • Evolutionary improvement (1% better, repeated) and revolutionary capability (genuinely new behaviour) are both valid — but revolutionary requires truly differentiated value for the specific user
  • AI is more than chat: over-rotation toward chat systems causes teams to miss use cases suited to image, sensor, or time series data

How to understand user intent in practice

  • Include all stakeholders: line-of-business users, architects, and decision-makers, not just the IT team
  • Run over-the-shoulder UX research — observe what workarounds users create (e.g., a note file open alongside the software) to surface missing workflow steps
  • Those observations directly inform requirements such as versioning, data lineage, and lifecycle management
  • AI is always part of a broader workflow; no one uses AI for its own sake

The IBM Design for AI framework

IBM formalised a five-layer approach (available at ibm.com/design/ai):

  1. Intent — what human-centred outcome are you targeting?
  2. Data — what do you have, what can you access, what do you need?
  3. Model — chosen based on data types and deployment context, designed to be swappable
  4. Insights — signals extracted from the model output
  5. Action — business value delivered to the end user

Three recommendations for corporate AI leaders

  1. Get in the game — even a data inventory exercise, before any project starts, unlocks value and surfaces what can be retired
  2. Design for evolution — AI architecture patterns are still maturing (like microprocessors decades ago); build in uncertainty and plan for the system to change within a quarter or a year
  3. Look beyond text — image data, IoT/sensor/time series data, and structured databases each unlock different use cases; knowing what data types you hold expands the space of problems you can address

More like this — when you're ready for early access.

Join the waitlist for a personal account and content recommendations based on what you're working on.

No spam. Unsubscribe at any time.

You're on the list. We'll be in touch before launch.

Get early access to the full library.

Join the waitlist for a personal account and content recommendations based on what you're working on.

No spam. Unsubscribe at any time.

You're on the list. We'll be in touch before launch.

Be among the first to get personalised recommendations tailored to your stage in business.

No spam.

You're on the list. We'll be in touch before launch.

Be among the first to get personalised recommendations tailored to your stage in business.

No spam.

You're on the list. We'll be in touch before launch.