Developing AI Products Part 4 Getting Data To Start Development

Published
Reading time
2 min read
Series of spreadsheets with different data

Dear friends,

In a recent letter, I mentioned some challenges to building AI products. These problems are distinct from the issues that arise in building traditional software. They include unclear technical feasibility and complex product specification. A further challenge is the need for data to start development.

To develop a traditional software product, interviews with potential users might be sufficient to scope out a desirable product, after which you can jump into writing the code. But AI systems require both code and data. If you have an idea for, say, automating the processing of medical records or optimizing logistics networks, you need medical records data or logistics data to train a model. Where can you get it?

I see different answers for consumer-facing and business-facing AI products. For consumer-facing (B2C) products, it is generally easier to ask a small group of alpha testers to try out a product and provide data. This may be sufficient to bootstrap the development process. If the data you need is generic to many users — for example, photos on smartphones — it’s also more likely that a team will be able to find or acquire enough data to get started.

For business-facing (B2B) AI projects, it’s often difficult to get the data necessary to build a prototype because a lot of highly specialized data is locked up within the companies that produce it. I’ve seen a couple of general ways in which AI teams get around this problem.

  • Some AI teams start by doing NRE (non-recurring engineering, or consulting) work, in which they build highly customized solutions for a handful of customers. This approach doesn’t scale, but you can use it to obtain enough data to learn the lessons or train the models needed to build a repeatable business. Given their need for data, AI startups seem to take this path more often than traditional software startups.
  • Some AI entrepreneurs have worked with multiple companies in a vertical market. For example, someone who has worked for a large public cloud company may have exposure to data from multiple companies in a given industry and witnessed similar issues play out in multiple companies. I’ve also had friends in academia who consulted for multiple companies, which enabled them to recognize patterns and come up with general solutions. Experience like this puts entrepreneurs in a better position to build a nascent product that helps them approach companies that can provide data.

If you lack data to get started on an AI project, these tactics can help you get an initial dataset. Once you’ve built a product, it becomes easier to find customers, get access to even more data, and scale up from there.

Keep learning!

Andrew

Share

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox