his is Part II of a two-part series on «How to assess Artificial Intelligence (AI) startups». Check out the previous Part I here.

How does it work?

While the scope of this article isn’t to teach the intricacies of Machine Learning models, it’s worth exploring some of the necessary steps. This understanding is essential to understand the needs and limitations of a startup.

Source: MLPref Inference Benchmark, Nov 2019


Machine Learning models, like their biological inspiration, are based on learning skills. The first step is to provide the system with enough information to learn the patterns. You can think of them as little children. Before a kid can operate safely, you need to teach them different patterns until they start ‘inferring’ how the world around them works. The more examples you show them, the faster and better your child will learn. 

Machine Learning algorithms are like children, and that stage of teaching is what we call the «training» of the model. This first stage is the foundation of any model. Depending on the algorithm, the amount of data (examples) and the time needed for successful training will vary dramatically. In other words, some models learn faster than others, as kids do. 

Most of the well-known machine learning models require massive amounts of data to achieve excellent performance. Beyond information, this training takes time. The actual training is, in reality, a gigantic algebraic operation. Depending on your computer capabilities (or kid’s brain), this processing will take a long time or forever. As the field has matured, so has the hardware used to train such models. The difference between teaching a model on a regular computer, vs. a one equipped with ad-hoc machine learning chips is appealing. 

Tensorflow Unit Rig. Source: Now you can train TensorFlow machine learning models faster and at lower cost on Cloud TPU Pods, Dec. 2018
Cost per hour of Machine Learning model training in a Cloud provider, against time taken to train it. Source: Best Deals in Deep Learning Cloud Providers, Oct 2018.
Evolution of Floating Point Operations (complexity of the model) of Image recognition Machine Learning models against their accuracy. Source: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks, Nov 2019.

Questions to ask: Two issues are important here. Is the startup using an already trained machine learning model, or are they doing the training themselves? Using someone else’s model is cost-effective and fast. The drawback is the lack of flexibility, as you can’t teach it anything new. Training your model is very powerful, but then the question is, how and what resources is the startup using to train it. Big models will incur in higher training costs, which will impact the startup’s burn rate. 

Data acquisition & curation

As I mentioned before, each machine learning model requires a different set of teachings or datasets. Like our children, these systems learn either under strict supervision (supervised learning) or on their own, in a self-taught way (unsupervised learning). Supervised models require that we provide learning examples, with the expected solution. Unsupervised models don’t need specific answers, but an overall goal. They’ll figure the right solution on their own. 

As incredible as it sounds, unsupervised learning models are few and don’t have great performance. Self-taught systems are still being developed, but some recent advances, like those demonstrated by Google’s Deep Mind unit, are very promising. The truth is, most AI systems out there use either supervised learning or semi-supervised learning. 

Now, acquiring enough data to train supervised models is one of the biggest challenges for any company. Depending on the complexity of the problem, we could be talking from 10000 data points, all the way to millions. 

«Our model, called GPT-2 (a successor to GPT), was trained simply to predict the next word in 40GB of Internet text. 


GPT-2 is a large transformer-based language model with 1.5 billion parameters, trained on a dataset of 8 million web pages.»

Better Language Models and Their Implications. OpenAI, Feb 2019.

The datasets need to be normalized, cleaned, and completed with the expected correct answer. Most times we don’t have enough data, it’s incomplete, or riddle with errors. Other times we don’t have enough correct answers («labeled dataset»). The efficiency of a machine learning model is directly proportional to the size and quality of the training data. Acquiring useful data is probably one of the hardest parts of applying AI in any company. 

Questions to ask: There are essential questions when it comes to training data, but maybe the critical one is to ask about the origin of the training data. If the training datasets are public or easily accessible, then, despite the use of AI, the startup has a lower degree of competitive advantage. But, if the company is capable of producing its proprietary training data, its strength is much more significant. 


Once the model is trained, we can then use it to solve the problem we have. This stage is what’s called ‘inference.’ It would be equal to hiring someone for our company right out of the university. We assume the academic system (the training stage of our model) has prepared them well for the realities of the market (the inference stage). The truth is that sometimes the student is well trained, others we need to teach them some more. The same happens with a Machine Learning model. For years, the most computationally expensive stage of the model was the training. The size of the models has exploded, and executing them is as costly or even more than the training phase. 

«Inference workloads are increasing dramatically, mirroring the increase in our training workloads, and the standard CPU servers we use today can not scale to keep up.»

Accelerating Facebook’s infrastructure with application-specific hardware. Mar. 2019.

The inference stage has an added complexity that the training step doesn’t have. On most occasions, the model is deployed in applications to provide solutions (inferences) in quasi-real-time. This means that any delay or overload is unacceptable. There is a stark difference with the training phase, where we are much more flexible. As long as it takes a reasonable amount of time, we don’t care for an extra hour. An hour delay for a production system is unimaginable. 

«Despite its apparent simplicity relative to training, the task of balancing latency, throughput, and accuracy for real-world applications makes optimizing inference difficult.»

MLPerf Inference Benchmark. Nov. 2019.
Latency of executing a Machine Learning model in different AWS server types against the cost per 100k inferences. Source: Reducing deep learning inference cost with MXNet and Amazon Elastic Inference. Mar. 2019.


As the number of AI startups grow, so does the computational needs. The bigger the scale of the company, the more complex the inference stage becomes. Scaling any startup is a challenge. Scaling an AI startup is even more so. This is why selecting the right tools can have a significant impact on efficiency and costs. 

«Via AI models deployed at scale, we make more than 200 trillion predictions and over 6 billion language translations every day. We use more than 3.5 billion public images to create, or train, our AI models, which allows them to better recognize and tag content.»

Accelerating Facebook’s infrastructure with application-specific hardware. Mar. 2019.

Precisely because of the increasing hardware needs, many startup companies are turning to cloud computing as a way to offset costs. At the same time, cloud providers are scrambling to offer the best buck for the hardware. The competition is so fierce that providers like Google have developed their own hardware (TPU) chips to that effect.

«Estimates indicate that over 100 companies are producing or are on the verge of producing optimized inference chips. By comparison, only about 20 companies target training.»

MLPerf Inference Benchmark. Nov. 2019.
Evolution of performance of Alphabet’s TPU chips. Source: Now you can train TensorFlow machine learning models faster and at lower cost on Cloud TPU Pods, Dec. 2018

Questions to ask: Where is the startup executing their Machine Learning model?. Most startups will probably use cloud computing resources. What provider and hardware are they using? What are the associated costs of both training and inference?. There might be situations where it’s cost-effective to run parts of the model in-house, so it’s also worth exploring. 

The Machine Learning Stack

Source: MLPref Inference Benchmark, Nov 2019

The final performance of a Machine Learning model depends on many aspects: the model used, the size of the model, the amount and quality of the data, how it was programmed, and the hardware the inference runs on.

While hardware choice has a significant impact, the way developers code the model also matters. Due to the mathematical complexity of these Machine Learning models, developers rarely code the hard math. Instead, data scientists have built a series of abstractions, called frameworks, that allows developers to build their models quickly on top of standard Machine Learning operations. 

Overall score for Machine Learning frameworks. Source: Which Deep Learning Framework is Growing Fastest? Jeff Hale, Apr. 2019.

Each framework has pros and cons, but the choice can have a substantial impact on the overall performance of the system. For applications that don’t require real-time inferences, the speed of training is critical. For those that need to return an answer in real-time (i.e., autonomous vehicles, voice assistants, etc.), inference speed is essential.

Inference speed of Computer Vision Machine Learning models attending to the underlying framework in use. Source: TensorFlow, PyTorch or MXNet? A comprehensive evaluation on NLP & CV tasks with Titan RTX. Abr. 2019
Training speed of Computer Vision Machine Learning models attending to the underlying framework in use. Source: TensorFlow, PyTorch or MXNet? A comprehensive evaluation on NLP & CV tasks with Titan RTX. Abr. 2019

Questions to ask: As pointed out, what machine learning framework does the startup uses might have a significant impact on the bottom line. It’s worth asking what framework and why. Early-stage startups will gravitate to the most abstract, well-documented, and fastest-to-deploy ones (i.e., PyTorch). More advanced ones will trade convenience for performance, showing domain expertise (i.e., TensorFlow). 

As a side note, it’s worth noting that some startups will choose a specific framework because it’s the one used by the research team that published the model. You want to be able to replicate the same performance, so you’ll stick to what the inventors used. In that sense, the recent move by OpenAI towards PyTorch might have a significant impact in the future.

Use of Machine Learning frameworks in arXiv research papers. Source: Which Deep Learning Framework is Growing Fastest? Jeff Hale, Apr. 2019.

The Team

Despite all the math and computational resources, the core of an AI startup, as always, it’s the team. And of course, hiring a skilled Data Scientist is expensive and hard due to the low numbers of them. 

Source: The Most In Demand Tech Skills for Data Scientists. Jeff Hale, Dec. 2019.
Source: The Most In Demand Tech Skills for Data Scientists. Jeff Hale, Dec. 2019.

It’s interesting to see that while languages like Python or R have been the bread and butter of most data scientists, developers with machine learning framework knowledge are on the rise. 

Despite the moniker, real AI startups, those that develop or adjust complex Machine Learning models, are rare. The main reason is talent. It’s not easy to find a good Ph.D. researcher willing to go and work for a startup. A good sign of quality AI startups is the number of Ph.D. they have in the team. While having researchers doesn’t increase the odds of success, it gives an edge. 

Questions to ask: When it comes to the team, understanding what languages and frameworks they work with is essential. It will give us a measure of their current technical capabilities. The same applies to Ph.D.s in the startup. While it’s not bulletproof, it at least shows an active link back to academia and reliable research. 

AI Startup Checklist

Assessing the future success of a startup is closer to fortune-telling than a science. When it comes to analyzing Deep Tech companies, like some AI startups, the process becomes increasingly harder. Without domain knowledge, it’s complex to understand the challenges and limitations of the product or service. I hope that the following (long) introduction to AI startups can serve as a quick manual for investors or anyone interested in understanding this space.

Checklist Questions

    1. Are they developing their own AI model or using an existing one?
    2. What AI models are they using, and for what exactly?
    3. Are they using a pre-trained model, or are they doing their own training?
    4. What data does the model require, and where/how are they acquiring it?
    5. What’s the performance of the model compare to other solutions?
    6. Does the model require constant training? If so, how long it takes, what cost, and where are they executing that training?
    7. What’s the model’s inference latency, what’s the cost and where are they executing the inference?
    8. What Machine Learning frameworks, if any, do they use, and why?
    9. Who in the team is in charge of the AI models, and what’s their expertise?
    10. What’s the roadmap and strategy to develop their own core AI?