How to assess Artificial Intelligence (AI) Startups (Part I)

This is Part I of a two-part series on «How to assess Artificial Intelligence (AI) startups». Check out Part II here. During the past few years, we’ve witnessed a massive growth in the number of so-called Artificial Intelligence (AI) startups. Both sides of the table, investors and entrepreneurs, have jumped into the frenzy with intent.

This new trend responds to significant advancements in AI research. As real as these are, it’s essential to question the ongoing avalanche of companies using «AI» that has followed.

Comparison of different AI systems for automatic generation of speech, and how they rank against human speech quality. Source: Deep Mind Wavenet Generative Model for Raw Audio.

What do we mean by AI Startup?

It’s easy to throw the word «AI» around, but it’s much harder to qualify what do we mean by that. Artificial Intelligence or AI, is a term that aggregates many, and varied mathematical approaches that try to emulate intelligent behavior.

When a company tells us that they’re an AI Startup, we should ask ourselves what exactly do they mean by that. Each AI algorithm is useful for a specific set of problems. There isn’t, as of this writing, a general AI tool that delivers magical answers for any challenge.

Questions to ask: Because each algorithm is better at a specific set of problems, the first question we should ask a startup is what problem they are trying to fix with AI. 

These algorithms aren’t easy to develop and require extensive mathematical knowledge. Real breakthroughs are far and few between. Such is the case that the core math concepts behind most of the well-known AI algorithms are decades old.

The reason why this math didn’t produce the expected «intelligence» was that, at the time, we lacked computational scale. Recent advances in both data and computational power have finally provided what AI needed for a breakthrough.

Precisely because these algorithms aren’t easy to develop, most current «AI» practitioners, are really «AI» operators, not researchers. Deploying these algorithms, though, isn’t easy either. In most cases, they demand sophisticated expertise and extensive data and infrastructure. As complicated as it might be, there is still a big difference between designing an AI algorithm and executing it.

“We individually reviewed the activities, focus, and funding of 2,830 purported AI startups in the 13 EU countries most active in AI – Austria, Denmark, Finland, France, Germany, Ireland, Italy, the Netherlands, Norway, Portugal, Spain, Sweden, and the United Kingdom. Together, these countries also comprise nearly 90% of EU GDP. In approximately 60% of the cases – 1,580 companies – there was evidence of AI material to a company’s value proposition.» The State of AI: Divergence 2019 MMC Ventures Report

When assessing an AI startup, it’s critical to understand the difference. Using a tool and designing a tool are two very different things. Sometimes the tool designers have an edge; other times, it’s the efficient use of it that matters.

Questions to ask: Are they a core AI company (i.e., developing their algorithms and math), or are they using an existing algorithm and applying it to a vertical problem?

The rise of Deep Learning

Nowadays, when a company mentions AI, chances are, they’re referring to either Machine Learning or Deep Learning algorithms. People use these terms interchangeably, which isn’t wrong, but it’s inaccurate. Deep Learning is a class of Machine Learning algorithms. To be more precise, it’s a type of algorithm closely related to what’s known as Artificial Neural Networks (ANN). The goals, algorithms, requirements, and subtleties of each Machine Learning approach can vary.

Questions to ask: When evaluating a startup, it’s important to pinpoint what type of Machine Learning algorithms are they using. The goal is to understand the needs or limitations of such an algorithm.

But why Deep Learning? With so many AI algorithms, startups have a vast toolkit at their disposal. However, during the last decade, the most popular type has been Deep Learning, and there is a reason for it. In September of 2012, a Deep Learning model called AlexNet achieved, for the first time in history, quasi-human image recognition abilities. AlexNet demonstrated not only that Deep Learning was more powerful than other methods, but that its performance could rival that of a human.

The combination of AlexNet’s exceptional breakthrough and the advancement of hardware-specific improvements (i.e., GPU, FPGA, etc.) has generated a boom of research using Deep Learning methods. Since then, innovation and investment in the field have dramatically accelerated.

Evolution and proliferation of Deep Learning models for image recognition purposes. Source:

What is Deep Learning good at?

As I mentioned before, knowing what model a startup is using is essential. One of the reasons is because, so far, all AI algorithms are good at a limited range of problems. This ‘limitation’ means that to solve complex problems, researchers need more than one AI model. While Deep Learning is the new black, it’s rarely the ‘only’ algorithm in use. Nonetheless, it’s often at the core of most modern AI architectures. So the big question is, what is Deep Learning good at?

In general, Artificial Neural Networks, and by extension, Deep Neural Networks, are excellent at pattern matching. These algorithms are designed to learn, detect, and infer many different types of patterns. Several categories make good use of these properties.

Image Classification

AlexNet, the Deep Learning model that inspired the current AI wave, was designed for image recognition. It shouldn’t be too surprising that most Deep Learning models out there are also in the same category.

There are many groundbreaking applications for computer vision. Some are straightforward, like classifying images. Think about all the current ‘Cancer detecting’ models trying to distinguish a normal MRI scan from a problematic one.

In other cases, it’s about recognizing objects in an image. A good example is the growing crop of facial recognition companies. Many of these systems use a concatenation of Deep Learning models, each designed for a different purpose.

Facial Recognition example: the first step of a facial recognition system is to identify a human in a picture. Most images (i.e., traffic cameras) show many objects, many of which aren’t human. Once it detects a human, it needs to isolate the face. The next step is to analyze and process facial expressions. These cues are then compared against a database in search of patterns. Each of these stages uses a different Machine Learning algorithm.

Face Recognition Based on Deep Learning. Source: Yurii Pashchenko, Technology Stream

Natural Language Processing

Another big field, pioneered by Google, is the use of Deep Learning to help computers understand language. One of the primary use cases is to be able to decide what a text is talking about. Text classification is a big problem, and Deep Learning has made this somehow easier. It shouldn’t be a surprise that Google uses such models to run their Search Engine (Google BERT).

Language-related problems, though, can get much more complicated, like translations from and to different languages. Such systems make extensive use of Deep Learning to perform equivalences between words.

Sketch of how Google’s Translator model does the pattern matching. Source: A Neural Network for Machine Translation, at Production Scale. Sep. 2016.
Google’s Deep Learning network’s invented language used for translating. Source: Zero-Shot Translation with Google’s Multilingual Neural Machine Translation System. Nov. 2016.

Speech Recognition

Having talking robots has populated our imagination since the beginning of science fiction. Some will remember Hal, the T-800 or more recently, Her. Improvements in audio equipment and data bandwidth got us one step closer to that dream. But it’s been the use of Deep Learning algorithms, the ones that are making it a reality. We finally have a growing crop of voice AI assistants (Amazon’s Alexa, Apple’s Siri, Google Now, etc.) that can comprehend and speak our language. Some systems, like Google’s Duplex AI assistant, are borderline creepy.

In these cases, the system uses Deep Learning to match audio to text (Automatic Speech Recognition, ASR) and then the generated text to a correct answer to give to the user. Each module, again, will have its independent Deep Learning model.

Google Duplex AI demo. Google I/O 2018

Generation of content

While pattern matching is mostly used for classification purposes, it can also be used to generate utterly unique content. This category is becoming quite relevant as it’s enabling the generation of AI-generated articles, images, and videos.

While there are valid use cases like having computers write news or even music, the fastest-growing use is that of fake news. Having the capacity to create phony content at scale is at the core of any disinformation campaign.

This category is appealing, especially for the creative arts, but it also holds deep ethical and moral questions.

SYSTEM PROMPT (HUMAN-WRITTEN) Miley Cyrus was caught shoplifting from Abercrombie and Fitch on Hollywood Boulevard today. MODEL COMPLETION (MACHINE-WRITTEN, SECOND TRY) The 19-year-old singer was caught on camera being escorted out of the store by security guards. The singer was wearing a black hoodie with the label ‘Blurred Lines’ on the front and ‘Fashion Police’ on the back. Scroll down for video Shoplifting: Miley Cyrus was caught shoplifting from Abercrombie and Fitch on Hollywood Boulevard today (pictured) The singer was wearing a black hoodie with the label ‘Blurred Lines’ on the front and ‘Fashion Police’ on the back The singer was also wearing a pair of black-rimmed glasses, a black jacket, black jeans and black sandals. She was carrying a pair of black and white striped gloves and a small black bag. OpenAI GPT-2 Language Model Text Generator examples. Feb. 2019.

Questions to ask: Once we understand what algorithms a startup uses, we need to ask what they are using them for. Chances are we won’t be able to assess if this use is the best. Nonetheless, if we have a broad understanding, we can detect glaring misuses. We can ask the reverse question too. If we’ve identified a potential use for machine learning, and the startup isn’t using any, it might show their lack of commitment to AI.

This is Part I of a two-part series on «How to assess Artificial Intelligence (AI) startups». Check out Part II here.

How to assess Artificial Intelligence (AI) Startups (Part II)

his is Part II of a two-part series on «How to assess Artificial Intelligence (AI) startups». Check out the previous Part I here.

How does it work?

While the scope of this article isn’t to teach the intricacies of Machine Learning models, it’s worth exploring some of the necessary steps. This understanding is essential to understand the needs and limitations of a startup.

Source: MLPref Inference Benchmark, Nov 2019


Machine Learning models, like their biological inspiration, are based on learning skills. The first step is to provide the system with enough information to learn the patterns. You can think of them as little children. Before a kid can operate safely, you need to teach them different patterns until they start ‘inferring’ how the world around them works. The more examples you show them, the faster and better your child will learn. 

Machine Learning algorithms are like children, and that stage of teaching is what we call the «training» of the model. This first stage is the foundation of any model. Depending on the algorithm, the amount of data (examples) and the time needed for successful training will vary dramatically. In other words, some models learn faster than others, as kids do. 

Most of the well-known machine learning models require massive amounts of data to achieve excellent performance. Beyond information, this training takes time. The actual training is, in reality, a gigantic algebraic operation. Depending on your computer capabilities (or kid’s brain), this processing will take a long time or forever. As the field has matured, so has the hardware used to train such models. The difference between teaching a model on a regular computer, vs. a one equipped with ad-hoc machine learning chips is appealing. 

Tensorflow Unit Rig. Source: Now you can train TensorFlow machine learning models faster and at lower cost on Cloud TPU Pods, Dec. 2018
Cost per hour of Machine Learning model training in a Cloud provider, against time taken to train it. Source: Best Deals in Deep Learning Cloud Providers, Oct 2018.
Evolution of Floating Point Operations (complexity of the model) of Image recognition Machine Learning models against their accuracy. Source: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks, Nov 2019.

Questions to ask: Two issues are important here. Is the startup using an already trained machine learning model, or are they doing the training themselves? Using someone else’s model is cost-effective and fast. The drawback is the lack of flexibility, as you can’t teach it anything new. Training your model is very powerful, but then the question is, how and what resources is the startup using to train it. Big models will incur in higher training costs, which will impact the startup’s burn rate. 

Data acquisition & curation

As I mentioned before, each machine learning model requires a different set of teachings or datasets. Like our children, these systems learn either under strict supervision (supervised learning) or on their own, in a self-taught way (unsupervised learning). Supervised models require that we provide learning examples, with the expected solution. Unsupervised models don’t need specific answers, but an overall goal. They’ll figure the right solution on their own. 

As incredible as it sounds, unsupervised learning models are few and don’t have great performance. Self-taught systems are still being developed, but some recent advances, like those demonstrated by Google’s Deep Mind unit, are very promising. The truth is, most AI systems out there use either supervised learning or semi-supervised learning. 

Now, acquiring enough data to train supervised models is one of the biggest challenges for any company. Depending on the complexity of the problem, we could be talking from 10000 data points, all the way to millions. 

«Our model, called GPT-2 (a successor to GPT), was trained simply to predict the next word in 40GB of Internet text. 


GPT-2 is a large transformer-based language model with 1.5 billion parameters, trained on a dataset of 8 million web pages.»

Better Language Models and Their Implications. OpenAI, Feb 2019.

The datasets need to be normalized, cleaned, and completed with the expected correct answer. Most times we don’t have enough data, it’s incomplete, or riddle with errors. Other times we don’t have enough correct answers («labeled dataset»). The efficiency of a machine learning model is directly proportional to the size and quality of the training data. Acquiring useful data is probably one of the hardest parts of applying AI in any company. 

Questions to ask: There are essential questions when it comes to training data, but maybe the critical one is to ask about the origin of the training data. If the training datasets are public or easily accessible, then, despite the use of AI, the startup has a lower degree of competitive advantage. But, if the company is capable of producing its proprietary training data, its strength is much more significant. 


Once the model is trained, we can then use it to solve the problem we have. This stage is what’s called ‘inference.’ It would be equal to hiring someone for our company right out of the university. We assume the academic system (the training stage of our model) has prepared them well for the realities of the market (the inference stage). The truth is that sometimes the student is well trained, others we need to teach them some more. The same happens with a Machine Learning model. For years, the most computationally expensive stage of the model was the training. The size of the models has exploded, and executing them is as costly or even more than the training phase. 

«Inference workloads are increasing dramatically, mirroring the increase in our training workloads, and the standard CPU servers we use today can not scale to keep up.»

Accelerating Facebook’s infrastructure with application-specific hardware. Mar. 2019.

The inference stage has an added complexity that the training step doesn’t have. On most occasions, the model is deployed in applications to provide solutions (inferences) in quasi-real-time. This means that any delay or overload is unacceptable. There is a stark difference with the training phase, where we are much more flexible. As long as it takes a reasonable amount of time, we don’t care for an extra hour. An hour delay for a production system is unimaginable. 

«Despite its apparent simplicity relative to training, the task of balancing latency, throughput, and accuracy for real-world applications makes optimizing inference difficult.»

MLPerf Inference Benchmark. Nov. 2019.

Latency of executing a Machine Learning model in different AWS server types against the cost per 100k inferences. Source: Reducing deep learning inference cost with MXNet and Amazon Elastic Inference. Mar. 2019.


As the number of AI startups grow, so does the computational needs. The bigger the scale of the company, the more complex the inference stage becomes. Scaling any startup is a challenge. Scaling an AI startup is even more so. This is why selecting the right tools can have a significant impact on efficiency and costs. 

«Via AI models deployed at scale, we make more than 200 trillion predictions and over 6 billion language translations every day. We use more than 3.5 billion public images to create, or train, our AI models, which allows them to better recognize and tag content.»

Accelerating Facebook’s infrastructure with application-specific hardware. Mar. 2019.

Precisely because of the increasing hardware needs, many startup companies are turning to cloud computing as a way to offset costs. At the same time, cloud providers are scrambling to offer the best buck for the hardware. The competition is so fierce that providers like Google have developed their own hardware (TPU) chips to that effect.

«Estimates indicate that over 100 companies are producing or are on the verge of producing optimized inference chips. By comparison, only about 20 companies target training.»

MLPerf Inference Benchmark. Nov. 2019.

Evolution of performance of Alphabet’s TPU chips. Source: Now you can train TensorFlow machine learning models faster and at lower cost on Cloud TPU Pods, Dec. 2018

Questions to ask: Where is the startup executing their Machine Learning model?. Most startups will probably use cloud computing resources. What provider and hardware are they using? What are the associated costs of both training and inference?. There might be situations where it’s cost-effective to run parts of the model in-house, so it’s also worth exploring. 

The Machine Learning Stack

Source: MLPref Inference Benchmark, Nov 2019

The final performance of a Machine Learning model depends on many aspects: the model used, the size of the model, the amount and quality of the data, how it was programmed, and the hardware the inference runs on.

While hardware choice has a significant impact, the way developers code the model also matters. Due to the mathematical complexity of these Machine Learning models, developers rarely code the hard math. Instead, data scientists have built a series of abstractions, called frameworks, that allows developers to build their models quickly on top of standard Machine Learning operations. 

Overall score for Machine Learning frameworks. Source: Which Deep Learning Framework is Growing Fastest? Jeff Hale, Apr. 2019.

Each framework has pros and cons, but the choice can have a substantial impact on the overall performance of the system. For applications that don’t require real-time inferences, the speed of training is critical. For those that need to return an answer in real-time (i.e., autonomous vehicles, voice assistants, etc.), inference speed is essential.

Inference speed of Computer Vision Machine Learning models attending to the underlying framework in use. Source: TensorFlow, PyTorch or MXNet? A comprehensive evaluation on NLP & CV tasks with Titan RTX. Abr. 2019
Training speed of Computer Vision Machine Learning models attending to the underlying framework in use. Source: TensorFlow, PyTorch or MXNet? A comprehensive evaluation on NLP & CV tasks with Titan RTX. Abr. 2019

Questions to ask: As pointed out, what machine learning framework does the startup uses might have a significant impact on the bottom line. It’s worth asking what framework and why. Early-stage startups will gravitate to the most abstract, well-documented, and fastest-to-deploy ones (i.e., PyTorch). More advanced ones will trade convenience for performance, showing domain expertise (i.e., TensorFlow). 

As a side note, it’s worth noting that some startups will choose a specific framework because it’s the one used by the research team that published the model. You want to be able to replicate the same performance, so you’ll stick to what the inventors used. In that sense, the recent move by OpenAI towards PyTorch might have a significant impact in the future.

Use of Machine Learning frameworks in arXiv research papers. Source: Which Deep Learning Framework is Growing Fastest? Jeff Hale, Apr. 2019.

The Team

Despite all the math and computational resources, the core of an AI startup, as always, it’s the team. And of course, hiring a skilled Data Scientist is expensive and hard due to the low numbers of them. 

Source: The Most In Demand Tech Skills for Data Scientists. Jeff Hale, Dec. 2019.
Source: The Most In Demand Tech Skills for Data Scientists. Jeff Hale, Dec. 2019.

It’s interesting to see that while languages like Python or R have been the bread and butter of most data scientists, developers with machine learning framework knowledge are on the rise. 

Despite the moniker, real AI startups, those that develop or adjust complex Machine Learning models, are rare. The main reason is talent. It’s not easy to find a good Ph.D. researcher willing to go and work for a startup. A good sign of quality AI startups is the number of Ph.D. they have in the team. While having researchers doesn’t increase the odds of success, it gives an edge. 

Questions to ask: When it comes to the team, understanding what languages and frameworks they work with is essential. It will give us a measure of their current technical capabilities. The same applies to Ph.D.s in the startup. While it’s not bulletproof, it at least shows an active link back to academia and reliable research. 

AI Startup Checklist

Assessing the future success of a startup is closer to fortune-telling than a science. When it comes to analyzing Deep Tech companies, like some AI startups, the process becomes increasingly harder. Without domain knowledge, it’s complex to understand the challenges and limitations of the product or service. I hope that the following (long) introduction to AI startups can serve as a quick manual for investors or anyone interested in understanding this space.

Checklist Questions

    1. Are they developing their own AI model or using an existing one?
    2. What AI models are they using, and for what exactly?
    3. Are they using a pre-trained model, or are they doing their own training?
    4. What data does the model require, and where/how are they acquiring it?
    5. What’s the performance of the model compare to other solutions?
    6. Does the model require constant training? If so, how long it takes, what cost, and where are they executing that training?
    7. What’s the model’s inference latency, what’s the cost and where are they executing the inference?
    8. What Machine Learning frameworks, if any, do they use, and why?
    9. Who in the team is in charge of the AI models, and what’s their expertise?
    10. What’s the roadmap and strategy to develop their own core AI?