White Paper: Enhancing Series B Investment Outcomes

SignalRank's dual approach with heuristic and machine learning algorithms

Sep 30, 2024

Executive Summary

In the highly competitive venture capital landscape, identifying high-potential Series B investments is crucial for maximizing returns. Traditional methods often fall short in navigating the complexities and dynamic nature of these investments, the market is too complex and large for a human to map everything going on in VC. SignalRank offers a data-driven dual approach with both heuristic algorithms and machine learning algorithms, combining expert knowledge with machine intelligence to optimize investment outcomes.

This white paper outlines our methodology, presents the results of comprehensive backtesting and Monte Carlo simulations, and demonstrates how our platform can significantly enhance investment outcomes in Series B rounds.

Introduction

SignalRank is a systematic investment platform with access to top tier startups at Series B:

Selection: SignalRank is a set of live algorithms and models trained on 50m+ data points from 1m+ fundraising rounds to select the highest potential Series Bs.
Access: SignalRank invests in these Series Bs via the pro rata of partners, seed investors who use the platform to fund qualifying follow-ons.
Returns: c.30% of SignalRank selected companies in our backtest attracted a $1bn+ valuation (vs. less than 10% in the overall market)

We are building this platform to evaluate Series B rounds. The ones that ranked high and passed dynamic threshold from our algorithms are qualified for investments.

To get the best potential Multiple on Invested Capital (MOIC), we started by developing heuristic algorithms with in-house venture capital expertise, this algorithm currently serves as SignalRank’s default selection mechanism.

In parallel, we are investing in machine-learning based algorithms which we believe should drive us to the next level of performance. Indeed, from initial evaluation, it shows consistent improvement in MOIC compared to the heuristic ones.

Before we dive deeper into SignalRank’s algorithms and analysis, It's important to note that access is also a critical factor. It’s impossible to get into every promising B round that the algorithms chose. According to the recent stats from our deal flow team, the current accessible rate is around 22%. We believe that we can increase this to more than 50% at scale once we have more seed partner relationships. In the result analysis, we will use Monte Carlo analysis to better estimate the true MOIC number we are likely to get.

SignalRank Investment Platform

System Overview

The diagram above shows a high level overview of SignalRank’s investment platform.

Data and Infrastructure

The data sources for this study are curated from specialized providers in the startup and venture capital space, such as Crunchbase and Preqin. Additionally, external data sources and expert knowledge have been utilized for valuation correction and estimation.

Conceptually, there are two primary data flows: one supporting the heuristic solution and the other supporting the machine learning (ML) solution. Both are powered by in-house data and ML pipelines deployed on Snowflake and AWS. Tasks management across different rerun schedules is handled using Apache Airflow and Snowflake’s native task scheduling capabilities.

The output from the heuristic data flow includes essential company information, statistics on funding rounds, investor details, and heuristic performance metrics for each company across different funding rounds. In contrast, the ML data flow produces feature data, trained models, and other modeling related outputs. Both data flows also generate dynamic thresholding information for company selection (a.k.a qualification).

It is important to note that these data flows are not isolated. In fact, some ML modeling features are derived from heuristic performance metrics. This intentional design allows us to effectively combine human intelligence (expert knowledge) with machine intelligence. Our study demonstrates that heuristic features significantly enhance model performance.

Applications

The outputs from both data flows are powering various application and tasks, including:

Live Candidate Qualification: This process involves scoring live Series B rounds and comparing them against a dynamic qualification threshold, which is calculated based on recent Series B rounds. The threshold represents the score a company must achieve at Series B to qualify as a SignalRank index candidate.
Backtesting: This process evaluates the performance metrics of investment algorithms by applying them to historical data over a specified time window. The key performance metrics of interest include:
- Average and Median 5-Year MOIC: Aggregated 5-year MOICs from a group of algorithm’s selections. “5-Year” indicates the return we can get at the end of year 5 from the investment. In this paper, unless otherwise specified, the term 'MOIC' will refer specifically to the 5-year MOIC. References to MOIC should be understood in this context.
- Number of Qualifiers: Ensuring that there are sufficient candidates to produce at least 100 investable Series B rounds per year.
- Unicorn Percentage: The ratio of unicorns to the total number of qualifiers.
- Absolute Number of Unicorns: The total count of unicorns identified.
Reporting: Present useful insights for SignalRank’s internal and external users.

Methodology

Heuristic Approach

Scoring

To evaluate a candidate raising a Series B round using heuristic algorithms, we begin by calculating its Round Scores. Typically, a Series B candidate will have multiple Round Scores, one for each preceding round—Pre-Seed/Seed, Series A, and Series B. These Round Scores are derived from the Investor Scores associated with each funding round. Specifically, we consider the investor’s average MOIC for the last 5 years, investment efficiency (unicorn percentage among investor’s investments) , and the number of unicorns in which they have invested. We analyze investors’ performance over the 5 years leading up to the Series B round date to avoid look-ahead bias.

These individual Round Scores are then aggregated to generate a comprehensive Company Score.

Qualification Mechanism

The algorithm maintains a dynamic threshold for candidate qualification, determined as the top nth percentile of Company Scores for Series B rounds occurring within a defined look-back window (e.g., 3 months).

For live qualification, the anchor date for the look-back window is set as the current date. For backtesting, the anchor date is set as the Series B round date of the investment candidate.

Machine Learning Approach

Motivation

In heuristic algorithms, we manually design a scoring function to evaluate candidates. This approach offers clear interpretability and demonstrates strong backtesting performance, making it an effective initial baseline.

Recent studies have highlighted the potential of machine learning applications in the venture capital space. In parallel with our heuristic approach, SignalRank has also invested in exploring the machine learning route.

Data and Features

We have accumulated over 50 million data points across various entity types from venture capital data providers, such as Crunchbase and Preqin, including detailed information on over 1 million funding rounds over the years.

The raw data is cleaned, transformed, and aggregated into modeling features through machine learning pipelines. These features capture fundamental company information, funding round dynamics, investor profiles, and more. Additionally, as previously mentioned, features derived from the heuristic pipeline, specifically round and company performance metrics, are incorporated into this process.

Modeling

With these features, we developed various models tailored to different tasks. Classification models are employed to predict company success rates (e.g., unicorn probability or the probability of achieving MOIC > N), while regression models are used to predict investment returns (e.g., MOIC predictions).

Given that SignalRank's primary focus is on optimizing average MOIC over time, we initially concentrated on experimenting with the regression model for MOIC prediction. The training target for this model is the log-scaled MOIC of a company post-Series B round.

It is well established that startup successes follow a power law distribution, where only a small percentage of startups achieve significant returns, and among those that succeed, 5 year returns can reach 10x or even 100x. Reflecting this pattern, the 5 Year MOIC of startups also exhibits a similar distribution.

The histogram above shows the Power-law distributions of that. As we can see, it’s heavily skewed where most companies have a 5 year return between zero to 8x.

Training with this type of target distribution can pose significant challenges for many machine learning models in terms of effective learning. To reduce this skewness and stabilize the variance, we apply a logarithmic transformation to MOIC as the target variable, which significantly improves model performance.

We experimented with a variety of model types, from linear and tree-based models to neural networks. Among these, Gradient Boosted Trees trained with XGBoost, demonstrated the best performance.

Qualification Mechanism

Once the model is trained, it enables us to generate prediction scores for each candidate, applicable both in live qualification scenarios and in backtesting. Given the inherent risks of data leakage, we exercise stringent caution to exclude any events or data points that are either future-related or unknown at the time of the Series B round we aim to predict.

In the machine learning workflow, similar to the qualification process in the heuristic approach, we maintain a dynamic threshold for candidate selection. This threshold is calculated using the prediction scores of all Series B rounds within a defined look-back window. The strictness of our candidate selection can be adjusted by setting the threshold at varying levels, such as the top 5%, top 10%, or top 15%, allowing us to control the trade-off between selectivity and inclusiveness.

Backtesting Results and Monte-Carlo Analysis

Setup

In this section, we present the results of backtesting, which was conducted to evaluate the performance of our investment algorithms over historical data. The primary objective of this analysis is to assess the potential returns, measured as Multiple on Invested Capital (MOIC), that could have been achieved if the algorithms were applied in past investment scenarios

Backtesting was performed using historical data before 2020. We focused specifically on Series B rounds, as our algorithms are tailored to optimize outcomes at this stage of funding. We didn’t use more recent data because our primary metrics is 5-year-MOIC and we could only measure this metric for Series B rounds prior to 2020.

It is imperative to emphasize that look-ahead bias must be avoided when scoring a historical Series B round. Only data available before the date of this funding round should be used. This applies to both the heuristic calculation of investor performance and the training and scoring of machine learning models. For example, we should not use data that is future (and unknown) relative to a Series B round to train a model and make predictions about that round with it.

The performance metrics evaluated during backtesting include the average and median 5-year MOIC, the number of qualifiers identified by the algorithms, the unicorn percentage (ratio of unicorns to total qualifiers), and the absolute number of unicorns. These metrics provide a comprehensive view of the potential success rates and return profiles of the investment strategies. Among these, average and median 5-year MOIC currently serve as the primary metrics.

Heuristic Results

The heuristic algorithms, developed using in-house venture capital expertise, demonstrated strong performance across the backtesting period. As illustrated in the "MOIC Ratio vs. Market" analysis below, the heuristic algorithm consistently outperformed the market average each year. The average 5-Year MOIC over the testing period was 5.52, with a unicorn percentage of 30%. The total unicorns identified was 259.

To account for the variability in access to promising Series B rounds, we conducted a Monte Carlo analysis. This simulation-based approach allows us to model the impact of our current access rate, approximately 22%, on the estimated MOIC. By simulating a large number of possible scenarios, we can better estimate the true distribution of potential returns and reduce the uncertainty associated with real-world investment constraints.

In this analysis, we randomly sampled 22% of selections from each year (2012-2019) and calculated the average 5-Year MOIC for the selected group. We performed this sampling process 100,000 times to obtain the distribution. The table below shows the average 5-Year MOIC for each percentile, with the median (50%) across the years averaging 4.99.

Understanding the distribution of these outcomes is essential, as it shows the full range of possibilities from the worst-case to the best-case scenarios. Notably, even in the worst-case scenario, the heuristic algorithm achieves a MOIC of 1.42. However, this worst-case estimate is likely an underestimation, as it assumes that every year is at the lower end of performance, which is extremely unlikely in practice. The same applies to the best-case scenario.

Additionally, the calculation above was based on a 22% sample rate, but as SignalRank continues to develop its partner networks, the access rate is expected to improve. To address both points, we conducted another Monte Carlo analysis. In this analysis, we first aggregated selections across all 8 years before sampling, which naturally mitigates the previous issue where the average of the minimum MOIC for each year was an underestimation. By varying sample rates, we were able to estimate the effect of different access rates.

As shown in the table above, this approach yields tighter and more realistic bounds for the minimum and maximum MOIC. With a 20% sample rate, the minimum MOIC increases to 3.3 (compared to 1.42 in the prior analysis), while the maximum decreases to 9.45 (from 13.88). By increasing the sample rate, we observe several trends that highlight the importance of achieving a higher access rate:

The minimum 5-Year MOIC increases steadily as the sample rate increases, while the maximum decreases.
The median MOIC converges towards the mean.
The standard deviation decreases, indicating reduced volatility in potential outcomes.

Machine Learning Results

Similarly, we did the same set of analysis for the machine learning approach and got the tables below.

Comparing the performance of heuristic and machine learning algorithms reveals several key observations:

The machine learning algorithms improve the average MOIC by approximately 16% (from 5.52 to 6.43) while qualifying more candidates (854 to 1088).
In the Monte Carlo analysis, which accounts for access rates, the machine learning algorithms improve the median MOIC by approximately 12% (e.g., for a sample rate of 0.20, from 5.69 to 6.40).
The heuristic algorithms achieve a higher unicorn percentage (the ratio of unicorns to qualifiers) at around 30%, compared to 25% for the machine learning algorithms.
Same pattern for varying sample rate analysis as for heuristic algorithms.

Conclusion

At SignalRank, we developed and evaluated investment algorithms designed to score and rank Series B funding rounds, leveraging both heuristic approaches grounded in venture capital expertise and advanced machine learning models. Through rigorous backtesting and Monte Carlo simulations, we were able to assess the performance of these algorithms

Our heuristic algorithms demonstrated consistent outperformance against market averages, with an average 5-Year MOIC of 5.52 and a unicorn rate of 30%. To account for the inherent variability in access to these investment opportunities, we applied Monte Carlo simulations, which revealed that even in worst-case scenarios, our heuristics maintained a solid baseline MOIC. By aggregating selections across all years before sampling, we effectively addressed the earlier underestimation of minimum returns, providing a more realistic range of potential outcomes.

On the machine learning front, the algorithms showed a marked improvement over heuristic methods, boosting the average 5-Year MOIC by approximately 16% to 6.43, while also qualifying more investment candidates. Despite a slightly lower unicorn percentage (25%), the machine learning models excelled in scenarios accounting for varying access rates, demonstrating a 12% increase in median 5-Year MOIC when access rates were considered.

Comparing the two approaches, it is evident that while heuristic methods offer strong interpretability and a reliable baseline, machine learning algorithms have the potential to push performance to the next level. The integration of human expertise with machine intelligence not only enhances the predictive power but also provides a more comprehensive approach to venture capital investment strategies.

As SignalRank continues to expand its partner networks and improve access rates, these models will become even more effective. Future work will focus on further refining the machine learning models, exploring additional features, and enhancing our strategies to maximize returns while maintaining a robust risk management framework.

In conclusion, the combination of heuristic and machine learning algorithms represents a powerful toolset for optimizing venture capital investments. Our findings also underscore the importance of access, and the integration of diverse analytical approaches in achieving superior investment outcomes.

Appendix

Alternative Minimum MOIC Estimation

Beside the access rate based Monte Carlo Analysis above, we could do a similar analysis on a different assumption. Assuming SignalRank invests in 30 Series B rounds per year, what is the minimum estimated MOIC if we consider keeping the investment pace for 1 year, 2 years and so on?

As in the varying sample rate case, we first aggregate all qualifiers from all 8 years and sampling from this group. The reason being, on an average year, we don’t know if it’s more like 2012, 2016, or 2019 etc, so by randomly sampling from the whole group, performance from an average year could be obtained. Let’s use ML results for instance.

As we can see from the table above, similar to the varying sample rate case, by increasing investment years from 1 to 8

minimum MOIC increase from 1.48 to 3.50
median MOIC gets closer to mean, with smaller standard deviation (from 3.29 to 1)

A guest post by

Yi Ma

White Paper: Enhancing Series B Investment Outcomes

SignalRank's dual approach with heuristic and machine learning algorithms

Executive Summary

Introduction

SignalRank Investment Platform

System Overview

Data and Infrastructure

Applications

Methodology

Heuristic Approach

Scoring

Qualification Mechanism

Machine Learning Approach

Motivation

Data and Features

Modeling

Qualification Mechanism

Backtesting Results and Monte-Carlo Analysis

Setup

Heuristic Results

Machine Learning Results

Conclusion

Appendix

Alternative Minimum MOIC Estimation

Discussion about this post