I’ve been impressed by the recent improvements in the capability of machine learning models. As it becomes increasingly probable that we are entering a decade of historically extraordinary progress in ML, it becomes important to think about who is most likely to capture this value. Will it be the companies that train machine learning models who can demand any price? Or will it be some other, new type of player?
To pattern our thinking, it’s worth keeping in mind two general, if not vague, lessons that we can read from the recent history of ML progress.
The first lesson - more famously known as the “bitter lesson” - is that it is primarily the availability of compute power that drives progress in machine learning, and not particularly discoveries of special tricks or novel designs. The most recent and best performing models are those that have less interesting design than usual, but instead have benefited from far more computing power.
If this remains true, the lack of any “secret herbs and spices” will prevent true differentiation - and instead will make the ability of a model provider to compete tied almost directly with its ability to access capital. It is, after all, this capital that compute is purchased with, and which is leveraged to push the capabilities of ML. People are far less of a bottleneck than they are in tech more broadly, where the ability to innovate is instead tied more directly to the organizational ability to hire.
In that way, the business of model providers looks similar to that of cloud providers. Cloud providers like AWS must purchase hardware - a lot of it - and then sell access to that hardware with a margin. And at first glance, AWS has made Amazon a lot of money, so being the next AWS is not necessarily a bad place to be in.
But there’s a difference hidden by this analogy. Cloud services like AWS could be better thought of as vast platforms of many interconnected products. Each product makes the other more valuable - you can bait a developer in by selling cheap and easy-to-use storage, for example, and then all the other products become progressively easier to sell once the developer is already set up on the platform. This also has the effect of tying the user to the platform itself: migrating every single obscure product onto a competitor is enough of an expense to ward off everyone except the most unsatisfied from doing so.
In contrast, unless they manage to differentiate themselves, model providers might not even have that ability. This ties into a second lesson: as models get bigger, and more general data is used to train them, they begin to display more general capability.
We have seen this with language models like GPT-3, which display a growing competence in general text generation of any kind – whether writing poetry, translating French, or even doing math, even if poorly. More interestingly, Google thinks this will continue to be true into the future, hence their bet on the Pathways approach, a strategy to combine what were traditionally several distinct models into one.
This power of the general model is bad news to the companies that provide it. If better models will display more general capabilities, then that means that they will become increasingly interchangeable. Highly general models, which can do some task of interest like accounting reasonably well, thereby become easily replaceable and lose the ability to impose a switching cost that other businesses like AWS enjoy.
Without switching costs, the remaining factors that can be competed on are capability and price. And, as we saw earlier, capability is mostly downstream of capital investment, the size of which dictates the required pricing so as to yield a return. And so if they cannot offer capability that is very specific and hard to copy, their choices whittle down to competing on price alone.
If model providers can compete only on price, then that gives the power to the companies which build on top of them - who take this commodity and shape it into something valuable and unique.
This might seem like small fish to fry compared to actually creating the ML models. But it’s important to not forget that there are many economic niches which remain underserved, and consist mostly of tedious and unfulfilling mental labor. These niches are not best served by giving them access to an API endpoint and some credentials, but rather by discovering, iterating, and designing around their use case. This takes focus and human skill to execute, like startups more generally do today.
In the long run, even this will be best done by some future, super capable ML model. But the mental exercise of imagining such a long run is of limited usefulness when very little is knowable. And the long run is, in the end, path dependent on what happens in the near term, back on the time scales of mere months and years.
Lacking the ability to make sense of a hypothetical far-off future, the only thing left to do is to build and worry about the rest later. It remains early days for ML. But even now, we are entering a period where general-purpose models are good enough to solve an increasingly broad set of seemingly mundane problems. Working on these solutions may not be fashionable, but it is something even better: it is useful. And as such, I suspect that there are now opportunities to build businesses of this kind even today – if only one is to go looking.