Why This Article?
Houston, we have a problem. The volume of data is exploding, software is more complicated, and all that’s happening while processing power is reaching its limits. I’ve argued that our civilization is at an inflection point, where the value to be gained from the data bomb can compound at unprecedented rates with Artificial Intelligence. But to unleash Artificial Intelligence in all its splendor, the hardware behind this stuff must make a breakthrough. The hardware, at the moment, is the bottleneck in the path towards a new information revolution. This is big.
It’s amazing that all this has happened in the last 3-5 years. And the next 3-5 years will look very different. The innovators of the world in California, Seattle, Korea, Japan and China are working very hard to resolve the bottleneck. They must, because they know what the world wants: Autonomous cars, Robot factory workers, Virtual Reality that’s actually useful, Her, …and so on.
This whole discussion is my attempt at trying to find investment opportunities in hardware innovations that will enable the new information revolution that is Cloud-AI-IoT-5G-Edge-VR.
So, what are they working on? And how can I invest in it?
Pop Open the Hood
There are 4 main parts inside “an AI machine” (or any computer, really):
Of these four, I’ll focus on the first two – Memory and Processing. In the AI realm, I’ve already taken a first crack at the software part of it. The Network part of it is a bit beyond my reach now. Memory and Processing is where a lot of the work is being done to deal with the double-whammy of the Data Bomb and AI.
Data needs to be stored, and it needs to be processed. Simple enough. But the problem is that the dominant forms of Memory and Processing is use today no longer suffice. As we keep moving towards the new paradigm with more data and more AI, we need more memory, and faster processing. The movers and shakers of those worlds are obliging but at the moment they are being outpaced by the volume of data and the proliferation of “intelligent” programs. But with every problem, there is An opportunity.
I’ll dive right in. First, we need to split Memory into two types:
- Hot Memory
- Cold Memory
I don’t know if these are official terms, but I’ll borrow it from a magazine I read. Cold Memory should actually be called Storage. This is your Hard Disk Drive (HDD), if you will. Lately, you may have heard the term “Solid State Drive” (SSD), which companies like Apple tout as a cool new thing. So, Cold Memory is where you might store your files, while they’re not being used. Think of Cold Memory as Permanent Memory.
Hot Memory refers to a more temporary kind of memory. This type of memory is used in conjunction with the processor (where all the computing is actually done). When you open an application and do some math in Excel, for example, the processor needs this type of temporary memory as it does its computation. Hot Memory is where a bulk of the innovation is taking place. Companies like Micron and Samsung are spending billions to figure out the best type of Hot Memory technology that can unleash AI on the Cloud. Currently, our choices are not ideal, because they weren’t originally built for AI and the Data Bomb.
There are two main types of Hot Memory in use today:
- Dynamic Random Access Memory (DRAM)
- NAND Flash (Flash)
Each one has advantages and disadvantages within the context of AI and Big Data. DRAM is more popular today, because it’s multitudes faster than Flash Memory. A large part of the reason is that DRAM has “random access” to memory bits, while in Flash, it’s “sequential access”. Basically, “random” is much faster. But speed is not the only consideration.
Flash has many things going for it. I’ll touch upon each advantage briefly:
- Price: Flash is much cheaper than DRAM measured by cost/bit.
- Non-Volatility: This may be Flash’s biggest selling point. DRAM is volatile, which means that it doesn’t store data if power is switched off. Flash is non-volatile – it doesn’t lose data if power is switched off.
- Power Efficiency: Flash is much more power efficient. The “random access” feature of DRAM makes it very power-hungry. In a Cloud data center, this is a massive consideration.
There is no easy answer. DRAM is more popular because of speed. But it’s volatility, cost, and power needs are far from ideal. Flash has its problems too, mainly that it’s too slow. You and I may not notice the difference in speed, but in a Cloud environment that runs thousands of AI programs, every nano-second counts.
Progress is being made to mesh the two technologies together, so we have the best of both worlds. Intel and Micron co-worked on something called 3D X-Point, which is supposed to offer features of both DRAM and Flash. I’m not sure whether this is the holy grail – it’s unclear to me whether it is the “best of both worlds” or “neither here nor there”.
The ideal scenario for AI and Big Data computing is this: Non-Volatile memory that’s random-access (so it’s fast), all while being power-efficient and cheap. Tough ask, but the 800-pound gorillas are tirelessly working towards it. More on them in the last section.
Now onto the more fun part of the AI brain.
This is a meaty topic, so I’ll need to curb my enthusiasm and keep it as brief as possible. Remember that the goal is to find investment opportunities, not to attain a Computer Engineering Diploma. They would, of course, be highly correlated.
Before I start processing, well, processing, let me just lay out the landscape as it exists today. This is my understanding of it:
Let’s take them one by one.
CPUs: Most computer, including your laptop, has at least one Central Processing Unit (CPU). These are general-purpose integrated circuits that perform a variety of functions for us. For our most pressing needs – like Facebook, Cat Videos, Email and (occasionally) Work – CPUS get the job done. Most of us don’t run convoluted Neural Network Computing logic on a regular basis. For those that do – CPUs just don’t cut it.
GPUs: Over the last 3 years or so, AI has gone mainstream. And it’s only just begun. With the Data Bomb and AI, logic is rarely sequential. If you try to visualize it, it looks less like a Decision Tree and more like a convoluted lattice. It turned out that GPUs (Graphics Processing Units), used mostly in videogames, were far superior to CPUs that this type of “Parallel Computing”. For the last couple of years, GPUs have been immensely popular as Machine Learning was becoming mainstream. GPUs excel at performing several computers simultaneously, while consuming less power than CPUs. The problem is that while they might be good at heavy-duty Matrix Algebra, they’re not particularly good at more convoluted logic structures beyond Machine Learning. Besides, they were really built for graphics, not for “intelligence”.
FPGAs: Field Programming Gate Arrays are a new thing. These are integrated circuits that are programmable by a customer after the chip has already been made. This gives AI programmers a whole new level of flexibility while they write their programs, and while they “train” their programs. The customizability of FPGAs makes processing fast and efficient, especially for more convoluted logic structures. If an engineer can tailor the “logic gates” suited to a particular algorithm that may be the need of the hour, it can outperform GPUs by a big margin.
ASICs: Application-specific Integrated Circuits are the ultimate in “tailor-made” chips for AI. These chips are built for specific applications, as the name suggests. In other words, if the logic structure and its application are known in advance, nothing beats ASICs in terms of speed and efficiency; not even FPGAs. Of course, that level of customization comes at a cost. ASICs are decidedly more expensive than FPGAs, GPUs and CPUs.
It seems to me (and other commentators who know much more than I do) that GPUs have played a superb role in being the “bridge chip”. While the big chip manufacturers took their own sweet time to wake up to the reality of the Big Data + AI computing paradigm, GPUs did the job reliably, especially in Machine-Learning applications. The future, however, is more complicated. And the big chip guys have now woken up and smelled the coffee.
Based on most of the stuff I’ve read, and upon scuttlebutting around by talking to two experts in the world of semiconductors (one heavily involved in FPGAs, and the other who works at a Research Firm that some of the big chip manufacturers depend upon), it seems that the AI world will include both FPGAs and ASICs in some form or the other. FPGAs are flexible, which means that they are clearly the logical choice for “training” an AI system. Let’s put it this way: the more general and “all purpose” an AI program is, the better it is to use an FPGA architecture. That’s because the end-use of that AI program may not be known with absolute surety. However, if the end-use is known with absolute surety, ASICs are clearly the right choice. They may be more expensive, but let’s remember that hardware costs are mostly fixed. As demand for AI increases, margins and profits improve dramatically.
A more Perfect Union
Memory and Processing are two facets of computing that are going through dramatic changes now, forced by the sheer volume of data and the proliferation of “intelligent programs”. As you would expect, these changes are happening in unison, more or less. Each type of company – Memory or Processor – keeps tabs on what the other is doing. After all, one cannot function without the other. At this point, I thought it would be entertaining to imagine a perfect scenario that might shake out of all this scrambling.
So, let’s imagine a company like Amazon, Microsoft or Google, which sell Cloud Services and AI applications all rolled into one neat platform. What would they (the biggest buyers of Memory and Processors) like to have in their massive data centers from which all this good stuff is distributed?
Here’s a cool scenario: 70% FPGAs and 30% ASICs, with Non-Volatile Random-Access Memory, both integrated with chips for Hot Memory and separated from chips for Cold Memory. Easy-peasy.
We may be a few years away from this scenario. On the Memory side, some of the big guns are also involved in Processors, which may prove to be an advantage in the long run.
You may be wondering why I chose a 70/30 ratio for the FPGA/ASIC split. It’s just a guess. My thinking is that the big vendors of Cloud Services and AI tools will need to offer more flexibility to the buyers of these tools. Imagine a Tesla or BMW using tools on Amazon AWS or Microsoft Azure to build an Autonomous Car Operating System. They may need more flexibility to “train” the system based on a variety of data structures that a car might face in a given day. An FPGA-type architecture allows for this type of flexibility. Once the AI logic-structure is determined, clearly an ASIC would be more efficient within the actual car. It would be faster and less power-hungry, which would be nice in a car that runs on a battery.
This brings me to Edge.
Cloud vs. Edge
Not all computing will be done on the Cloud; some of it will be done on an “Edge Device”. Edge Devices are anything that move around – cell phones, cars, robots, IoT devices. They would use AI and Data from the Cloud but some of that information will probably be processed right where the action is happening – at the Edge.
IoT, which will just exacerbate the Data Bomb, hasn’t even taken flight yet. Once that happens, I would imagine that ASICs will play a bigger role. Our smartphones today already have powerful processors. But they will need to be AI-ready as well. If all the computing is done on the cloud, latency will be a big issue as AI-infused Apps become the norm. In these cases, the hardware is known, and the use-case is known, which means that an efficient ASIC may be the best solution. FPGAs may be unnecessarily customizable and power-hungry for Edge devices.
On the Edge, that 70/30 assumption should be flipped – 70% ASICs and 30% FPGAs.
So, what should I do?
Memory is a hard nut to crack. There seems to be a general level of dissatisfaction with the way things are now – the choices being a speedy but expensive and volatile DRAM, or a slow but cheap and non-volatile Flash technology. The biggest memory companies in the world are scrambling to make something better. Intel and Micron’s 3D X-point is one such example, but there is more work to be done.
The 3 big Memory guys – Samsung, SK Hynix and Micron – must make some tough Capital Expenditure decisions: keep funding production of DRAM and NAND Flash products or spend on products that are marginally better. It seems to me that unless they can produce something that’s dramatically better than either DRAM or NAND Flash, a massive multi-billion-dollar investment may be too risky for them. Remember that much of the costs are fixed, so they need to sell massive amounts of chips to realize a decent profit. The entropy in this space has certainly piqued my interest. I will be digging into them to see how they deal with their Catch-22.
I feel a bit more confident about the prospects of FPGAs and ASICs. Of the two, ASICs for AI are newer in the arena. The biggest ASIC-for-AI player is Google. They recently released the third generation Tensor Processing Unit (TPU), which they claim is miles better than GPUs and FPGAs in terms of efficiency. I have no doubts. Many startups like Graphcore (not public) are following in their footsteps to create their own ASICs for AI. This space is just heating up. But as I mentioned earlier, there is a need for customizability. FPGAs seem to be the logical choice for AI-training and for building AI-Apps that don’t pre-suppose a certain data structure or end-use-case. In the FPGA arena, the big players are Intel (after their acquisition of Alterra), Microsoft (with their partnership with Intel), and a company called Xilinx that specializes in FPGAs.
Intel seems to be all over the place – with fingers in both Memory and Processing. I mentioned that they’re active in FPGAs with their acquisition of Alterra. They’re working with Microsoft on FPGAs as well. But they had acquired another company called Nervana, which put them in the ASICs-for-AI arena. And then they’re also playing in the Hot Memory space along with Micron. On the surface, it seems like they could be a good proxy-bet for what may eventually shake out in the AI-hardware world. But the question is: are they able to shift away from their core business – CPUs – towards FPGAs, ASICs and Hot Memory in a perfectly synchronized way? Or will there be some growing pains after having missed the GPU boat?
There is one other “sector” worth looking at – companies that make equipment for these Processor and Memory companies. Two companies popped up on my quantitative screen: Applied Materials and LAM Research. They seem to have all the 800-poung gorillas of Processing and Memory as their clients. Their revenues seem to be following the upward trajectory of Big Data + AI. And it seems logical (and safer) to bet on companies that will most likely gain from whatever shakes out at the far end of this early innings in the AI-Hardware scramble. The question is: Are the revenue growth rates sustainable, and are they cyclical?
So, there we go – I have a full plate and a lot to chew on. Here is the list of companies I’ll take a closer look at:
- SK Hynix
- Applied Materials
- LAM Research
You may have noticed that I left out Samsung and Google. Mostly, that’s because they are not exactly “pure plays” in these industries. Samsung is a giant convoluted conglomerate that seems like it makes almost every type of machine in the world – from TVs to phones to ships to chips. I feel like I need an AI system to dig through their numbers.
And let’s be honest, Google is primarily a venture capital firm with Advertising as its base business. On the other hand, they are decidedly the only company at the vortex of AI, Cloud and Hardware. If I have some time, I will dig into them.
Ok. Time to work.