Google recently released the Tensor for its deep learning algorithm Flow design of the special integrated chip, named Tensor Processing units (Tensor Processing Unit: TPU), as we all know, compared with Google's software products, it designed the hardware products of less high-profile appearance often, occasionally whimper ending to the end. But this time on the new hardware is very good at Google and leading the depth of the study. Combined with the recent Google and each big tech giants in the field of AI, it's not hard to guess the chip is of great significance for Google and AI developers, and a profound impact on industry. But why should Google in general hardware performance has very powerful now will still be in order to develop a deep learning algorithm hardware, and the chip and what it means for other giants, to explain, it is not an easy thing.
Depth study of the operation process
For any operation to replace the new hardware is looking for two purposes: faster speed and lower power consumption. That looks empty words and deep learning, in its essence is just a lot of computation. We all know the old saying: the efficiency of the universal tool you never special tools. CPU, GPU, and FPGA, its attribute is a common tool. Because they can handle many different tasks. And dedicated TPU nature from the point should be several hardware efficiency are higher than the previous. The high efficiency, here is not only refers to the faster, also refers to the energy consumption will be lower.
But we can't make sense, but also pose some data. In fact, Xilinx once said in particular deep learning FPGA development environment can effect comparing to 25 times of the CPU/GPU architecture, is not a two times, is 25 times! Students can take out a pen and paper, let us give a practical examples explain the cause of the efficiency: with in depth (within DNN) neural network for image recognition as an example, the entire structure of the network is roughly like this:
Google design of TPU chip which manufacturers causes crisis?
Image: gitbooks. IO
Which in addition to the input layer is used for image feature extraction for function, the output layer to the output, the rest of the hidden layer is used for identification and the analysis of features of image. When a picture is input, the first layer, hidden layer will first per-pixel analysis on it. At this time of the analysis to extract the first image of some general characteristics, such as some rough lines, colors, etc. If the input is a face image, the first would be received some leaning color transformation.
The first layer of the node according to the results of the analysis of the obtained signal decided whether to down a layer of the output signal. The so-called analysis process, from the point of view on mathematics is each hidden layer nodes in by a specific function to deal with adjacent nodes with weight value from the data. Output and decide whether the next layer, each layer analysis is completed after some nodes usually won't down a layer of output data, right now, to receive a layer a layer of data, can be based on the identified some of the more complex features, such as eyes, mouth and nose,
After successive increase recognition degree, at the highest layer and algorithm will be completed for all facial feature recognition, and in the output layer is given a result judgment. Based on the different application, the results may have different performance. Such as distinguish who is this face.
Google design of TPU chip which manufacturers causes crisis?
Photo: the heart of the machine
It's not hard to imagine, as a result of analysis of each layer will be made to a huge amount of data operation, thus a high computing performance of the processor. The CPU board will show obvious, after years of evolution, the CPU according to the need to strengthen its positioning for the ability of logic operations (such as If the else). Relative did not improve much pure computing power. So in the face of such a large number of CPU computing will inevitably feel hard. Naturally, people would think of using GPU and FPGA to calculate.
The FPGA and the GPU
In addition to the actual calculation ability and associated hardware computing speed, also related to hardware can support instructions. We know that some higher order operations into low order operation resulting in a decline in the efficiency of the calculation. But if the hardware itself to support the higher order operations, it wouldn't have to break it down again. Can save a lot of time and resources.
The FPGA and has a large number of cell within the GPU, so their computing power are strong. At the time of neural network computing speed is faster than the CPU will be many, but there are still some difference between both. Due to the fixed hardware architecture after GPU native support instructions are fixed. If neural network arithmetic of GPU does not support the command, for example, if a GPU, only supports the addition, subtraction, multiplication, and division and our algorithm requires the matrix-vector multiplication or convolution operation, the GPU cannot directly, only through software simulation methods such as addition and multiplication of circulation, speed will be slower than after the programming of FPGA. For a piece of FPGA, if there is no standard of FPGA "convolution" instruction, developers will be able to in the FPGA hardware circuit "field programming". Is equivalent to by changing the FPGA hardware structure of FPGA can native support for convolution computation, so the higher efficiency than the GPU.
In fact, at this point, we have been close to Google the development of TPU. TPU is a kind of ASIC, which is a kind of similar to the FPGA, but there is no special chip of customizability, as described in Google, Tensor is designed for its deep learning language Flow and development of a chip. Because it is designed for Tensor Flow preparation, so it's no need to Google it have any customizability, as long as can perfect support Tensor Flow need to all of the instructions. At the same time, TPU run Tensor Flow efficiency will no doubt is the highest of all equipment. This is the most obvious purpose of Google development of TPU: the efficiency of the pursuit of perfection.
Have any influence on other vendors?
There are many manufacturers have invested a lot of energy in the field of AI and showed some ambitions, before the TPU release, most manufacturers in this field are at the same time use the FPGA and GPU to improve own neural network training algorithm. NVIDIA is one special: it is one of the world's largest manufacturer of GPU, has been in the spare no effort to promote their products are applied in the field of deep learning. But in fact, the GPU main is not designed neural network arithmetic, but the image processing. More is due to its special structure happen to is also applicable to the neural network algorithm, although the depth of the NVIDIA are also introduced some of their own learning algorithms, but due to the nature of the GPU itself has been the FPGA with a head pressure. And the TPU will let this one more competitors out of thin air on the market, so if I vote, I think this kind of TPU is one of the biggest influence on NVIDIA. Although Google says it will not sell the chip to other companies, means that it will have direct competition with other companies. But Google chips if the performance is more outstanding, more and more developers and service providers to Google services, rather than using other vendor's service and hardware, it will also indirectly to other manufacturers, especially the NVIDIA business impact.
Second is the "middle level" represented by Intel vendors, Intel at the end of last year bought a world-renowned Altera FPGA companies, obviously not ignore its potential in depth study. But so far did not explicitly Intel showed his determination to enter the field of artificial intelligence, also said he would introduce no artificial intelligence product, but only express some apply relevant features to their existing products in the will. Actually for the manufacturer, unless of course they are secret for a long time of product development of artificial intelligence. Or in fact the market more competitive just as they had brought a better tool, not to cause any damage to their interests. Of course, if Google also intends to design their own future CPU is another matter. But Google says it does not have this kind of desire, and that's another story.
Others have been in the field of artificial intelligence and Google had similar levels of achievement of the company is expected to not be affected by too much, such as Microsoft and apple. Microsoft has been exploring the FPGA related operation speed of artificial intelligence, and have their own development algorithm. After a long time of debugging, the algorithm based on FPGA is also not necessarily how much loss to Google on the final performance. If Microsoft would like to, but it can also be ready to start developing an own artificial intelligence chip, after all, Microsoft is also of its own development over a lot of dedicated hardware.
The plight of NVIDIA
NVIDIA vigorously promote their GPU to deep learning algorithms of accelerating effect, but in propaganda about FPGA, is, of course, knowing that their products not only not obvious advantages compared with the FPGA has some deficiencies. The TPU to battlefield will no doubt more aggravate the pressure of the NVIDIA. GPU features in a short period of time can't be earth-shaking changes, NVIDIA in the short term the most should do in the field of artificial intelligence, may be to find suitable for GPU plays a special scene then carry forward. As to compete head-on with FPGA and TPU, may have to wait for several generations.