Google Tensorflow ASIC and its impact


New Member
Dec 3, 2015
I think the TPU isn't as revolutionary as it may seem, since a large chunk of the performance increase is due to computing with 8 bit integers rather than 32 or 64 bit floating points. That alone will give you at minimum a 4x speedup. Nvidia's Pascal also moves in this direction with support for FP16 (reportedly 20Tflop/s at that FP16 vs 10Tflop/s at SP). The rest of the alleged performance increase (around 2x) will be due to the optimized ASIC implementation and integer arithmetic instead of floating point.

The interesting question for me is why other large players have not jumped on the opportunity of easy 2x or 4x speedups by getting rid of SP/DP, since it's commonly known in the Deep Learning community that there's often no need for more than 8 or 16 bits of precision. Only Nvidia is moving slowly in that direction, basically sacrificing DP performance in many of their GPU's and the upcoming providing FP16 support. Would it be that Google with all their applications is in the perfect position to establish the exact requirements for deep learning at the moment?


Feb 7, 2014
It's the form factor here that makes this incredible and very interesting.

We know cloud providers and other similar compute companies have custom hardware developed and manufactured to make use of every inch of a physical data center. The push towards modular units that can quickly swapped out when they die (similar to HDDs) seems like the move for many of these companies. This just is an evolution in blade server technologies that's bridging the gap between architectures.

I can't wait till we can play with these units via cloud providers. I wouldn't mind paying $/h to run some experiments!