With 512-bit vectors and 8x8x4 matrices, each dojo core comes close to a full BF16 TFLOP. The result is something that looks more like a microprocessor but is wide like a modern desktop CPU.