- Source: TensorFloat-32
TensorFloat-32 or TF32 is a numeric floating point format designed for Tensor Core running on certain Nvidia GPUs.
Format
The binary format is:
1 sign bit
8 exponent bits
10 fraction bits (also called mantissa, or precision bits)
The total 19 bits fits within a double word (32 bits), and while it lacks precision compared with a normal 32 bit IEEE 754 floating point number, provides much faster computation, up to 8 times on a A100 (compared to a V100 using FP32).
See also
IEEE 754
References
External links
Kata Kunci Pencarian:
- TensorFloat-32
- Hopper (microarchitecture)
- Ampere (microarchitecture)
- Single-precision floating-point format
- Floating-point arithmetic
- Bfloat16 floating-point format
- Long double
- Subnormal number
- IBM hexadecimal floating-point
- Normal number (computing)