Quantizer width notation

In quantization we often see the notation WmAn, for instance W8A16. This describes the width of the weights (W) and activations (A). For instance a W4A16 quantizer will use 4-bit weights in a linear layer, but will use 16-bit inputs (activations).

Usually the width of the weights is equal to or less than the activations. Different configurations have benefits and downsides: