Model Parallelism

There are several forms for modern parallelism. Two commonly-used are:

Tensor Parallelism distributes the weights within a layer between GPUs.
Intra-layer parallelism distributes full layers between GPUs.

Model parallelism makes it possible to use models that would be too large for a single GPU’s VRAM by using the VRAM of multiple GPUs.

Model parallelism can also speed up models by performing work in parallel. This is clearly the case for tensor parallelism, but the same can be achieved for layer parallelism through pipelining.

🥝 Daniël's Garden

Explorer

Recent Updates

Socials

Hardware

OPNsense KPN Fiber

Model Parallelism

Graph View

Backlinks