How many GPUs do I need to be able to serve Llama 70B? In order to answer that, you need to know how much GPU memory will be required by the Large Language Model.
The formula is simple:
$$ M=\frac{(P * 4B)}{(32/Q)} * 1.2 $$
How many GPUs do I need to be able to serve Llama 70B? In order to answer that, you need to know how much GPU memory will be required by the Large Language Model.
The formula is simple:
$$ M=\frac{(P * 4B)}{(32/Q)} * 1.2 $$