What is a Hybrid Expert (MoE)? Why are many popular models with large parameters but not so large activations?
Mixture of Experts (MoE) is a model architecture that "doesn't put the whole model together every time". Its most important feature is that some layer...