Tagged "moe-models"
- Dynamic Expert Cache in llama.cpp Achieves 27% Faster Inference on Large MoE Models
- Homelab Consolidation: Replacing 3 Models with Single 122B MoE Model on AMD Ryzen AI MAX+
- Liquid AI's LFM2-24B Achieves 50 Tokens/Second in Web Browser via WebGPU
- New Open-Weight Models Released: GigaChat-3.1-Ultra and Lightning Variants
- Comprehensive MoE Backend Benchmarks for Qwen3.5-397B: Real Numbers vs Hype
- Krasis Hybrid MoE Runtime Achieves 3,324 tok/s Prefill on Single RTX 5080