Tagged "model-quantization"
- Qwen3.5 Series Releases Comprehensive Model Lineup Across All Tiers
- Qwen3.5-27B Identified as Sweet Spot for Mid-Range Local Deployment
- Kioxia Sampling UFS 5.0 Embedded Flash Memory for Next-Generation Mobile Applications
- Anthropic Has Never Open-Sourced an LLM: Implications for Local Deployment Strategy
- Breaking the Speed Limit: Strategies for 17k Tokens/Sec Local Inference
- How Do You Know Which SKILL.md Is Good?
- Open-Source llama.cpp Finds Long-Term Home at Hugging Face
- Future of Mobile AI: What On-Device Intelligence Means for App Developers
- The Complete Stack for Local Autonomous Agents: From GGML to Orchestration
- Ouro 2.6B Thinking Model GGUFs Released with Q8_0 and Q4_K_M Quantization
- GGML Joins Hugging Face: What This Means for Local Model Optimization
- DietPi Released a New Version v10.1
- CPU-Trained Language Model Outperforms GPU Baseline After 40 Hours
- Strix Halo Performance Benchmarks: Minimax M2.5, Step 3.5 Flash, Qwen3 Coder
- Qwen3 Coder Next Remains Effective at Aggressive Quantization Levels
- [Release] Ouro-2.6B-Thinking: ByteDance's Recurrent Model Now Runnable Locally
- I Thought I Needed a GPU to Run AI Until I Learned About These Models
- Running Local LLMs and VLMs on Arduino UNO Q with yzma
- Enhanced Quantization Visualization Methods for Understanding LLM Compression Trade-offs
- Local Vision-Language Models for Document OCR and PII Detection in Privacy-Critical Workflows
- Hardware Economics Shift: DDR5 RDIMM Pricing Now Comparable to GPUs for Local Inference
- Qwen 3.5-397B-A17B Now Available for Local Inference with Aggressive Quantisation
- Ask HN: What is the best bang for buck budget AI coding?
- Alibaba Unveils Major AI Model Upgrade Ahead of DeepSeek Release
- Running Mistral-7B on Intel NPU Achieves 12.6 Tokens/Second
- New Header-Only C++ Benchmark Tool for Predictive Models on Raw Binary Streams
- GLM-5 Released: 744B Parameter MoE Model Targeting Complex Tasks
- Community Member Builds 144GB VRAM Local LLM Powerhouse