Tagged "model-quantization"

Qwen3.5 Series Releases Comprehensive Model Lineup Across All Tiers 25 February 2026
Qwen3.5-27B Identified as Sweet Spot for Mid-Range Local Deployment 25 February 2026
Kioxia Sampling UFS 5.0 Embedded Flash Memory for Next-Generation Mobile Applications 24 February 2026
Anthropic Has Never Open-Sourced an LLM: Implications for Local Deployment Strategy 24 February 2026
Breaking the Speed Limit: Strategies for 17k Tokens/Sec Local Inference 23 February 2026
How Do You Know Which SKILL.md Is Good? 23 February 2026
Open-Source llama.cpp Finds Long-Term Home at Hugging Face 23 February 2026
Future of Mobile AI: What On-Device Intelligence Means for App Developers 23 February 2026
The Complete Stack for Local Autonomous Agents: From GGML to Orchestration 23 February 2026
Ouro 2.6B Thinking Model GGUFs Released with Q8_0 and Q4_K_M Quantization 22 February 2026
GGML Joins Hugging Face: What This Means for Local Model Optimization 22 February 2026
DietPi Released a New Version v10.1 22 February 2026
CPU-Trained Language Model Outperforms GPU Baseline After 40 Hours 22 February 2026
Strix Halo Performance Benchmarks: Minimax M2.5, Step 3.5 Flash, Qwen3 Coder 21 February 2026
Qwen3 Coder Next Remains Effective at Aggressive Quantization Levels 21 February 2026
[Release] Ouro-2.6B-Thinking: ByteDance's Recurrent Model Now Runnable Locally 21 February 2026
I Thought I Needed a GPU to Run AI Until I Learned About These Models 21 February 2026
Running Local LLMs and VLMs on Arduino UNO Q with yzma 19 February 2026
Enhanced Quantization Visualization Methods for Understanding LLM Compression Trade-offs 19 February 2026
Local Vision-Language Models for Document OCR and PII Detection in Privacy-Critical Workflows 19 February 2026
Hardware Economics Shift: DDR5 RDIMM Pricing Now Comparable to GPUs for Local Inference 19 February 2026
Qwen 3.5-397B-A17B Now Available for Local Inference with Aggressive Quantisation 17 February 2026
Ask HN: What is the best bang for buck budget AI coding? 17 February 2026
Alibaba Unveils Major AI Model Upgrade Ahead of DeepSeek Release 16 February 2026
Running Mistral-7B on Intel NPU Achieves 12.6 Tokens/Second 12 February 2026
New Header-Only C++ Benchmark Tool for Predictive Models on Raw Binary Streams 12 February 2026
GLM-5 Released: 744B Parameter MoE Model Targeting Complex Tasks 12 February 2026
Community Member Builds 144GB VRAM Local LLM Powerhouse 11 February 2026