Tagged "quantisation"

New Era of On-Device AI Driven by High-Speed UFS 5.0 Storage 25 February 2026
Qwen3.5 Series Releases Comprehensive Model Lineup Across All Tiers 25 February 2026
Qwen3.5-27B Identified as Sweet Spot for Mid-Range Local Deployment 25 February 2026
PyTorch Foundation Announces New Members as Agentic AI Demand Grows 25 February 2026
Mirai Announces $10M to Advance On-Device AI Performance for Consumer Devices 25 February 2026
Show HN: 100% LLM Accuracy–No Fine-Tuning, JSON Only 25 February 2026
Advanced Quantization Techniques Show Surprising Performance Gains Over Standard Methods 25 February 2026
How AI is Redefining Price and Performance in Modern Laptops 25 February 2026
What Breaks When AI Agent Frameworks Are Forced Into <1MB RAM and Sub-ms Startup 25 February 2026
No, Local LLMs Can't Replace ChatGPT or Gemini — I Tried 24 February 2026
Kioxia Sampling UFS 5.0 Embedded Flash Memory for Next-Generation Mobile Applications 24 February 2026
Enhanced Interface Speed Enables High-Performance On-Device AI Features in Smartphones 24 February 2026
Elastic Introduces Best-in-Class Embedding Models for High Performance Semantic Search 24 February 2026
Show HN: Dypai – Build Backends from Your IDE Using AI and MCP 24 February 2026
Enterprise Infrastructure Guide: Running Local LLMs for 70-150 Developers 24 February 2026
Anthropic Has Never Open-Sourced an LLM: Implications for Local Deployment Strategy 24 February 2026
Breaking the Speed Limit: Strategies for 17k Tokens/Sec Local Inference 23 February 2026
How Do You Know Which SKILL.md Is Good? 23 February 2026
Qwen3-Code-Next Proves Practical for Local Development: Real-World Coding Tasks on Mac Studio 23 February 2026
Custom Portable Workstation Optimized for Local AI Inference Builds 23 February 2026
Open-Source Framework Achieves Gemini 3 Deep Think Level Performance Through Local Model Scaffolding 23 February 2026
Nvidia Could Launch Its First Laptops With Its Own Processors 23 February 2026
nanollama: Open-Source Framework for Training Llama 3 from Scratch with One-Command GGUF Export 23 February 2026
Open-Source llama.cpp Finds Long-Term Home at Hugging Face 23 February 2026
Future of Mobile AI: What On-Device Intelligence Means for App Developers 23 February 2026
The Complete Stack for Local Autonomous Agents: From GGML to Orchestration 23 February 2026
Breaking the Speed Limit: Strategies for 17k Tokens/Sec Local Inference 23 February 2026
Ouro 2.6B Thinking Model GGUFs Released with Q8_0 and Q4_K_M Quantization 22 February 2026
At India AI Impact Summit, Intel Showcases AI PCs and Cost-Efficient Frugal AI 22 February 2026
GGML Joins Hugging Face: What This Means for Local Model Optimization 22 February 2026
DietPi Released a New Version v10.1 22 February 2026
CPU-Trained Language Model Outperforms GPU Baseline After 40 Hours 22 February 2026
AI PCs Explained: 7 Critical Truths About NPUs and Privacy 22 February 2026
Taalas Etches AI Models onto Transistors to Rocket Boost Inference 21 February 2026
Strix Halo Performance Benchmarks: Minimax M2.5, Step 3.5 Flash, Qwen3 Coder 21 February 2026
Qwen3 Coder Next Remains Effective at Aggressive Quantization Levels 21 February 2026
[Release] Ouro-2.6B-Thinking: ByteDance's Recurrent Model Now Runnable Locally 21 February 2026
I Thought I Needed a GPU to Run AI Until I Learned About These Models 21 February 2026
GGML.AI Acquired by Hugging Face 21 February 2026
Sarvam Brings AI to Feature Phones, Cars, and Smart Glasses 19 February 2026
Running Local LLMs and VLMs on Arduino UNO Q with yzma 19 February 2026
Enhanced Quantization Visualization Methods for Understanding LLM Compression Trade-offs 19 February 2026
Local Vision-Language Models for Document OCR and PII Detection in Privacy-Critical Workflows 19 February 2026
Hardware Economics Shift: DDR5 RDIMM Pricing Now Comparable to GPUs for Local Inference 19 February 2026
Qualcomm Ventures Positions India as Blueprint for Affordable On-Device AI Infrastructure 18 February 2026
Same INT8 Model Shows 93% to 71% Accuracy Variance Across Snapdragon Chipsets 18 February 2026
Qwen 3.5-397B-A17B Now Available for Local Inference with Aggressive Quantisation 17 February 2026
Ask HN: What is the best bang for buck budget AI coding? 17 February 2026
Alibaba Unveils Major AI Model Upgrade Ahead of DeepSeek Release 16 February 2026
GitHub Announces Support for Open Source AI Project Maintainers 13 February 2026
Running Mistral-7B on Intel NPU Achieves 12.6 Tokens/Second 12 February 2026
New Header-Only C++ Benchmark Tool for Predictive Models on Raw Binary Streams 12 February 2026
GLM-5 Released: 744B Parameter MoE Model Targeting Complex Tasks 12 February 2026
Community Member Builds 144GB VRAM Local LLM Powerhouse 11 February 2026