Tagged "quantisation"
- New Era of On-Device AI Driven by High-Speed UFS 5.0 Storage
- Qwen3.5 Series Releases Comprehensive Model Lineup Across All Tiers
- Qwen3.5-27B Identified as Sweet Spot for Mid-Range Local Deployment
- PyTorch Foundation Announces New Members as Agentic AI Demand Grows
- Mirai Announces $10M to Advance On-Device AI Performance for Consumer Devices
- Show HN: 100% LLM Accuracy–No Fine-Tuning, JSON Only
- Advanced Quantization Techniques Show Surprising Performance Gains Over Standard Methods
- How AI is Redefining Price and Performance in Modern Laptops
- What Breaks When AI Agent Frameworks Are Forced Into <1MB RAM and Sub-ms Startup
- No, Local LLMs Can't Replace ChatGPT or Gemini — I Tried
- Kioxia Sampling UFS 5.0 Embedded Flash Memory for Next-Generation Mobile Applications
- Enhanced Interface Speed Enables High-Performance On-Device AI Features in Smartphones
- Elastic Introduces Best-in-Class Embedding Models for High Performance Semantic Search
- Show HN: Dypai – Build Backends from Your IDE Using AI and MCP
- Enterprise Infrastructure Guide: Running Local LLMs for 70-150 Developers
- Anthropic Has Never Open-Sourced an LLM: Implications for Local Deployment Strategy
- Breaking the Speed Limit: Strategies for 17k Tokens/Sec Local Inference
- How Do You Know Which SKILL.md Is Good?
- Qwen3-Code-Next Proves Practical for Local Development: Real-World Coding Tasks on Mac Studio
- Custom Portable Workstation Optimized for Local AI Inference Builds
- Open-Source Framework Achieves Gemini 3 Deep Think Level Performance Through Local Model Scaffolding
- Nvidia Could Launch Its First Laptops With Its Own Processors
- nanollama: Open-Source Framework for Training Llama 3 from Scratch with One-Command GGUF Export
- Open-Source llama.cpp Finds Long-Term Home at Hugging Face
- Future of Mobile AI: What On-Device Intelligence Means for App Developers
- The Complete Stack for Local Autonomous Agents: From GGML to Orchestration
- Breaking the Speed Limit: Strategies for 17k Tokens/Sec Local Inference
- Ouro 2.6B Thinking Model GGUFs Released with Q8_0 and Q4_K_M Quantization
- At India AI Impact Summit, Intel Showcases AI PCs and Cost-Efficient Frugal AI
- GGML Joins Hugging Face: What This Means for Local Model Optimization
- DietPi Released a New Version v10.1
- CPU-Trained Language Model Outperforms GPU Baseline After 40 Hours
- AI PCs Explained: 7 Critical Truths About NPUs and Privacy
- Taalas Etches AI Models onto Transistors to Rocket Boost Inference
- Strix Halo Performance Benchmarks: Minimax M2.5, Step 3.5 Flash, Qwen3 Coder
- Qwen3 Coder Next Remains Effective at Aggressive Quantization Levels
- [Release] Ouro-2.6B-Thinking: ByteDance's Recurrent Model Now Runnable Locally
- I Thought I Needed a GPU to Run AI Until I Learned About These Models
- GGML.AI Acquired by Hugging Face
- Sarvam Brings AI to Feature Phones, Cars, and Smart Glasses
- Running Local LLMs and VLMs on Arduino UNO Q with yzma
- Enhanced Quantization Visualization Methods for Understanding LLM Compression Trade-offs
- Local Vision-Language Models for Document OCR and PII Detection in Privacy-Critical Workflows
- Hardware Economics Shift: DDR5 RDIMM Pricing Now Comparable to GPUs for Local Inference
- Qualcomm Ventures Positions India as Blueprint for Affordable On-Device AI Infrastructure
- Same INT8 Model Shows 93% to 71% Accuracy Variance Across Snapdragon Chipsets
- Qwen 3.5-397B-A17B Now Available for Local Inference with Aggressive Quantisation
- Ask HN: What is the best bang for buck budget AI coding?
- Alibaba Unveils Major AI Model Upgrade Ahead of DeepSeek Release
- GitHub Announces Support for Open Source AI Project Maintainers
- Running Mistral-7B on Intel NPU Achieves 12.6 Tokens/Second
- New Header-Only C++ Benchmark Tool for Predictive Models on Raw Binary Streams
- GLM-5 Released: 744B Parameter MoE Model Targeting Complex Tasks
- Community Member Builds 144GB VRAM Local LLM Powerhouse