Tagged "llama-cpp"
- Qwen3.5 Thinking Mode Can Be Disabled for Production Inference Optimization
- Qwen3.5-27B Identified as Sweet Spot for Mid-Range Local Deployment
- Mirai Announces $10M to Advance On-Device AI Performance for Consumer Devices
- Advanced Quantization Techniques Show Surprising Performance Gains Over Standard Methods
- How AI is Redefining Price and Performance in Modern Laptops
- Show HN: A Ground Up TLS 1.3 Client Written in C
- Enterprise Infrastructure Guide: Running Local LLMs for 70-150 Developers
- Apple Accelerates U.S. Manufacturing with Mac Mini Production
- nanollama: Open-Source Framework for Training Llama 3 from Scratch with One-Command GGUF Export
- Open-Source llama.cpp Finds Long-Term Home at Hugging Face
- Breaking the Speed Limit: Strategies for 17k Tokens/Sec Local Inference
- Ouro 2.6B Thinking Model GGUFs Released with Q8_0 and Q4_K_M Quantization
- Strix Halo Performance Benchmarks: Minimax M2.5, Step 3.5 Flash, Qwen3 Coder
- I Thought I Needed a GPU to Run AI Until I Learned About These Models
- Open-Source + AI: ggml Joins Hugging Face, llama.cpp Stays Open—Local AI's Long-Term Home
- GGML.AI Acquired by Hugging Face
- Enhanced Quantization Visualization Methods for Understanding LLM Compression Trade-offs
- Kitten TTS V0.8 Released: State-of-the-Art Super-Tiny Text-to-Speech Model Under 25MB
- Self-Hosted AI: A Complete Roadmap for Beginners
- Meet Sarvam Edge: India's AI Model That Runs on Phones and Laptops With No Internet
- Qwen 3.5-397B-A17B Now Available for Local Inference with Aggressive Quantisation
- Open-Source Models Now Comprise 4 of Top 5 Most-Used Endpoints on OpenRouter
- Switching From Ollama and LM Studio to llama.cpp: Performance Benefits
- Optimal llama.cpp Settings Found for Qwen3 Coder Next Loop Issues
- GitHub Announces Support for Open Source AI Project Maintainers
- Scaling llama.cpp On Neoverse N2: Solving Cross-NUMA Performance Issues
- Scaling llama.cpp On Neoverse N2: Solving Cross-NUMA Performance Issues
- New Header-Only C++ Benchmark Tool for Predictive Models on Raw Binary Streams