Skip to content
AxiomLogicaSearch
Category

AI & ML

All about AI and Machine Learning, Latest articles, advances in domain.

All articles

AI & ML

Optimizing LLM Serving Goodput: A Guide to ChunkSize Tuning

By tuning ChunkSize—the segment size of prefill processing—engineers can balance the trade-off between TTFT and overall system throughput, as smaller chunks prioritize user responsiveness while larger chunks saturate GPU compute kernels, provided the scheduler is configured to avoid memory-bandwidth contention.

16 min read
AI & ML

Integrating HiPPO-Initialized SSM Subsystems into LLM Architectures

By utilizing HiPPO-initialized SSM side-car modules, engineers can theoretically achieve O(1) state inference latency and persistent memory, albeit at the cost of significantly increased integration complexity compared to traditional Transformer-only architectures.

15 min read
Qwen2-VL GPTQ and AWQ benchmarks: what quantization does to multimodal accuracy
AI & ML

Qwen2-VL GPTQ and AWQ benchmarks: what quantization does to multimodal accuracy

On Qwen2-VL-2B-Instruct, GPTQ-Int4 preserves most multimodal quality but still shows measurable drops versus BF16 on harder vision-language tasks — for example, MMMU falls from 41.88 to 39.22 and MathVista from 44.40 to 41.69 — while DocVQA stays comparatively stable, implying task sensitivity matters more than the bit-width label alone.

18 min read
Agentic RAG with knowledge graphs: how multi-hop retrieval works under the hood
AI & ML

Agentic RAG with knowledge graphs: how multi-hop retrieval works under the hood

Knowledge-graph agentic RAG works by using entity links and graph traversal to expand the evidence frontier beyond nearest-neighbor chunk retrieval — this improves multi-hop recall when relationships matter — but it depends on strong entity resolution and graph quality, so noisy extraction can amplify wrong paths rather than fix them.

26 min read

The weekly brief.

One email each Sunday with what we tested, what we'd buy, and what to skip. No filler.