Skip to content
AxiomLogicaSearch
Category

AI & ML

All about AI and Machine Learning, Latest articles, advances in domain.

All articles

How to extend a Llama or Qwen context window with YaRN in vLLM: a step-by-step deployment guide
AI & ML

How to extend a Llama or Qwen context window with YaRN in vLLM: a step-by-step deployment guide

vLLM’s Qwen deployment docs explicitly recommend RoPE scaling for context lengths beyond the pretrained 32,768-token limit and validate YaRN for length extrapolation — but the exact scaling knobs must be matched to the model’s original max position embeddings and sampling/runtime settings, or the model can silently degrade even if it accepts longer prompts.

18 min read
S-LoRA vs LoRAX vs vLLM PEFT: which multi-adapter serving stack fits your workload?
AI & ML

S-LoRA vs LoRAX vs vLLM PEFT: which multi-adapter serving stack fits your workload?

S-LoRA is optimized for high-scale multi-adapter serving through unified paging and heterogeneous batching, LoRAX is designed for thousands of adapters with dynamic loading and production features, and vLLM PEFT is the lighter-weight option when you want vLLM’s serving stack with adapter support but not the most aggressive multi-adapter specialization.

20 min read
Should teams buy curated preference data or build an in-house curation pipeline?
AI & ML

Should teams buy curated preference data or build an in-house curation pipeline?

Buying curated preference data reduces internal labeling and curation labor, but the trade-off is vendor dependency and less control over sampling and rubric design — in practice, teams should expect the cheapest path to be purchase for experimentation and the best path to be build when they need domain-specific preference signals, auditability, or iterative rubric changes.

24 min read
How to merge multiple fine-tuned LLMs with mergekit: a practical tutorial
AI & ML

How to merge multiple fine-tuned LLMs with mergekit: a practical tutorial

mergekit can run entirely on CPU or with as little as 8 GB VRAM and still perform multi-model merges out of core — this makes low-cost experimentation feasible — but quality still depends on choosing compatible checkpoints and the right merge method, not just averaging weights.

19 min read
How to build a fine-tuning dataset filtering pipeline with Setu and Hugging Face Datasets
AI & ML

How to build a fine-tuning dataset filtering pipeline with Setu and Hugging Face Datasets

Setu combines Spark-based document preparation, cleaning, flagging/filtering, and MinHashLSH deduplication with Hugging Face Datasets-style dataset handling — enough to scale noisy web/PDF/speech corpora into SFT-ready training data — but it still depends on Linux/WSL-friendly setup, Java, Spark, and a multi-stage quality gate before deduplication pays off.

20 min read

The weekly brief.

One email each Sunday with what we tested, what we'd buy, and what to skip. No filler.