AI & ML

All about AI and Machine Learning, Latest articles, advances in domain.

All articles

AI & ML

Should you adopt FlashAttention-3 now, or stay on FlashAttention-2? A Hopper-era migration decision

FlashAttention-3 can deliver 1.5-2.0x Hopper-only speedups and much higher FP8 throughput, but the migration only pays off if your workload runs on H100/H800-class GPUs and you can absorb beta-risk, validation effort, and rollout complexity versus staying on the stable FlashAttention-2 path.

15 min read

AI & ML

ORPO vs DPO vs KTO vs SimPO: which preference-optimization method should you choose in 2026?

SimPO removes the reference-model/log-ratio dependency and the SimPO README reports it can outperform DPO and its latest variants on AlpacaEval 2, MT-Bench, and Arena-Hard — but the gains are hyperparameter-sensitive, especially learning rate, beta, and gamma/beta tuning.

24 min read

AI & ML

Benchmark contamination in 2026: what inference-time decontamination changes for evaluation

DeconIEP shifts decontamination from dataset filtering to inference-time embedding perturbation — preserving the benchmark while reducing leakage-driven inflation — but its effectiveness is bounded by the perturbation budget and it trades off against benign utility, so it is not a free fix for contaminated evaluation.

24 min read

AI & ML

How to merge multiple fine-tuned LLMs with mergekit: a practical tutorial

mergekit can run entirely on CPU or with as little as 8 GB VRAM and still perform multi-model merges out of core — this makes low-cost experimentation feasible — but quality still depends on choosing compatible checkpoints and the right merge method, not just averaging weights.

19 min read

AI & ML

RAGchain internals: how multiple retrievers, rerankers, and HyDE fit into one workflow

RAGchain’s core design is to compose retrieval and reranking as interchangeable modules around a shared workflow layer, letting teams mix BM25, vector search, HyDE, OCR loaders, and multiple rerankers so they can improve recall and ordering without rewriting the whole pipeline.

26 min read

AI & ML

Should teams merge fine-tuned checkpoints instead of retraining or serving multiple models?

Model merging can capture the value of multiple fine-tunes without paying for full retraining or multi-model serving — reducing experimentation waste and inference duplication — but the ROI only works when the organization already has several compatible checkpoints and enough evaluation discipline to avoid shipping a bad merge.

23 min read

AI & ML

TIES-Merging under the hood: how sign conflicts and parameter interference are resolved

TIES-Merging improves over naive averaging by trimming low-magnitude delta weights, electing a dominant sign across models, and then merging only sign-aligned parameters — this directly targets both redundancy and sign interference — but it still assumes the component models remain sufficiently compatible in weight space.

22 min read

AI & ML

How to build a fine-tuning dataset filtering pipeline with Setu and Hugging Face Datasets

Setu combines Spark-based document preparation, cleaning, flagging/filtering, and MinHashLSH deduplication with Hugging Face Datasets-style dataset handling — enough to scale noisy web/PDF/speech corpora into SFT-ready training data — but it still depends on Linux/WSL-friendly setup, Java, Spark, and a multi-stage quality gate before deduplication pays off.

20 min read

AI & ML

Model merging at scale: what the latest benchmarks say about base-model quality and expert count

Recent large-scale merging results suggest that stronger base models and larger model sizes make merging easier, and that merging more expert checkpoints can improve zero-shot generalization — but the gains flatten across methods at larger scales, so method choice matters less than base quality and expert count.

19 min read

AI & ML

Unsloth's low-VRAM training stack: what its kernels and workflow change for single-GPU fine-tuning

Unsloth claims its custom Triton kernels plus smart packing can deliver up to 5× faster training and 30%–90% lower VRAM use with no accuracy loss — but the benefit is workload-dependent, strongest when sequences are short enough that packing removes real padding waste rather than merely shifting it around.

21 min read

AI & ML

DeepSpeed vs Megatron-LM: which stack fits pre-training, fine-tuning, and checkpoint portability?

Megatron-LM is the stronger research/pre-training substrate, while DeepSpeed is the broader optimization layer with more turnkey distributed features and integrations — but the real business cost difference is checkpoint portability and operational complexity, because Megatron Bridge and DeepSpeed↔Megatron integration reduce migration friction only if you standardize on compatible formats and workflows.

23 min read

AI & ML

Chat templates and alignment failures: how ChatBug turns formatting into a safety vulnerability

ChatBug arises because chat templates impose a rigid format on the model, but not on the user — attackers can exploit that mismatch to bypass safety alignment, and the paper reports the issue across eight SOTA LLMs — but adversarial training lowers vulnerability at a meaningful performance cost.

29 min read

AI & ML

The weekly brief.