Decoding Test-Time Scaling: Reasoning Chains vs. Inference Computation
Increasing test-time computation via longer reasoning chains improves performance on complex logical tasks following a power-law, but saturates when the token count per reasoning step exceeds the model's effective context window capacity — necessitating dynamic pruning or halting mechanisms for production efficiency.