Blog — Aman Sharma on LLM Reasoning, Continual Learning & Multi-Agent Systems

May 28, 2026 · 16 min read · Essay · Inverse RL (Part 0)

In Search of a Verifier

Frontier models make real discoveries in math and code and almost none in robotics, biology, or open-ended science. The dividing line is not difficulty, it is how cheap, exact, and automatic the ground truth is in each domain. Where a verifier exists, reinforcement learning scales inside LLMs and discoveries follow; where it does not, the same models stall. The conceptual opener to my inverse RL series, on why recovering the reward from demonstrations is the missing piece for the physical world.

verifiers · reward-functions · inverse-rl · agentic-ai
May 26, 2026 · 20 min read · Technical Blog Post · Inverse RL (Part 1)

Watching the Path, Recovering the Goal: Classical Inverse RL

Show me what you do, and I will tell you what you want. That is the bet of inverse reinforcement learning, and it is also the bet behind every modern LLM alignment pipeline, every robot learning from human demonstration, and every recommendation system trying to figure out what users actually care about. This post is about how the field first learned to take that bet seriously, in 2004 and 2006, on examples small enough to fit in a 12x12 gridworld. With interactive widgets at every key step.

inverse-rl · imitation-learning · abbeel-ng · mmp
Apr 16, 2026 · 10 min read · Technical Blog Post · LLM Art Auctions (Part 1)

Can Frontier AI Models Read a Painting?

Four frontier multimodal models (Gemini 3.1 Pro, Claude Sonnet 4.6, GPT-5.4, Qwen 3.6 Plus) appraised fifteen paintings worth $1.46 billion in two conditions: image only, and image plus a four-word metadata label. Visual recognition is largely solved at the frontier; what separates the models is what they do with that recognition. The first post in my LLM Art Auctions research series.

multimodal-llms · art-valuation · vision-language · recognition-vs-commitment
Mar 19, 2026 · 10 min read · Technical Blog Post · Lossfunk Letters

The Reasoning Illusion: Why LLMs Fail When the Training Data Runs Out

We present EsoLang-Bench, a benchmark using esoteric programming languages where training data is virtually nonexistent. Five frontier models scored 85-95% on standard benchmarks but achieved only 11.2% maximum on EsoLang-Bench, with most below 5%. All models scored exactly 0% beyond the "Easy" difficulty tier, a uniform failure suggesting fundamental limitations rather than gradual degradation.

esolang-bench · reasoning · evaluation
Technical Blog Post

The Beauty Behind Engram Module

A deep-dive into the Engram module, exploring its architecture, memory mechanisms, and the elegant design principles behind continual learning in neural networks.

continual-learning · memory · architecture
Technical Blog Post

Importance Sampling: Sample from Any Distribution

A technical walkthrough of importance sampling, covering how to estimate expectations under one distribution using samples from another, and why it matters for reinforcement learning and probabilistic inference.

importance-sampling · rl · probability
Nov 6, 2025 · 8 min read · Technical Blog Post · Lossfunk Letters

Sequential Scaling Outperforms Parallel Scaling for LLMs

Sequential reasoning wins in 95.6% of configurations at matched compute, with accuracy gains up to 46.7%. On AIME-2025 with Qwen3-235B: 76.7% vs parallel's 30.0%. We introduce inverse-entropy weighted voting, a training-free aggregation method that achieved optimal performance in 97% of sequential runs.

sequential-scaling · inference · entropy
Oct 29, 2025 · 6 min read · Technical Blog Post · Lossfunk Letters

Do LLMs Know When They've Gotten a Correct Answer?

We demonstrate that post-trained models can recognize correct solutions through output entropy analysis. Sequence-level entropy cleanly separates correct from incorrect reasoning, but only in reward-trained models, not instruction-tuned ones. This enables 25-50% token reduction without sacrificing accuracy.

entropy · confidence · efficient-reasoning
Technical Blog Post · Medium

Flux.jl: A Simplified Way to Build Custom ML Models with Ease

An introduction to Flux.jl, Julia's machine learning library, covering how to build custom neural network architectures from scratch with a clean, composable API that makes deep learning in Julia intuitive and flexible.

julia · flux · deep-learning · ml

My World of Thoughts