LLMs+: 10 Things That Matter in AI Right Now

To get there, a few things need to happen. First, LLMs must become more efficient and cheaper to run. Some of the biggest advances are on this front. One approach, called mixture-of-experts, splits an LLM up into smaller parts and gives each an expertise in a different type of task. That means only some parts of the model need to be switched on at a given time.

Another way to make LLMs more efficient could be to ditch transformers—the type of neural network underpinning almost all of them today—in favor of diffusion models, an alternative type of neural network more typically used for image and video generation. There are more experimental approaches, too. Last year, the Chinese AI firm DeepSeek showed off a way to encode text in images, which cuts computation costs.

Another crucial area of progress has to do with what’s known as an LLM’s context window. This is the amount of text (or video) that a model can take in at once, equivalent to its working memory. A couple of years ago, LLMs could process several thousand tokens (words or parts of words) in one go, or a few dozen pages of text. The latest models now have context windows up to a million tokens long—a whole stack of books. But the bigger the context window and the longer the task, the more likely models are to go off the rails or forget what they were doing. There are breakthroughs happening there, too. One recent paper by researchers at MIT CSAIL introduced what they call recursive LLMs. Instead of taking in a vast context window at once, recursive LLMs break their input up into chunks and send each chunk to a copy of itself, which in turn might break those chunks up again and send the results to even more copies. Multiple LLMs processing smaller pieces of information seem to be far more reliable for long, hard tasks. The result is an LLM, but not as we know it.

Source link