Close Menu
UK Daily: Tech, Science, Business & Lifestyle News UpdatesUK Daily: Tech, Science, Business & Lifestyle News Updates
    What's Hot

    Bitget Bolsters Stock+ Platform With U.S. Stock Options Trading

    July 4, 2026

    Eastbourne: Trains delayed after vehicle hits level crossing

    July 4, 2026

    NASA’s Hubble Captures Crimson Cloud Sparkling with White, Blue Stars

    July 4, 2026
    Facebook X (Twitter) Instagram
    Trending
    • Bitget Bolsters Stock+ Platform With U.S. Stock Options Trading
    • Eastbourne: Trains delayed after vehicle hits level crossing
    • NASA’s Hubble Captures Crimson Cloud Sparkling with White, Blue Stars
    • Jason Heigl Foundation Approves $425,000 to Fund 6,000+ Free Spay/Neuter Surgeries
    • Do Investors Care How Old Startup Founders Are?
    • Gillingham sign former Rochdale and Charlton Athletic goalkeeper Lennon MacLorg
    • The only AI glossary you’ll need this year
    • Nottingham Forest owner Marinakis announces £210m stadium plans
    • London
    • Kent
    • Glasgow
    • Cardiff
    • Belfast
    Facebook X (Twitter) Instagram YouTube
    UK Daily: Tech, Science, Business & Lifestyle News UpdatesUK Daily: Tech, Science, Business & Lifestyle News Updates
    Subscribe
    Saturday, July 4
    • Home
    • News
      1. Kent
      2. London
      3. Belfast
      4. Birmingham
      5. Cardiff
      6. Edinburgh
      7. Glasgow
      8. Liverpool
      9. Manchester
      10. Newcastle
      11. Nottingham
      12. Sheffield
      13. West Yorkshire
      Featured

      ‘Miniature’ mountain creature with ‘squeaker’-like call discovered as new species

      Science November 9, 2023
      Recent

      Bitget Bolsters Stock+ Platform With U.S. Stock Options Trading

      July 4, 2026

      Eastbourne: Trains delayed after vehicle hits level crossing

      July 4, 2026

      NASA’s Hubble Captures Crimson Cloud Sparkling with White, Blue Stars

      July 4, 2026
    • Lifestyle
      1. Celebrity
      2. Fashion
      3. Food
      4. Leisure
      5. Social Good
      6. Trending
      7. Wellness
      8. Event
      Featured

      Are Ice Spice & Tobey Maguire Dating? Why Fans Thought They Were Kissing

      Celebrity July 3, 2026
      Recent

      Are Ice Spice & Tobey Maguire Dating? Why Fans Thought They Were Kissing

      July 3, 2026

      Tobey Maguire Ex-Wife & Girlfriends: Inside the ‘Spider-Man’ Star’s Dating History

      July 3, 2026

      Are Ice Spice & Tobey Maguire Dating? What to Know About Their Kiss

      July 3, 2026
    • Science
    • Business
    • Sports

      Gillingham sign former Rochdale and Charlton Athletic goalkeeper Lennon MacLorg

      July 3, 2026

      Lee Martin at Whitstable Town and Steve Watt at Faversham Town handed home starts

      July 3, 2026

      Deal Town and Herne Bay handed home ties

      July 3, 2026

      Newboys Minster handed a home tie, Lordswood to face Corinthian

      July 3, 2026

      Goalkeeper Ollie Wright signs a three-year deal with Southampton before completing a season-long loan move to League Two Gillingham

      July 3, 2026
    • Politics
    • Tech
    • Property
    • Press Release
    UK Daily: Tech, Science, Business & Lifestyle News UpdatesUK Daily: Tech, Science, Business & Lifestyle News Updates
    Home » Enabling small language models to solve complex reasoning tasks | MIT News

    Enabling small language models to solve complex reasoning tasks | MIT News

    bibhutiBy bibhutiDecember 12, 2025 Tech No Comments7 Mins Read
    Facebook Twitter LinkedIn WhatsApp Telegram
    Share
    Facebook Twitter LinkedIn Telegram WhatsApp



    As language models (LMs) improve at tasks like image generation, trivia questions, and simple math, you might think that human-like reasoning is around the corner. In reality, they still trail us by a wide margin on complex tasks. Try playing Sudoku with one, for instance, where you fill in numbers one through nine in such a way that each appears only once across the columns, rows, and sections of a nine-by-nine grid. Your AI opponent will either fail to fill in boxes on its own or do so inefficiently, although it can verify if you’ve filled yours out correctly.

    Whether an LM is trying to solve advanced puzzles, design molecules, or write math proofs, the system struggles to answer open-ended requests that have strict rules to follow. The model is better at telling users how to approach these challenges than attempting them itself. Moreover, hands-on problem-solving requires LMs to consider a wide range of options while following constraints. Small LMs can’t do this reliably on their own; large language models (LLMs) sometimes can, particularly if they’re optimized for reasoning tasks, but they take a while to respond, and they use a lot of computing power.

    This predicament led researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) to develop a collaborative approach where an LLM does the planning, then divvies up the legwork of that strategy among smaller ones. Their method helps small LMs provide more accurate responses than leading LLMs like OpenAI’s GPT-4o, and approach the precision of top reasoning systems such as o1, while being more efficient than both. Their framework, called “Distributional Constraints by Inference Programming with Language Models” (or “DisCIPL”), has a large model steer smaller “follower” models toward precise responses when writing things like text blurbs, grocery lists with budgets, and travel itineraries.

    The inner workings of DisCIPL are much like contracting a company for a particular job. You provide a “boss” model with a request, and it carefully considers how to go about doing that project. Then, the LLM relays these instructions and guidelines in a clear way to smaller models. It corrects follower LMs’ outputs where needed — for example, replacing one model’s phrasing that doesn’t fit in a poem with a better option from another.

    The LLM communicates with its followers using a language they all understand — that is, a programming language for controlling LMs called “LLaMPPL.” Developed by MIT’s Probabilistic Computing Project in 2023, this program allows users to encode specific rules that steer a model toward a desired result. For example, LLaMPPL can be used to produce error-free code by incorporating the rules of a particular language within its instructions. Directions like “write eight lines of poetry where each line has exactly eight words” are encoded in LLaMPPL, queuing smaller models to contribute to different parts of the answer.

    MIT PhD student Gabriel Grand, who is the lead author on a paper presenting this work, says that DisCIPL allows LMs to guide each other toward the best responses, which improves their overall efficiency. “We’re working toward improving LMs’ inference efficiency, particularly on the many modern applications of these models that involve generating outputs subject to constraints,” adds Grand, who is also a CSAIL researcher. “Language models are consuming more energy as people use them more, which means we need models that can provide accurate answers while using minimal computing power.”

    “It’s really exciting to see new alternatives to standard language model inference,” says University of California at Berkeley Assistant Professor Alane Suhr, who wasn’t involved in the research. “This work invites new approaches to language modeling and LLMs that significantly reduce inference latency via parallelization, require significantly fewer parameters than current LLMs, and even improve task performance over standard serialized inference. The work also presents opportunities to explore transparency, interpretability, and controllability of model outputs, which is still a huge open problem in the deployment of these technologies.”

    An underdog story

    You may think that larger-scale LMs are “better” at complex prompts than smaller ones when it comes to accuracy and efficiency. DisCIPL suggests a surprising counterpoint for these tasks: If you can combine the strengths of smaller models instead, you may just see an efficiency bump with similar results.

    The researchers note that, in theory, you can plug in dozens of LMs to work together in the DisCIPL framework, regardless of size. In writing and reasoning experiments, they went with GPT-4o as their “planner LM,” which is one of the models that helps ChatGPT generate responses. It brainstormed a plan for several “Llama-3.2-1B” models (smaller systems developed by Meta), in which those LMs filled in each word (or token) of the response.

    This collective approach competed against three comparable ones: a follower-only baseline powered by Llama-3.2-1B, GPT-4o working on its own, and the industry-leading o1 reasoning system that helps ChatGPT figure out more complex questions, such as coding requests and math problems.

    DisCIPL first presented an ability to write sentences and paragraphs that follow explicit rules. The models were given very specific prompts — for example, writing a sentence that has exactly 18 words, where the fourth word must be “Glasgow,” the eighth should be “in”, and the 11th must be “and.” The system was remarkably adept at handling this request, crafting coherent outputs while achieving accuracy and coherence similar to o1.

    Faster, cheaper, better

    This experiment also revealed that key components of DisCIPL were much cheaper than state-of-the-art systems. For instance, whereas existing reasoning models like OpenAI’s o1 perform reasoning in text, DisCIPL “reasons” by writing Python code, which is more compact. In practice, the researchers found that DisCIPL led to 40.1 percent shorter reasoning and 80.2 percent cost savings over o1.

    DisCIPL’s efficiency gains stem partly from using small Llama models as followers, which are 1,000 to 10,000 times cheaper per token than comparable reasoning models. This means that DisCIPL is more “scalable” — the researchers were able to run dozens of Llama models in parallel for a fraction of the cost.

    Those weren’t the only surprising findings, according to CSAIL researchers. Their system also performed well against o1 on real-world tasks, such as making ingredient lists, planning out a travel itinerary, and writing grant proposals with word limits. Meanwhile, GPT-4o struggled with these requests, and with writing tests, it often couldn’t place keywords in the correct parts of sentences. The follower-only baseline essentially finished in last place across the board, as it had difficulties with following instructions.

    “Over the last several years, we’ve seen some impressive results from approaches that use language models to ‘auto-formalize’ problems in math and robotics by representing them with code,” says senior author Jacob Andreas, who is an MIT electrical engineering and computer science associate professor and CSAIL principal investigator. “What I find most exciting about this paper is the fact that we can now use LMs to auto-formalize text generation itself, enabling the same kinds of efficiency gains and guarantees that we’ve seen in these other domains.” 

    In the future, the researchers plan on expanding this framework into a more fully-recursive approach, where you can use the same model as both the leader and followers. Grand adds that DisCIPL could be extended to mathematical reasoning tasks, where answers are harder to verify. They also intend to test the system on its ability to meet users’ fuzzy preferences, as opposed to following hard constraints, which can’t be outlined in code so explicitly. Thinking even bigger, the team hopes to use the largest possible models available, although they note that such experiments are computationally expensive.

    Grand and Andreas wrote the paper alongside CSAIL principal investigator and MIT Professor Joshua Tenenbaum, as well as MIT Department of Brain and Cognitive Sciences Principal Research Scientist Vikash Mansinghka and Yale University Assistant Professor Alex Lew SM ’20 PhD ’25. CSAIL researchers presented the work at the Conference on Language Modeling in October and IVADO’s “Deploying Autonomous Agents: Lessons, Risks and Real-World Impact” workshop in November.

    Their work was supported, in part, by the MIT Quest for Intelligence, Siegel Family Foundation, the MIT-IBM Watson AI Lab, a Sloan Research Fellowship, Intel, the Air Force Office of Scientific Research, the Defense Advanced Research Projects Agency, the Office of Naval Research, and the National Science Foundation.



    Source link

    Featured Just In Top News
    Share. Facebook Twitter LinkedIn Email
    Previous ArticleTunbridge Wells council dodges higher housing targets while approving blueprints for thousands of homes
    Next Article Who Plays Rose Landry in ‘Heated Rivalry’? Meet Actress Sophie Nélisse – Hollywood Life
    bibhuti
    • Website

    Keep Reading

    Eastbourne: Trains delayed after vehicle hits level crossing

    NASA’s Hubble Captures Crimson Cloud Sparkling with White, Blue Stars

    Gillingham sign former Rochdale and Charlton Athletic goalkeeper Lennon MacLorg

    The only AI glossary you’ll need this year

    Nottingham Forest owner Marinakis announces £210m stadium plans

    Beloved Broadway musical Hairspray announces five-night run at Glasgow theatre

    Add A Comment
    Leave A Reply Cancel Reply

    Editors Picks

    89th Utkala Dibasa Celebration Brings Odisha’s Vibrant Culture to London

    April 8, 2024

    US and EU pledge to foster connections to enhance research on AI safety and risk.

    April 5, 2024

    Holi Celebrations Across Various Locations in Kent Attract a Diverse Range of Community Participation

    March 25, 2024

    Plans for new Bromley tower blocks up to 14-storeys tall refused

    December 4, 2023
    Latest Posts

    Subscribe to News

    Get the latest sports news from NewsSite about world, sports and politics.

    Advertisement

    Recent Posts

    • Bitget Bolsters Stock+ Platform With U.S. Stock Options Trading
    • Eastbourne: Trains delayed after vehicle hits level crossing
    • NASA’s Hubble Captures Crimson Cloud Sparkling with White, Blue Stars
    • Jason Heigl Foundation Approves $425,000 to Fund 6,000+ Free Spay/Neuter Surgeries
    • Do Investors Care How Old Startup Founders Are?

    Recent Comments

    1. Register on Anycubic users say their 3D printers were hacked to warn of a security flaw
    2. Pembuatan Akun Binance on Braiins Becomes First Mining Pool To Introduce Lightning Payouts
    3. tadalafil tablets sale on The market is forcing cloud vendors to relax data egress fees
    4. cerebrozen reviews on Kent director of cricket Simon Cook adapting to his new role during the close season
    5. Glycogen Review on The little-known town just 5 miles from Kent border with stunning beaches and only 600 residents
    The News Times Logo
    Facebook X (Twitter) Pinterest Vimeo WhatsApp TikTok Instagram

    News

    • UK News
    • US Politics
    • EU Politics
    • Business
    • Opinions
    • Connections
    • Science

    Company

    • Information
    • Advertising
    • Classified Ads
    • Contact Info
    • Do Not Sell Data
    • GDPR Policy
    • Media Kits

    Services

    • Subscriptions
    • Customer Support
    • Bulk Packages
    • Newsletters
    • Sponsored News
    • Work With Us

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    © 2026 The News Times. Designed by The News Times.
    • Privacy Policy
    • Terms
    • Accessibility

    Type above and press Enter to search. Press Esc to cancel.

    Manage Cookie Consent
    To provide the best experiences, we use technologies like cookies to store and/or access device information. Consenting to these technologies will allow us to process data such as browsing behavior or unique IDs on this site. Not consenting or withdrawing consent, may adversely affect certain features and functions.
    Functional Always active
    The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network.
    Preferences
    The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
    Statistics
    The technical storage or access that is used exclusively for statistical purposes. The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
    Marketing
    The technical storage or access is required to create user profiles to send advertising, or to track the user on a website or across several websites for similar marketing purposes.
    • Manage options
    • Manage services
    • Manage {vendor_count} vendors
    • Read more about these purposes
    View preferences
    • {title}
    • {title}
    • {title}