Microsoft has claimed that its new medical AI system can diagnose difficult cases with far more accuracy than experienced doctors. In tests involving 304 cases from the New England Journal of Medicine, the Microsoft AI Diagnostic Orchestrator, or MAI-DxO, correctly solved 85.5% of the cases when paired with OpenAI’s o3 model. This was 4 times more accurate than a group of 21 doctors from the US and UK, who averaged just 20%.
According to Microsoft, these doctors had between 5 and 20 years of experience, but in the tests, they had no access to colleagues, books or AI tools. This allowed for a more direct comparison with the AI system.
The company also said the system completed diagnoses with less cost than the human doctors. The researchers assigned a virtual cost to each test ordered and found that the AI made more efficient decisions overall.
How Does The AI Actually Work?
The system Microsoft used is not just one AI model but a kind of digital conductor. It brings together different language models like GPT, Gemini, Claude and others, and allows them to work together like a team of doctors. This virtual panel asks follow-up questions, orders tests and double-checks reasoning before coming to a final decision.
The cases used for testing are real and highly complex. Each one is drawn from the New England Journal of Medicine’s weekly “Case Records” series, which is known for its difficulty and detail. These cases often take multiple specialists to solve in real life.
To simulate a more realistic medical process, the system follows a method called sequential diagnosis. This means it starts with an initial patient presentation and then works step by step, ordering tests or asking questions to narrow down possible illnesses. The AI system’s answers are then compared to the actual diagnosis from the journal.
Why Does This Relevant For MedTech And Health Startups?
This level of accuracy and efficiency could change the way health startups work. Instead of trying to build their own diagnostic tools from scratch, many new health companies may soon rely on orchestrator technology like MAI-DxO. Startups focused on digital health already use AI for chatbots and basic symptom checkers. This new system could offer more advanced support in the background while still keeping doctors in charge.
A Singapore-based startup CEO, Dr. Shravan Verma, told Business Insider that while AI is helpful, it still needs to hand off complex or uncertain cases to real professionals. He said that AI tools are good for the “first mile of care”, but should escalate things when needed. This kind of structure could be ideal for startups wanting to automate basic queries but maintain clinical safety.
Tech leaders such as Bill Gates have also spoken about the shortage of doctors globally. He said in a podcast that AI could help fill the gap in medical knowledge, allowing more people access to advice and guidance without waiting for in-person care.
Is It Safe To Use Yet?
Microsoft has made it clear that the system is still under research and is not approved for clinical use. It will need to pass safety testing, get regulatory approval, and be validated in real healthcare settings before going public.
The company has started working with hospitals and health organisations to test the system further. It is also reviewing how the system performs with more everyday health concerns, not just the most complicated ones.
Importantly, Microsoft is focusing on building trust. The company said any real world rollout will need clear safeguards and full transparency. It believes AI is not a replacement for human doctors but a powerful assistant that can help them make better decisions.
What Will Medtech Startups Build Around AI Diagnostic Systems?
Experts have shared how they think startups will be building around systems such as those of Microsoft. Here’s what they’ve shared…
Dr Marta G. Zanchi, Founder & Managing Partner, Nina Capital
“Startups are essential to the future of AI in diagnostics, not just for developing intelligent algorithms, but for building the interfaces and integrations that make these algorithms usable in real-world clinical settings. We’ve seen firsthand how value is unlocked not just through diagnostic accuracy. But, also through workflow fit, clinician trust, and true health data flows with inherently high interoperability and utility (data liquidity). As well as, clear alignment with the financial ROIs that healthcare as a system desperately needs to see for the adoption of new innovation, given its present state of unsustainability and waste.
“The rise of diagnostic orchestrators will demand new infrastructure like hardware-agnostic interfaces, real-time data pipelines, explainability layers, and modular dashboards tuned for multi-specialty environments, with clear proof of enterprise business value for the organisations that ultimately adopt them.
“Companies in our portfolio like Mindpeak (Germany) and Magentiq Eye (Israel) are using AI to assist pathologists and endoscopists with decision support tools that improve detection and consistency. Contextflow (Austria) applies AI to streamline complex radiology workflows. Methinks (Spain) enables real-time stroke triage through AI-based neuroimaging interpretation, and Promptly (Portugal) is building the longitudinal data layer that will power orchestration across diagnostics and treatment. They may not have started in the UK, but they are relevant to the market. For example, Promptly has been selected for a NHS Wales-wide initiative, and Mindpeak works with global lab networks like Unilabs, which operates in the UK.
“Together, these startups illustrate that the future of AI in diagnostics won’t be driven by single-point algorithms alone, but by orchestrators built around smart dashboards, real-time interfaces, federated data access, and clinical validation from the start – with clear financial incentives making their adoption not just possible, but an imperative for healthcare system’s sustainability.”
Nick Davidov, Co-founder And Managing Partner, Davidovs Venture Collective
“Medical systems have a lot of momentum and are very slow to change — for all sorts of good and bad reasons. What we see happening right now is that computer vision systems for diagnostics have been around for quite some time. I’d say around 2015−16 was the first year people started actually using computer vision for X-ray interpretation. Later on, people added MRIs and other stuff.
“What is happening now is the rise of decision support systems that are actually being implemented by clinicians, because they understand how much more effective and efficient it makes them. A lot of times they can overlook a rare disease they only read about in a textbook 20 years ago.
“These decision support systems combine parts of computer vision-based systems, or parts of interpretation of blood work and different tests, and they try to look at the picture holistically. The traditional problem of a doctor, especially in the US, is that — for example — a heart surgeon looks at a body and only sees the heart and blood vessels. And for them, the body stops at the throat. Because everything above the throat is for dentists, right? So what AI can actually help do is assist these doctors in looking at the problem holistically and assessing it in terms of what is happening to the patient in general.
“Now, this is a very risky thing to do, because AI in general is a black box, and we don’t quite know how it comes to a certain conclusion. So some of our portfolio companies, like Qualified Health, are working specifically on that: making the AI decision-making process transparent, explainable, and also compliant with all of the limitations that doctors have to comply with — especially around insurance claims and stuff like that.
“To sum it up: AI diagnostic solutions have been in the market for quite some time. They’ve been used by radiologists and people doing tests. Now AI is moving towards being the general practitioner’s assistant, or a decision support system, absorbing all of this data and surfacing ideas, concerns, things to follow up with — and also trying to be explainable and transparent in the process.”
“The general problem of the healthcare system is how much admin work has to exist on top of the doctor’s work. So, in addition to doctors spending a ridiculous amount of time on dealing with this, each doctor on average has like five or six admin personnel working with them in the clinic: medical scribes transcribing stuff, coders who code transcriptions into insurance claims, people who deal with insurance claims and denials, people who try to manage clinical excellence — like how triage works, how they distribute resources, how they assign shifts.
“All of this stuff is easily automated by AI. The technology — and the companies that DVC is backing — like Collectly, which helps them manage their revenues, or Red Sky Health, which helps reduce the number of insurance denials, or Denti, which is like a dentist assistant.
“What unites them all? They are all trying to enable doctors to be doctors and put more focus on what matters most: spending time with the patient, focusing on treatment, and making sure that the patient trusts them, follows the treatment plan, and can actually see the results. Less of the filling-in-forms nonsense, and less typing on the keyboard. Doctors shouldn’t be touching the keyboard at all. That’s not what they spent 12 years studying for.
“Another big thing that is happening in healthcare — and should probably happen more — is value-based care or preventative care. The most money insurance companies spend is actually on end-of-life treatment, when they’re trying to save somebody’s life at the very end. But they could have delayed that for a long time by focusing on prevention, focusing on checkups, focusing on mental health, physical fitness, and wellness. And a lot of startups are now exploring that.
“So there’s a myriad of startups offering checkups and looking at your blood work and trying to give you lifestyle advice that’s going to fix something for you and offset your heart attack by another 10 years.”