Midjourney has always lived in the realm of pixels—turning text prompts into images that look like they came from a camera, a studio, or a dream. So when the company announced a medical imaging concept that involves dunking a person into a vat of water to generate ultrasound scans, it landed with a particular kind of surprise: not just “AI is moving into healthcare,” but “AI is proposing a new way to do imaging, and it’s talking about competing with MRI.”
That last part is where the quiet questions start.
Midjourney’s pitch, as described publicly, is built around an experience that sounds almost deliberately non-clinical. The scanner would use ultrasound, and the company frames the interaction as casual—more like a spa than a hospital visit—while aiming for performance that could be “as powerful as MRI.” CEO David Holz has gone further, suggesting the system could one day be better than MRI. Midjourney also positions the effort as a long-term health project: helping people live longer, healthier lives.
On paper, the ambition is easy to understand. Ultrasound is widely used, relatively safe, and portable compared with MRI. It’s also familiar to clinicians and patients. But ultrasound has historically struggled with certain limitations: image quality can vary depending on operator skill, patient anatomy, and the specific hardware setup; deep structures can be harder to resolve; and translating raw ultrasound data into clinically reliable diagnostic information is a complex pipeline.
MRI, by contrast, is expensive and slow, but it has a reputation for producing high-contrast images across many tissues and for enabling a broad range of diagnostic applications. If Midjourney’s system truly aims to match MRI-level capability, it isn’t just improving ultrasound—it’s trying to close a gap that radiology has spent decades managing with different tools, different physics, and different clinical workflows.
And that’s why experts’ reactions, as reported, have been cautious rather than dismissive. The skepticism isn’t about whether ultrasound can be improved. It’s about what Midjourney has shown so far—and what it hasn’t.
The missing piece isn’t imagination. It’s evidence.
A concept that looks like science fiction still needs to behave like medicine
The most striking element of Midjourney’s announcement is the “vat of water” framing. Water is not a gimmick in ultrasound; it’s a practical medium. Ultrasound waves travel through fluid efficiently, and coupling the transducer to the body is a major factor in image quality. In conventional ultrasound, gel and contact methods handle this coupling. A water-based environment could, in theory, standardize coupling and reduce variability between scans.
But standardization is only one part of the story. Medical imaging isn’t just about getting a picture. It’s about getting a picture that is reproducible, interpretable, and diagnostically accurate across diverse patients and conditions. It’s about safety, calibration, and validation. It’s about how the system performs when things are messy—when anatomy is unusual, when patients move, when there are implants, when the clinical question is subtle, when the scan is done under real-world constraints rather than controlled demonstrations.
When companies propose new imaging modalities, the first wave of credibility usually comes from technical details: what frequencies are used, how the transducer array is designed, how the system handles beamforming, how it reconstructs images, what the training data looks like, and how the model is evaluated. The second wave comes from clinical studies: comparisons against established standards, sensitivity and specificity metrics, inter-reader agreement, and evidence that the system improves outcomes rather than just producing visually impressive images.
Midjourney’s public materials, at least as characterized in early reporting, have not yet provided enough of that to satisfy the standards clinicians rely on. That doesn’t mean the work isn’t real. It means the public record is thin.
And in medicine, thin evidence is not a minor issue. It’s the difference between a promising prototype and a tool that can be trusted with diagnoses.
Why “as powerful as MRI” is a high bar—and a slippery claim
Claims like “as powerful as MRI” sound straightforward, but they hide a lot of complexity. MRI is not one thing. It’s a family of sequences and protocols tuned to different tissues and diagnostic goals. “Powerful” could mean spatial resolution, contrast-to-noise ratio, ability to detect small lesions, robustness across body types, speed, or the range of conditions it can evaluate.
Ultrasound also isn’t one thing. There are many ultrasound techniques—conventional B-mode imaging, Doppler, elastography, contrast-enhanced ultrasound, and more. Each has different strengths and weaknesses. Even within ultrasound, the path from raw signals to clinically meaningful images can involve sophisticated reconstruction and interpretation steps.
So when a company says it wants to match MRI, the key question becomes: match it for what?
Is the goal to replicate MRI’s soft-tissue contrast for specific organs? Is it to detect tumors at comparable sizes? Is it to provide functional information similar to what MRI can infer from diffusion, perfusion, or spectroscopy-like approaches? Or is it primarily about producing images that look MRI-like to the human eye?
If the claim is about diagnostic performance, then the evaluation must be diagnostic. Visual similarity is not enough. A system can produce images that resemble MRI while failing to detect clinically relevant differences. Conversely, a system might not look like MRI but could still perform well for specific tasks if it captures the right signal features.
This is where the “quiet questions” become loud in practice: without clear benchmarks and transparent methodology, it’s hard for clinicians to know whether the comparison is apples-to-apples or apples-to-oranges.
The phantom body: a useful step, but not the finish line
One of the most concrete elements in Midjourney’s public narrative is the use of an imaging phantom. Phantoms are common in imaging research: they are controlled objects designed to mimic tissue properties and known structures. Researchers can segment the phantom and validate how cleanly structures separate under controlled conditions.
Phantoms matter because they help isolate variables. They allow teams to test whether the system can reconstruct boundaries, maintain geometry, and produce consistent outputs when the ground truth is known. Segmentation of phantom structures can show whether the system is stable and whether the reconstruction pipeline behaves as expected.
But phantoms are not bodies. They don’t capture the full complexity of human anatomy, including variations in tissue composition, the presence of air, the effects of motion, and the unpredictable ways pathology changes signal characteristics. A system that performs well on a phantom can still struggle in real patients. And a system that struggles on a phantom might still succeed clinically if the failure modes don’t matter for the intended diagnostic tasks.
In other words, phantom results are a starting point for engineering confidence, not a substitute for clinical validation.
What makes the phantom step feel “off” to some observers is not that it exists—it’s that it may be the most visible proof so far. When a company is making bold claims about outperforming MRI, the public expects more than controlled-condition demonstrations. They expect evidence that the system works where it counts: in diverse patients, across realistic scanning conditions, with clinically meaningful endpoints.
The AI angle: promise and risk in equal measure
Midjourney’s background is image generation, which naturally raises a question: is the system doing something fundamentally different from traditional ultrasound reconstruction, or is it using AI to enhance images after the fact?
AI can help in multiple ways. It can improve reconstruction quality, denoise signals, correct artifacts, estimate missing information, and accelerate processing. It can also help interpret images—turning reconstructed images into diagnostic predictions.
But AI introduces its own set of concerns. Models can be sensitive to distribution shifts: if the training data doesn’t represent the full variety of real patients, performance can degrade. Models can also produce plausible-looking outputs that are wrong in subtle ways. In imaging, subtle errors can be clinically significant.
Clinicians are trained to trust imaging pipelines that have been validated and audited. They want to know how errors behave, what the failure modes are, and whether the system provides uncertainty estimates or confidence measures. They also want to know whether the system is robust to different operators, different machines, different patient positions, and different clinical contexts.
If Midjourney’s approach relies heavily on AI reconstruction or AI interpretation, then transparency about training data, evaluation protocols, and error analysis becomes even more important. Without that, the system’s performance may be hard to verify independently.
There’s also a deeper philosophical issue: ultrasound is a physical measurement. MRI is also physical, but the reconstruction and interpretation pipelines differ. When AI enters the pipeline, it can blur the line between measurement and inference. That’s not inherently bad—medicine already uses inference constantly—but it changes what “ground truth” means and how validation should be structured.
The “spa” experience: patient comfort is real, but workflow matters more
The idea of a casual, spa-like scan is appealing. Patient anxiety around medical procedures is real, and comfort can improve compliance. If a scanner reduces discomfort, shortens time, or simplifies preparation, that could increase the number of people who get screened or monitored.
However, comfort alone doesn’t determine clinical value. Workflow integration does. A scanner that is comfortable but slow, difficult to schedule, or incompatible with existing clinical systems may not deliver the promised impact. A scanner that produces images quickly but requires extensive post-processing or specialized interpretation may also face adoption barriers.
Clinicians and health systems care about throughput, cost, maintenance, calibration, and training. They care about how the results fit into existing diagnostic pathways. They care about regulatory approval timelines and reimbursement.
So the “spa” framing is best understood as a potential advantage in patient experience—not as proof of diagnostic superiority. It’s a design goal. The clinical question remains: does it produce reliable diagnostic information?
What would credible proof look like?
If Midjourney wants to convert curiosity into credibility, the next steps are likely to be the unglamorous ones: detailed technical documentation and rigorous evaluation.
At
