By Dave DeFusco
Diffusion models have become the artistic and scientific darlings of artificial intelligence. They power image generators like DALL繚E and Stable Diffusion, producing stunning, lifelike pictures from simple text prompts. But a recent study led by researchers at the Katz School of Science and Health asks a fundamental question: Are these models really creating something new or just rearranging what theyve already seen?
That question lies at the heart of published in the journal Information Fusion by Lakshmikar Polamreddy, a Ph.D. student in mathematics at the Katz School, and Jialu Li, a student in the M.S. in Artificial Intelligence. Their research challenges a popular belief that diffusion models imagine in the same way humans do.
Diffusion models have been the state-of-the-art for image and video generation, said Polamreddy. We wanted to test whether they really generate new data or not. My assumption was that they dontthat they just replicate the existing content in different forms.
Diffusion models work by gradually turning random noise into a detailed image, learning from large datasets of real pictures. Because their results can be impressively realistic, its easy to assume theyre generating novel ideas. But Polamreddys team found otherwise. When they asked a model trained on tens of thousands of images to produce new ones, almost all the results were variations of existing data.
If I generate 10,000 images, he said, maybe only 10 of them contain truly new features not seen in the training data. Those 10 are what we call diverse samples.
These diverse samples are special. They contain elements that are different but relevant to the original data. Polamreddy distinguishes them from so-called out-of-distribution samples, which are completely unrelated.
If I give the model brain images and ask it to generate more, but it produces a heart image, thats out of distribution, he said. We discard those. But if it gives a new kind of brain image with a slightly different structure, thats a diverse sample and its valuable.
The teams most striking finding came from applying their method to medical images, where data scarcity is a real problem. Hospitals often cant share patient scans because of privacy concerns, and collecting new images for training AI diagnostic systems is expensive and time-consuming.
Thats where data augmentationcreating additional training images artificiallycomes in. Most augmentation techniques, like flipping or rotating existing images, dont add new information. Polamreddys study suggests that even a small number of truly diverse samples can significantly improve diagnostic models.
Data augmentation is especially critical in the medical field, said Polamreddy. Because of privacy concerns, we dont have enough data. Generating diverse samples with new content helps counter that scarcity and improves downstream tasks, like image classification and disease diagnosis.
Using chest X-rays and breast ultrasound images, the researchers trained an image-classification model with and without diverse samples. The results were striking: adding diverse samples improved classification accuracy by several percentage points, sometimes more than five points higher than models trained on standard generated images.
Even a few diverse samples can make a big difference, said Jialu Li, co-author of the study. They diversify the training data and help the model generalize better, which means it performs more accurately on real-world medical images.
To measure novelty, the team turned to information theory, a mathematical framework that studies how information is stored and transmitted. They used metrics like entropy and mutual information to see whether the generated images truly contained new data.
If theres no relationship between the training and generated images, entropy will be high, said Polamreddy. These measurements help us see whether theres really new information or just a repetition of what the model already knows.
Their conclusion is that ideal diffusion models dont create new information at all. Any new content comes from small imperfections in how the model reverses the diffusion process or, essentially, a lucky byproduct of noise and complexity.
To find diverse samples, the researchers had to take a brute-force approach. They generated thousands of images repeatedly, filtering each batch to identify the rare few that contained novel features.
If I want 100 diverse samples, said Polamreddy, I might have to run the model many times. Each iteration gives me one or two, so I keep going until I get what I need.
That method, while effective, is slow. The teams next goal is to design diversity-aware diffusion modelsones that can produce semantically rich, varied images in a single pass.
We need better conditioning in the diffusion process, said Polamreddy. Thats how we can teach models to generate more diverse samples automatically instead of relying on brute force.