Thursday, October 23, 2025

UNM Scientist Joins $152 Million Initiative to Develop Transparent AI in Science

Share

Tackling the Data Quality Challenge in AI: The OMAI Project

As artificial intelligence (AI) continues to reshape industries, one critical hurdle emerges: the quality of data used to train models. Much of the training for current AI systems relies on vast swathes of information from the internet, which can be rife with inaccuracies. This issue is particularly prevalent within scientific fields that demand pristine data for reliable results. Enter Sarah Dreier, an assistant professor of political science at the University of New Mexico, who is set to contribute to a revolutionary initiative designed to enhance data quality in AI.

The Open Multimodal AI Infrastructure to Accelerate Science (OMAI)

Dreier is collaborating with a dedicated team working on the Open Multimodal AI Infrastructure to Accelerate Science (OMAI) project. With the mission to create open AI models that can significantly boost scientific discovery, this ambitious project is being spearheaded by the Allen Institute for AI. According to Dreier, the challenge is a tough one, but the outcomes promise to yield models that are “more transparent, more open, and more flexible.”

The Role of Data in AI Training

A significant point of concern, as highlighted by Dreier, is that engineers training AI models often lack insight into the quality of the data being used. "They’re not reading unfathomably large amounts of text to feed into their model," she notes, pointing to a gap in accountability that can skew results. By participating in OMAI, Dreier seeks to ensure that robust methodologies underpin the data used in training, particularly data that meets the stringent requirements of scientific inquiry.

Funding and Vision Behind OMAI

The OMAI initiative is backed by a substantial $152 million investment from both the U.S. National Science Foundation and Nvidia Corp., with contributions of $75 million and $77 million, respectively. This financial commitment underscores the goal of creating a suite of open AI models that not only serve the U.S. scientific community but also align with broader national objectives regarding global AI leadership. Noah Smith, senior director of natural language processing research at the Allen Institute and a professor at the University of Washington, plays a pivotal role in guiding this vision.

Transforming Scientific Progress with Open Models

In recent years, the popularity of AI has skyrocketed, with tech giants like OpenAI and Google reporting hundreds of millions of monthly users. However, the vast majority of large language models remain "closed," meaning the data and methodologies used in their development are not publicly accessible. This lack of transparency poses a significant barrier to scientific advancement, stifling innovation by restricting users’ abilities to scrutinize, adapt, or build upon existing models. Smith emphasizes that open models are vital for fostering collaboration, transparency, and reproducibility—elements that are essential for effective scientific progress.

Contributions from Social Science to Data Curation

Notably, Dreier is the sole social scientist involved in the OMAI project, bringing a fresh perspective to an otherwise technical team primarily focused on computer science. With a funding allocation of $600,000, her role involves exploring various types of data suitable for training models aimed at supporting scientific tasks such as research analysis and code generation. "I’m going to be thinking most immediately about the kinds of data that could be useful to political scientists, sociologists," Dreier explains, underscoring the interdisciplinary nature of this initiative.

Collaborative Efforts and Continued Research

Dreier’s work is not isolated; it builds upon a foundation of collaboration established during her time as a post-doctoral research fellow at Smith’s lab. Since joining UNM, she has maintained a working relationship with Smith, focusing on related projects that align with OMAI’s goals. This collaborative effort showcases the power of interdisciplinary research, bridging the gap between social sciences and technological advancements in AI.

Overcoming Challenges for Broader Benefits

Smith articulates that the development of open AI models addresses two intertwined challenges: advancing AI technology and applying it to enhance discoveries across various scientific domains. The overarching goal is to establish a more transparent and trustworthy pathway for AI research. With OMAI’s resources, researchers can process vast quantities of research data, generate code, and visualize insights more effectively.

Future Implications for Research and Innovation

In practical terms, OMAI aims to deliver tools that enable scientists to make significant breakthroughs in critical fields such as energy research, materials science, and protein function prediction. Smith suggests that the impact of the project could be monumental, allowing for faster advancements and a more collaborative approach to scientific inquiry, reinforcing the essential role of high-quality data in driving innovative solutions.

Read more

Related updates