Division of Research

Date February 26, 2025

Whose data is it, anyway?

By Sherri Miles

Data ownership, health data, and AI among Love Data Week 2025 events

PROVIDENCE, R.I. [Brown University] — Every year during the week of Valentine’s Day, Brown University celebrates the love of data in all its forms and functions. This year marked Brown’s sixth annual Love Data Week, attracting over 450 attendees to 16 virtual, hybrid, and in-person events, including faculty lightning talks, panel discussions, and presentations.

Kate Wells and Chloe Jazzy Lau speak at a panel discussion. — Kate Wells, Curator of Rhode Island Collections, Providence Public Library, *right,* and Chloe Jazzy Lau, Bonner Community Fellow at the Swearer Center for Public Service, *center,* speak as part of the Love Data Week panel, *Designing maps for empowerment.*

Love Data Week is an international celebration of data that raises awareness about research data management, sharing, preservation, and reuse across disciplines. Universities, nonprofit organizations, government agencies, and corporations are encouraged to host data-related events and activities.

At Brown, Love Data Week began in the summer of 2019 when Brown science librarians met with staff from the Office of Research Integrity to discuss participating in the event hosted by the Inter-university Consortium for Political and Social Research at the University of Michigan.

“The motivation was the opportunity to highlight the research taking place at Brown and the many staff in our offices behind the scenes that support the research ecosystem,” said Andrew Creamer, open science librarian. “We saw it as a time when the Library and Division of Research could also offer various informational trainings related to our units and services.”

The planning committee solicited proposals for the 2025 theme, “Whose Data Is It, Anyway?” through Today@Brown and direct contact with researchers. “This year, we planned two keynotes around the theme based on our experience working with the researchers and knowledge of their research areas (Dr. Sheldon Holder and Stephen Buka),” said Kelsey Lubin, assistant director of research integrity and Love Data Week planning committee chair.

As researchers, we need to think about who is doing the cancer research and whose data we are gathering. As we plan clinical trials, are we doing our best to reach all the communities we can to have the best conglomerate of data possible?

Dr. Sheldon Holder Physician Scientist at the Legorreta Cancer Center at Brown University

Dr. Sheldon Holder speaks during his keynote presentation.

Data disparity

The week started with Dr. Sheldon Holder’s keynote presentation, Cancer: Unlocking the Code in the Cell, Clinic, and Community: Who’s doing cancer research anyway? Holder is a physician-scientist at Brown’s Legorreta Cancer Center.

“We’re trying to cure cancer,” said Holder. “Cancer is a large problem, and large problems require large solutions.” He shared statewide cancer data with the audience. “Rhode Island has the third highest incidence rate of bladder cancer in the country. And lung cancer is the number one cause of cancer deaths in Rhode Island across all populations,” he said, adding that some counties in Rhode Island have much higher cancer rates than others.

Holder, a board-certified medical oncologist, maintains an active oncology clinic and is interested in cancer disparities and destigmatizing cancer. “As researchers, we need to think about who is doing the cancer research and whose data we are gathering,” he said. “As we plan clinical trials, are we doing our best to reach all the communities we can to have the best conglomerate of data possible?”

Holder’s research includes clinical trials and a focus on community engagement. He created the Color of Cancer storytelling website for cancer patients, the OPEN Outreach and Participatory Engagement Knowledge Board, and the Cancer Talk Café for community and cancer center members.

“We need to engage with the communities that don’t engage with us,” said Holder. “We need to be the ones to go out and initiate that engagement.”

Data governance

As clinicians care for their patients, information from those visits is often recorded within electronic health record (EHR) systems. More than just data points, these records include patient demographics and private health information that needs protection.

In a session on The URSA Initiative: Navigating EHR Data Sharing and Access in Rhode Island presenters from Brown’s compliance, legal, information technology, and research offices discussed ensuring appropriate management and sharing of health data.

Brown’s Unified Research data Sharing and Access (URSA) Initiative makes electronic health records (EHR) and other health data accessible and usable for research purposes across Rhode Island, following standard policies, procedures, and protocols for appropriate sharing and use of health data.

“What’s core to this is data governance. There are four essential pillars when we think about the health data governance framework,” said Liz Chen, interim director of the Brown Center for Biomedical Informatics (BCBI). “We want to make sure we’re getting quality data. Is it accurate? Is it complete? Who is responsible and accountable for that data?”

Regulatory Advising, a new unit in the Division of Research, provides guidance and support to researchers engaged in human subjects and data research. “We’re trying to cover the whole project cycle from the beginning,” said Ximena Levy, director of Regulatory Advising. “That’s why the URSA Initiative is so important. We will have the guidelines and governance for research involving data coming from the EHR.”

“From data discovery and IRB approvals to extracting and de-identifying your data to getting it where it needs to go, there’s a lot of work to do before you can dig into the data, answer your research questions, and share your results,” said Karen Crowley, manager of health data science in BCBI. “We must maintain compliance with all legal and regulatory standards and requirements. We must act with the highest of ethics. We must collaborate. Practicing team science will ensure our research is methodologically sound and clinically meaningful.”

No one has been doing this as long as we have. What we have with this cohort is unique in the world. We’ve had 5,000 assessments of many, many thousands of people and sometimes their parents and sometimes their kids. So we love data.

Stephen Buka Brown University Professor of Epidemiology

Stephen Buka speaks during his presentation.

Longitudinal data

A particularly unique research study at Brown has accumulated health data spanning generations. In his keynote panel, Participants not Subjects: Furthering Engagement and Respect in Long-term Research Projects, epidemiology professor Stephen Buka described the New England Family Study (NEFS), a longitudinal research study he started in the 1960s that continues today.

Nearly sixty years ago, NEFS enrolled 17,000 pregnant women in a study to understand conditions during pregnancy that might contribute to neurodevelopmental problems in children, such as epilepsy, cerebral palsy, learning problems, motor problems, and blindness. Researchers were interested in infectious exposures that might alter the development of the fetus, such as toxicological exposure, cigarette smoking, PFAS chemicals, or other conditions early in pregnancy that could change the physiology of the fetus and young person.

Researchers collected blood samples from the pregnant mother and information about her health and medications. They also collected samples from the placental tissues and infants and information about the parents’ social circumstances.

“It was a time when there was just the glimmer of thought that maybe something that happened in the woman’s pregnancy might be relevant for a child’s health,” said Buka.

The study followed the children through time: at birth, 4/8/12 months, and seven years. Recording growth, height, weight, perceptual and motor skills, neurology behavior, and condition, researchers examined how children developed in the first seven years of life. They continued to follow up as study participants reached adulthood, then extended the study to include the next generation.

“We decided to not just look at the babies themselves and how they’re doing at age 42, but also look at how their kids were doing with the third generation,” said Buka. “We went back to some of the original moms who had enrolled in the study, now in their seventies and eighties.”

“As the cohort is older, we’ve changed our focus to conditions that occur as people age in their forties and sixties. We’re doing cardiovascular measures with body mass index, lipid levels, and blood levels, focusing on the brain, heart, and general wellbeing.”

“No one has been doing this as long as we have,” said Buka. “What we have with this cohort is unique in the world. Nobody has the early pre-birth information, the bank of maternal information. As science has been evolving, we can do more and more from these bank samples, increasing our ability to understand factors that happen in pregnancy and how that relates to heart disease and cancer and schizophrenia and adult diseases.”

The NEFS research program has been funded by the National Institutes of Mental Health, the National Institute of Neurological Disease, the National Cancer Institute, and the National Institute of Aging.

Without the funds to follow all 17,000 study participants into adulthood, NEFS researchers have seen more than 3,000 cohort members, some multiple times at different ages. “We’ve had 5,000 assessments of many, many thousands of people and sometimes their parents and sometimes their kids,” said Buka. “So we love data.”

The keynote panel included Jason, who joined the study as a baby. “I had faint memories of participating even as a young boy,” he said. “When I was contacted again in the nineties, I jumped at the opportunity. At the time, I didn’t know the impact the study was having globally, but I knew that what they were discovering was of serious value.”

Data science and AI

Several events during the week discussed artificial intelligence (AI) and data. A hands-on workshop, Mojo: the Programming Language for Artificial Intelligence, introduced participants to basic data science and AI programming skills.

In Elevating the Labor of Data, organized by the Community-Engaged Data and Evaluation Collaborative (CEDEC), participants learned about the North Burial Ground Documentation Project, a collaboration between Brown Anthropology and the City of Providence Parks Department (North Burial Ground) funded by the Data Science Institute. Leveraging artificial intelligence, a team of student researchers examines 300+ years of historical archaeological data to understand Providence’s history, including changes in religious beliefs, life expectancy, and public health through time.

In the presentation, Brown Center for Biomedical Informatics: Leveraging EHRs and AI to Advance Biomedical Discovery and Healthcare Delivery, Liz Chen described biomedical informatics as an interdisciplinary field using data, information, and knowledge to improve human health.

Kelsey Lubin, assistant director of research integrity and Love Data Week planning committee chair, right, attends a Love Data Week event. — Kelsey Lubin, assistant director of research integrity and Love Data Week planning committee chair, *right,* attends a Love Data Week event.

“Looking at layering AI with the learning health system, there are a number of health priorities, and we can use data, knowledge, and practice to address them,” said Chen. “The role of AI is we can take health data, and we can transform EHR data into knowledge using different AI methods like natural language processing and machine learning.”

BCBI provides training and education for researchers to gain skills in informatics, data science for health, and AI. “One of the things we do is provide consults,” said Chen. “We help them think about best practices for extracting data, managing it, and protecting the data with data privacy and security.”

In his lightning talk about Navigating the Climate Science Deluge, Stephen Bach, assistant professor of computer science at Brown, explained the need to expedite the review of climate science literature for scientists writing comprehensive assessments, which summarize the current state of knowledge and uncertainty about various topics in climate science, drawing on all relevant research.

“As climate and climate change become essential subjects of study, there are increasingly more and more data about climate being collected and papers and research being written about climate,” said Bach. “The exponential growth of climate literature is a bottleneck.”

“Can AI help with this? I’m not a climate scientist,” said Bach. “I work in the computer science department. My interest is in machine learning and using those tools to help people in specialized domains like climate science.”

Bach is part of the team developing a solution to this bottleneck using Bonito, an open-source AI tool they will build based on large language models, or LLMs, to help find primary sources relevant to author queries. This project is part of the NSF’s Collaborations in Artificial Intelligence and Geosciences program.

“Our team spans computer science, earth, environmental and planetary sciences, and the University Library,” said Bach. “Our goal is to co-design a system that can aid in navigating this deluge of critical data.”

Data partners

Other Love Data Week 2025 events included a discussion of China’s Great Firewall, clinical data sharing using Vivli, managing research and citations with Zotero, strengthening pandemic preparedness through the Global Health Security Index, designing maps for empowerment, and more. For a complete list of presentations, visit the Love Data Week website.

“We were very excited to have representation from many different departments this year,” said Lubin.

The rise in University participation has become a noticeable trend. “Our distinctive number of events has set Brown apart from others over these last years,” said Creamer. “Kelsey has continued this tradition and expanded offerings by diversifying the disciplines and opening proposals to the Brown community, allowing more opportunities for humanities and social science topics.”

https://www.youtube.com/embed/S38XTsobCGY

Love Data Week 2025

A video recap of the events of Love Data Week 2025.

— — —

Participating campus partners included the Department of Biostatistics, Center for Statistical Sciences, Brown University Library Data Management and Sharing Services, Research Integrity Research Data Team, Brown Arts Institute, Center for Computation & Visualization (CCV), Information Security Group (ISG), Data Science Institute, Brown Center for Biomedical Informatics (BCBI), Advance-CTR, Brown University Library Center for Digital Scholarship (CDS), Brown Multimedia Labs, Carney Institute for Brain Science, and the Swearer Center.