Ninth Annual Maternal and Child Health Epidemiology Conference / December 10-12, 2003

Approaches to Using Data Linkage

 MILTON KOTELCHUCK:  I was asked today to really speak on one aspect of linked data systems, which, if you will, is really the political side of the data linkages.  There are technical sides, you've heard Jane Lazar from our Project among others.  You can see how technically advanced I am.  I still do things with those cards with the punch-holes.  I just want to first introduce our team.  I'm going to talk about the PELL; Pregnancy to Early Life Longitudinal Database and this is a three-way partnership, a public/private partnership.  You can see the names of the various people from Boston University School of Public Health.  The next speaker, Angela, will speak from the Mass Department of Public Health, and I think some of you know Wanda Barfield, who's not here in the audience at this moment, and Kay Thomashek who's both our Project Officer and Collaborator on this project from CDC.  Let me tell you a little bit about PELL and then sort of where this talk is supposed to go. 

As I mentioned, it's a public/private partnership, meaning this is an effort to do a data linkage project that is not solely within State government, but takes advantages of the strength of some of the advantages of having an academic partner.  We'll talk a little bit about that.  Angela's really going to talk a little bit more about this than I am.  We were funded originally by CDC and one of the ASPH part activities, and we were funded to assess the impact of the prenatal environment on subsequent child health and as the mergers have occurred in our own group's interest, we also really look at the impact of prenatal environment on subsequent child and maternal health.  Both of those things are in our system.  PELL utilizes a broad range of public health data.  To say, we have a lot of breadth to our data system and it also has a longitudinal component.  So both breadth and longitudinal are key aspects of our study.  And the PELL system allows for many different kinds of linkages and analyses including what we like to think of as dyadic linkages, of mother/infant linkages, as well as maternal linkages, child linkages, multiple sibling and family linkages. 

The data system I'm going to describe to you is expandable and we believe that it offers some conceptual and practical experiences for other States that are considering this.  I'm just going to give you a couple quick slides so you have some idea of what we're talking about and then go into the talk of the day.  This is our core data set.  The center of our data system is really the linkage of birth certificates and hospital discharge database.  This is actually a richer than just the birth certificate allowing us to take advantage of both of these two data sets that are available to us and allows us to link both mothers and child in our system.  We also include all fetal deaths.  We started this system in Massachusetts in 1998 and currently have 320,000-plus infant records and 280,000 maternal records since many women have more than one birth in this period of time, and also multiple births.  To the core of our system that kind of gives us the center of it, we've added a whole series of different databases to it.  You can think of them; the blue ones somehow have to do with health status, so we've added the birth defects registry.  I once made this presentation (inaudible) and we had a slide that showed that we stopped at infant deaths at the end of a year, so we've added in all the subsequent child deaths in our system and also maternal death files.  We also happen to have linked to the link infant birth/death file; Mass. pro-quality was a Medicaid, looking at the effectiveness of Medicaid that allowed us to add some clinical data in this. 

And the Hospital Discharge in Massachusetts also includes observational stay data and emergency room data.  We just got the emergency room data, really, a few months ago and are getting it up and running.  In addition, we have a series of programmatic databases, Early Intervention is one of the key ones that a lot of people have been interested in.  Our Children With Special Healthcare Needs Programs in the State, Healthy Start, which is a special Healthy Start Program, a wrap-around Program in Massachusetts.  It's different than the Federal Healthy Start Program.  Childcare Coordination is a Title V Program and we're in the midst of working out our details with WIC to add WIC to our database.  And the ones in orange are really, if you think about  it would be possible to add contextual data sets to our system, although we actually haven't yet done this for the moment.  Just for your own edification I just sort of listed some of the kinds of data sets that are, sort of, linkable and linked in our system.  Although this slide doesn't do justice, we tried to figure a slide to show longitudinality. 

So if you take a birth in a given year, we can actually follow these births either the mother or child from, let's say a 1999 birth, we can then follow the mother and child after the delivery for subsequent hospitalization, subsequent emergency room usage during that year.  Not only in that year, but in the subsequent year or any succeeding year, because once we've got an identifier we can follow these people all the way through.  And, likewise, we could go backwards.  So once we know a link, we can look at what happened during the course of the pregnancy.  What happened during the mother's pregnancy, and look at also, again, whether there were antenatal hospitalizations and emergency rooms.  And also whether you participate in programs prior to the delivery or participate in programs after.  I don't think this slide does justice to longitudinality, but you get the idea. 

And, likewise, we can do maternally linked databases, which is what most people have been talking about at this conference, although not exclusively, by linking from one mother to another.  And, Jane, did you have a present on the maternal linked database?

JANE LAZAR:  Tomorrow.

MILTON KOTELCHUCK:  Jane Lazar from our team is going to talk to you about the maternal linkage.  I think we have some really clever ideas and Jane, in particular, has really developed some really clever ideas about how to do maternal linkages, especially if you don't have a unique identifier, like a social security number.  And, again, this is just a slide to give you a sense of the longitudinality of our dataset.  And our hope is that we'll be able to follow from births through five or six, through the entrance into school, although in principal you could go farther.  In thinking about developing a linked data system, there really are seven big areas that you really need to think about, there may be more, but these are the ones we conceptualize.  It doesn't just happen; there are a lot of things that you have to do to make a data system work. 

And the first thing you really need to do is you need to have some conceptualization of what a linked data system is.  And as I've listened to people talking in this meeting, you know, there are various ideas people have for whether you have a medical record, or whether you have a database, whether it's on time, whether it's retrospective.  But you need a vision of what you're actually doing to give guidance.  Most of us though, to be fair, started by linking one data set with another and kind of got into this business.  And only now that we've started linking a lot, are we starting to have a vision of where we're going.  But that's kind of important.  I'm not going to speak a lot about that today.  The second issue is a series of technical linkage issues.  Again, you spent two days in this conference and yesterday's meeting had a lot of discussion about that.  I'm going to pass on that in my talk today and move onto the third topic, which is how do you actually get access to the data? 

How do you deal with confidentiality issues because, in addition to the technical, the political side of getting linked data systems, both of these are really critically important and that's sort of my task.  And there's all these other issues you have to worry about in making a system look good.  So moving onto access to database and confidentiality issues, let me just start by saying it is either the or a major issue, one of two major issues in constructing linked databases.  It takes a lot of time.  You have to acknowledge that linked databases require confidential data.  You can't avoid confidential.  If you don't have it, you can't link your databases, so we're stuck in this world.  Getting access involves both political and professional concerns.  It's not just asking for the data.  You have to work through a lot of politics.

People have all those pictures of silos and getting from people to relax and allow this data to be used and to feel comfortable about its use is really a critical issue.  In addition, in a public/private partnership, it's a more complex issue.  If you're only in State government, you sometimes finesse things within your agency, but when you're talking about working outside, and even within an agency, it's not such an easy thing to deal with.  HIPAA further complicates access to databases.  I'm going to talk a little bit about that further.  And in my opinion, technical issues are no longer the principal barrier to linked data systems. That may have been true 10 years ago, but the capacities and the really great analytic insights that people are having, have really allowed us to make great progress on linkage issues.  They still are important, but I believe the political ones are the ones that take the greatest amount of time.  In working with the Mass Department of Public Health, that's our core linkage, the holder of many of these, and holder of these data sets.  So when I acknowledge that the Mass Department of Public Health had extensive experience and expertise in addressing confidentiality concerns and requirements, they didn't come to this, like, for the first time.  They've been struggling about this for many, many years as this Program entered into their lives.  They have a Committee called the RADAR Committee, Research and Data Access Review Committee.  That's a nice little name they have, which provides an institutional mechanism to review research, confidentiality requests using State Public Health databases. 

In my opinion, anybody who has a linked data system has got to develop some mechanism for decision rules about who can have access to it to assure that things are done right.  And in Massachusetts such a Committee actually existed prior to the development of this Project and it's through that Committee that much of the work has actually taken place.  And as I'll explain, the PELL Project, with the RADAR Committee, developed and codified an extensive set of procedures about how to facilitate implementation and analysis.  Now I think the reason that this Project worked is that we really made three critical decisions right in the beginning.  And I think they really serve us well.  First of all, all data linkage activities using confidential data is done and held at the Mass Department of Public Health.  So although I'm at Boston University, all the work, all the linkage activities, all the names, where the data resides, is at the State.  I know that CDC has often fairly similar things for some of the data sets that it has where it'll allow people to work with them, but you have to be at the site.  So confidential data is done at the State. 

In our particular case, we did not create a full duplicate database, with all of a sudden all these data elements, data sets, sort of linked together on the side.  So there really was not exactly a permanent linked database.  So when people talked about the PELL data system, really the PELL data system was kind of a virtual database.  We used a series of linker programs with generated ID numbers and a series of computer programs that can extract linked data from different data sets.  It is true that we did put an identifier on the existing data sets, so in that respect, maybe it isn't so virtual.  But, nonetheless, in many ways, if you went and said, "Let me see what the PELL database looked like."  You can't see it, you know.  It is really a series of programs that extract data and allow linkages across different data sets.  And finally, what we used to call prior to HIPAA, only do you identify data sets can then will be extracted from this linked database for analysis.  And that analysis can be done either at the Mass Department of Public Health or elsewhere. 

In this particular case, at Boston University.  So, basically, linkages are done at the State.  It's a virtual system and the data only uses the identified word, or now in HIPAA terms, it's actually limited use data sets for analysis.  By doing these kind of, sort of, structural thinking, this allowed us to make a fair amount of progress at working together.  Getting approval required many different steps.  And just a list for you, Massachusetts has which was really helpful, and a few other States have this, is a section of the Massachusetts general law, which is known as Section 24AB of Massachusetts General Law 111, allows the Commissioner of the Mass Department of Public Health to authorize confidential studies to look at mortality and morbidity of the citizens of the State.  This allowed the Commissioner to give us a writ of confidentiality to first to construct the database, to allow these data elements to be linked together.  Secondly, we, the whole PELL team and many others, worked to develop the series of inter and intra-agency agreements to access the databases.  These are the sort of famous memos of understanding between different groupings to get data from one place to another.  I have to say one of the hardest things for a person on the outside is to be negotiating among agencies in the same department or between departments where I have no standing whatsoever and I'm trying to get people to stop being in their silos.  People protect their data very strongly and this took a lot of work and continues to take a lot of work.  We all signed confidentiality pledges that we would not release any of the data by name. 

It's really part of the 24AB process.  Then we went back to the RADAR Committee a second time with our first analytic request.  So we had one set of data to do the linkage, permission to do the linkage, then we came back and got permission for each and really every major analysis we go back to the RADAR Committee, which is that internal body that can approve of things, to get permission for a specific study.  And in our case, our first analysis we did was a set of analyses concerning twins.   Only after we got the approval of the second IRB did I then go to my own University and get IRB and HIPAA approval for doing an analysis with, from that point of view, was a de-identified data file.  This was like a simple secondary data analysis, there was no problem getting approval in my institution.   And then if that wasn't enough we had to go to CDC and get approval from CDC to do this study.  So every study has a lot of steps, which is why this takes a fair amount of time to get through.  And, I believe CDC had a typo or something, you know, it was like crazy things slow you down.  It takes many months to do all this.  Talking on HIPAA for a second, one of the nice things is that there's a clause in HIPAA that says if State laws are stronger than HIPAA, then the State law overrides HIPAA.  And HIPAA, if you think about the P as "portability," the State law in Massachusetts essentially says, "You can do a confidential data, but you cannot port, you cannot transmit that data any further," which is one of the key elements in the HIPAA regulation. 

It allows data to move, but if you can't move it any farther.  HIPAA is overridden in Massachusetts and other States might have this capability.  The fact that PELL was sort of a virtual link database made this not such a big issue for HIPAA compliance.  The actually construction of the database, that was seen as a government activity.  However, once you develop an analytic file, those analytic files are subject to the HIPAA law, because then you are doing analysis with the data.  We also, in addition, converted a fair amount of our data to try to be more HIPAA sensitive so we switched a lot of specific dates to times, like, how many days post-delivery did this event happen, as opposed to an actual date, switching from a birth date to an age.  Things like that allowed you to be a little more HIPAA sensitive.  But we are getting HIPAA waivers for all of our analytic studies.  If you listen to this, this is like a lot of activities you have to do getting seven approvals.  What do you do if you just want to check whether this variable and this variable go together?  So we worked out a series of procedures with the RADAR Committee to make this living and breathable, as opposed to too regulated. 

So we worked out a series of agreements on how to deal with major PELL initiative grants, which we get a 24AB, but also the ability to do pilot analyses, to do analyses that the Department of Public Health initiated.  You know, here we have this partnership but many from the audience too who just say, "Gee, once the Department discovers this they would like to do this analysis, by this analysis."  So we worked out a series of ways of getting quick approval from the RADAR Committee, the essence of which is there's a person or two who sort of reviews this and if they think there's a major issue then it comes back to the RADAR Committee, but otherwise you can do these activities, and a means for amending approved analytic agreements.  We worked on operational procedures for the dissemination or release of data.  This is a really important issue.  What data can be released and, again, we go through both the RADAR and a person or two from the RADAR Committee who works with us on this. 

We also work to ensure that any agency or program that contributed data will have some prior approval to analytic uses of their data.  This is to say that if there's a major project, if WIC gave us our data then somebody just can't go off and do an analysis of WIC without consulting the Program.  We have been trying to develop a series of just marker variables, whether you were in WIC, you were in EI, that would be part more of a core data set that didn't need approval every time they were used.  And there were procedures that we really haven't developed.  We haven't figured out how to deal with third party requests.  People from the outside who say, "Gee, I now know you have this great data set, will you do this for us?"  So, this is an evolving activity and Angela will talk, I think, about this.  We also work the series of how to deal with confidentiality issues within our own project, so we have a method for approving internal pilot data requests from our project, so people just don't go off and start asking for everything. 

We have a system for doing that.  We give formal approval to all major studies that come through PELL.  We always for each analytic project identified a lead person and the team members for every study, and we have a mechanism for adding an external researcher for a specific analytic project.  Like our project on twins, there's a person in the State who specializes in this who's working with us on this particular study.  Bottom line's this, confidentiality and access issues; working through confidentiality and access issues takes at least as much time as the technical linkage activities.  And I see a lot of nodding heads, so I mean I think you just have to be prepared and know about this.  If you're aware of it, you don't get quite as frustrated as you would get if you think it's going to happen fast because it isn't.  Respecting and working constructively with the Mass Department of Public Health confidentiality procedures was critical.  I think you have to be very respectful of what existing confidentiality procedures there are.  I do think our success to date reflects a culmination of years of prior working together. 

I would normally recommend people start on one project, build some confidentiality and trust, get people working with each other before you expand to what seems like a massive system.  But in Massachusetts there's been a long history of projects and people working together that has facilitated this activity.  And I think other States and Cities could adopt some of the ideas that we have for their own work.  Database access and confidentiality can be successfully addressed.  We exist.  PELL exists.  Really, we're doing good work.  It's getting more and more used by more and more people.  And I believe we provide a platform for expanded (inaudible) longitudinal and programmatic research on maternal infant health services in the State.  Since this is a University/State partnership, I'm going to say two or three things on the University prospective and Angela's going to talk a little on the State prospective in a minute. 

These are some advantages that, from a University prospective from us, I'm just now going back to our partnership, it does allow for increased access to linked public health data for academic researchers, as well as for the people in the Mass Department of Public Health.  It's a benefit to both groups this linked data system.  It particularly strengthens our relationship with the Mass Department of Public Health.  First of all, from a University and faculty point of view, it enhances our understanding of what are the current public health issues.  What are the issues that the Department is really worrying about so that we are working on those?  It permits us to provide much better technical assistance and consultation now that we know what the issues are and we blend our analyses with the current issues and it allows us to be involved in evidence-based MCH policy development.  And in another talk I can give you some examples of how we've done all of these things.  It also, on a nice note, allows us to be involved in cutting edge MCH epidemiology projects.  This is really good stuff.  We're really enjoying it.  We're learning a lot of new things.  And it's really good to be involved in innovative longitudinal research, which this database looks like.  And as I said at the end, it's kind of fun.  It's fun to forge new cooperative relationships, partnerships and all of these different kinds of things.  On the other hand, there are challenges that remain.  PELL is still not fully institutionalized in this State and, Angela and I have opposite slides on this.  I see it not as fully institutionalized.  She'll tell you her perspective in a minute. 

I think that the Mass Department of Public Health is only beginning to appreciate the capacity that this new linkage system allows for them.  We are still vulnerable to organization and personnel shifts.  Massachusetts, like everybody else, had lost a lot of money in this particular period of time and a lot of people have moved jobs and other things and that really, you know, you work with people and then they disappear, that's a difficult kind of thing to deal with.  Getting annual permissions for access to some of the PELL data sets.  Generally we do this on a year-by-year basis.  You saw how many data sets we have.  It's, like, almost a full-time job just making sure that all of our agreements are current.  And sometimes authority for resolution of PELL problems is a bit defuse, although the RADAR Committee is where things should be resolved.  Some of these issues are political and it's not always clear how to get resolution.  Brokering inter-agency, inter-governmental data sharing agreements, especially with provisions for PELL usage, can be kind of difficult.  And sometimes the boundaries between linking data, which is what we do, and actually performing the analyses for the Mass Department of Public Health, our relationship is still evolving.  We do many runs for them.  We have the personnel sometimes. 

If people are short, they say, "Well, why don't you do the run?  You've got the data."  It's not always clear whose responsibilities.  But in the end, we think that we can develop a really important that PELL, it's development has allowed us to develop a partnership that's really important for improving MCH epidemiology, however defined, research, practice and policy in the State.