Ninth Annual Maternal and Child Health Epidemiology Conference / December 10-12, 2003

Pregnancy Associated Crashes and BirthOutcomes: 
Linking Birth/Fetal Death to Motor Vehicle Data

LAWRENCE COOK:  Hi.  I’d like to thank Hank for his help with this work and then my other colleagues at the University of Utah, Lisa *Hyde, *Lenora *Olson, and Mike Dean.  And again, we’re just looking at motor vehicle crashes here with our study.  It’s been discussed quite heavily I think at the beginning of this session that prior fetal and injury research has been pretty limited and probably the main reasons for that are coding issues.  There’s a lack of pregnancy information on crash records as well as a lack of injury information or motor vehicle crash history information on birth certificates or fetal death certificates.  So the objective of our study was to assess the effective involvement in a motor vehicle crash on the likelihood of adverse fetal events and we used probabilistic linkage to combine motor vehicle crash data with birth certificates and fetal death certificates.  And I thought that this conference had linking as a theme that I would take a moment to try to rid the myth that deterministic linkages is best and if you can’t do deterministic you should only then go to probabilistic linkage because then we’ll be probably right.  So if you just bear with me, I’m a statistician and I kind of get excited about this. 

Anyways, probabilistic record linkage is simply a method that uses statistical properties of your variables to determine whether or not two records refer to the same person and event.  The two properties that it looks at are the discriminating power and the reliability; it uses this to build an odds ratio, which can in turn be converted into a probability.  So the reliability, which is often if you read the literature denoted by the letter M is simply given that you have two records that you know are a true match, what’s the probability that a given field agrees?  So you know these two records match, what’s the probability that their names are the same?  What’s the probability their dates of birth are the same?  So if these records really are a true match than really the only way they should be different is if there was an error.  So you as the programmer or the researcher can actually estimate what this reliability is by one minus the probability of an error and you would enter that into the program.  Whereas the discriminating power, it’s often denoted by the letter U is given that you know two records don’t match, what’s the probability that they agree on a given field. 

So you’ve got two records that refer to completely different fields or two different people, what’s the probability their names are the same?  What’s the probability their genders are the same?  And this can be calculated just based on what you see in the data.  You don’t have to enter this yourself.  So what I want to do is go through that record pair that I had up there previously and just show you how probabilistic linkage will eventually arrive at a decision whether or not these are the same two people.  So I’m going to assume, mostly I work with crash and ambulance records, so my example’s a crash and ambulance record.  It works exactly the same way no matter what your databases are.  But I’m going to assume I have one ambulance record, I have 100,000 crash records, I know that the true match is somewhere in those 100,000 cases.  So if I picked one crash record at random, the odds of finding that true match are 1 to 99,999 because there’s one correct way and 99,999 incorrect ways to do it.  So now you just simply step through the fields that you have.  And I do have some reliability and discriminating information up here, just in case you want to make the calculations yourself, I’m not going to go into it.  Don’t worry about where they came from, they’re completely made up to make the example easy. 

Anyway, let’s look at the first names, the first names agree, what does that mean in terms of the odds that these are a true match?  Well, they’ve improved, it’s now 1 to 1,111.  I’m going to treat the middle and last name as the same field, so do one of these agree with the other one?  Well, Smith matches Smith Sanchez, what happens now?  My odds have improved to 1 to 51.  If you want to think about what does this mean intuitively?  If you can find the set of record pairs where the first and last names agree you’d find one match for every 51 false matches.  So genders agree, there’s two outcomes for gender, it happens to cut the odds in half, that’s a nice property.  Now I’m going to be very conservative, I see that the month and day of the birth agree, but the year of the birth disagrees, it only disagrees by one, you can actually handle this if you want, but I’m going to be conservative and penalize myself but really I’m down here to my odds are one to six, people go to Vegas on worse odds than this. 

Looking at the date of the event we see that the month and the days both agree.  If I assume I only have one year of data it doesn’t really help to know the year agreed.  But now the odds are now drastically in my favor, 60 to 1.  So all pairs of records that look like this, there’d be 60 true matches to every one false match.  Adding more information we have the time, the hour agrees, the minute disagrees; again you can build a tolerance to handle this.  But anyway my odds are now much better, 1,699 to 1.  Noticing it happened in the same location, they’re even better, 16,990 to 1.  So at the end of the day you ask yourself do I have the one or is it more likely that I have one of the 16,990?  This connection can be converted into a probability and the probability you’d receive is that the probability is .9994 that these are a match.  Yeah, they’re probably a match, but I mean it’s a lot better than just saying it’s probably correct, you can exactly say how correct you think it is.  And when we do linkages we set sort of like a threshold or an alpha level, so anything that’s above .9, so probability is a .9 of being correct, we keep as a true match, everything else gets rejected as a false match, it gives you a way to sort of standardize your process, make it repeatable and feel pretty good about it I think.  So back to the linkage or motor vehicle crashes with birth and death certificates.

 So from Utah we had the statewide motor vehicle crash data, the birth certificate data and the Utah fetal death certificate data for 1992 to 1999.  Because identifiers on the occupants were limited we had to limit our study to only look at drivers also we limited the birth certificates just to the single live births and then in our state fetal deaths are only reported after 20 weeks gestation and they exclude elective abortions.  The variables that we had for our linkage included the mother’s first and last name, the mother’s date of birth, the date of the infant birth or fetal death, and the date of the crash.  And we were able to compare the date of the crash with the gestational age and the date of less menses to make sure the crash really happened during the pregnancy and not just prior to it.  We did several descriptive analyses as well as some logistic regression modeling to try to get at the relationship between motor vehicle crashes and seatbelt use on adverse fetal events.  And we looked at several adverse outcomes including low birth weight, excessive maternal bleeding, fetal distress and placental abruption.  Our logistic regression models could control for a lot of things. 

I simply put this slide up here so you could see what they are but also to point out that because of our linkage we could control from factors from the birth certificate as well as from the crash database, so we were able to mix information across the two databases.  Looking at our results with our linkage to the birth certificates we found that roughly three percent of pregnancies were a driver in a motor vehicle crash during their pregnancy.  Comparing some of the demographic information between women in crashes and pregnant women not in crashes, not much difference although it seems that women who were in crashes were slightly younger and slightly more likely to smoke.  All these factors were controlled for a logistic regression model.  Looking at the trimester of when the crash occurred it does look like there might be a slight increasing trend, but really all these percentages are about between 30 and 35 percent, so not wildly different. 

So the results from our logistic regression model, not even looking at seatbelt usage, just looking at crash risk, we found that there really was no difference in the risk for any of the adverse outcomes that we looked at for women who were in crashes and women who were not in crashes.  But then we started to look at seatbelt usage and this shows that pregnant women who are in crashes wearing seatbelts are really quite a bit different than pregnant women who were not wearing seatbelts.  In fact the women who were not wearing seatbelts are younger, they’re more likely to smoke, more likely to have drank alcohol during the pregnancy, less likely to have completed high school, and less likely to receive care in the first trimester.  So I mean, this is a big risk group for lots of other reasons than just their seatbelts. 

But going back to logistic regression model what do we find?  And these results are in addition to controlling for all those other bad things that these women happen to do.  We do find that comparing women who were not wearing their seatbelts at the time of the crash to women who were not in crashes at all, that these women not wearing seatbelts were at an increased risk for having low birth weight babies.  Also when we compared them to women who were not wearing seatbelts or who were wearing seatbelts, we found women not wearing seatbelts were twice as likely to experience excessive maternal bleeding during the delivery compared to women who wore their seatbelts at the time of the crash.  Looking at the results from the fetal death certificates, again, there were 2,645 fetal deaths reported in Utah during the study period, we linked to 45, just under two percent.  And since we only had 45 outcomes it didn’t give us a whole lot of outcomes to build a gigantic logistic regression model. 

This is just a crude odds ratio for fetal death and we found that pregnant women who were not wearing their seatbelts were nearly three times more likely to experience a fetal death compared to pregnant women who were wearing their seatbelts at the time of the crash.  I’m just trying to get out of here quick so we’ll have time at the end.  The conclusions from this study we found that probabilistic linkage really is a feasible method for combining the crash and birth records, it gives you the ability to compare and use data across both of those data sets in your analyses.  It also provides you with a comparison group on a population basis that was not involved in crashes and also it kind of limits the effect of recall bias because they don’t have to remember if they were in a crash or not and *(inaudible) follow up we just have the data there.  And also it appears that failure to wear a seatbelt may increase the likelihood of some adverse fetal outcomes.  Thank you.