![]() |
MCHB/EPI Miami Conference — December 7 - 9, 2005
Using the Population Attributable Fraction (PAF) for Public Health Assessment — Transcript
CRAIG MASON: Let me get started. What I'm talking about is a strategy; the jargon will be a little tricky because we're using the term sequential, though we're talking about different results. The strategy, we're talking about different methods. The strategy I'm talking about is when you have a specific causal model that your data corresponds to. And let's kind of skip through this a little bit. We talked about this, kind of the key point to this is the, again, that total PAF aggregate when you look at both together is valid, though this problem is looking at the individual ones, if they're confounded, but that's another story. I've got all these extra slides, but we're running out of time, so you can--they're in the handouts, but don't worry about--I can take us actually going through all of them.
Anyway, the strategy that I'm talking about, you can think of it as somewhat parallel to partitioning an R squared and general linear modeling or multiple regression. We are entering variables hierarchically in a specific order, looking at the effect of an exogenous variable, that first variable that you enter that's outside of the influence of the other variables, and then sequentially adding additional subsequent variables, each time controlling for the effect of the variables earlier in the model. It's kind of interesting. It has a lot of parallels with R squared in the types of data and situations you can use it. And it's fundamentally different from approaches. You can think of it as partially now the beta. This a progression or an older model. That's actually led to some problems, I think, where we see some of the strange results with other ways. People have tried to, and have noted when they tried, to calculate an adjusted PAF by only looking at some adjusted beta co-efficient.
To give you kind of an illustration of how this might be used, I'll talk a little bit about a model where we have both smoking and low birth weight impacting mild mental retardation, and smoking has also an effect we know on low birth weight, that if the mother smokes during pregnancy it leads to low birth weight. And smoking has a direct effect on mild mental retardation, but it also has an effect by increasing rates of low birth weight, which then lead to increased rates of mild mental retardation. So the total effect of smoking, its true impact, can go through multiple pathways. The total effect of low birth weight, on the other hand, part of that is its unique effect on mild mental retardation. The part of the effect for looking at a PAF for low birth weight is, in fact, the result of increased rates of low birth weight due to maternal smoking.
When you think about, visually see that as well through Ven diagrams, which didn't quite show as well, that we have the effect of smoking, the total effect of birth weight influencing mild mental retardation, and the total effect of smoking is this area A, plus this area C. The total effect of birth weight the area B and that area C. And that went past. Let's do this really quick. And the area C is that effect of birth weight, that is, in fact, do or associated with smoking. The total effect of smoking and birth weight together are A plus B plus C, plus, as we'll see, there's some additional complex interaction that may come into play that's also part of that effect. The strategy that I'm talking about, again, and you can see that I've actually included some variance partitioning formulas on the bottom, really parallels this approach that we're first looking at the total effect of smoking. So smoking gets credit for its direct effect. It gets credit for any indirect effect it has through birth weight. The birth weight effect, on the other hand, is even after we statistically partition out or remove the effect of smoking, so it's the effect of birth weight that is not actually caused or driven by smoking. And you could successfully add additional variables, each again looking at the effect after controlling for the ones before it.
Looking at kind of the model sorts of questions you could ask, go beyond simply what's the proportion of the cases of mild MR in the population that's related to smoking, but you can start to ask to what degree is the effect of smoking passing through low birth weight, and what degree is the effect of smoking independent of birth weight or through some other mechanism other than low birth weight.
I'll refer in a minute to this concept of smoking, we can think of called exogenous variable in this sort of a model, in that it has nothing impacting it in the model. Of course, things impact it in real life and we can carry the model on backwards, but in this case, it's kind of a starting point, that exogenous variable, the first step in our sequential model.
I'm going to go through a couple of different ways of calculating this to illustrate different types of data and how you end up with the right answer naturally evolves out of the data, regardless of whether you've got confounding or interactions in the data. To start with, going to use a very simple example where we have confounding between two risk factors, predicting mild MR, risk factors A and B, and by the way, this is also made up data and that's why I didn't call them anything, just to avoid someone misquoting me somewhere, getting me in trouble. We have two risk factors A and B, in predicting mild MR that's confounding between the two, the likelihood that you've experienced B will vary based on whether one has experienced A. But there's no interaction in the risk activity sense. I'm focusing on risk ratios, not odds ratios, hence, the focus on this risk (inaudible) model. The first step, thinking back to how the ven diagram and that overall model, is calculated unadjusted PAF for your exogenous variable. And again, smoking should get the credit for its total affect. What's the affect of smoking? It's not just an indirect unique effect; it's also the effect that it has by influencing other subsequent factors. So we first calculate the unadjusted PAF for smoking. And we found that's 26.667 percent. We then estimate the affect that A has on B. So we do a very simple, calculate the risk ratio. We see that if you've experienced A, the probability that you'll experience B is about 1.20 times the probability that you'll not experience A. And that's an important number we'll come back to in just a minute.
We then statistically remove the effect of A on B. And in essence, kind of the key to what we're doing here is I'm adjusting the PE, the proportion of the population that's exposed to the risk factor. This is, conceptually, I think, a key point in that if you've got a causal model, if smoking is influencing birth weight, removing the effect of smoking doesn't just involve statistically looking, you know, what's the change in my beta in some logistic regression when I've controlled for smoking. You've also got to recognize the effect of smoking wasn't just in its beta co-efficient. Smoking also has an effect in the number of people exposed, number of babies exposed to low birth weight. So you've also got to adjust that PE estimate as well as your actual risk ratio or odds ratio. So what we're doing here is the--got the numbers down here.
You take the actual, the original number that experienced both A and B and then multiply by the inverse of that risk ratio that we just saw for B predicted by A. And you'll end up with 416.667 of exposed to A also experiencing B, and the balance now as we remove that affect on the PE, which is again, how we're operating this, would go up to 833.33. The probability of having mild MR for each of these groups over here remains unchanged. So we're not actually changing the relationship within the groups, but all we're doing again is saying what happens if smoking does not lead--let's remove the effect that smoking had on increasing the number of kids up here. Let's remove the effect that that risk factor had on bumping up the numbers here. And if we remove that effect, we don't really see 416 up there. We then calculate a PAF, an adjusted PAF for risk factor B, and in this case, we have to use a different formula.
As Deb pointed out, there's a zillion of these floating around. And in this case, this is a little different because we want to refer back to the original number of cases, not the number of cases in this adjusted table. So once we do that, we end up with 18.33 percent related to B adjusted for A, or low birth weight adjusted for smoking. And if we add those up, voila, we get 45 percent, which was the original aggregate PAF, which we know is the number we should be getting in the end. So in the end, those two add up nicely, if there's no interaction.
Let's look at a second example, where now we've got our risk factors uncorrelated or unconfounded, depending on which way you're coming. There's no association between A and B. But there is an inaction, again, in that risk activity sense that we can see in the risk ratios.
Very quickly, I'm just going to skip through this because we're running out of time, and you can see the calculations, kinda cool. As you'd expect, there's no association between the two. So the adjustment process actually does nothing in this case, and we end up with an adjusted PAF for B of 4.76 percent. If we add these up, what we get is the unadjusted PAF for A was 28.57--I'm on slide 29. The unadjusted PAF for A for smoking, that total effect has got any way which way possible is 28.57. The affect for B, low birth weight, after we've controlled for, after removing that the impact that A had on increasing its rates, is only 4.76, which gives us an aggregate, a total of 33.33, which is not equal anymore to that aggregate PAF, so we're off, in fact, by 19.05 percent. They're not adding up anymore. But the issue is, we've not considered the interaction. In this case, we know that there's confounding between the two. Again, you think of that as an interaction. How big is that interaction effect? Well, the expected risk ratio for the A-B group, if there was no interaction, would be the risk ratio for A plus the risk ratio for B minus one, which would give us a risk ratio of four, is what we should have seen, if there was no interaction. Well, we have five out of a thousand cases in our referent group that have mild MR. If the risk ratio should have been four, out of this thousand cases we should have seen 20 kids, if there was no interaction. In fact, we only saw 12. So the difference between the 20 that we would have seen with no interaction and the 12 that we actually observed is eight cases. That's how big the interaction effect is.
Well, eight cases out of the 42 is 19.05 percent. So once we then add in the interaction, it adds up again perfectly, the way it should. You might wonder what do we mean by an interaction, but at the very end, I've got an example kind of illustrating how considering this interaction actually can help answer some paradoxical stuff people have noticed previously in the literature. So again, the right answer, the correct value naturally emerges out of this process, once we take into consideration the interaction as an additional effect that needs to be modeled.
Finally, what if we got as complex, a very messy model in which case we now have an interaction, those risk ratios minus one do not add up to three. And we've got confounding among our risk factors, so we've got everything going on. Without going through all the calculations, what we get is our PAF for A is 14.29. The adjusted PAF for B is 29.12. There's no test afterwards. Well, except for Scott Gross, but the rest of you don't have a test. Your expected risk ratio for the A-B group would have been, had there been no interaction, was those two minus one or 3.667. The expected number of cases of mild MR would therefore have been 18.333, which is different from the 15 that we observed by 3.333, or 5.13 percent. So we're thinking, uh, we add that back and that's going to solve the problem. No, it still doesn't add up. It's only 48.53, so what's the issue? Well, because there's also the correlation between the two risk factors between A and B. We also have to consider that the confounding, what impact does that have on the interaction? Once we can do that by taking the actual number that we observed also multiplying that by--let's remove the effect of that association, multiply it by the inverse of the risk ratio and we get the number of cases associated with the interaction. What we're also adjusting now are taking into consideration the correlation between our two risk factors. And we get 4.286 cases, which translates to 6.59 percent. And now, again, it all adds up to the aggregate number that we would seek. Now, in reality, if you're doing this, you always assume that they're correlated and confounded. You would just calculate these numbers all the time. And what will happen, if there's no interaction, your interaction effect is just going to be zero. But I want to kind of illustrate starting with a simple model to a very complex sort of relationship that you inevitably, that the right answer inevitably always just emerges out of this.
Kind of an application to get at, so that sort of question. This helps to answer an issue that's been raised in the literature on some sort of paradoxical results sometimes we see when we're looking at multi-risk factor models and attributable fractions in particular.
For example, one example that's been discussed is the impact of high altitude and low birth weight on infant mortality. Babies born at higher altitudes tend to be born lower birth weight. We know that lower birth weight babies tend to have higher rates of infant mortality; therefore, you would naturally assume that higher altitude babies born in higher altitude areas are going to have higher infant mortality. No, you're right. That's not the case. In fact, we don't see that. Low birth weight babies born of high altitudes have lower mortality rates than infant babies born at sea level. And you get kind of this odd opposite effect of high birth weight babies and high altitudes, in fact have higher mortality rates than high birthrate babies born at sea level, so you've got something strange going on. But it seems like altitude is a protective factor, you know, offsetting, decreasing the effect of low birth weight.
What's actually happening is we have a population shift. If you look at kind of a standardized mortality curve for low altitude babies, at the higher altitudes, the mortality distribution just shifts down a little bit. It stays the same, the curve--the board looks the same because I made it up--but in essence, what's happening is your whole standardized mortality curve just shifts. Altitude is not having any effect on infant mortality, positive or negative. It's just it's shifting down. And what's happening is, for the high altitude babies, if we set some cut off of what's a low birth weight kid, you're just getting more of the high altitude kids misclassified as low birth weight because of that shift in their population.
What ends up happening, if you use normal or previous existing procedures for looking at this with a PAF, we find out that altitude appears to serve as a protective factor for low birth weight. If you're going to have a low birthrate baby, have them in Denver , not in San Francisco . The problem is, that same argument, like 45 years ago or 40 years ago was used with smoking. If you're a low birth weight baby, it's better that your mom smoked. If you do the calculations, it works out that there's a reduced--but it's wrong. The problem is, it's not taking into consideration the sequential effects. Smoking leads to birth weight. Birth weight doesn't lead to smoking. Because you're born low birth weight, it doesn't inspire your mother to run up to Denver or wherever to have you. It doesn't work that way. In fact, if we calculate this using this sequential causal sort of model, take into consideration the causal sequence, that smoking leads to birth weight which leads to mortality or mild MR, we see that the effective birth weight is overstated because these normal birth weight, high altitude kids are being misclassified as low birth weight. Within their population, they're not low birth weight, but we're applying this arbitrary cut off that makes them misclassified low birth weight.
I created some simulated data. If a million babies born high altitude, low altitude, here's kind of the equation for coming up with infant mortality. You don't need to worry about that. But again, just to make it very clean, instead of using real data where there's going to be other confounds. And what you see is, if we do the sequential partitioning with this causal model, we find out that altitude as our first risk factor, altitude has no PAF. There's no association between altitude and infant mortality, exactly what we should see, because altitude had no effect on infant mortality, so the PAF is, in fact, zero. PAF for birth weight is 28.88, and the interaction, 7.41. So if we, on the other hand, do this using other partitioning techniques, and I won't go through the calculations, we get an adjusted PAF for altitude that would be negative. Altitude's a protective factor. Well, we know that's wrong. There was no impact on infant mortality. If we look at effect for birth weight, the adjusted PAF for birth weight would be higher. In fact, it could be higher, depending on which way you did it, it could be higher than the 28.88. It could be higher than the 36.29. It's inflating what the effect of birth weight is, in part because of the screwed up altitude effect.
The sequential approach that I've described, we get an effect for altitude of zero, which is exactly, that's the correct conclusion. Altitude does not have--it's not a protective factor, it's not a risk factor. It's not having any effect. And we get the PAF for birth weight decreased in that the effect is, in fact, less, once we adjust for altitude. Once we stop misclassifying those normal birth weight babies in Denver , the effect of birth weight does go down, which again, is the correct conclusion.
The interaction, that seven point whatever percent, quantifies the degree that that aggregate PAF is capitalizing on how the low birth weight definition has been misapplied to the high altitude births, because of that population shift. So it gives us a way of kind of quantifying that effect within the model.
In summary, with apologies to Dr. Seuss: If your model, B1 causal, or your population shifts, if the factors that you're after put your data in a twist, think sequential, it's fundamental and you will see the light. Partitioning your PAF will get the answer right. Don't just adjust your risk ratio; your PE matters too. You leave it out and you will pout, when one plus one ain't two. So invert that RR, weight those P's; it's like slicing up a pie. Each PAF will now add up, which beats a poke in the eye. Thank you.