MCHB Conference Webcasts audio slides transcripts
Using Geographic Information System (GIS) to Analyze MCH EPI Data

MCHB/EPI Miami Conference — December 7 - 9, 2005

Using the Population Attributable Fraction (PAF) for Public Health Assessment — Transcript

 

DEB ROSENBERG: So this is the outline for the whole workshop. And what we're going to do is I'll give a brief overview of population attributable fraction, go through conceptually a bit--some of the methodological issues, then I'm going to give a very simple example with two variables so, you know, not, I mean, I guess two variables is multi-variable, but not really. But to get us to understand in a very basic way what's going on. And then Kristen is going to give a more elaborate example, but still, only with three variables, so, you know, you're going to have to extrapolate. It gets kind of complicated and we're not presenting anything beyond three variables today, three independent variables, I should say. And then Craig Mason is going to talk about a special case that he's done extensive work on, looking at direct and indirect effects, so it's a special case of when variables are in a causal pathway. Our example, our simple and our two-variable and three-variable example, the variables are not in the causal pathway so you'll get to see another case of how to handle PAF under those circumstances.

And then we're counting on our colleague, Juan, and all of you, but we know that Juan isn't shy, so we want to be held accountable for, you know, yeah, this is an interesting method and we're going to try and elaborate it for you, but what does it really mean when you're sitting in a state health department and you wanted to do priority setting or use it somehow to help you inform policy, program planning, et cetera? Because it's a workshop, we ideally would like to be able to take questions on a whenever basis, but again, because of time, we don't think we're going to be able to do that. So each of us is going to try and stop one or two times in our presentations and see if you have questions you really need to ask us at that time so that we can move forward with everybody understanding where we are. But I would just ask again, it's because of time constraints we're going to fly through this, and hopefully, we'll get to do some more on this at a later date and you can ask more questions, or you can grab any of us at any time too.

Okay. So I'm going to go into the first part, just a quick overview. I've just put on this slide what I consider the four basic attributable risk measures. I'm really assuming that everyone in this room has seen these and understands these in their crude form, so I'm not spending a lot of time here, but just to know that these are, of course, the measures based on risk differences, as opposed to risk ratios that we're used to working with, relative risks and odds ratios. And the one we're focusing on today, of course, is the last one on this list, the population attributable fraction, which is really the difference between the marginal proportion of risk in a population, compared to the proportion of risk in the unexposed. And just so you know, I always set up two by two tables like this. There's not a right or a wrong here, but throughout the presentation I always have the exposure or the risk factor as the row variable and outcome as the column variable. It's not, you know, you can do it the other way too, but just as you follow through, that's the way we're going to be presenting it today, at least Kristen and I.

So there are lots of algebraic formulations for the PAF. And you're probably going to see more than one of them today so I just want you to be aware, it's annoying to me, you know, when you read a textbook or you look at papers or whatever and people use different formulas and they're the same thing, but with different symbols, et cetera. So this is the formula that we'll be using for the way we calculate the population attributable fraction. So you notice that it's a function of the relative risk. And one reason we've chosen this formulation is, as you'll see later, we'd like to, in a multi-variable context, to go on and be able to take advantage of modeling procedures that are typically, you know, those beta coefficients give us the ratio measures and not the difference measures, so this is going to be a function of the relative risk and then multiplied times the exposure prevalence. So the number, when I say, "total exposed cases," that means the number with the outcome who are exposed over the total number of those with the outcome in the whole population.

In the multi-variable context, why does this get important? We would like to generate PAF for every factor we're interested in, however many we have. And we would like there to be those PAFs to have a relationship such that we can add them up and they will sum to the total of what we're going to be calling the aggregate PAF for all of them combined, and you'll see how that goes. It's not so easy.

So generally then, these mutually exclusive and mutually adjusted PAFs isn't so easy. And, in fact, I mean, you might be saying why aren't we just using adjustment methods like we use for those ratio measures, use either stratified analysis or modeling to get adjusted PAFs. It actually doesn't work quite that easily because we have overlapping distributions of the exposure groups and so we actually won't get the result we want if we just use regular adjustment procedures, and you'll see that computationally in a minute.

So we have to make decisions about how variables are going to be handled in this multi-variable context. And, of course, we have to do this in every kind of modeling or multi-variable work that we do. But it's going to be a little different in this world so just to say we're going to talk to you about ways you're going to have to make decisions about how you handle what we're calling modifiable or unmodifiable risk factors, risk factors, risk markers, whatever terminology you feel comfortable with, confounding in effect, modification will be dealt with slightly differently than we're typically used to doing it. And also again, Craig will talk about handling variables in the causal path.

Always having this strong conceptual framework or a logic model is important when we do model building, so we don't want to just do exploratory model building. We want to do it in some systematic fashion. And in the context of population attributable fraction, it's actually particularly important because in addition to having issues around how you're going to interpret the results that you get or what you think how you build a model that's--you want to make sure you have the--it measures as much as you can about risk factors for a particular outcome. Here, there's actually going to be calculational things that change, depending on how you build that model, so there's more than just interpretation involved here.

Okay. Here's some terminology. And again, I think we're mostly consistent across all of our presentations in this terminology, but these words get used differently. So I'm going to tell you how I'm going to use them. The aggregate PAF is, again, the total PAF for many factors, more than one factor. But when they're considered as part of what I'm calling a risk system--you'll see more of what I mean by that in a minute--and then the component population attributable fraction. These are separate PAFs for each combination of all the exposure levels in that risk system that we've just created to get an aggregate.

Then we have what's called a sequential population attributable fraction. This is the special case when you've got a PAF where you've said, "What's the one order that I'm interested for removing or eliminating factors?" so as you know the population attributable fraction, the interpretation is what percent of the outcome can be attributed to factor A? Well, now I've got factors A, B, C, D, E or however many of them that I have, and I'm going to get many sequential PAFs because I could say, "Well, I want to focus on factor A, then factor B, then factor C, but I might want to do it in another order also."

Then we have the average PAF. And the average PAF is a method for summarizing overall those possible sequences that we're going to have, when we have lots of factors involved.

So here, my two by two tables again. These are the crude tables. My example of two variables is smoking and cocaine as independent variables, looking at their association with low birth weight. This is contrived data, fake data, but purposely chosen because everybody in here is really familiar, I think, with these variables so I don't want us to get caught up in the particular associations. I want you to be able to focus on the method. So the crude relative risks, even those it's fake data, I don't think are too far off from what we see. Crude relative risk for the smoking, low birth weight association is 1.6, crude relative risk for cocaine and low birth weight, 4.77. Notice, obviously, the cocaine low birth weight association stronger than the smoking low birth weight association. How about when we go on and then calculate the crude population attributable fractions? Again, a function of the relative risks, but also then having to multiply by the number of exposed cases over the total number of exposed. You can see these numbers a lot throughout the presentation. So there are a total of 700 low birth weight deliveries here, and you're going to start watching the smoking and cocaine strata, if you will, and how those play out in the calculations.

So these are just crude. And when you look at the crude, you see, even though cocaine had a much greater relative risk, the crude PAFs, at least, are a lot closer; .107 for smoking, .102 for cocaine. 10.7 percent of low birth weight births might be attributed to smoking, 10.2 percent to cocaine. Only in this very crude analysis.

Now let's move towards thinking of smoking and cocaine as part of what I've called the risk system. So now I've created a table here that shows on the left-hand side all of the levels of those risk factors, so you could be both a smoker and a cocaine user. You could just be a cocaine user, just a smoker, or neither. And let's collapse those categories; really treat both smoking and cocaine. When I say a risk system, everything in the risk system now at this point is it going to get treated as a single variable. So let's create a new two-by-two table that just looks at you're either a smoker or a cocaine user or both, versus neither. And here you can see that that aggregate relative risk considering both variables combined is 1.88.

And now let's look at the components of the smoking and cocaine risk system, and I know in your handouts you've got black and white. You can try and--if you look at the slides, I've tried to color code to help make it a little easier to follow what's going on. So the components of the risk system are, for those women who are both smokers and cocaine users, we've got a relative risk of 5.89. For just the cocaine use only, the relative risk is 4.3, and for the smoking only, women relative risk of 1.36.

Those are the relative risks. Now, let's look at PAFs for that risk system, considering both smoking and cocaine. The aggregate PAF then for that combination, that new two-by-two table that I created couple of slides back, times that new aggregate relative risk, 1.88 there, and I get then an aggregate population attributable fraction of .16. This is another important number to--just file that away because we're going to refer back to that a lot. So this is both smoking and cocaine combined, together what is the population attributable fraction for low birth weight?

And then on the right-hand side, you'll see that with the component PAFs, I have a component for both smoking and cocaine, a component for the cocaine users only, those who have used cocaine but didn't smoke, and a component for the women who smoked, and this is now the color coding. So you've got red, blue and green following those rows in the table on the previous slide, and these components sum to that aggregate PAF.

So we have achieved one thing we wanted to do which--we're trying to achieve a system of population attributable fractions, such that they will sum to the aggregate PAF. And we've achieved that here with the components, but as you can see in the pie chart, I still can't really separate out, here's the PAF for smoking, regardless of cocaine use. Here's the PAF for cocaine, regardless of smoking. I still have this overlap. One of the components is that you were both a smoker and a cocaine user. So, yeah, they sum to the aggregate PAF, but I've still got overlap. I haven't managed to get rid of that overlap.

So let's go through some of the adjustment procedures, and I've put adjusted in quotes because as I said before, we start out using procedures that you're all going to be totally familiar with, but they're not going to be enough, so I am going to start out using the stratified approach. And I've given you here the formulas again for PAFs, both when there is affect modification--and this is on the multiplicative scale--and when there isn't. We've made a decision as we proceed that we're going to use the top formula, which is actually using the stratum-specific estimates as though there might be affect modification. These data turns out there's not affect modification between smoking and cocaine, but it's assumption-free and we feel like it's the most, I don't know, honest, for lack of a better word, approach because we're not making any assumptions about whether there is or isn't affect modification here. So we're going to be using that top formula, which is a summation, over the stratum-specific relative risks times sense of the number of exposed cases in each strata, a proportion of all of the cases in the population.

So here's what you're all used to looking at. Here are two-by-two tables adjusted, so the first one is the smoking/low birth weight relationship adjusted for cocaine use, so you see the strata for cocaine is yes, and cocaine is no. And you can see that you can believe me now that there is no affect modification here because the stratum-specific relative risks, 1.37 and 1.36, right? There's no affect modification.

And then I'm going to calculate the stratum-specific PAFs, and here's where you've got to take your pen because we noticed actually just this morning that there's an error in this slide. So at the top--

I'm sorry; I thought I heard somebody say something. Do you see the--the top table here on the--here, I'm just going to go--this is easier for me. Can you hear me? Okay. So on this table; I need the stratum/specific (inaudible). Did I do this right? I want to make sure I get this right. It's the next slide, you're right. This is (inaudible). Forget it; put those pencils away (inaudible). So this is stratum/specific, if you will, PAFs that are going to get summed together to get down here to the adjusted, in our lingo, if we were just using our regular way of thinking about adjusting using the stratified approach. This is the PAF for smoking/low birth rate relationship adjusted for cocaine use. So remember, we had the proved PAF was .107, 10.7 percent. And now we're adjusting it and as we often see with adjustment, we've now (inaudible) diminished the PAF for smoking.

UNKNOWN SPEAKER: (Inaudible).

DEB ROSENBERG: Great question. So you are going through the stratum-specific approach, we've got the stratum specific enumerated. In fact, the stratum specific number of exposed cases, but it's always over the total number of cases across all strata, so that's why you're going to see this number 700 a lot. Great, great question.

I'm sorry. Okay. So this is the other way around. And this is where the error is. The error is up here. This was cutting and pasting so I've got it, the stratum-specific relative risk, this is the cocaine/low birth weight relationship stratified by smoking. I need this relevant risk then. So this, you've got to substitute 4.23 minus one over 4.238.

UNKNOWN SPEAKER: That's right then?

DEB ROSENBERG: No, this is not the right answer. You'll see that in the next page. This is right, okay? So .099 minus .042 is that this comes out to be if we do the calculations. I'm going to actually correct these. These slides supposedly are going to be up on the conference website at some point, so you can get a corrected version at some juncture.

So now look at these pie charts now. And supposedly, we have a smoking low birth weight relationship controlling for cocaine and a cocaine low birth weight relationship controlling for smoking. But, and this is well known in the literature, this has been discussed a lot, when we try and add them together to meet that criterion of equaling the aggregate or the total PAF, we've failed. So we have -- here's from four, this is the same pie chart from before, this is that .16, the aggregate PAF not only modified after stratification, smoking PAF controlling for cocaine, cocaine PAF controlling for smoking. I've got more than the total of the aggregate. And you can imagine, this gets compounded if you really have a multi-variable world that you're thinking about. So as we get more factors, this isn't enough, so we have to move beyond these adjustment procedures, and I go back to this one.

So what does this mean then? We've got to go beyond these procedures. This is a useful--we didn't waste our time doing that stratified approach. We're going to use those PAFs, but now they're going to have a different interpretation than what we usually mean when we say, "adjusted." And it's going become what we call one part of a sequence, a strategy, when you think about it, of eliminating risk factors from a population. And these are going to be--those PAFs we just calculated are going to be what we call sequential PAFs, one part of a sequence, depending on how we decide to focus on different variables.

For the smoking cocaine risk system, there's two possible sequences; a simple example: You can eliminate smoking first, controlling for cocaine use, and then go on to try and eliminate cocaine use, or the other way around. And with each sequence though, we're going to have two pieces, two sequential PAFs. So now, you've got notation again. We're going to start calling this the PAF subscriptive, "SEQ" for sequential. The sequential PAF for eliminating smoking controlling for cocaine use is what we just calculated, the adjusted PAF for smoking. So remember now, it's for eliminating smoking, controlling for cocaine use. But then the next part of the sequence is eliminating cocaine use after smoking has already been eliminated, and that's just the remainder between that first PAF we calculated and the total. And you can see the subtraction here, so the total is .16. And when you minus that point .076 for eliminating smoking first, and we get another result, another PAF for eliminating cocaine, second in the sequence. And then this next slide is exactly the same process, but we're going to do it for cocaine. So here, it's for eliminating cocaine use first and you see the .099 that we got from doing our stratified analysis, and then again, we do the subtraction from the aggregate PAF to get the two sequential PAFs, important for this sequence. By definition, because we've just subtracted from the aggregate to get that other sequential PAF, luckily, these sequential PAFs within each of the two possible sequences sum to the aggregate. So now we actually can create some pie charts, which is always the tendency with these PAFs. We want pie charts that parse out all the factors we're interested in and their contribution to the outcome. So the smoking first sequence has these two sequential PAFs, .076 and .084. They add up to .16. And then we have the other sequence, cocaine use first, .099 plus .061. That also equals .16.

So that's great. We've achieved the goal of having mutually exclusive pieces of a pie that we can say, "Here's something for one variable being eliminated first. Here's something for another variable being eliminated first." But we still haven't achieved being able to say, "Regardless of how we eliminate risk factors, what can we say? What's our estimate of the impact of eliminating smoking, for instance? Or what's our estimate, on average, if we eliminate cocaine?" So we have different sequences, but we don't have some overall summary. The average PAF is a way to achieve that summary. To calculate the average, the sequential PAFs, are rearranged, leaving the two for smoking together and the two for cocaine together. So now we have eliminating smoking first, averaged with the other possibility, eliminating smoking second, and eliminating cocaine use first, averaged with eliminating cocaine use second, and you can imagine, just think about a multi-variable system where you've got eight variables you're interested in, or six variables you're interested in. Starts getting really complicated.

Here's the way these averages work, and what we've done in very simple arithmetic means, so the average PAF for smoking--and we've just written it notationally like this so that you can see it's the two sequential PAFs--the first one is smoking controlling for cocaine. And the second one is smoking after cocaine has been eliminated, and we're just dividing them by two. We have two sequential PAFs; we're taking a simple arithmetic mean. When I do that, I get a PAF for smoking that's an average that's .07, or seven percent. I do that for cocaine, rearranging those sequential PAFs again, take an average, and I get .09.

The averages also, thankfully, add up to the total, the aggregate PAF. And in the pie chart they're rounded, but you can see at the bottom here, for those of you who care about the fourth decimal place, you know, you can see what they really came out to be. So just think back for a moment, and then I'm going to turn it over to Kristen, remember the crude PAFs: 10.7 percent for smoking, 10.2 percent for cocaine, and now I'm saying, on average, 6.85 percent or seven percent, if you will, for smoking. 9.2 percent for cocaine, after counting for the other factor in the system.

I think you can tell that this is my bias, and I don't know, Kristen and Craig will say what they think. My biases, I like the average PAF, I think is the important measure that I want to focus on because in the real world, you know, we don't get to say, "Let's eliminate factor A first and then factor B and then factor C." We barely have control over any of that. And so in the real world, things are happening in tandem and we have control over some and more control over some and less control over others, so I would like to know, on average, some summary measure of what I might be able to attribute to one factor or another. So just to show you briefly how quickly it will get too complicated as we add more variables model, I've just written for you here, how many sequences, and it starts, if you add more variables than five, it starts going up really, really, really fast. So I want to just stop for a minute, pause, and see if you want to talk just for a few minutes about this before Kristen goes on and does even an more complicated example that adds another variable to the system. Does anybody have anything?

UNKNOWN SPEAKER: Just one point. (Inaudible). Sometimes you have a mathematical solution, an elegant mathematical solution, but doesn't make sense (inaudible) how do you explain that to people who are not (inaudible)?

DEB ROSENBERG: Well, we think, and we're going to hear from one and hopefully we'll talk about--

UNKNOWN SPEAKER: There is a question.

DEB ROSENBERG: Oh, I'm sorry. Again, he's challenging us appropriately about, well, this is nice statistically or mathematically, but hey, what difference does it make? Is that a fair summary of what you just said? And we agree with that, and so we want to talk about that more, but I'll just give my quick response and right now, I do think you saw how things--it's like anything else, you know, someone could say to you, oh, your adjusted relative risk, you barely tweaked that compared to what the crude was, what difference does it make? And so I think we have to think about it in that way. And we did see some differences going from the crude to the average PAF for smoking and cocaine. And I really think you're going to be able to look at those and think through in the context of politics and funding and everything else, that it should be able to inform decisions you make out in the field, but unless Ron wants to comment, I think you should let--well, first, we should see if there's other questions.

UNKNOWN SPEAKER: Just one quick question and quick comments on these stats. I remember when we started this whole discussion. And I think that we've been talking now in this session for probably two or three years. And it started with work that I will not refer to in particular. But where we are so, at least, just turn the volume because you almost (inaudible).

DEB ROSENBERG: You got to really hold it close, yeah.

UNKNOWN SPEAKER: I talk pretty loud, so, yeah. So basically, if you would add all the factors, population, growth, that they calculate it, you would be able to just wipe from the face of the earth, (inaudible) that wasn't a big policy (inaudible). So if you see the change in the factors, where you plug in ten (inaudible) of course, the change would be more dramatic, or could be more dramatic. What are the long-term implications? And I'm going to talk a little bit more later. But if you are going to invest a million dollars on something to convince somebody that you use (inaudible) said about ratios and you have to manage the investments, it won't work. It won't work because it's not correct. So that was the foundation. And that's coming forward now.

UNKNOWN SPEAKER: Somebody just asked a dumb question, so I'm going to ask (inaudible). I'm glad what math adds up, but I'm also worrying about (inaudible) you were saying, if I stop smoking, the rate will become--does not the crude population triple risk actually answer that question, and it's adjusted only when I'm trying to adjust and reduce the fact, multiple factors at one time. So I guess trying to (inaudible) math not only adds up, but conceptually, which math goes with the answer to which question?

DEB ROSENBERG: Actually, that's a great question. Yeah, we're not saying that either the crude or even the simple adjusted doesn't have use on its own. I said though, I'm biased. I like these averages because I think in a multi-variable context. I think we live in a multi-variable context, but you're right. Depending on the purpose, you might want to use one of the simpler measures. What I thought you were going at, which I'll let go is thinking--yeah.

UNKNOWN SPEAKER: The reason I say that (inaudible) smoking cessation program, and I'm really trying to say this program works at this level, what (inaudible) do I have as great as the multiple-factor context for low birth weight, but I'm really only interested in, in this focus (inaudible) falls in, what will be the factor (inaudible)?

DEB ROSENBERG: Right. And you might want to use that sequential PAF that's based on eliminating smoking first. We're going--these are great questions. We were hoping that we would stimulate all of this, but I want to let Kristen go and leave plenty of time for Craig too. And hopefully, we'll have time to get into more of that kind stuff afterwards. Thank you.