MCHB Conference Webcasts audio slides transcripts
Using Geographic Information System (GIS) to Analyze MCH EPI Data

MCHB/EPI Miami Conference — December 7 - 9, 2005

Using the Population Attributable Fraction (PAF) for Public Health Assessment — Transcript

 

KRISTIN RANKIN: I'm actually going to give you a chance to make some decisions about the analysis that we carry out here. And I am going to go quickly through this, and there are a lot more slides than we'll cover, but I'm hoping that you'll be able to take these home. There's SAS code in them. Try it out with your own data sets and don't be too worried if I go too quickly. Please ask me questions afterwards.

I'm going to pick up where Deb left off. And she was just saying before the question session that the number of variables that increases in the risk system, as they increase, you're increasing the number of possible sequences almost exponentially, because as each factor is added there's that many more possible sequences to eliminate in the risk factor in the system. So we realized really quickly that we wanted to stop what we were doing our paper and pencil and two-by-two tables stratified and try to get to a more automated method for doing this. And we're only partially the way there, but we're starting to use SASS, or you can use any statistical program, to generate these prevalences, generate these relative risks stratified, and try to at least try this method a bit, because we know that part of the reason it's not used very much is because there isn't a standard statistical program that will take care of all these steps for you at once. And when I say, "take care of," it doesn't mean plug and chug. It means, of course, with your conceptual framework and what you're working on, but there's not a real elegant way yet out in the statistical software to do this. So I'm just going to show some tips for ways you can possibly do this with statistical software, but there are many possible ways so feel free to play around on your own with that too.

I wanted to talk a little bit about modifiable and unmodifiable risk factors. The way that Deb and I have been thinking about this, there are modifiable risk factors that we can actually try to address. And then there are unmodifiable risk factors; the most common one I can think of is race ethnicity, where when I look at a pie chart of the population of attributable fraction, it's more satisfying to me if I see all the pieces of the pie being something I can do something to address. And if I see race ethnicity as part of the pie of what's causing low birth weight, and we know it's not really causing, it's really a risk marker more than anything. Then I really can't think of an intervention to target it, so what we've been doing is just controlling for unmodifiable risk factors, and including the modifiable risk factors in our risk system.

So it's not always totally clear, if a variable is modifiable or unmodifiable. And it really, really depends on your conceptual framework, where you're coming from and what you want to accomplish. One example of this variable is maternal age. You may just want to control for the differences in maternal age within your sample. But you also might be wanting to target teen pregnancy or poor outcomes due to teen pregnancy. You might want to look at age as a modifiable risk factor as a dichotomous variable between teen pregnancy and pregnancy after teens.

So we're going to do a quick case study here, and I'll just touch on parts of this, but we want to start working with three variables, building from Deb's approach with two variables. But I want to ask you some important questions to drive this analysis, and the scenario is that you're asked to prioritize funding for interventions to target the high rate of low birth weight in your jurisdiction. And you have a data set for relatively reliable data on smoking during pregnancy, cocaine use during pregnancy and poverty level. Now, we all know that this reliable data set does not exist so you know that this is our contrived data. And I'm using the same data set as Deb did so we can compare later the different methods and what different results we'll come up with. So I just wanted to show you quickly some descriptive statistics. We have 20 percent smoking in this sample, three percent cocaine use, and 42 percent under the federal poverty level, so it's a rather high-risk group. But we have seven percent low birth weight overall, and that's that 700 number that kept coming up in Deb's examples. That's the total number of cases we have in this sample out of 10,000.

Just quickly, I wanted to give you an idea of the component PAFs and all of the overlaps. So this is just three variables, but you have the overlap between smoking and cocaine, the overlap between smoking, cocaine and poverty, and every possible combination. So we're going to do the following data analysis to try to address that. And I don't know if anybody remembers those Choose Your Own Adventure books, where you would get to the end of the segment and it would ask you where you want the story to go next, and it could tell you which page to go to. So we're going to do a choose-your-own-adventure data analysis. So I'm counting on you to raise your hands now if you believe--there's two decisions I'm going to have you make, whether each risk factor is modifiable or unmodifiable, and what method of PAF we want to use to calculate. So can I get a show of hands who would consider smoking a modifiable risk factor? Thank you. And there's no right or wrong answer to any of these, so--well, depends on your perspective, so would you consider cocaine a modifiable risk factor? Show of hands. Okay. Those were pretty straightforward. Depending on your perspective, you might or might not consider poverty modifiable or unmodifiable risk factor. So can I have a show of hands of who would consider it modifiable? Oh, it's about half-and-half. Well, we'll go with the modifiable because I like an optimistic crowd. We can challenge poverty. We can challenge poverty right here, right now, with this fake data set.

Because we're going to go with that example, I want you now in your handouts to skip to page 28, slide 55. And then you'll see the slides that we're skipping are the example considering poverty as unmodifiable, and you can go back and look at that later. And if you consider poverty a modifiable factor, you can just change that, think of it as race ethnicity instead of poverty in that case, where you're just controlling for it.

So I've included SAS coding here, but I think I'm going to skip over a lot of it and just try to give you an idea of what we're trying to accomplish. So basically, I've used a combination of pop three and pop gen mod. And gen mod can be used to model relative risks in cross sectional cohort data, using the log link and the binomial distribution. And I'll let you get into that later. But basically, the first step here is to calculate the aggregate PAF for the modifiable exposures. So we've considered all three of our variables modifiable here. So I've created a new dichotomous variable, and this is like when Deb showed, she collapsed the different levels into a dichotomous variable for any risk versus no risk. So I've done that and called this variable Mod X. And here, I'm just looking at the frequency table of low birth rate versus Mod X.

In the next slide, I run the very simple model of low birthrate equals Mod X. And here I'm just getting--the desire is just to get a relative risk out of this model. And in the next slide, you'll actually see the results. Right now, we have 700 total cases, as we keep seeing this number coming up. We had 525 of those cases who have at least one of the risk factors mentioned. And then if you look down to the next table, these are SASS results from the estimate statement in PROC GENMOD, where the relative risk for this relationship is 2.56. So you see the familiar computation down there where we're taking the prevalence of exposed cases among all the cases, and the relative risk minus one over the relative risk to get an aggregate PAF of .46. So that just means if we eliminated poverty, cocaine and smoking in this population, we would eliminate .46 of low birth weight.

So the next part, we want to look at every possible sequence so if you remember with three variables, there will be six possible sequences. And the first part of that is looking at each factor removed from the risk system first. So first, we're going to look at smoking removed from the risk system first. And this is just frequency and a model that will help us get at that. With this model, you can see that the model gets significantly more challenging when you're looking at all three of these risk factors in a partitioned way. So you have not only smoking, cocaine and poverty, but every second level and third level interaction term in the model, and the nice thing is with this, you can use one model for all of these different interactions, no matter how many variables you have, because stats have something called the estimate statement, which is a really nice feature that allows you to exponentiate the betas in a way that you can calculate every stratum specific relative risk for as many interaction terms as you have.

So I've just commented out here where the estimate statements would go in this code, and I'm going to show you the estimate statements in the next slides. And they might be a little overwhelming right now, but if you take them back and start using them maybe with your own data, you can see that you can specify every level of every variable in your model to get the appropriate stratum-specific relative risks. So you can see I've commented, smoking, where cocaine equals yes and poverty equals yes. So we're going to have four stratum-specific PAFs for smoking because there's two strata in each of the other two variables.

So here, the results, these are the stratum-specific prevalences, so what this table is showing you is that on the 700 low birthrate cases and 200 of those are smokers. Well, we already knew that, but how do smokers parse out between cocaine and poverty status? So you'll see the colors on the slides. They'll be associated with the colors on the following slides when are from the model. So these are for stratum-specific relative risks.

So then we're going to calculate the stratum-specific PAFs for smoking, removed from the system first. And as we saw in the equation earlier that Deb showed, we can create a summary PAF here which is the sequential PAF nor removing smoking from the risk system first by adding all the stratum-specific PAFs, and that's .07. So you can see, the following slides go through this for cocaine removed first and then for poverty removed first, so I'm just going to fly by because it's the same exact thing, only you see the prevalence has changed. We're looking at only amongst cocaine users in this case. But for cocaine, we'll end up getting a sequential PAF, if removed first, of .10. And for poverty, we get .28.

So we're going to be on slide 73 on this next one right here. So then we also need to calculate the sequential PAF for each of these factors removed second from the risk system, and then each of these factors removed third from the risk system. And in order to do that, we need these sub-aggregate PAFs. That's what I'm calling them right now for lack of a better word. But basically, we want the PAF for smoking and cocaine before we've gotten poverty out of the risk system. So whether cocaine came first or smoking came first, we need that aggregate of PAF for smoking and cocaine controlling for poverty in order to know what poverty is going to be when it's removed third. And if that's a little complicated, please ask me questions later. But anyway, these are the six possible sequences I'm going to go through now. And again, I'll show you two of them and then you can belt the idea and take a look for yourself. But the yellow colors are always going to be smoke, in the slides on the screen.

The blue is going to be cocaine and the red is going to be poverty. So in sequence one, remove smoking, then cocaine, then poverty. So as we saw before, we already calculated the .07 for smoking. That's when it's removed from the risk system first, controlling for the other factors. But then we want to subtract the .07 from the aggregate of smoking and cocaine together, controlling for poverty. Since we want to know what's the effect of cocaine after already removing smoking from the risk system. So you can see in that PAF sequential 1B, that's PAF sequence 1B, you're just removing cocaine from the risk system after removing smoking, controlling for poverty the whole time. So that becomes .08. And then for poverty after you've removed both smoking and cocaine, the portion of the pie that's left for poverty is .31. So this notation STP here is the same as our aggregate PAF, which we mentioned was .46 for the entire system. So we know there's .46 in the whole system, and what are we going to have left for poverty, once we've already eliminated smoking and cocaine? So here's the pie charts for these sequences, and you can see what's interesting here is it doesn't matter if you remove cocaine or poverty from the risk system first, they actually add up to the same in sequence one and sequence two, but that's not always the case.

So I'm going to fly by the third, fourth, fifth and sixth sequences and get to the average PAF on slide 80. You can go back and look at that later. Now, the interesting thing is that we do have six sequences, but for each risk factor we actually only have four sequential PAFs to average. And the reason for that is because you saw in the sequences that I showed you, one and two, smoking was removed first both times. And then there's also going to be other sequences where smoking is removed third, twice. So we're actually only going to come up with four unique sequential PAFs because of that overlap there. And if there's questions about that, please ask me later too. So we're going to average, and when you take the slides home later and look at them, you can see exactly which sequences I've taken these PAFs from, and I'm going to average the four PAFs for each risk factor. And I get .08 for smoking, .09 for cocaine and .29 for poverty. Now, these are the average PAFs for the six possible sequences.

And this summary slide just sums what would have happened if we handled this differently. So the first pie actually shows the example that Deb showed. And this is before poverty was even into our data set, for example. In the second slide, you'll see the results if you had chosen poverty as unmodifiable, which some of you had. And you can go back and look and see how that was calculated. But basically, we're not including poverty in our pie then, if we just want to control for it. And you can see the aggregate PAF actually gets depressed by one percentage point from .16 that Deb was presenting, to .15 when we control for poverty, but don't consider it as a modifiable risk factor. And that's what we'd expect usually when we control for something, the measure of effect gets depressed. And then the third example, I don't know if you can see in the red, the third component of highest poverty, and that's been added in. So the aggregate PAF for the example that we just went through is .46.

Just in summary, these partitioning methods allow for a precise estimation of the population attributable fraction while controlling for any other factors you want to look at. And then it also allows for these mutually exclusive estimates that make comparisons of the potential impact of an intervention strategies among factors possible. So you can kind of take a look at all three of these risk factors in any sequence that they may be eliminated and then see their potential impact.

So the last slide is just to let you know these are not merely all our own ideas, and we've synthesized a lot of previous work to put this workshop together. So these are all further reading, if you're interested in this topic, please address. Thank you.