MCHB Conference Webcasts downloads audio slides transcripts
Using Geographic Information System (GIS) to Analyze MCH EPI Data

MCHB/EPI Miami Training — December 5 - 6, 2005

Q & A and Closing Remarks — Transcript

 

RAVI SHARMA: All right. I think everybody is done with the exercises. So why don't we go on to a little bit question and answer. So Russ and I and Diane and Z are here. So I understand, let's see if you have any questions on, people have different questions that they were asking Russ. Shall we — grab the mic.

UNKNOWN SPEAKER: How to you bring in a new data set into ARC.

RAVI SHARMA: I think the question was can you, if you have health service areas.

RUSSELL KIRBY: One of the questions that was raised, which we didn't have a specific exercise about was let's say you've got data by county, okay? And your health department has created health districts. And say, for example, your state has seven health districts and they're aggregates of counties. But how do you use ARC to define those districts and then pull the data together so that — without having to recalculate everything individually that it will do that calculation. So that was the first question. So do you want to — I think Ravi has a way that he can actually show you how to do this.

RAVI SHARMA: I was trying to see if I can find an assignment here. An assignment we can use your data set.

Diane: I can use anything that has a polygon.

RAVI SHARMA: Let's just use —

UNKNOWN SPEAKER: Pennsylvania County .

RAVI SHARMA: Okay. I think it's F —

UNKNOWN SPEAKER: (Inaudible) I think we can use — why is it not visible here?

UNKNOWN SPEAKER: Is that the state?

RAVI SHARMA: It's for New Jersey .

There are two ways to do this. Let's assume you all work in different states and different states have these what we call, for example in Pennsylvania we have what's known as the Health Service Areas, am I right? And these are comprised of different counties. Like, for example, there is a Southwestern Pennsylvania Health Service Area is made up of Allegheny, Westmoreland, Washington County . I think Indiana and a few more. It's a collection of about ten counties. The question is what if you want to create a polygon, you don't want separate counties, right? You want all those counties merged. So one way to do that is for you to create another attribute file. A variable. An attribute — an attribute file in which you will have a variable that would specify, for example, if there are ten service, health service areas in Pennsylvania . Service area one would be made up of Allegheny, Pittsburgh , you know, whatever, the ten counties. It will have an attribute value of one.

And the second health service area that you might create might have an attribute value of three — two. Three, four, five. Until you exhaust all the different, you know, the whole set of health service area. And that would be a column of values that you will have. You can create that. Either you can create it in — I was trying to see if I can do this — we can create it both in ArcGIS by making the table editable. And when it's editable, you can add values. Otherwise you have to go into, you can do that in Excel and then bring the table back into ArcGIS. The other way to do that is you can use the feature, it's called — you just click the counties you want to merge and you can simply use a union command to unite all the four counties together by simply clicking on the four counties. So that's another way to do it.

So that's the geo processing function. You can use clipping. I think we use one feature, we use buffer. You can use a clip feature. The clip allows you to use a much larger geographic data set. Let's say you have data set for Pennsylvania , six or seven counties, and you're working with ten counties. So what you can do is you have two layers. Your input layer is to Pennsylvania County . And your clip layer is the ten counties you're going to work with as your working file. And you can use that to clip your bigger file and get your smaller file. So that's a clipping. The other function that you can use in GIS is called union. It simply unites the different polygons together in — can't do it. It's taking a long time.

So that's really the answer in a nutshell is you can do it using either — you can either by developing a numerical code, which will be another attribute variable list in your attribute file that will give the numerical number for coding scheme. So you essentially what you do is develop a coding scheme. If you have 67 counties and they're grouped into ten health service areas, you will have ten different values for a variable called health service. HSA, let's say, and then you use that command. Then one, two, three, four can then be used to merge the counties. And actually it's a very simple function in ArcGIS to merge the counties together.

UNKNOWN SPEAKER: What else are we running?

RAVI SHARMA: It takes a long time. I have a lot of junk on my laptop, that's why.

UNKNOWN SPEAKER: Can you use it for later use?

RAVI SHARMA: Yes. Now, one thing you have to remember, whenever you join anything to your map, to your shape file that is a temporary join. Okay? I don't know whether you realize that. Whenever you join an attribute file. For example, in an exercise, the exercise here to join, right? If you want to make that permanent, you have to export it as a DBF file, or you have to, so there's two ways. You expose the table as a DBF file. Or you can go to the layer command, right click, and export it as a shape file.

So everything that you joined is now permanent.

So two ways to do it. Two ways to make your joins permanent. One, export it as a DBF file in your attribute file. Options. Drop down. Export. And export to the DBF file. The second, if you want to shape, save it as a shape file, go to the table of contents. The layer that where you join the files, right click and export. That will be exported as a shape file with the maps and all the attributes joined together.

Go to the — can you go to the shape file on the main — on the desktop?

UNKNOWN SPEAKER: The what?

RAVI SHARMA: Desktop.

I was going to go to the desktop. Let's see. Actually, let me just see if I can show some of these commands to you. Otherwise — do you see this analysis tool here? This analysis tool actually is a very useful set of tools. It does extract. Tools for overlaying the proximity which you used and then there are different statistics. So if you click on extract, extract has a clipping function, a select function and a table function. The select and the table function is very similar to what you use in attribute. If you go to overlay you see the intersect and the union. Can you click on the union?

The union allows you to combine — so with the union, you see the drop-down box here join attributes. Here when you create a new attribute that says Health System, with an attribute value from 1 to 10, if you specify that and the attribute, in the attribute file it will automatically take those 67 counties and create, merge the polygons to create 10 reduced number of polygons for you.

So that's the second way to create and that to me is a better way to create an area that you might be interested in in creating the health service areas using the union command. Okay?

UNKNOWN SPEAKER: Why is it better?

RAVI SHARMA: For one thing, you are doing it by looking at all the counties with you know giving them numerical order. When you do point and click you might miss them and the trouble is once you create a new polygon, you really cannot dissolve it easily. Sorry, recreate it, go back to the original very easily. So the best thing to do is to make sure that you create, don't use point and click because most of the time when I point and click sometimes I miss a county. And you end up creating an output you're not interested in. So the best thing to do is to first work with an Excel spreadsheet or SPSS or SAS, with a DBS file. You already realize the D base file that's part of a shape file can be read in Excel, right?

For example, let's see if I can find here (inaudible) where are the exercises? That doesn't make sense. This is nuts.

I apologize. I am unable to get to my — okay, where are we?

Let's go up there, yeah.

UNKNOWN SPEAKER: You need to change this.

RAVI SHARMA: No. That should be okay, because it's — okay. Let's just go up here. Let's see if we can go to workshop.

UNKNOWN SPEAKER: Maybe you can — do you have your (inaudible).

RAVI SHARMA: This one doesn't even have the data sets. It doesn't have any of the data set we're looking at. So let's add the workshop folder.

Let me see if I can get to my —

UNKNOWN SPEAKER: (Inaudible).

RAVI SHARMA: Why is it going to a different — I don't understand why it goes to a different —

UNKNOWN SPEAKER: You should have it set.

RAVI SHARMA: Finally.

UNKNOWN SPEAKER: (Inaudible).

RAVI SHARMA: This is not even my folder anymore. Something is wrong.

UNKNOWN SPEAKER: Use mine.

RAVI SHARMA: We'll figure this problem out. We don't want to spend too much time on it. But the short answer is we can do it so maybe we'll give you a demo of why you would unite all the different polygons together, okay? So let's go on forward.

Anymore questions? Otherwise, we will — Russ did you say there were more questions or —

UNKNOWN SPEAKER: Hi. I am Lacota cruise from New Jersey . I have a question about distributing GIS data, and if you have any recommendations or anecdotes about distributing projects that have been done in GIS. Maps are nice. They're very hard to distribute.

RUSSELL KIRBY: Right. What you're asking about really is a cutting edge area in terms of this. And the whole issue in terms of distributing the results of your GIS — there's a couple of different things. One is you could actually create a GIS application that could be distributed across the web such that a person, you know, in another state or somewhere outside of your agency, could actually log into the application and you know create a map that they wish to see. Or you could create an application that has a series of already created maps that could be potentially served on the web. And there are actually — we could actually spend our whole day talking about this, because there are a variety of different models that have been developed that enable basically serving maps and Atlases across the web and there are about four different very specific kinds of arrangements that you can use in terms of your hardware or your servers and whether the end user is in client server mode or logging on directly from a browser. But there's a variety of different issues around that. And I don't know that I'd want to actually make a specific recommendation about the best way to do that, because I think the technology is still evolving in terms of that.

Diane, what do you guys do in terms of that?

Diane: Well, we — we serve things over the web that for the most part we only want to look at core pleth maps that's mostly what we distribute. We do several publications which contain mostly core pleth maps. So I'm not sure what you're interested in doing as far as distributing data. If it's just the zoom in and out function, we've actually found that PDF documents work really well. If you have a statewide map with county names but the county names aren't particularly clear, you can export the PDF and they can still zoom in and read what county it is.

But as far as like having functions where you can click on the identify tool and get tables up behind your map or look at the information, I haven't really found a need to do that yet. But there are things like the Internet which does include another piece of software, which is ESRI AMIS Internet map server. Other than that, you could — I'm not sure where they're going with it now but they used to have ArcExplore where you could bundle up some data and ArcExplore together which is free ware and you had limited function. You couldn't edit anyone but you could distribute data for people to view it. Now they have a new product called ARC 4 which they distribute to teachers or recommend it for education purposes. But I haven't even used that myself yet. So I'm not sure what functionality that has and what their future of ArcExplore, ArcVoyager, the combined tool will be in the future.

RAVI SHARMA: That's good. The only thing I can add is there are some very specific special — a number of different — I see problems that one needs to first take into consideration. One is privacy. You know the HIPAA regulation, making sure all the concerns are met and that we make sure that there's no statistical disclosure problem with your data set. And if we do so, there is some interesting applications that serve. Actually you can create these maps on the fly using vital statistic data. You may want to look at the State of Washington . They have one. It's called Epi. Geo Epi.

UNKNOWN SPEAKER: (Inaudible).

RAVI SHARMA: Epi QMS, right. Very interesting application, because you can actually first determine what database and you have limited control over variables like, for example, you can't do too many cross tabulations but if you're interested for example in number of low birth weights by counties. You can create a map. It will produce for you confidence intervals and accessible to all users anywhere. That actually is a really good — and I know the state of Pennsylvania uses it, too. It's called Epi QMS, right?

RUSSELL KIRBY: Yeah. How many of you are — maybe you don't know the answer to this. But a lot of the states are in the community health assessment program that CDC has. I know Washington is one of the states that's in that. But there's probably about 15 or 20 states that are in that. And most of the states that are in that program have been experimenting in a variety of ways with making data more accessible at the community level. I would have to say that, you know, in looking at these, the applications that we have to date, it's actually more interesting to view each of them that you see with a high degree of skepticism and think about some of the issues that might be involved in terms of doing this test of making data more accessible, because I don't think anybody has really got a complete handle on all these issues yet. And by studying three or four of them you'll be able to see different approaches and perhaps come away with some ideas that might be useful in developing one yourself. But I think the science is really not completely there yet.

UNKNOWN SPEAKER: Last night a very simple question —

RUSSELL KIRBY: I remembered another one — go ahead.

UNKNOWN SPEAKER: This is a real simple question that I had a little ways back. Once we have made these beautiful maps and it's labeled and so how do we — how do we get it into a Word document, for example, as an image of some kind?

RAVI SHARMA: Like a —

RUSSELL KIRBY: You made a map.

UNKNOWN SPEAKER: You made a map that's as pretty. His is so pretty over there. That blue one with the toxic river and what have you.

RUSSELL KIRBY: It is nice.

UNKNOWN SPEAKER: How do we save it and make it —

RAVI SHARMA: So what you can do is, as you know, when you create a map, I just want to add a map so I can talk with something, otherwise it sounds so much (inaudible) so I'm going to find something here I can add.

The new version of ArcGIS 9.1 has actually very good features for creating beautiful layouts and then what you want to do is when you want to actually create — so what you will do, you want to go to — you want to go to view and you want to create a layout view.

So a layout view will look something like that, when you actually — you can print this or you can export it as a JPEG. Actually, there are different formats to export it. But first you want to create a pretty map. You want to create a layout and then you want to insert a title. So you will go to insert, title, and then you will say PA — I'm going to put something here LBW Map. Okay.

And I want to — let me just put something in here so it doesn't look nice at the moment. So I'm going to add some data to the map. Oh, man. Russ, we may have to get this poison control working tomorrow otherwise —

Still waiting for that to right click. And then what you can do is you know from file — once you have a map, you can insert a legend so it tells you exactly what the map is showing, the legend can be inserted. Then you go to file, export. And you figure out how you want to export your file, what format. One of the format is JPEG. JPEG is actually a nice format.

This is going to take another few seconds. Geez. The JPEG is a good format. You can export it. Save it somewhere and then as you know in your PowerPoint you open your PowerPoint presentation and you insert and go to quantities here and I'm going to pick a variable here to map. This is really interesting. That's really interesting here.

UNKNOWN SPEAKER: (Inaudible).

RAVI SHARMA: If it was data that — you can see it go from two to —

UNKNOWN SPEAKER: (Inaudible).

RAVI SHARMA: So did this one. So I'm kind of a little surprised here. This one doesn't. Okay. All right. Then what you want to do is you want to insert a legend. And here is your legend. Now that you have a nice map, you can even insert, if you like, a north arrow and pick either — any one that is interesting. This sounds good. Okay. And then you can export this. Export map. And the drop-down here, I usually use JPEG but you can use TIF. The TIF files are very high resolution, but they also consume a lot of space. GIF files are smaller, but every time you open and close you lose data. So try the JPEG.

Give it a title. And figure out where you want to save it. And save the file. So that's the way you would save it. It's saved as a JPEG file and you can insert it in your PowerPoint presentation. Okay?

RUSSELL KIRBY: I remembered another question that somebody asked. It just took a while. I suffer from a condition it's called PSMD. Premature senior moment disorder. So I'm allowed to forget things for a while. But somebody asked me, okay, so we've got data, say, by county, and our state has a rule that we can't provide statistics when the number of events is less than N. You know, three or five or whatever it is. So here we've got this database, which is based on individual records. And now we're making this map by county where we classified the data. How do we use ARC or can we use ARC to build in a suppression rule that will make it so that whatever our rule is that it will be applied to the map. So that was the question that was asked.

RAVI SHARMA: ArcGIS, the Wibi script, you know in the Wibi script you can write a script that says if — so it's very similar — if rate or whatever is less than 10, the number of — if birth, variable birth is less than 10, you can select them out. And what you do then is you actually end up creating another map in which there are no counties with less than 10 events. So the rule of thumb is less than 10. So that's one way to do that. I can think of off the top of my head. I normally — what I normally do is I'm trying to see if I've ever expressed data. I'm just opposite. I try not to express data.

RUSSELL KIRBY: The thing is a lot of times in state and local health departments, whether they make sense or not, there are rules that have been established that have to be followed.

RAVI SHARMA: Right. The state of Pennsylvania is less than 10. Actually, they won't even calculate a rate with less than 10. So what you do is essentially create in ArcGIS you can do that, less than 10. You simply suppress that and you can then — what it does, it actually or another way to do that, by the way, if it's less than 10, you can insert NA.

RUSSELL KIRBY: Okay. What they're asking is, would it be difficult for you to show how that would actually work?

RAVI SHARMA: Which one?

RUSSELL KIRBY: The VB script, just so —

RAVI SHARMA: If I can find the data.

RUSSELL KIRBY: Visual Basic script. VB script. It's one of those attachments if you get it emailed to you it will probably (inaudible).

RAVI SHARMA: So the open — . Open that attribute file. So let's see here if I can find — so what I go to, I go to select by attribute and what I need to do first is select, let's see, births. Less than or equal to — you make your own rule. Less than or equal to 10. Okay. That doesn't sound — look like 10. Less than or equal to 10. Okay. And do I need to do anything else? So I'm selecting all births that are less than or equal to 10. Apply. Let me see what happened here. Let's do this again.

Select by attribute. So drop down to — ah. That's what happened. Less than or equal to — and by the way, you can, if you click on this one up here, this will give you unique values. Now, I am looking at here, I don't see —

UNKNOWN SPEAKER: If you select —

RAVI SHARMA: Let's select LBW. I don't see anything less than —

UNKNOWN SPEAKER: Say less than 50.

RAVI SHARMA: Yeah — no, let's look at this LBW here. Now you can see quite a few that are less than. So I'm going to select those less than — okay. And another thing I can do here is when you select less than, you can see the counties are automatically, the ones in blue, these are the counties with less than 10 number of low birth weight babies. So that's the first thing that happens when you run the selection by attribute. The next thing you can do is you can — let's see here. I can use the switch selection here. You see what the switch selection has done? The switch selection command switches around and picks those that were not less than 10, right? So now I can save that as a shape file. So — or another way to do it is I'm just going to show you different techniques. I don't know which one will work for you. Once I do that, I can now go to options command and I can export this. I can export that as a DBF file. So here's the DBF file. So that's one thing I can do. You can change the name here and you can change where you want to put it.

Another thing I can do, I can go to the layer, click on the layer that is selected and export that data. This will be a shape file while the first procedure is exporting a DBF file. So when you export a DBF file it's just a table. But if you want the shape file with the table and the whole works, you need to export it and it will do that.

And you can add it to your — so here we go. So you see the holes? So these now they are holes for — when you actually — and what I can do now and another thing you need to worry about, whenever you do selection, you need to go and unselect. Otherwise it will create problems for you. So you go to selection. Clear selection. And now what we can do here is we can map. Now we can map the low birth weights. Sorry, for the low birth weight for what year was it?

RUSSELL KIRBY: 1990 or 19 — (inaudible).

RAVI SHARMA: So this way what it does it excludes it. So any county with less than two. Now, the other way to do that is simply wherever the number is less than 10, you put a — give it a number. 9999 and then make sure the 999 is in the legend appears as not —

RUSSELL KIRBY: You can have the legend reflect — (inaudible).

RAVI SHARMA: There's different ways to do this. Another way to do that is you simply do not release data with less than 10.

Diane: Or one thing we'd actually put an asterisk in the county and then put a note, footnote on the map that these rates may be unstable because they're based on small numbers.

RAVI SHARMA: Right. You can do that. That's like putting a 999 or whatever —

Diane: You may still want to see where the rate falls.

RAVI SHARMA: You can certainly do that.

Okay. I think we need — shall we go forward? I can certainly do the — I don't know which one it was. Yeah, here we go. So you see the holes here. I'm not sure what happened to these other counties.

RUSSELL KIRBY: I think you're getting some degrading of the process.

RAVI SHARMA: Absolutely.

RUSSELL KIRBY: The point is that there's a great deal of flexibility that you have for both selecting particular map features that you want to include in your map and as well you can minute in terms of the data. Personally, I'm not a great proponent of, particularly when you're doing county-level aggregates, of applying suppression rules, because I mean I think the more important thing is to show the people the amount of variability or the precision of the estimate is probably more important than just not providing the estimate. But, again, each state has its own policies that have been developed. And you know you have to follow what the rules are.

RAVI SHARMA: Tomorrow what we'll do is we'll look at empirical base procedure for smoothing data so if you're less than 10 what you do is borrow strength from neighbors so you have spatial smoothing or you borrow strength from counties with larger numbers. So empirical base, smoothing, strictly statistical or you have empirical spatial base which borrows both from the distribution of the county for the state as a whole but also distributions from your neighboring counties to enhance the stability of your data. So you can do both.

UNKNOWN SPEAKER: I do have another question that relates to creating figure legends. When you do that, perhaps you know I've noticed I've gotten frustrated sometimes because I've wanted to go in and just edit slightly what the figure legends say you know like 1 to 10. I'd like it to say 1 to 10 people or something. And I have to do that with text boxes and it really bugs me. Do you know of another way in which I can go in there?

RAVI SHARMA: I'm going to have Diane answer.

Diane: You have to do it with text boxes. Either in your symbology or —

RUSSELL KIRBY: (Inaudible).

Diane: Either in your symbology window do it with either typing in your labels or you can type in directly on your symbology form on the TOC as you see it.

RAVI SHARMA: So you wanted some, like a free form way to do that.

UNKNOWN SPEAKER: Yes. Edit that range, something.

RAVI SHARMA: Yeah. So maybe we can — Diane is very close to ESRI, Chapel Hill, maybe you can make a suggestion for next version, that the user —

Diane: In the next version you'll be able to directly add an Excel table, 9.2. Won't that be great? It's supposed to come out in the spring.

RAVI SHARMA: Yeah, but I don't — as a really good heavy user, I just don't like working with Excel, because it's so unstable when you start adding data and then linking it to Excel. I had so much problem, when you tried to figure out — because when you link you have to have a text and creating text sometimes doesn't work well in — as you know to create text in Excel, you can go to (inaudible) say text. But also concatenate, you will see here, it's good. Most people use Excel. It will be great if they can have added functionality with Excel. That would be terrific. But you will see here in a few minutes I'm going to show you in ArcGIS we can do the same thing using the concatenate function.

UNKNOWN SPEAKER: Excel (inaudible).

RAVI SHARMA: What's that?

UNKNOWN SPEAKER: Excel is — (inaudible).

RAVI SHARMA: Number of rows, the limitation. How much, 20, yeah, there are limitations.

All right. Further questions? I would like some questions to be directed to Dr. Kirby.

RUSSELL KIRBY: And he's getting off really easy now. We need some questions for him, too.

Did you have a question?

UNKNOWN SPEAKER: Not a question, just a comment. Previous item which is probably self-evident in terms of tool numbers and the statistical ramifications of that. There's also the data privacy aspect of it, which many of us are very sensitive to in which there are a lot of laws about when you're dealing with things like mental health, mental illness, teen pregnancy and so forth and you might be able to locate people in a particular county if there was only one or two.

RAVI SHARMA: Absolutely.

UNKNOWN SPEAKER: So that's an important factor which probably everybody knows about.

RAVI SHARMA: Yeah. That's a huge concern. Absolutely.

RUSSELL KIRBY: So there was a question just asked a minute ago, which I guess we need to confer about. But that was are we going to actually show the exercise for how you draw down the census data and what was our plan for how we were going to do that?

RAVI SHARMA: We are going to go — it's 15 after. I'm going to zip through this and if you find that your heads are swirling, you know, put your hands up, tell me to stop, okay, I'll do that.

RUSSELL KIRBY: And the recommendation that's made to the slides that are in the binders are printed four to the page. And some of them have a great — you need a magnifying glass to read them. So when this gets posted on the web we need to make sure they're available so you can actually read everything that's on them.

RAVI SHARMA: We will redo those and we actually have e-mail listings for all. We'll send you updated versions of those, you know, without any problem. So don't worry.

Okay. So shall we go on to the next? Are we ready, Russ? Okay.

What I would like to show in the next 45 minutes or so, I can't have you work along with me because it will take terribly long, because some of these data sets, is ayou know census data sets are huge. So here is what I would like to demonstrate.

If you are working with one or two counties, you can simply use the American fact finder and just download data at whatever level of geography you like for simple one county. But most of you work at state level and you're interested in all the counties that are part of your state. Some counties have, you know, some states have a lot of different counties. Pennsylvania has 67 counties. And therefore it becomes very messy and very time-consuming to use the American fact finder and just download them one county at a time. So the FTP process that has been standardized by the Bureau of the Census is actually, works very well. It's a little tricky to use it. But you know it's not beyond — if I can do it you can do it. But it's a little tricky because the steps that involved are, first of all, you need to know a little bit about the census geography. And as you know, the census geography is very — we use what's called a census hierarchy. And hierarchy imposes very strict relationships between aggregations at different levels. So, for example, the census blocks make up the census, the blocks make up census groups, which make up census tracks, which make up counties, which make up states. Which make up entire U.S. ? So that's the census hierarchy. You need to understand and appreciate the census hierarchy. And you will all at some point or other be using census data, because that's where the socioeconomic demographic data to correcterize your communities in terms of socioeconomic, poverty, economic, education, employment, that's where it comes from and you need to be able to download all of this data and link it to your map file.

The added problem is that when you download the data from the census track, it gives you the state, the county and the track, the Phipps code but it doesn't put them together. And the census tracks are not unique in the sense, the only way to make them unique is to create a Phipps code. And a Phipps code for a county track is made up of Phipps code for the county, for the state, for the county and for the track. Joined together.

So you need to be able to figure those out. So those are some of the problems. So what I'm going to do is show you — it's detailed enough so that you know where to go and look for resources. I thought we lost it. So here are the prerequisites. You need to go before you begin downloading, I strongly recommend you go to the Census 2000 summary file three ASCII text data file. This explains in considerable detail about, let me see if I can click this for you. This will provide you with all the information that you need to figure out which variables you want to download from which of the several different data sets and then you will be able to configure the Census Bureau has done actually a wonderful job. It provides us with a template, a Microsoft Access template for all the different tables that it has created. And these are dummy tables. And all we need to do is we need to populate these dummy tables with the data set for a state and a geography of our choice. So my — since I've been talking about Pennsylvania , I'm going to change my tune. I'm going to talk about a neighboring county, New Jersey .

So anybody from New Jersey ? Oh, you're from — you live in New Jersey and you work in Philadelphia .

UNKNOWN SPEAKER: (Inaudible).

RAVI SHARMA: Cherry Hill ?

UNKNOWN SPEAKER: (Inaudible).

RAVI SHARMA: Okay. All right. So well you know the neighboring — so what we're going to do is we're going to talk about — so I'm going to use New Jersey and what I'm going to do is download just one variable. But that will give you some idea. I'm going to download the median income of a census track for New Jersey , link it to New Jersey shape file, and show you the process of putting those two together. So this is the file you need to look at. And it says by the way you can click for live help between 10 and noon . There are actually people on duty, and from 2 to 4. You can do that. And you have several different choices. You can use Microsoft Access as a way to download your data set or you can use SAS, or you can use some other data program, database. I'm going to use Microsoft Access. But if you are used to using SAS, by all means, use SAS to do that. There's a data file directory and there's a file documentation directory.

The first thing you need to learn is there is, as you know, if you've seen the census data, it's huge. They have so many different tables. There is STF 1, which is as you know complete count data. STF 2. STF 3 is the data set most of you will be using. The STF 3, sorry, is the one that has socioeconomic data. And that's the one you'll normally use. So that's the one I'm going to be talking about.

So this is where you would like, you should go and look at and read the documentation to get familiar with the data set and the way the data set are arranged.

So I'm going to go back to the presentation here. And then I'll directly show — you need to read as I pointed out the summaries file three technical documentation. And then once you do that, you need to download what's called the Microsoft Access template from the Census 2000. So let me show you what that looks like. I downloaded — I downloaded the — here we are.

I created a folder called NJ. And when you download that file it will look like this. This is — these are Microsoft template files that you download from up here. So you want to download this onto your drive. And there are some technical notes. You don't need to read it. And by the way, even if you have never used Microsoft Access, the instructions are very, very clear and very simple. And Microsoft Access, as you know it's got a lot of good features and it's not that difficult to learn all the different functions that you need to know to be able to link the data sets. So don't get intimidated by the fact that this is a database. You know, that you might feel that, you know, you can't use. You can use it.

So I'm going to show you how you can use some of these features very easy but yet very powerful. Okay. So once I do that — so that's the first thing I need to do is download this, and the second thing I need to know is what kind of — what data variables, what variables I want to download and from where. Right? If I have the answers to those two questions, I'm then ready. So I'm going to show you where you will go to find what kind of data is available from the SF 3 files and where it is located, because it's relatively — you have to do a lot of point and clicking to get to the database. So let me just show you on this slide here. I am going to make that a — this is just to make sure you all understand the census geography I was talking about. These numbers here are very important. Now, what I would like to do is to download the data at the census track level, and you can see the census track level has a code of 140. This one up here. That's important to know. If you're downloading to the census track level, that's the code that is in the census database that signifies a state/county census track. You can download, you can actually do any of these. Once the data set is downloaded, it actually downloads the entire data set you can actually make a selection at any level of geography. That's another great thing of downloading the whole data set. Once you download it you can extract data set for any level of geography.

We've already talked about that. This is the template that I showed you in my New Jersey folder. And I'm just, you know, showing you, giving you some graphics on how you need to uncompress it. And as you — so this is an FSF 3. That's what it will say. And MDB is an extension. And you simply unzip it. And we will get to this stage in a minute because I'm going to show it to you.

The next thing we need to do, there is a place, I'm going to show it to you, too. It's called summary file three. And the summary file three is where all the files for all the states are in a folder. And there is a New Jersey file. So what I will do now is actually —

Now we'll go through the process of selecting the data and locating it, the data set, and then figuring out the location. Once we know the location of the data, essentially what the way we determine the location of the data set is figuring out the table. So you know my example is household income for New Jersey by census track. And as soon as this thing is — here it is. I will show you in a few minutes from the documentation how you would get to the variable of your choice. I'm choosing household median income. You could be — you can get any data set as you know from the census. Tons of information. So what we need to do is go to this matrix here. The table matrix. And the table matrix will show you the data set. And we are already at you know 55. So you can see this variable here, median household income in 1999, and this is what you need to be looking at. This is a data dictionary reference number. This one is 5053001. That's the median household income in 1999. And that's what I'm going to download for New Jersey . So now that I have located the table, I need to figure out so the next big puzzle here that we need to solve is where is this table in the scheme of things, right? So I know it's getting towards where it's late in the evening. This is not the kind of thing we should be talking about, right?

So if you feel a little, like yawning a little bit, by all means, go ahead. I won't be offended. But as you know I'm alert.

So what I would do is, you never know. The number here 50P05301. As I said, we need to figure out where this data set is. So let's try to figure that thing out now.

UNKNOWN SPEAKER: You said data dictionary number (inaudible).

RAVI SHARMA: That's the data reference number, yeah.

Okay. So the next thing we need to do is we have a data file directory here. And I'm going to click here. Open up — actually, I may have opened up the wrong documentation. What we're doing next is we need to figure out where in the scheme, where in the scheme of things is table P 053001. So that's what I'm going to show you next.

That's not good. Okay. So let me see if I can find — if I don't find it, what I'm going to do while it's opening I'm going to go to my presentation and I'm going to show you exactly where — if that table ever comes — is ever downloaded, this is what's called a file segmentation. If you recall 5053001, that is — you can see the matrix here. And what we want is you can see here this is P 51 to P 67. So the P 53 is somewhere in this file structure and the file name is ST, state. So if you're working with Minnesota , what will that be? ST. Pennsylvania will be PA. New Jersey will be NJ. Right? MN. Okay.

So and Arizona AZ. So each, when we get to the file structure, the database, each of the database that is organized by state will have a, for Pennsylvania it will have the PA in front and New Jersey will have NJ. So what we want to do is we want to download from the Pennsylvania folder a data set called PA006UF3 and that will contain — I'm sorry, for New Jersey. For New Jersey , so but the file structure is the same for all the states, including the territories. So it doesn't really make any difference.

But we're going to — I'm going to show you how to download the New Jersey household median income and link it to the Census track. So this table will help you to figure out which file you want to download. So let's see if I can summarize what we have so far. We have first to download the Access template. That's very important. The Access templates are then what we'll populate with our data set. Second, you need to determine the variables that you're interested in. I'm interested in the household income. And I showed you the table matrix from which, from where you determine what the table number is. So in my case it was 5053011 and then I go to the table distribution across data files, what's called the file table segmentation, and the data file that I'm interested in is this one up here. The ST0006US3. That contains the median household income and of course it has a lot of other data along with it. But I'm going to simply extract the data set that I'm interested in. And it will be the ST will be replaced by the name of the state. In my case NJ. So now what we're going to do is actually go to the process quickly of downloading, extracting and quickly linking. Hopefully we still have about 20 minutes. I think we can do it.

If not, you may be here a little late. Okay. I want to close this here. All right. So let's go to the data file directory and we're going to go to New Jersey . And do you recall what table we want to download? Right? We want to download this particular table here. The MJ0006 underscore UF3.CIP because it contains the variable median household income and that's how we determine that is the one that contains it in the table file segmentation, and we also found out the variable name for median household income from the table matrix. It was P053001. So we're going to now — click on this. Save.

And as you can see, I've already saved the file here. It's NJ 006 under score U of 3. I'm not going to save it again but it's already been saved. Once you save it, while we're here — while we're here we need to also download another file, in addition to the NJ 006 we also need to download what we call the geography file. The geography file is a file that will give us the census track, the county and other boundary information for various levels of aggregation. And that's typically the last file in your folder. So in this case it's this file here it's called NJ geography underscore UFP. Zip. And I'm going to click on that and I'm going to save it. And I'm going to save it right here. As you can see, I've already saved it. It's right there. It's a zipped file. Compressed file. It has to be uncompressed. So now I have — I think I have all the data I need. Actually I have — I've done this for you already.

So let's go to — and now we can actually go to — I want to show you the data sets. So up here. So uncompressed, I've used the zip to uncompress the template file. I've uncompressed also, and I have to show you something here. NJ 006. When you download the files from the website, they come as you saw with UF 3 extension. You need to replace the UF 3 extension with the TXT. You want to create that into a text file. So all you need to do is rename the file as TXT. So I used — I simply went up here. You can see up here. You know all you need to do is simply rename this. But I've uncompressed it and I renamed it so now it's a text file. And I did the same thing with — I did the same thing with the geo file. The geo file is a UF 3 extension. I simply did, you know, right click. And you renamed the extension to TXT from UF 3. So now this is a text file. So we have two text files. We have the Microsoft Access template, and we are ready now to actually start the process of extracting the data on median family income for New Jersey .

Okay? So let's do that. So to do that what we will do is I am going to actually — I've actually done it here. So I'm going to close this. I'm going to redo this again. So let me go to — I apologize for rushing you through all this. But I know you're all with me, right? More or less? So I just want to — so we'll redo this. I've already done the applications here. So let me — so all you need to do is simply right click — click on the application, the template file and simply open it.

And I am actually going to get rid of this here. I've been told to tell you as we sample data include most basic population variable but those are mostly demographic variables like age race and sex. But if you're interested in income, education and stuff like that, that's not on the — you actually have to buy it from ESRI. But if you are interested only simply the basic demographic variables you can easily get those.

You can actually also download it. I'm just deleting this so we can redo this again. All right. The first thing you need to do is open your templates and the templates as you all know is on in the New Jersey folder here. So here are the tables. Now, what I've done actually is already created some of these applications for you. The first thing you want to do is these tables from STF 1, 2, 3, 4, these are your dummy tables. These are all the dummy tables. These are the templates. There's nothing in there. You have to populate them with the database from your own state and geographic area of your choice. All right. So basically empty. So what you need to do, the first thing you need to do is to bring in and specify your attribute level variable which is NJ 006. That's the one that has the data for the median family income. To do that, we simply go to file, get external. Import. And drop down, select. Text. And do you see this one up here? So this one has all the variables from what was it P 56 to whatever that we downloaded from the census website. I'm going to see if it's going to give me trouble. Because I already — that's good. So far so good.

Okay. So here's, this is the data set that I downloaded FTPed and saved it on my disk. You can see New Jersey . That's really good. And now what we need to do is specify the way the format of the way it has to drop into those templates. The template is going to drop down into SF 3006 but we need to specify exactly the template. So to do that it's really very simple. You click on advanced. So if I click on advanced here, how many of you have done this? Okay. So you can help me out. Am I doing it right? Let me know if I'm missing here. I'm going to go fast. So here is — so two of you have done it. So what next? I thought you said you did it. You haven't done it. Oh. I was asking how many of you have done it. And two of you said — okay. All right. So I can't see it clearly here. Okay. You see the specs here? Right? So click on specs. And we want specs for this here. The STF 3006 because that's the table that we are importing. So I'm going to click open and do you see it automatically puts, that looks good. You're doing good here. I don't see any problems. And so then okay. And next. Finished. Yes. Oops. Okay. Now it won't override. It will not override the table, but once it works, I'm going to show you what it looks like. This is what the table looks like. Once everything is imported, this is the way the table looks. Simple, right? You have taken a massive amount of census data. You've put them into different columns for New Jersey , and let's scroll down to variable, what was it, five — five 0 — five 3.

UNKNOWN SPEAKER: 50311.

RAVI SHARMA: Stop me when you see it. Oops. I've gone too far. Here we go. Right? So this is the median income for New Jersey by census tracks. We still have to link this table to the geography. We haven't done this yet, right? So the next thing we need to do is once we have finished importing this table, you can, you know, usually the best thing to do is make sure you save it. I've already saved it. So I'm not going to save it again. So the next thing we need to do is go and import the geography. The geography is what will give us the different levels of geography of the data. So that we can then select — we're going to select F and you will see in a few minutes if the sum level equals 140, because that's the one that will give us the census tracks for New Jersey. So I'm going to do that quick here.

So, again, I'm going to go to advanced. I'm going to go to specs here and I'm going to use the geography import. You see SF 3. Import specs. Open. Okay. Finish.

Now, I'm not going to override the table. The table is already here. And this is what the table looks like. So you can see the table has a sum level 040 and log — this one up here, the logical land, that's the unique — that's the unique number that we're going to use to link with. So everything else looks good here. So the next thing we need to do here is now go into tools, relationships, and the way normally you would do is once — when you go into tools and the relationships, you simply go in and click this number. Click. Logical racket number and log racket number in geography like this. Do you want to edit the relationship? I don't want to. This looks okay. So now what we have is we have created a relationship between two tables. One is the attribute table with household median income. The other one is the geography file that has the census track numbers in it, which we want to attract. So once we have this, all we need to do now is simply go to tools. Actually, I've already run this. Tools. It will not run because I think I have already run this table. But what happens is when you run this for the first time, tools will ask you — you can then use the tools to run this. And when you run it, it will create for you a table where the two are linked. So these two are linked now. And what we need to do is simply go to query. We're going to create a query. And we are now going to select variables from two different, the tables are linked. So now we're going to select tables from — let's select table from geography first. We'll select sum level. Summary level. Then we'll select state. County. And we want to select track. All right. Then from the table — from table NJ 0006, I am simply interested in the median family income, which, as you know, is 53001 am I right? Right there. Right? Click that. Next. I'm not done with this yet, because while the tables are linked, the tables for the census, I want to — I'm only interested in census track. So I need to extract only household incomes by census tracks for New Jersey . So I have one more step to go. So it's okay to go with the default next. And I'm going to ask it to modify the query and I'm going to finish. And the query is modified and I want to now specify here are the criteria. Because I want geography 140, which is census track data. Okay?

So now I can run my query. So what do you see? You have now exactly the data you want. Two tables joined together at census track levels. This is census track 140. This is the state 34. This is the county and this is the track. And this is your household median income. So it takes a little bit, you know, it's a little work, but now you have all the data sets that you need. You know, state, county, track, and the median income.

I know all of you are with me to this point.

If not, the question is where did I lose you.

But I'll be happy to go through this again. The next thing we need to do is it's already 6:00 . I don't want to keep you too late. But tomorrow — Russ, should we do this tomorrow? All I need to do — all I need to do is add this, join it to the map file.

RUSSELL KIRBY: (Inaudible).

RAVI SHARMA: Do it now? Okay.

RUSSELL KIRBY: Do you want to just do it now?

RAVI SHARMA: We'll do it tomorrow?

RUSSELL KIRBY: How much longer do you think it will take to do that?

RAVI SHARMA: About ten minutes.

RUSSELL KIRBY: Want to do it just now, ten minutes?

RAVI SHARMA: I'm happy to do it tomorrow. I don't want to keep people — you know simply — you know how people are — this is not something that is exciting, but it's something that — we can finish it now. Let's do it now and finish it, right? What the heck.

RUSSELL KIRBY: We can do it three or four times.

RAVI SHARMA: Shall we go forward? Okay. So I'll be — I'll go a little quick here. So what I'm going to do now is I want to add New Jersey shape file layer and then I would like to add the data file for, the one that we just created for household income. And then link those together. Not a really straightforward process, but we'll see how fast I can do it for you guys.

Actually, you guys got it off lucky because we had actually an assignment for you to download process and link the data set for a state of your choice and a variable of your choice, right, Russ. So we're cutting that out.

RUSSELL KIRBY: (Inaudible).

RAVI SHARMA: (Chuckling).

Let me see if it will allow me to add this. So here we have — we have two files. I'm going to open the attribute file first for the New Jersey , and as you can see we have state, county and track. That's going to give us a problem. And New Jersey , let's see, up here we have open file and we have, as you can see the track here is 10123456. So we have a track that is six digits. The state and the county is fine. Right? The state and the county is fine. But we have a track which is six digits while we have here this is one of the most common problems in — and then here we have a track that is only four digits. So we have six and we have two. So what do you think we should do? We've got a serious problem here. So this is five digits. One, two, three, four, five. And so let's see. And this is — this track here is six — so what we're going to do here is we are going to create a variable, another variable and we're going to pad these with zeros because this is not good.

I'm going to — what I'm going to do is actually I'm going to take a shortcut here so not to keep you waiting. I'm going to go to my slides and show you exactly the process of how you would do it. But you see the problem here is we have very uneven fields. We have some with tracks with four and some with six. So what we need to do is those tracks with four we need to pad those with zeros. And if there are no zeros we leave them as is. And so what we need to do is we need to create an ARC, a script in ArcGIS. It's a VB script, and I'm going to show you exactly how you do it on my slides. I want to go through the exercise and then we'll be done. Okay?

These are all the steps, by the way, we've been through. And they're all documented in this slide. So you will have no problem following the — and you can see same — these should be familiar. We've been through this. And this is also familiar. We went through this. And this is the final data set we ended up with. And now I'm going to show you joining the attribute to data shape files. The problem is when we have uneven fields and what we need to do is what we call concatenate using VB Script in some cases state county Phipps code can be given in three separate fields as shown in the figure below. And so you can use — this shows how you can go and actually create — this is called concatenation, you can concatenate the three fields, and I described to you here in red text exactly how you would do it. So this is in English to explain what is up here. And then you can calculate this.

Now, what do you do when you have fields that are not compliant? This is where you have some fields that are four and some that are six. So here is the script. So what this is saying is if the length of the track file is six digits long, then all you need to do is concatenate state and county and track. We use "and" because these are text, they're not numbers. We're simply putting them together, concatenating.

Now if the length of the track is not 6, X equals state and county and track and 00. That means add two zeros at the end.

And if — and then put an X here and that will create for you a compliant field with six digits and then you're in business.

And then what you can do once you run the script, you can, if VB is executed properly. You can create a new field called STFIP column will be populated with character 11 as shown here and you can use this to link your data. Okay?

Now, the most — this is for the VB script is not exexecuted, check the brackets for land which are different from state county and track. They all use different brackets. This is to let you know if it doesn't work.

Now, take this reference down. This one is an excellent reference for those of you who are going to work with census data and you want to correctorize your neighborhoods' tracks in terms of socioeconomic demographic characteristic. Unlocking the Census With GIS is a beautiful, excellent book. I use it in my class. The other one — this is a little dated — reference is by Dowell Meyers. Analysis with census data. It's still okay, '92. But this is the one I would recommend for all of you. All of the stuff I talked about today he does it, of course. He doesn't — it's kind of a lot of the stuff is missed. But it has some of the stuff on how you can use VB script, how you can compose for attacking different kinds of problems. Common GIS problem in linking census data with the map data.

I'm going to stick here for a few minutes for those who are specifically interested in this. Okay? So I can talk to you personally.

Otherwise, we're done. Russ.

RUSSELL KIRBY: Okay. Anyway, we've had a pretty long day, and I want to thank everybody for still being awake, I think it's remarkable, actually. Henry, was there some discussion about possibly a group going out for dinner that we need to talk about?

UNKNOWN SPEAKER: (Inaudible).

RUSSELL KIRBY: Okay. Yeah. If you are just out here last night you'll probably realize that while we're in this beautiful resort community, we're in the middle of nowhere in relation to any place that anybody might want to actually go other than here in the hotel. And it's generally necessary to get a group of several people and take a taxicab to get to restaurants and it's not that expensive if you have three or four people in a taxi. It costs two or $3 a person to do that. Okay. There's something else. We can make arrangements for this room to be open in the evening if anybody wants to come and do some additional work as well. That's certainly no problem at all. We just have to make sure that the last person out makes sure the room is locked by the security. So that everything stays here.

I think it's probably fine if you want to leave things on the tables. I think they're just going to clean up the dishes and so on, but they should leave everything else here. And then tomorrow we've got a couple other tools we wanted to show you. One is the tool called GEODAR, which does exploratory spatial data analysis. And it happens to be — we've loaded the installation files on each of the computers. And so we'll have to run that before we can use it. It's software that you can all download it for free at the website of university of Illinois at urban na champagne. And also if we have time we'll also look at SAT Scan a little bit. We want to spend most of the time looking at exploratory data analysis and issues around smoothing and local indicators of spatial association and those kinds of techniques. So anybody have any questions?

Okay. And there will be a few surprises tomorrow, but I'm not at liberty to tell you what they might be.

UNKNOWN SPEAKER: (Inaudible).

RUSSELL KIRBY: I was hoping to get him. But somebody said they saw Gloria Estefan yesterday so maybe we can get her to come instead.

Okay. So we'll call it a day then.