Understanding Fire Inspections

The visit last week with the fire inspectors was enlightening, as it gave us the perspective of the inspectors who would be using our final deliverables. For our first deliverable, we will be putting together a list of properties that require permits, based on a set of criteria in the City of Atlanta Fire Ordinance code. Some of those properties, (roughly 2,600), are already being inspected and have already been issued permits, but there are many other businesses in the city not being inspected, for a variety of reasons. We wanted to find other businesses in the city of the same type as those currently being inspected, since, if motor vehicle repair places need inspection, for example, then AFRD would want to know how many motor vehicle repair businesses they are and aren’t inspecting in the city. Below is a histogram of the top 20 currently inspected business types, shown in blue, with the other businesses of the same type not inspected, shown in orange.



These are grouped according to their SIC code (Standard Industrial Classification), which helps provide us with a consistent way to classify the type of business, across multiple datasets. We obtained these classifications from a database of Business Licenses in the City of Atlanta, which has the geo-coordinates, names, and addresses of 20,000 businesses in the city (among other information), though this does not include buildings such as schools and day cares, which are inspected, but do not have business licenses. In order to find the SIC code for the businesses being inspected, we first matched by geo-coordinates, finding too many businesses with the same geo-coordinates in the Inspection database (5,000 matches, but only 2,600 total businesses have been inspected), because many businesses might share the same address (if they are in a mall, for instance). We then filtered by business name to find a more complete set of matches, using a string matching search method for “fuzzy matches” of strings. (ie: MCDONALDS vs MCDONALD’S).


Below is a spatial distribution of the top 5 inspected types of businesses, as well as their counterparts of similar types of businesses that have NOT been inspected, from the Business License database.



Top 5 inspected business types.




Non-inspected buildings, of the same types as the top 5 inspected types.



Next, we mapped the number and percentage of inspections of businesses of these types, aggregated by their location in the city, using the NPU, or Neighborhood Planning Unit as the unit of analysis for visualization purposes.



# of Top 5 Inspected Business Types, by Neighborhood (NPU).



# of Non-inspected Businesses, of same types as top 5 most Inspected, grouped by Neighborhood (NPU).




Percentage of Top 5 Inspected Business Types, grouped by Neighborhood (NPU).

These maps and visualizations can help the Atlanta Fire Rescue Department make more informed decisions about which businesses in the city should be inspected and see gaps in their inspection process, both in the types of businesses they inspect, as well as  the locations where they inspect.



On the other front (for our second deliverable), we are creating a fire risk model using the AFRD information we have on where fires have occurred in Atlanta for the last five years, combined with the CoStar Real Estate property assessment data for the commercial properties in the city. For the last several weeks, we have been joining data from various sources, cleaning the 240 variables we have for each building in the city, and beginning to build a regression model to determine which factors of a building are more predictive for fires. Below is a visualization of the intersections of our various datasets:



Currently, we are building our model using the 371 CoStar properties which are in the AFRD Fire Incidents database (meaning, they had fires, shown above as #1 and #3) as our positive examples, and using the remaining 6,604 CoStar properties as the negative examples of buildings with similar information known about them, which did not have fires. After we build this model, we will be joining the FSAF Inspection dataset with the CoStar dataset, so that we can use the businesses from Deliverable #1, which are already being inspected (or which need to be), and run them through the fire risk model to prioritize their inspections by their fire risk score.



Other blog posts from our team:

Week 1 – Hello from the DSSG-ATL fire team!

Week 2 – Update from the Fire Team

Week 3 – A Day with Fire Inspectors

Week 4 – Understanding Fire Inspections

Update from the Fire Team

To recap, our project team is working with the Atlanta Fire Rescue Department (AFRD) to help them understand more clearly what the most predictive factors are for fires, so they can make more informed decisions about conducting fire inspections. Over the past week, our team has been grappling with how best to understand, clean, and merge the various data we have about the many buildings in the city of Atlanta. Our contact at AFRD has been very helpful in providing a large number of datasets about fire incidents, fire inspections, and building information details. However, before we can build any sort of predictive model, we need to know that the buildings referred to in one database, such as the Fire Incidents in Atlanta, are the same buildings referred to in another database, which may have specific building information such as the year it was built, its building material, occupancy, usage, zoning, etc. This process has been spearheaded by Wenwen Zhang, a PhD student conducting research on Geo-Information Systems, or GIS, at Georgia Tech. The diagram below shows the different datasets, and the method we are using to join them together.



Data Aggregation process

We have been working to join the datasets using 3 data types: building addresses, X/Y GPS coordinates, and GIS information known as the “parcel ID.” Parcels are a division of land used by the Tax Commissioner’s office for tax valuation purposes, but are, for our purposes, a useful way to join sometimes inconsistent, vague, or incomplete location data. Once this process is complete, we will have a unified dataset with all of the buildings where fire incidents have occurred, to be able to build our model of predictive factors of fire incidents. In the above diagram, we have “AFRD”, or a database of Fire Incidents in Atlanta from 2011-2015, “FSAF”, a database of Fire Inspections in Atlanta, “Costar”, which is a property assessment of 7000 commercial properties in Atlanta, providing information about specific details of building construction, “SCI”, another building information assessment, and “CO”, or Certificate of Occupancy, which businesses are required to obtain before allowing people inside them.


Below is an example of the method of joining parcels (in tan), with geocoordinates and addresses (in red), with fire incidents (in green). However, the most useful immediate output for our purposes is not a map, but a large database, or CSV file, with all of the commercial properties, their building information (from CoStar), and whether they had a fire or not (from AFRD).

Parcel image

Also, in our process of understanding what data we actually have in these 6 different (and very large) datasets, we have made a set of codebooks for each of them, explaining what the various attributes mean, since many are highly abbreviated, highly specific or non-obvious terminology. Additionally, we have been doing some data cleaning and exploratory data analysis, making sure that, for instance, the mean year of building construction doesn’t appear to be 1432, because of many missing entries containing zeroes, instead of NA. (1432 wasn’t the average year for building construction in Atlanta, it turns out). This is all part of the necessary process of data cleaning before we can begin building our predictive model.


We are meeting today with a group of fire inspectors at the AFRD Central Office, and we’ll be working with them to make sense out of the City of Atlanta Fire Ordinance Codes, to better understand which buildings in the city need inspection permits, based on the materials or processes that take place in that business. We will be trying to draw on their hard-earned experience and tacit knowledge about fire risk factors, to help inform our predictive model.



Other blog posts from our team:

Week 1 – Hello from the DSSG-ATL fire team!

Week 2 – Update from the Fire Team

Week 3 – A Day with Fire Inspectors

Week 4 – Understanding Fire Inspections