A Predictive Model for Fire Risk in Atlanta

As part of our deliverables to the Atlanta Fire Rescue Department (AFRD), we are giving them a list of potential properties to inspect. However, we needed to be able to prioritize this list based on fire risk, so that AFRD can best allocate their inspection resources. To prioritize the list of properties to inspect, we created a model that predicts fire risk based on certain characteristics of properties in Atlanta. This model was built in R statistical programming language and used a SVM (Support Vector Machine) algorithm. The model used 58 independent variables to predict fire as an outcome variable. Data sources for features in the model include the Costar properties dataset, Parcel data and SCI data from the City of Atlanta, demographic data from the U.S. Census Bureau, and fire incident and inspection data from AFRD. Features were based on property location, land or property use, financial factors, time-based factors such as year built, condition, occupancy, size, building details, owner information, demographics of property location, and inspection data.

Prediction Model Validation

Our predictive model was found to be highly predictive of fires. We validated our predictive model in two ways:

First, we validated our model using a time-based approach. The model would be easy to validate if we could run the model and, after predicting which buildings would catch on fire in the next year, we could look into the future to see which actually did catch on fire. Because we can’t look into the future, we simulated this approach by using data from 2011 – 2014 to predict fires in the last year of data, 2014 – 2015. We used 10 bootstrapped random samples and took the average of each of them to calculate our results. This model did very well, with an average accuracy of 0.77 and average area under the curve (AUC) of 0.75. Here is a confusion matrix of the results:

validation1

Figure 1: Confusion matrix for time-based model validation approach.

The most important metric in this case is true positives – that is, how many properties the model predicted to have a fire that actually did have a fire. Of the properties in the last year of data that did have a fire, our model was able to predict 73.31% of them. This means that for every 10 fires, our model would have predicted approximately seven of them. Considering how few fires occur (only about 6% of properties have fires), this is much better than if you were guessing by chance at which properties would catch on fire.

We also validated our model using 10-fold cross validation, a more standard machine learning validation approach. This model also did quite well, with an average accuracy of 0.78 and average AUC of 0.73. Here is a confusion matrix of the results:

validation2

Figure 2: Confusion matrix for 10-fold cross-validation approach.

In this validation, we were able to predict true positives 67.56% of the time. This means that for every 10 fires, our model would have predicted almost 7 of them.

It is worth briefly discussing the implications of the false positives in this model. In both validation approaches, we had a substantial amount of false positives – that is, properties that our model predicted would have a fire, but did not actually have a fire. Though many predictive models try to maximize the specificity (the ratio of true negatives to all negatives) by increasing true negatives and reducing false positives, in the context of determining which properties to inspect, false positives are actually quite valuable. False positives represent properties that share many characteristics with those properties that did catch on fire. Thus, because they have these characteristics, these are properties that may be at high risk of catching on fire, and should be inspected by AFRD. Additionally, because in a sense our training set and the data set that we ultimately apply the model to are the same (that is, the list of commercial properties in Atlanta), a perfect model with no false positives would do nothing more than tell us which buildings had previously caught on fire. While this is useful to know, it is data AFRD already has. False positives give us the added value of predicting properties that have not caught on fire, but are at risk of fire due to their characteristics.

We want to give the caveat that this particular model is not necessarily the best fit of the data. Although we tried many other algorithms and configurations of factors and found this model to be the most predictive, further experimentation would undoubtedly yield a more predictive model. We encourage AFRD or others to build upon our methods to improve the model if they wish.

Applying the predictive model to potential inspections

After we built the predictive model, we applied it to the list of current and potential inspections so that AFRD could prioritize inspections to focus on properties most at risk of fire. To do this, we first computed the raw output of the prediction model on this list of properties. This generated a score between 0 and 1 for each property (see Figure 3 below). To be more useful, we translated these scores to a 1-10 scale. Then we divided these scores into low risk (1), medium risk (2-5), and high risk (6-10).

riskscore

 

Figure 3: Transforming model output to risk scores.

We then applied these risk scores to the list of current and potential properties to inspect, and included them on the interactive map.

As a result of this work, AFRD will be able to focus their inspection efforts on those commercial properties in Atlanta that are most at risk of fire. We hope that this focused inspection will result in fewer fires, fewer fire-related injuries, and fewer fire-related deaths in Atlanta.

Thanks for following our blog posts this summer! It’s been a pleasure to work with Dr. Matt Hinds-Aldrich and the rest of our contacts at AFRD. Please feel free to contact me at ohaimson@uci.edu with any questions about this blog post or the project in general.

– Oliver Haimson

Hello from the DSSG-ATL Fire team!

Today we got our project assignments for summer 2015, and I was excited to be assigned to the Fire team! This is a project with the Atlanta Fire Rescue Department (AFRD) using all kinds of interesting datasets to predict fire risk of different buildings throughout Atlanta. This way, the AFRD can better allocated their building inspection resources. The DSSG Fire team includes myself (Oliver Haimson), Wenwen Zhang, Michael Madaio, and temporarily Xiang (Sean) Cheng. Our faculty advisors are Polo Chau and Bistra Dilkina.

I was drawn to this project because of its potential for real world impact in reducing fire risk, which can save lives if successful. I’m excited to build statistical and machine learning models to predict fire risk, not only using the data provided to us by AFRD, but also using social media data. I think that we may be able to predict fire risk by looking at the ways that people talk about particular businesses on sites like Yelp and Twitter.

After we got our data this morning, we got to work on familiarizing ourselves with the different datasets. Wenwen, who is a GIS and spatial modeling wizard, wasted no time in creating some initial geospatial visualizations. Here’s an example of a heat map of fires in Atlanta. fireheatmap

I spent the day looking through the variables included in the different datasets, and thinking about which ones may be useful as predictor variables in a fire risk model. I think that one of the hardest parts of this project will be matching up the buildings in the different datasets. Many buildings are in one dataset but not others, the address formats are different in the different datasets, and the fire data is often recorded as intersections or at imprecise addresses. We remain hopeful that we may be able to identify the buildings using geo-coordinates and parcel data.

We then met with Matt Hinds-Aldrich, AFRD’s Senior Management Analyst. Matt was extremely generous with his time and spent several hours going through the datasets with us and helping us to understand the data better. He also articulated AFRD’s goals to be able to better predict fire risk in order to inspect the most at-risk buildings. As he describes it, this is a process to discover the “unknown unknowns” – buildings that AFRD doesn’t know that they don’t know they need to inspect. We set up weekly meetings for the rest of the summer (including a visit to their headquarters next week!), and talked about potential opportunities for our team to tag along with fire inspectors to better understand the everyday details of fire inspection.

Our team is very excited to keep working on this fascinating problem, and we hope that our work this summer will eventually make an impact on fire prevention in Atlanta!