UN Data for Climate Action – Predicting and Alleviating Road Flooding in Senegal

With ten weeks’ busy work, we successfully completed our work for the UN Data for Climate Action Challenge. Here is a wrap up of the research questions we focus on, and the solutions for the problems.

The background is that climate change has the potential to raise the risk of flood for coastal countries, like those of Senegal. Given the large proportion of unpaved road in Senegal, flood risk could damage the road network and affect the accessibility of residents. Given the condition that in Africa countries there is a deficit funding for the infrastructure development, it is critical to identify which roads should be prioritized, to prepare for the possible damages brought by climate change. We propose two steps to identify the roads that should be prioritized.

First, we need to evaluate the probability of flood risk for the areas where roads go through under climate change. To achieve this, we build a flood risk model based on the topographic features and historical weather data for the area we study.

The second step is to analyze the contribution of each road segment to regional connectivity. Roads that are critical to accessibility and under flood risk should be prioritized for weatherproofing.

Applying optimization techniques, we can then determine explicit plans for allocating road maintenance funds. Multiple sustainable development objectives can be explored within this framework, such as maximizing rural connectivity or minimizing the expected number of people isolated due to flooding. This approach has the potential to minimize the long-term cost of establishing a reliable road network while helping to buffer vulnerable populations from extreme weather events.

Flood Risk Prediction Model

For flood risk prediction, we have collected data from multiple sources, for example, flooding maps of Senegal from NASA, daily weather data from NOAA, land cover data from the Food and Agriculture Organization of the UN, and different types of maps from Open Street Map. With the rich information in topography, hydrology, and weather etc. we are able to build machine learning models to evaluate the flood risk for the 1km*1km analysis unit. The bellowing framework shows features we use, the targets, the algorithms we use to build models, and evaluation methods.

There is also a critical step to join the target flooding area and all the features so that they are at the same spatial scales. For raster files, we mainly use the zonal statistics method to get values for each grid cell. For land cover and water area data, we calculate the intersection area of feature polygons to each grid cell. For daily weather, we use the weighted average, where the weights are determined by the distance of the grid cell to two weather stations.

Firstly, we train regression models and use the proportion of flooding in each grid cell during each biweekly time period as the target. We choose three machine-learning models to train on the data: Support Vector Machines (SVM), Random Forest (RF) and XGBoost. The best RF model achieves promising performance, with an R-squared (how close the data fit to the regression line) of about 0.7056 in test set, and a root mean square error (RMSE) of about 0.1041. The top 10 important features of the model show that the dynamic historical weather features affect the flooding area change, especially the historical temperature and precipitation.

However, the regression results do not reflect how adversely the road going through this area may be affected by the flood. This is a challenging idea to quantify, as the flooding area change of a grid cell is not directly related to the probability of road becoming flooded. Therefore, we set a threshold to determine whether the grid cell is flooded or not at a particular biweekly time period, and turn it into a classification problem. Each sample is labeled as flooded or not based on the percentage of flooding areas in this grid cell. For conservative consideration, the threshold is set as 0.5, which means that if a grid cell has 50% area flooded during a biweekly time period, this sample is labeled as flooded, vise versa. In the table shows the model evaluation and performance on test dataset.

A visualization of the historical flood risk map and the predicted map shows that we can precisely capture areas with high flooding risk such as #1, #2, and #3. Meanwhile, for some historical low flood risk areas (#4), our predicted model can overestimate the flood risk. Such areas may get flooded not that frequent in the past, but probably have a risk of getting flooded in the future, according to our model. The predicted results help to offer suggestive information for the future preparation.

Road Network Optimization

We use the telecommunication data from Orange to estimate the traffic flow in road segments. We first began by generating the Voronoi of the cellular network towers by computing the Delaunay Trian-gulation of each tower and assigning road intersections to each Voronoi region. We then began assigning population flow to the edges by checking if a user was in transition. We say a user is in transition if the tower corresponding to their cell phone use changed from one time stamp to the next. If a user is in transition, we calculate the shortest path between two randomly chosen roads corresponding to the origin and destination region. After the path is calculated, we increment the population of the edges in the path by one for the date of the destination’s time stamp.

The second task was to determine which edges in our graph were at most risk of being flooded. Using the 14 days composite flood map from NASA, we calculate the amount of flooding in a road at a particular time period. This is calculated by the sum of the areas that are flooded in one road segment at a specific time period. We then divide this sum by the length of the entire road segment. The assumption is that if a road segment is frequently flooded, or a large pro-portion of the road is flooded, then this road segment has a higher risk of being broken. Therefore, we define the flood risk of a road as the sum of flooded proportions over all the time periods. The third task was to determine overall importance of each road segment and make repair or preemptive fortifications based on the value of the road. We define road importance as how much of an impact its removal may have on accessibility to the surrounding regions. This is computed by finding the distance traveled by all inhabitants on two separate paths, and taking their difference. The first path is the original intact path. The second is the alternate route taken if one of the roads in the original path is damaged. We take the difference between the second path and the first path. That is, the bigger the difference is, the worse the new route is, and thus the more impact on accessibility the flooding of the chosen road will have. We calculated importance of the top 20 riskiest roads.

In conclusion, we solve the road optimization problem by building a flood risk model, evaluating the road traffic based on mobility behaviors extracted from cell phone records, and combining these two to assess the road importance. Hope these models can help decision makers to make more efficient strategies regarding to climate mitigation for transportation.

We thank our mentor Bistra Dilkina, Caleb Robinson, and Amrita Gupta for useful advice.

Finishing the Course

After 10 busy weeks, our project is complete! We’re proud to have worked with the Food Bank to better understand this important policy issue. Let’s recap what we’ve accomplished and why. The Atlanta Community Food Bank (ACFB) aspires to … [Continue reading]

Ending on Good Note

The final presentation on Monday evening was a great opportunity for us to reflect on all of the hard work and learning we’ve done on our housing justice projects this summer. Our first project was an analysis and visualization of Atlanta’s … [Continue reading]

Week 9: The Final Push

After scrambling the past couple of weeks with paper submission deadlines and the mid-program presentation, we are now working on re-running our models, finalizing our estimates, and making updates to our interactive tools. Time is of the essence, as … [Continue reading]

Nearing the finish line

This past week we’ve been working on creating visualizations of the data we’ve collected and starting to prepare it to be put on the R Shiny Snap app.   Below is an analysis we created for understanding SNAP sentiment over time.  The y … [Continue reading]

The good, the bad and the ugly

The good: And there is light. While adjusting the first pilot sensing device (unit 1.0 beta #1), the team has been working on several parallel tasks for making possible to start to collect pilot data by the end of this week. Most of the … [Continue reading]

Almost there !!!

The past two weeks were pretty hectic. We spent long hours in the lab and did tons of number crunching. The prior week, we had our midterm presentation and last week we had a deadline for paper submission. Fortunately, both went pretty well. We … [Continue reading]

Pushing through milestones

This week was a season of growth for our team best characterized through milestones, triumphs, and valuable lessons. After the wake of the mid-term presentations, our team headed back to the drawing board to work out kinks in our poster, oral … [Continue reading]