Seeing Like a Bike: Iterations for Better Data Quality

Our team has been refining the sensor boxes as we collect data. Our friendly colleagues volunteered to ride a bike, and each time, they came back with handful of data along with points that need to be fixed in hardware, software, and data itself. Even though technical challenges that come from physical shocks and vibrations can be hardly in perfection due to the nature of electrical parts used, many issues have been resolved through making small changes in a iterative way.

Try-and-Error: Data Collection and Sensor Box Refinement

Without help of our awesome colleagues, this would have not been possible.

Whenever there are LED indications of sensor malfunctions or wrong data found, we unpacked the box and examined the flaws. Minor (?) issues that we identified and fixed are as follows:

  • Occasional hiccups in the communication between the Pi and Arduino -> resolved by implementing the timeout and reset functionality.
  • Impedance issue: all of sudden, an Arduino board stoped working and sent out “NACK” signals, and never came back to normal after resetting the board -> By removing some amount of solders from our custom bridge, this could be resolved. Too much solder on PCB prevents weak signals coming in and out of Arduino due to the low impedance.
  • Cable order: some cables for the Sonar sensor were found that the order of pins was reversed. This did not raise an error, but data was wrong -> We examined all the cables, bridges, and sensor pins.
  • Broken wires: some cables looked fine by its appearance but we found that a wire was broken inside the socket. This can be prevented by using stronger cables and sockets (that are tenable to bike shocks) in the future.
  • Hardware errors: some Arduino boards, USB-to-TTL connectors, and sensors were found that they were damaged and out of order -> this is the hardest part to identify. After finding them, we had to replace these parts.

Gas Calibration Data Collected

With help of Raj, a Ph.D. student from the department of Environmental Science, we were able to co-locate our gas sensors at the official gas sensing station that is 10 minutes away from Georgia Tech. By comparing data between sensor data and official data from the sensing station, we expect to adjust gas sensors to some degree. Since the temporal resolution of the official data is one hour, it would be hard to adjust them very precisely. Even though, this would increase our gas sensor accuracy greatly.


Environmental Signatures and Ground Truth Data

If we can identify what objects are around the bike only by looking at the sensor data, it is possible to use sensor data for semantic-level analyses. Without guaranteeing the connection between the sensory data and real-world objects, modeling environmental factors using sensory data would be hardly convincing to audience due to the noisy nature of sensors. Our strategy to analyze the sensory data begins with creating semantic-level signatures and classifying each segment of streaming data from bikes. In order to do that, we recorded environmental information in videos and voices using GoPro and voice recorders. These qualitative data provides ground-truth information for the sensor data.

Based on the Level of Transportation Stress (LTS) model, we listed possible obstacles and objects in the biking routes. After aligning the GoPro video and sensor streams by time, we qualitatively tagged each segment of the video (only when the circumstance was not too complex). For example, when a vehicle passes by the bike and there is no other objects around in a video segment, we assume that its corresponding sensory data is a typical classifier for a vehicle passing-by. After a test riding in downtown Atlanta, we collected a ground-truth data.


Here are some examples for creating signatures: (1) a narrow street with cars parked in parallel, and (2) a city road with a car passing by the rider.


The temporal pattern of the corresponding Lidar data to this video segment is as follows:


Since the frequency of the proximity values might provide better indications for objects rather than the temporal pattern of it, we converted this signal into the frequency domain using the Discrete Cosine Transform.

This frequency signature can be used to classify similar environmental factors in the data. Similar to this, the case where a car is passing by the rider is as follows.

These two different cases show their unique patterns to some degree. The graph of a street with cars parked in parallel shows a regular change of Lidar values which resulted in a high middle-level frequency (around 4 to 7). Meanwhile,  the case where a car is passing by the rider shows a higher value in a low frequency (around 2-3) since the Lidar value changes radically at one time. Of course, these are exploratory signatures, and more ground-truth data and other sensors need to be aggregated/merged to provide robust signatures.

We are working on generating more ground-truth data. The classification performance for data segments depends on (1) the quality of signatures, (2) the quality of ground-truth data, and (3) the prediction model (feature engineering). We hope to finish the first-round classifications of sensory data in a few days with a high prediction performance.

We are reporting our final results at the DSSG final presentation on Monday (July 24th, 2017).

Seeing Like a Bike: Towards Integrating the Sensor System

The seeing like a bike team is now in the stage of wrapping up the sensor box and integrating parts as a system. Each level of the sensor system design – from hardware to software – is under the iterative refinement process to better collect data as well as to provide a seamless experience to end users.

Box Design

The sensor box needs to be tenable from external pressure and shocks. Also, it needs to provide an easy-to-use interfaces for users. In order for allowing shock/pressure/vibration tolerance, we aim to make boxes using a sturdy ABS. Before working on actual ABS boxes, we first started testing our designs using wooden plates since it was more efficient in terms of time and cost. After several iterations, we were able to come up with a best design that can house complex structures of sensors, a battery, and an Arduino board. After finalizing the structure of the box, we tried to laser-cut an ABS box. However, using a laser cutter to make required holes and ventilation slits was tricky due to the characteristic of ABS – it was burning easily when exposed to lasers. After several experiments, we were able to find a good way to cut ABS boxes. Using a fast speed laser with low power, the burning effect decreased. By cutting the ABS multiple times with this weak laser, it was possible to make relatively neat holes and slits.

The front case, i.e., the server case, is also being redesigned and constructed using a 3D printer. Since the GPS device has been moved to the front, we redesigned it, and now, it is being slowly constructed from the printer. We are still in the process of refining the locations of slits and holes, but we are almost there!

Data Quality and Computing Efficiency

Even though data collection functionalities and high-level board optimizations were completed last week, we have been continuously working on optimizing the code structure and data formats. This optimization process is not only for data quality in the acquisition stage, but also for the power efficiency (so to allow the longevity of the device). This process involves (1) running the code in the command-line mode; (2) minimizing the use of REST APIs; (3) using GPS timestamps for entire sensory data; and (4) setting a time interval correctly for each sensor. In addition, we also redesigned the LED operations to make the indicators simple.

The original Raspberry Pi server was running in the Windows mode due to the convenience for setting the auto-running mode. However, running the Windows mode on Raspberry Pi with a battery is inefficient from the longevity perspective. This led us to dig into the Linux settings, and we finally changed the mode to the command line. Also, we tried to minimize the use of REST APIs. While they provide powerful interfaces for other applications to communicate to the server, it also creates another layer of software, i.e., network layer, which each sensor needs to go through. This is not critical in the power consumption, but still we moved several sensor communications to simple socket applications so to minimize the script executions.

Timestamp was another challenge in the original system as the Raspberry Pi’s time was not universal one. Due to the inconsistency and inaccuracy of times among devices, synchronization techniques had to be used to maintain the timestamps consistent across different datasets. To make it simple and accurate, we came up with a solution: GPS’s timestamp that comes from the satellite is used for the entire dataset. Since GPS data is collected every 200ms, we update the global time every 200ms, and other sensors use this global time when logging their times.

Finally, we set the data collection interval for each sensor. For example, air quality does not change very quickly, it is okay to collect data every second. Meanwhile, the acceleration of the bike can change very quickly depending on the movement of the bike, so the accelerometer needs to collect data more frequently then air quality sensors. As part of this effort, ultra sonar sensors’ interval has been set to 200ms based on a simple physics model.  We believe this series of setting and tuning parameters and refining software structures will allow better raw data quality and energy efficiency.

Next Steps

The next steps are to connect everything with a new box and case, to calibrate some sensors (for making sure the sensory data is correct), to deploy the system on a bike, and to start collecting pilot data. Once pilot data looks good, we will make same boxes more and deploy them to passionate riders’ bikes. We cannot wait heading out to ride a bike with our system.