Sharpening the Future of Occupancy Grid Map Prediction Methods: An Investigation into Loss Functions and Semantic Segmentation Multi-Task learning for More Accurate OGM Predictions

Dirks, Rutger

Sharpening the Future of Occupancy Grid Map Prediction Methods

Title

Sharpening the Future of Occupancy Grid Map Prediction Methods: An Investigation into Loss Functions and Semantic Segmentation Multi-Task learning for More Accurate OGM Predictions

Author

Dirks, Rutger (TU Delft Mechanical, Maritime and Materials Engineering)

Contributor

Boekema, H.J. (mentor)
Gavrila, D. (graduation committee)
Pool, E.A.I. (graduation committee)
Kooij, J.F.P. (graduation committee)

Degree granting institution

Delft University of Technology

Programme

Mechanical Engineering | Vehicle Engineering | Cognitive Robotics

Date

2022-06-13

Abstract

For an Autonomous Vehicle (AV) to traverse safely in traffic, It is vital it can anticipate the behavior of surrounding traffic participants using motion prediction. Current motion prediction approaches can be categorized into object-centered and object-agnostic methods and are primarily based on deep learning. The former relies on a human-engineered pipeline of object detection and tracking, of which the errors can accumulate in the motion predictions. The latter does not rely on this pipeline; however, it lacks the ability to learn object representations causing blurriness and object disappearances for longer-term predictions which forms a safety hazard. This thesis proposes two methods to improve the performance of the object-agnostic sequence-to-sequence Occupancy Grid Map (OGM) prediction networks, trained on the Waymo Open Perception dataset. The first method uses inter-pixel loss functions, i.e. the SSIM and Sinkhorn losses, instead of the ubiquitously used per-pixel losses, to train the PredRNN++ Occupancy Grid Map (OGM) prediction network. Inter-pixel losses take into account the spatial relations between grid cells during the evaluation of OGMs, whereas per-pixel losses evaluate each grid cell’s value independently. The quantitative results demonstrate that using inter-pixel losses can improve short term predictions with a prediction horizon of T = 5 for the Mean Squared Error (MSE) by 4.3%, Image Similarity (IS) by 7.8%, Average Precision (AP) by 0.3%, metrics. For the longer term, T = 15, the predictions improve for the MSE by 5.0%, IS by 20.5%, AP by 0.1%, and Accuracy by 0.6%. Furthermore, the use of inter-pixel losses reduces blurriness and object disappearances. The second method is based on multi-task learning. By training the PredRNN++ to perform the prediction task together with the semantic segmentation task on the predicted OGMs, it is expected to learn object representations which it uses to improve the prediction quality. The quantitative results show that multi-task learning does not improve the OGM predictions. However, some qualitative results show that multi-task learning reduces blurriness and object disappearances.

Subject

Motion Prediction
Deep Learning
Loss Functions
multi-task learning
Occupancy Grid Map
Sinkhorn Loss
Structural Similarity Index Measure
Semantic Segmentation

To reference this document use:

http://resolver.tudelft.nl/uuid:efb50fdd-c246-4e55-8193-5cf78072ec29

Part of collection

Student theses

Document type

master thesis

Rights

Files

PDF

Sharpening_the_Future_of_ ... _Dirks.pdf

40.89 MB

Close viewer