Content
The Siamese network architecture has been applied by deep learning practitioners to find similarities between images. In the domain of autonomous driving, this network configuration has recently gained attention for solving the change detection task, which involves identifying changes in a previously known map of a vehicle's environment. This is vital, as such deviations may compromise the accuracy and reliability of the map, which is essential for the vehicle's ability to localize itself and navigate effectively. In this paper, we present a set of experiments involving state-of-the-art deep learning architectures based on both convolution (CNN) and attention mechanisms such as AlexNet, GoogLeNet, VGG, ResNet, Vision Transformer, and Shifted Windows Transformer as possible candidates for the feature extractor backbone module in the Siamese architecture to detect changes caused by the disappearance and appearance of construction zones. Also, we evaluate the performance of these architectures using fine-tuning, i.e., initializing the convolutional layers with pre-trained weights. In our experimentation, the best results were obtained using VGG16 (CNN), especially when it was initialized using pre-trained weights from the ImageNet-1K dataset. In particular, VGG16 with an average F1 score of 92% on highway datasets outperformed the baseline residual network composed of ResNet18 convolutions by about 13.5%.