The Reproduction of “Automated Pavement Crack Segmentation Using U-Net-Based Convolutional Neural Network”

The goal of the paper

Proposed network architecture

Original results of the paper

The result of the paper we want to reproduce is tables 1 & 2, which show the precision, recall, and F1 score of the test images. This is already a table where the authors compared their results with that of other papers, it can be concluded that their method was outperforming the others in almost all the scores. We will compare our results with this table to see how well our model performed on the datasets.

Table 1 & 2 of the paper

The Model

We initially started figuring out how to work with the Resnet code. We studied its structure to understand how the layers were built and which PyTorch modules were applied. This is important since the U-shaped resnet34 network requires skipping connections in the Resnet encoder in the same layer to be used in the decoder. Searching the internet yielded no solutions to enable these skip connections with Resnet code and we chose to first implement the entire decoder in the Resnet code to continue working on other important parts of the reproduction project. Later we learned of a more elegant solution but were unable to implement it due to time constraints.

Layers in the network
Forward in the network

Dataset, batching & transforming

Images and segmentation of the image

Training the model

With the data loaded and pre-processed, we can start training our model. This is done in the last block of code in the notebook. In the block above we initialized the hyperparameters and values, we used in our custom scheduler for the learning rate used in our training loop. In the training loop block, we initialized the optimizer, which is AdamW. The authors of the paper used the default hyperparameters for this optimizer. For the custom scheduler of the learning rate (LR) we first set a base LR, which we then follow until the paper increases until we reach 40% of the epoch, from there we let the LR decrease linearly to zero, while in the paper they stated that the LR should approach near zero at the last epoch, using a cyclic scheduler, which would converge to the base LR. So, we made our own scheduler. Although they are very clear on this part, they did not mention which base learning rate they used to achieve the results we wanted to reproduce. Also, the layers were divided into three-layer groups, where the first layer group was frozen for the first 15 epochs. In the training loop, we used the dice loss function to compare the predicted output with the mask. After training, we set the PyTorch in eval mode and test the model.


In this part, we will show our results from the model we trained and compare it with the original results from the paper. To calculate how good the model performs we used the measures used as in the paper. We wrote functions to calculate the precision, recall and the F1-score. It was stated in the paper that a True Positive (TP) was any pixel within a distance of 2 of a crack pixel in the mask. Which distance they used was not defined. The paper they referenced was also not very clear on how they did this. It was very likely they meant Hamming distance. But since this was unclear we wrote two implementations: one diamond shape, using the Hamming distance, and one when the diagonal distance is also distance 1, so a square. The unclarity of this made it difficult to reproduce the results. We chose to implement both and vary the learning rates to get more insight on which they could have used. The use of hamming distance seemed most obvious so we ran this more times.

Table 3: Our results


When we compare our results with the results from Table 1 of the paper we can conclude that the reproduction for the CFD dataset was unsuccessful. We can see that our model does not perform the same as theirs. When looking at our test scores it shows that the precision of our model is very close to that of the paper, but that the recall is way off (even compared with the other papers), which influences the F1-score. A too low recall means that too many pixels were identified as non-crack when they were actually a crack. This seems like underfitting, but when we compared the predicted image with the mask this didn’t seem to really show. Changing the base learning rate didn’t significantly increase this recall either. Something you would expect to see when you are overfitting (which sometimes happens with a high learning rate). This low recall is highly unlikely to be only due to the random data sorting for the train and test set or the random transformations on the data.


Reproduction of this paper was difficult. The appearing very clear and concise paper was too concise. Very important aspects were only shortly mentioned which mostly left us on our own to figure things out. All in all, we did manage to create a pavement crack segmentation algorithm but were unable to reproduce the recall of the paper to be reproduced. Despite this, reproducing this paper really gave us a lot of room to explore the functionalities and the limitations of PyTorch and a lot more insight into the actual application of deep learning algorithms. For example the hours of looking through the section on optimizers to only find out that we had to write the learning rate schedular ourselves gave us a lot of time to learn about optimizers and learning rate manipulation schemes. It also confronted us with the fact that on the surface there is far more to a deep learning paper than it seems. Not simply building a network. But more importantly, it showed us what is important to include if you want to make a paper reproducible. By experiencing the difficulties of reproducing a paper we can learn to be more concise and complete in our own work so that people in the future may be able to reproduce our own deep learning algorithm.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Aakaash Radhoe

Aakaash Radhoe

1 Follower

MSc Student Robotics at the Technical University of Delft.