Comparing Depth-wise Separable Convolution and Ordinary Convolution

Convolution neural networks revolutionized the field of computer vision including image classification and recognition, with the introduction of networks such as AlexNet. However, in the process of pursuing higher performance in competitions such as ImageNet, the complexity and depth of convolution neural networks have grown significantly and made computation cost impractical. Many solutions have come up to improve the computation cost, one of them is the use of depth-wise separable convolutions. The use of depth-wise separable convolutions in models such as MobileNet have presented promising results in reducing the size of the model significantly with a relatively small reduction in performance. This project aims to compare depth-wise separable convolution and ordinary convolution with similar structure in performance and efficiency on smaller data sets using a MobileNet like network.

We reproduced MobileNet to train on the CIFAR-10 and CIFAR-100 and found out that the use of depth-wise separable convolution and ordinary convolution provides a similar level of classification accuracy on test time. This differs from the result from MobileNet where using ordinary convolution provides a better classification accuracy. For the training time, it takes less time for DepthwiseConvNet to run each epoch but take longer to train the whole model due to the need to run more epochs compared to ConvNet. For test time, DepthwiseConvNet takes shorter amount of time when batch size is reasonably large but the study also discovered a situation where DepthwiseConvNet takes longer to run when input is extremely small.

This full paper can be accessed here and the code can be accessed here.