Hi Navneet,
The inference time or the time taken per frame for feature extraction by different pre-trained networks like GoogLeNet, MobileNetv2, and EfficientNet-b0 can indeed be influenced by several factors, including but not limited to the depth of the network. Here are some reasons why GoogLeNet might be faster than MobileNetv2 and EfficientNet-b0, and why MobileNetv2 is faster than EfficientNet-b0:
1. Network Depth:
- GoogLeNet has a depth of 22 layers, MobileNetv2 has 53 layers, and EfficientNet-b0 has 82 layers. Generally, deeper networks require more computational resources for both forward and backward passes. The number of layers directly impacts the number of matrix multiplications and other operations, which can increase the inference time.
2. Model Complexity:
- Model Architecture: Beyond just the depth, the architecture of these networks plays a crucial role. GoogLeNet introduces the inception module, which, despite increasing the depth and width of the network, was designed to keep the computational budget constant. MobileNetv2 uses depthwise separable convolutions, which significantly reduce the number of parameters and computational complexity compared to standard convolutions but might still be more computationally intensive than some operations in GoogLeNet. EfficientNet-b0 uses a compound scaling method that carefully scales the width, depth, and resolution of the network, leading to higher computational complexity.
- Parameter Count: A higher number of parameters can lead to longer inference times. EfficientNet-b0, despite its efficiency in balancing depth, width, and resolution for improved accuracy, has more parameters and a more complex structure than GoogLeNet and MobileNetv2, contributing to longer inference times.
3. Computational Efficiency:
- Operation Types: Different types of operations (e.g., depthwise separable convolutions in MobileNetv2, squeeze and excitation blocks in EfficientNet) have different computational requirements. Some operations are more efficiently executed on certain hardware architectures.
- Optimization and Implementation: The way these models are implemented and optimized for specific hardware (GPUs, CPUs, TPUs) can also affect inference time. Some architectures might be more optimized for parallel processing on GPUs, while others might not fully leverage the hardware's capabilities.
4. Input Resolution:
- The input resolution to the network can also affect inference time. EfficientNets are designed to scale up not just in depth and width but also in resolution, which can increase the computational load. If the EfficientNet-b0 is processing higher resolution images compared to GoogLeNet and MobileNetv2, this could further explain the increased inference time.
Summary:
While the depth of the network is a significant factor in determining the inference time, it's not the only one. The architectural decisions, types of operations used, model complexity, parameter count, and how well the model is optimized for the hardware it runs on all play crucial roles. GoogLeNet being faster than MobileNetv2, and MobileNetv2 being faster than EfficientNet-b0, can be attributed to a combination of these factors, with depth being one of the many considerations.