You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Comparing mobilenet v1 and v2 for inferencing on the CPU, I have observed some surprising numbers:
For v1, my inference time was about 148ms on average. For v2, the average was 185ms (25% slower)
The max_rss memory usage of the process reported about 160MB increase in memory for each copy of the MobileNet v1 loaded in Caffe, after initializing the Net and running 1 forward pass. For v2, the increase was about 300MB per copy.
I am using BVLC Caffe with Intel MKL, doing the measurements on the same system ( Intel Xeon CPU E5-2658 v2 @ 2.40GHz ) contemporaneously, and discarding the first few timings of each to "warm up" any caching.
From the paper I expected inference time and mem usage to be less.... am I missing something?
The text was updated successfully, but these errors were encountered:
I am comparing v1 and v2, both from this repo, and see the performance of v2 in terms of speed and memory usage is worse for my CPU. So the extent to which convolutions are optimized by Caffe is constant across the comparison. I do see total MACC ops 573M vs 438M for v1 vs v2 so v2 is doing fewer convo ops.
Perhaps the size of certain blobs is causing many CPU cache misses? This processor has a 25MB cache size.
Comparing mobilenet v1 and v2 for inferencing on the CPU, I have observed some surprising numbers:
For v1, my inference time was about 148ms on average. For v2, the average was 185ms (25% slower)
The max_rss memory usage of the process reported about 160MB increase in memory for each copy of the MobileNet v1 loaded in Caffe, after initializing the Net and running 1 forward pass. For v2, the increase was about 300MB per copy.
I am using BVLC Caffe with Intel MKL, doing the measurements on the same system ( Intel Xeon CPU E5-2658 v2 @ 2.40GHz ) contemporaneously, and discarding the first few timings of each to "warm up" any caching.
From the paper I expected inference time and mem usage to be less.... am I missing something?
The text was updated successfully, but these errors were encountered: