Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mean and scaling, C++ Caffe vs Python wrapper #525

Closed
Denominator opened this issue Jun 20, 2014 · 12 comments
Closed

Mean and scaling, C++ Caffe vs Python wrapper #525

Denominator opened this issue Jun 20, 2014 · 12 comments

Comments

@Denominator
Copy link

Hi!

I noticed in caffe.proto (around line 209):

// For data pre-processing, we can do simple scaling and subtracting the
// data mean, if provided. Note that the mean subtraction is always carried
// out before scaling.
optional float scale = 2 [default = 1];

But then in python file pycaffe.py (around line 266):

if input_scale:
    caffe_in *= input_scale

and few lines later:

if mean is not None:
    caffe_in -= mean

Is that right?

@Denominator Denominator changed the title Mean scaling, C++ Caffe vs Python wrapper Mean and scaling, C++ Caffe vs Python wrapper Jun 20, 2014
@longjon
Copy link
Contributor

longjon commented Jun 20, 2014

Yes. The quote from caffe.proto refers to DataLayer, which reads from a database stored on disk and has some preprocessing options. The networks used with the Python layer usually don't have DataLayers; they are used with ndarray inputs that are optionally preprocessed by calling Net.preprocess.

@Denominator
Copy link
Author

Sure, but I mean situation when you train some network from leveldb, get pretrained model file (as it takes a long time). Here you use not scaled mean, as mean is subtracted before scaling.

Then you want to use Python to do predictions later on. Obviously you would use the same mean which is wrong in this situation. You should use scaled mean for Python.

@longjon
Copy link
Contributor

longjon commented Jun 21, 2014

Oh, good point. That does seem incorrect. We'll look into it (submit a PR if you like).

@Denominator
Copy link
Author

Can't submit PR right now. If someone else could do it, that would be great.

@shelhamer
Copy link
Member

If I recall correctly this configuration is actually right for the reference ImageNet model, although this should certainly be better documented.

On the C++ side the image is represented as uint8 in 0-255 while on the Python side scikit-image represents the image data as float in 0-1. For this reason the Python wrapper scales first so that it can re-map the data to 0-255 to match the precomputed mean.

@longjon I see two options: either 1. make the Python preprocessing pipeline the same as the C++ data layer and change the stored mean, or 2. keep the current method with an explanatory comment.

To me 2. is a little more flexible and less work, but might be slightly confusing.

@Denominator
Copy link
Author

Actually I am working on Cifar10 Python classifier and use Imagenet as my template and it is very confusing :)

@longjon
Copy link
Contributor

longjon commented Jun 21, 2014

@shelhamer, I'm confused about your statement that scikit-image represents image data as float in [0, 1]. As far as I can tell, it just invokes PIL, much like matplotlib:

$ ipython --no-banner

In [1]: from skimage import io

In [2]: io.imread('examples/images/cat.jpg').flat[:10]
Out[2]: array([26, 57, 49, 27, 58, 50, 25, 55, 47, 28], dtype=uint8)

Are you reading images differently?

Looking at pycaffe.py, my guess is that you're doing the rescaling yourself so that the call to skimage.transform.resize works. Perhaps we should change caffe.io.resize_image to do precisely what is already done in Net.set_mean; then we could lose the undocumented precondition on caffe.io.resize_image (and Net.preprocess), simplify the code in set_mean, use [0, 255] images as input in Python, and match the transformation ordering to DataLayer.

@shelhamer
Copy link
Member

@longjon, your read on the current state of the caffe.io code is right. I'll try the alternative you've outlined in my next pass over the pycaffe. That's due soon so that I can verify and fix the speed issues Ross mentioned in #482.

@shelhamer
Copy link
Member

Another TODO for some brave soul: the mean subtraction assumes a 3 channel input and mean in the elementwise mode only because skimage.transform has that limitation. Is there a better way to do resizing/interpolation in Python? I tried scipy.ndimage.zoom once but remember it working quite slowly.

@Denominator
Copy link
Author

As far as I remember with PIL as long as your image has pixels of type int you have [0, 255] scale, but if your pixels are of type float, then it has to be [0, 1].

@sayanghosh
Copy link

The interesting fact is, I am having a similar problem trying to predict a class label given a raw image. in Python. I have 100 image files in a folder, and I try out the following alternative scenarios : (1) Pack them into a leveldb and confirm that the testing error (from C++) is around 72% (2) Try them out in Python, using the Classifier.predict function, and the accuracy drops to 42% ! Note that I have around 200 target classes.

x=caffe.io.load_image(filename,color=False)    
scores = net.predict([x])    
filters = net.blobs['prob'].data
f=filters.reshape(200)
return scores.reshape(200).argsort()[-1]

@shelhamer
Copy link
Member

@Denominator @sayanghosh check out #816. Let's sort out the input preprocessing in that PR.

shelhamer added a commit to shelhamer/caffe that referenced this issue Aug 6, 2014
- load an image as [0,1] single / np.float32 according to Python convention
- fix input scaling during preprocessing:
  - scale input for preprocessing by `raw_scale` e.g. to map an image
    to [0, 255] for the CaffeNet and AlexNet ImageNet models
  - scale feature space by `input_scale` after mean subtraction
  - switch examples to raw scale for ImageNet models
  - fix BVLC#525
- preserve type after resizing.
- resize 1, 3, or K channel images with special casing between
  skimage.transform (1 and 3) and scipy.ndimage (K) for speed
mitmul pushed a commit to mitmul/caffe that referenced this issue Sep 30, 2014
- load an image as [0,1] single / np.float32 according to Python convention
- fix input scaling during preprocessing:
  - scale input for preprocessing by `raw_scale` e.g. to map an image
    to [0, 255] for the CaffeNet and AlexNet ImageNet models
  - scale feature space by `input_scale` after mean subtraction
  - switch examples to raw scale for ImageNet models
  - fix BVLC#525
- preserve type after resizing.
- resize 1, 3, or K channel images with special casing between
  skimage.transform (1 and 3) and scipy.ndimage (K) for speed
RazvanRanca pushed a commit to RazvanRanca/caffe that referenced this issue Nov 4, 2014
- load an image as [0,1] single / np.float32 according to Python convention
- fix input scaling during preprocessing:
  - scale input for preprocessing by `raw_scale` e.g. to map an image
    to [0, 255] for the CaffeNet and AlexNet ImageNet models
  - scale feature space by `input_scale` after mean subtraction
  - switch examples to raw scale for ImageNet models
  - fix BVLC#525
- preserve type after resizing.
- resize 1, 3, or K channel images with special casing between
  skimage.transform (1 and 3) and scipy.ndimage (K) for speed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants