-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Out of memory #2
Comments
@xingjici a batch size of 2 is approximately 18GB memory on Cityscapes, and 2 is the default. If I remember correctly batch size 1 should apx 12GB, maybe a bit more. Keep me posted on your progress. |
@NoamRosenberg Batch size is 4. If computation with 2 batch approximately use 18GB memory, the one should be 9GB when nn.dataParallel is on. I have 4x12 gpus but it doesn't work when batch size equals to 4. |
@xingjici In practice it's not linear and 1 GPU will take more than 9GB, I suggest shrinking the model input for now as a test. It's easy to do, adjust args.base_size |
@NoamRosenberg I found that only GPU0 works in training and nn.Dataparellel may get crashed. Could you check the memory usage vis Nvidia-smi ? I thought the reason may be all computation burden were taken by GPU0 |
@xingjici this is very odd, could you elaborate on what you tried so far and what errors you get with the data parallel. I won’t have access to a computer till Monday, I will do my best to help you figure this out then. Please keep me updated, By the way I’m looking for contributors to this project.. happy to have you join forces |
@NoamRosenberg |
@xingjici , thanks for your ideas. I wonder if you wouldn't mind commiting them. Specifically, self.architect receives the self.model object which has just recently been distributed. So, I'm not quite sure what you mean, but if you commit this idea I can check more carefully. |
Hi, I have 4x12GB gpus, but is seams only the first one works.
out of memory were encountered after few second training.
The text was updated successfully, but these errors were encountered: