Facebook open-sourced its Deepmask and Sharpmask codebases last week for object segmentation in a scene. In the past I’ve tried traditional CV methods to do object masking, and it is kind of a disaster, you end up with a very finicky codebase full of heuristics and hacks, which fails as soon as it sees a new kind of image. Seems like an archetypical use case for neural nets. So seemed worth giving it a try.

Deepmask and Sharpmask come with pretrained models based on the COCO dataset, it looks laborious to label a new dataset without benefit of some mechanical turk-like process, so I decided to stick with the pretrained models.

To avoid polluting my system, and to share with the rest of the team, I decided to bring this all up in a docker container. Because our build environment has some unique characteristics, I needed to author my own container, but these were excellent dockerfile guides: Torch with CUDA and it’s dependencies. Also Torch install is a good reference.


First results in the picture. Did a nice job on this image for the large objects. The biggest problem I had was running out of GPU memory. The machine I was using only had 4G of video ram and this constrains how large an image you can feed in. We had a bunch of 4K x 3K street images, I found that scaling them down to 756×567 allowed me to get them thru the classifier. My modified classifier that implements a size limit is here

The classifier is slow, seconds to tag an image. It would be interesting to keep it resident as a demon, and to just emit the metadata instead of a modified image, and see if I could get a higher framerate. Also I want to play around with both a faster video card, as well as maybe an opencl implementation to get to non-cuda platforms. This is also my first lua code ever, I have no idea which part of this code is fast or slow.

This was a fun little project. It points out that, if software is eating the world, then ML is eating software. For a certain class of problems, a fairly generic neural net plus a dataset is going to be competitive with the best laboriously hand-coded algorithm.