Simple Dust Busting: Cleaning up white dirt on scanned negatives

On the left, a dirty scanned negative. On the right, a "cleaned up" version.

Scanning negatives with a cheap transparency scanner, then converting them in to positives using the process described here often brings another problem: dirt!

Over the period of a few days, I fed about 1400 frames of negative through the scanner and (not surprisingly, I suppose) I started to see hairs, dots and general grot appearing in the results. These are caused by dirt falling on to the illuminating screen at the bottom of the scanner. When the negative is digitized, the dirt stops light reaching the negative and those dirty bits come out as black. Once the negative is converted in to a positive, they become white.

The scanner came with a brush to clean up such grot, but (a) I couldn't be bothered using it often enough (after the first few hundred frames, this all gets very old and you just want to get it done ASAP), and (b) the brush seemed to add as much dirt as it removed much of the time! Using isopropyl alcohol with it helped ... but then you have to wait for that to dry ... At least with my level of patience, it just wasn't feasible to remove all the dirt all the time (or even most of it most of the time!). So ... some sort of post processing was called for to reduce the visibility of the grot to more or less acceptable levels.

Given the quantity of frames to deal with, a largely automatic process was indicated. After a bit of thought, there seemed to be two parts to this:
  • Try to locate which pixels are covered by dirt. Create a "mask" which is white where the dirt is and black where it isn't.
  • Given the mask, try to intelligently fill the pixels inside the mask using close by image pixels outside the mask (which we must assume are clean image).
Since the dirt appears as bright white in the positive images, given a run of images with the same dirt pattern in each frame, if we add up the frames, then scale them so the maximum is 1 (say), the dirt will "emerge" as the brightest parts of this scaled sum of frames. If we then threshold this (make all values above a manually chosen level white and the rest black) we should have the desired dirt mask.

This more or less works. However, it works much more reliably if we try to estimate the "background" of the scaled sum image. We can do this just by blurring it fairly heavily (which will filter out all reasonable -- i.e. smallish -- bits of dirt). If we subtract the background estimate off the scaled sum, then find the absolute value of the result and send that to be manually thresholded, it turns out we can get a rather satisfactory mask given, say, 50 or more input frames with the same(ish) dirt pattern to sum together. At least, that is what I found with my frames. An example of a processed scaled sum image before thresholding is:

The "background removed" scaled sum image needs to be thresholded manually (e.g. in Gimp) to get the final "binary" dirt mask. The threshold should be adjusted down to capture (most of) the dirt while avoiding including accidental "bright bits" that have accumulated due to aligned image features other than dirt. Below is an example of a "dirt mask".

Yes ... there is rather a lot of dirt, I'm afraid.

The next step is to "fill in" the bits of the image inside the mask in some sensible way using parts of the image that are close by but outside the mask. The process used to do this works as follows:
  1. Expand (dilate) the white bits of the mask a bit to be on the safe side.
  2. Make a copy of the dilated mask.
  3. Scan the mask until a white pixel is found.
  4. Find the average colour of the image in a box centered on the location of the white mask pixel but only add pixels where the mask is black in to the average. The size of the box is some suitable value such as 7x7.
  5. Set the result image at the location of the white mask pixel to this conditional average value. Set the pixel at this location in the copy of the mask to black.
  6. If no black mask pixels were found while trying to form a conditional average, note that fact as a "failure", but otherwise keep going.
  7. Continue from Step 3 with the next pixel and repeat until the entire image has been processed.
  8. If there were no "failures", then all the dirt pixels have been filled and we are finished.
  9. Otherwise, make the "active" mask the (modified) "copy mask", make the result image the input image and repeat the process. Keep repeating until there are no "failures" or a maximum number of "passes" have been carried out.
(This is actually implementing a sort of diffusion process in which valid image data is diffused from the edges of the dirt mask inwards until the mask is completely filled. A great deal of work has been published on this sort of in-filling which is much more advanced than this. OTOH, the process above is straightforward and can run quite quickly with fairly decent results -- on a good day.) We now have an image in which every non-black dilated mask pixel is filled with the local "outside mask" colour, as shown below:

Next, we need to put this back in to the original image to fill in the "holes". To do that, we simply blend the "filled mask" image with the original using (a slightly blurred version of) the dirt mask to control the blend. The sort of result this produces can be seen at the top of this page.

The process (in my experience) greatly improves the appearance of the scanned images. However, it is far from perfect. Here are some issues:
  • You need to manually identify runs of input images with the same dirt pattern on them. Or more or less the same pattern, anyway. This is itself somewhat tedious. It doesn't matter all that much if there is some variation in the dirt pattern. But you need to stop when the pattern changes significantly -- for obvious reasons.
  • Unless the pattern is perfectly stable in the run you have chosen (which is unlikely), there will be no threshold which captures all the dirt without including too much "non dirt". Some remnants of the dirt will probably remain after processing.

This process has been implemented as a mixed Python and C program called for Ubuntu Linux. For performance reasons, a pure Python program wasn't really acceptable. Low level pixel operations on large images really need to be done in a compiled language to get decent speed. This mix of Python and C (or C++) is a very effective approach, in my opinion, to many problems. It gives you the flexibility and power of Python with the speed of C/C++ where needed. I now believe that compiled languages should mostly be restricted to implementing compute bound functions which are then called from interpreted languages, with as much as possible of the complexity of the overall program in the interpreted code. The productivity benefits of this approach are often very great, I think. You can download the software here. To build it, proceed as follows:
$ tar xzf dustbust.tgz
$ cd dustbust
$ ./
To run dustbust to create a mask use:
$ cd dustbust # If not already there
$ python -m first_input_file.jpg last_input_file.jpg mask_file.jpg
$ gimp mask_file.jpg
This will create a mask using all the scanned image files between first_input_file.jpg and last_input_file.jpg (exclusive) then run Gimp on that mask. Select Colours->Threshold, adjust the threshold, then Save to save the thresholded mask.

To apply the mask to remove dirt on the same set of input files, use:
$ python first_input_file.jpg last_input_file.jpg mask_file.jpg
This will write cleaned images to a folder called: cleanedphotos. Here is a complete example of an actual run:
$ python -m ../posscans/positive-2014-04-27-175937.jpg ../posscans/positive-2014-04-27-181041.jpg group10.jpg
=========================================================================== - Very simple dust removal program for scanned positive images.
Estimating a dust mask.
INFO: Adding in: ../posscans/positive-2014-04-27-175937.jpg
INFO: Adding in: ../posscans/positive-2014-04-27-180954.jpg
INFO: Added in: 56 images.
INFO: Max value before scaling is: 13419.0
INFO: Estimating background (slow).
INFO: Removing background.
INFO: Trimming edges.
INFO: Wrote mask: group10.jpg
Please threshold this manually using GIMP etc.
INFO: Processing took: 1 minutes 8.9 seconds.
$ gimp group10.jpg
$ python  ../posscans/positive-2014-04-27-175937.jpg ../posscans/positive-2014-04-27-181041.jpg group10.jpg
=========================================================================== - Very simple dust removal program for scanned positive images.
Applying a dust mask.
INFO: Create cleanedphotos folder.
INFO: Dilating mask.
INFO: Softening mask.
INFO: Trimming mask edges.
INFO: Processing: ../posscans/positive-2014-04-27-175937.jpg
INFO: Processing: ../posscans/positive-2014-04-27-180954.jpg
INFO: Processed 56 image(s).
INFO: Processing took: 0 minutes 50.9 seconds.

You will need a working C++ development environment and a Python environment with Numpy and PIL installed to run it.

Go home ...