Simple Dust Busting: Cleaning up white dirt on scanned negatives
On the left, a dirty scanned negative. On the right, a "cleaned up" version.
Scanning negatives with a cheap transparency scanner, then converting them in to
positives using the process described here often brings
another problem: dirt!
Over the period of a few days, I fed about 1400 frames of negative through the
scanner and (not surprisingly, I suppose) I started to see hairs, dots and general
grot appearing in the results. These are caused by dirt falling on to the illuminating
screen at the bottom of the scanner. When the negative is digitized, the dirt stops
light reaching the negative and those dirty bits come out as black. Once the negative
is converted in to a positive, they become white.
The scanner came with a brush to clean up such grot, but (a) I couldn't be bothered
using it often enough (after the first few hundred frames, this all gets very old
and you just want to get it done ASAP), and (b) the brush seemed to add as much dirt
as it removed much of the time! Using isopropyl alcohol with it helped ... but then
you have to wait for that to dry ... At least with my level of patience, it just
wasn't feasible to remove all the dirt all the time (or even most of it most
of the time!). So ... some sort of post processing was called for to reduce the
visibility of the grot to more or less acceptable levels.
Given the quantity of frames to deal with, a largely automatic process was indicated.
After a bit of thought, there seemed to be two parts to this:
Since the dirt appears as bright white in the positive images, given a run of
images with the same dirt pattern in each frame, if we add up the frames, then
scale them so the maximum is 1 (say), the dirt will "emerge" as the brightest parts
of this scaled sum of frames. If we then threshold this (make all values above a manually
chosen level white and the rest black) we should have the desired dirt mask.
- Try to locate which pixels are covered by dirt. Create a "mask" which is white
where the dirt is and black where it isn't.
- Given the mask, try to intelligently fill the pixels inside the mask using
close by image pixels outside the mask (which we must assume are clean image).
This more or less works. However, it works much more reliably if we try to estimate
the "background" of the scaled sum image. We can do this just by blurring it fairly
heavily (which will filter out all reasonable -- i.e. smallish -- bits of dirt). If
we subtract the background estimate off the scaled sum, then find the absolute value of the
result and send that to be manually thresholded, it turns out we can get a rather
satisfactory mask given, say, 50 or more input frames with the same(ish) dirt pattern
to sum together. At least, that is what I found with my frames. An example of a
processed scaled sum image before thresholding is:
The "background removed" scaled sum image needs to be thresholded manually (e.g. in Gimp)
to get the final "binary" dirt mask. The threshold should be adjusted down to capture
(most of) the dirt while avoiding including accidental "bright bits" that have
accumulated due to aligned image features other than dirt. Below is an example of a
Yes ... there is rather a lot of dirt, I'm afraid.
The next step is to "fill in" the bits of the image inside the mask in some sensible
way using parts of the image that are close by but outside the mask. The process used
to do this works as follows:
(This is actually implementing a sort of diffusion process in which valid image
data is diffused from the edges of the dirt mask inwards until the mask is completely
filled. A great deal of work has been published on this sort of in-filling which is
much more advanced than this. OTOH, the process above is straightforward and can
run quite quickly with fairly decent results -- on a good day.)
We now have an image in which every non-black dilated mask pixel is filled with the
local "outside mask" colour, as shown below:
- Expand (dilate) the white bits of the mask a bit to be on the safe side.
- Make a copy of the dilated mask.
- Scan the mask until a white pixel is found.
- Find the average colour of the image in a box centered on the location of
the white mask pixel but only add pixels where the mask is black in to
the average. The size of the box is some suitable value such as 7x7.
- Set the result image at the location of the white mask pixel to this conditional
average value. Set the pixel at this location in the copy of the mask to black.
- If no black mask pixels were found while trying to form a conditional
average, note that fact as a "failure", but otherwise keep going.
- Continue from Step 3 with the next pixel and repeat until the entire image
has been processed.
- If there were no "failures", then all the dirt pixels have been filled and
we are finished.
- Otherwise, make the "active" mask the (modified) "copy mask", make the result
image the input image and repeat the process. Keep repeating until there are
no "failures" or a maximum number of "passes" have been carried out.
Next, we need to put this back in to the original image to fill in the "holes". To
do that, we simply blend the "filled mask" image with the original using (a slightly
blurred version of) the dirt mask to control the blend. The sort of result this
produces can be seen at the top of this page.
The process (in my experience) greatly improves the appearance of the scanned
images. However, it is far from perfect. Here are some issues:
- You need to manually identify runs of input images with the same dirt pattern
on them. Or more or less the same pattern, anyway. This is itself somewhat
tedious. It doesn't matter all that much if there is some variation in the
dirt pattern. But you need to stop when the pattern changes significantly -- for
- Unless the pattern is perfectly stable in the run you have chosen (which is
unlikely), there will be no threshold which captures all the dirt without
including too much "non dirt". Some remnants of the dirt will probably remain after
This process has been implemented as a mixed Python and C program called dustbust.py
for Ubuntu Linux. For performance reasons, a pure Python program wasn't really acceptable. Low
level pixel operations on large images really need to be done in a compiled language to get
decent speed. This mix of Python and C (or C++) is a very effective approach, in my opinion,
to many problems. It gives you the flexibility and power of Python with the speed of C/C++ where
needed. I now believe that compiled languages should mostly be restricted to implementing compute
bound functions which are then called from interpreted languages, with as much as possible
of the complexity of the overall program in the interpreted code. The productivity benefits of
this approach are often very great, I think.
You can download the software here. To build it, proceed as follows:
$ tar xzf dustbust.tgz
$ cd dustbust
To run dustbust to create a mask use:
$ cd dustbust # If not already there
$ python dustbust.py -m first_input_file.jpg last_input_file.jpg mask_file.jpg
$ gimp mask_file.jpg
This will create a mask using all the scanned image files between first_input_file.jpg and
last_input_file.jpg (exclusive) then run Gimp on that mask.
Select Colours->Threshold, adjust
the threshold, then Save to save the thresholded mask.
To apply the mask to remove dirt on the same set of input files, use:
$ python dustbust.py first_input_file.jpg last_input_file.jpg mask_file.jpg
This will write cleaned images to a folder called: cleanedphotos.
Here is a complete example of an actual run:
$ python dustbust.py -m ../posscans/positive-2014-04-27-175937.jpg ../posscans/positive-2014-04-27-181041.jpg group10.jpg
dustbust.py - Very simple dust removal program for scanned positive images.
Estimating a dust mask.
INFO: Adding in: ../posscans/positive-2014-04-27-175937.jpg
INFO: Adding in: ../posscans/positive-2014-04-27-180954.jpg
INFO: Added in: 56 images.
INFO: Max value before scaling is: 13419.0
INFO: Estimating background (slow).
INFO: Removing background.
INFO: Trimming edges.
INFO: Wrote mask: group10.jpg
Please threshold this manually using GIMP etc.
INFO: Processing took: 1 minutes 8.9 seconds.
$ gimp group10.jpg
$ python dustbust.py ../posscans/positive-2014-04-27-175937.jpg ../posscans/positive-2014-04-27-181041.jpg group10.jpg
dustbust.py - Very simple dust removal program for scanned positive images.
Applying a dust mask.
INFO: Create cleanedphotos folder.
INFO: Dilating mask.
INFO: Softening mask.
INFO: Trimming mask edges.
INFO: Processing: ../posscans/positive-2014-04-27-175937.jpg
INFO: Processing: ../posscans/positive-2014-04-27-180954.jpg
INFO: Processed 56 image(s).
INFO: Processing took: 0 minutes 50.9 seconds.
You will need a working C++ development environment and a Python environment with Numpy and PIL
installed to run it.
Go home ...