Python Quick Tip #3: Thresholding with NumPy

Updated: Jun 16

TL;DR

Here is the full script to load an image, binarize it with a threshold, then save a copy of the binary image:

import numpy as np
from skimage import io
from skimage.util import img_as_ubyte

# You need to change this to a valid image file on your computer
input_image = 'C:/FILES/leaf.tif'
# Set this to a valid threshold for your image
threshold = 165

gray_image = io.imread(input_image, as_gray=True)
gray_image = img_as_ubyte(gray_image)

binary_where = np.where(gray_image > threshold, 255, 0)
binary_where = img_as_ubyte(binary_where)

io.imshow(binary_where); io.show()

io.imsave('C:/FILES/leaf_thresholded.tif', binary_where)

You can also access this script on GitHub.


Line by Line

Import the necessary functionality first:

import numpy as np
from skimage import io
from skimage.util import img_as_ubyte

Set a variable to a valid image file on your computer:

# You need to change this to a valid image file on your computer
input_image = 'C:/FILES/leaf.tif'

Set a variable for the threshold you want to use. In this script we are only using it once, but if you want to continue with some of the exercises in the Additional Thoughts section it will come in handy to have this reference to it.

# Set this to a valid threshold for your image
threshold = 165

Load the image as grayscale. See our first Python Quick Tip for a description of this.

gray_image = io.imread(input_image, as_gray=True)
gray_image = img_as_ubyte(gray_image)

Here is the main functionality of our script. NumPy has a powerful function where() that provides a highly readable syntax (in my opinion) for masking an array. This function accepts one required and two optional arguments:

  • The first argument is the condition you wish to satisfy. In this case, we wish to mask all pixels within our image that are above our chosen threshold.

  • x (optional) is the value you wish to insert into the array everywhere the condition is TRUE

  • y (optional, but required if x is provided) is the value you wish to insert into the array everywhere the condition is FALSE

binary_where = np.where(gray_image > threshold, 255, 0)

If you double-check the type of the binary image using binary_where.dtype, you’ll see that NumPy returns an int32 image. This is a 32 bit integer, which is unnecessary for storing only two values! For most of our purposes an 8 bit image will work better, so let’s convert it:

binary_where = img_as_ubyte(binary_where)

Show the result to make sure it looks like what you expect:

io.imshow(binary_where); io.show()

Finally, save the result to disk as we discussed in Python Quick Tip 1:

io.imsave('C:/FILES/leaf_thresholded.tif', binary_where)

Additional Thoughts

Note that if you do not provide the x and y arguments for where(), it returns an array indicating for which indices the condition is true. This can be confusing the first time you see it! Consider an example array:

my_arr = np.array([1, 2, 3, 4, 5, 6])
np.where(my_arr > 3)

This will return a new array:

(array([3, 4, 5], dtype=int64),)

This doesn’t mean that Python thinks that 3, 4, 5 are greater than three! This means that the 3rd, 4th, and 5th indices of my_arr are greater than three. In fact, if you save the result of where() to a variable, you can use that to extract the values from the original array satisfying the condition:

mask = np.where(my_arr > 3)
my_arr[mask]

This will give you:

array([4, 5, 6])

The syntax of where() is probably easily readable to you if you have experience with Excel’s IF function because they are practically the same!

  • Excel: =IF(condition, return this if true, return this if false)

  • NumPy: numpy.where(condition, return this if true, return this if false)

  • Plain English: “test this condition and give me this value if it’s true, but otherwise give me that other value”

You may have also seen other ways to threshold arrays in Python using simple masking:

binary_image = gray_image > threshold

which returns a boolean array, or:

binary_image = (gray_image > threshold) * value

which returns an array where everywhere the boolean is true, value is inserted. These are perfectly valid ways to threshold images, too! Depending on what you find readable, you may even prefer one of these methods. You can see a script here that will plot these methods versus the where() function described above to show that the output is the same. The only difference is that the boolean method has a different type:

import matplotlib.pyplot as plt
from skimage import io
from skimage.util import img_as_ubyte

# You need to change this to a valid image file on your computer
input_image = 'C:/FILES/leaf.tif'
# Set this to a valid threshold for your image
threshold = 165

# Read the image
gray_image = io.imread(input_image, as_gray=True)
gray_image = img_as_ubyte(gray_image)

# Thresholding is as easy as performing a comparison on an array
binary_bool = gray_image > threshold

# To get int output instead of bool, multiply by a mask value
binary_multiple = (gray_image > threshold) * 255

# np.where produces an equivalent result, but is slightly faster
binary_where = np.where(gray_image > threshold, 255, 0)


# Show all the images to see that they are essentially the same!
fig, ax = plt.subplots(3, 1)

ax[0].imshow(binary_bool)
ax[0].set_title(f"Arr>t  ->  {binary_bool.dtype}")

ax[1].imshow(binary_multiple)
ax[1].set_title(f"(Arr>t)*255  ->  {binary_multiple.dtype}")

ax[2].imshow(binary_where)
ax[2].set_title(f"np.where(Arr>t,255,0)  ->  {binary_where.dtype}")

for a in ax:
    a.axis('off')
    
plt.show()

You may also wonder about which of these methods has faster performance. It turns out that the where() function is faster, but is negligible for smaller images. For a 1920 x 1080 grayscale image, my machine averaged 0.00784 seconds to compute the mask using where(), and 0.00912 seconds to compute without NumPy:

For an image with dimensions (1080, 1920); size 2073600:
np.where(array > threshold, val, 0) averages 0.00784 s to run
(array * threshold) * val averages 0.00912 s to run

This approximately 15% speedup doesn't seem significant in this context, but might add up for larger images. See the difference instead with a 1043 x 458 x 785 3D image:

For an image with dimensions (1043, 458, 785); size 374989790:
np.where(array > threshold, val, 0) averages 1.51479 s to run
(array * threshold) * val averages 1.88872 s to run

I used the following script for this timing comparison. Give it a try yourself!

from skimage import io
from skimage.util import img_as_ubyte
import numpy as np
from timeit import timeit

# You need to change this to a valid image file on your computer
input_image = 'C:/FILES/leaf.tif'
# Set this to a valid threshold for your image
threshold = 165

# Function for the (Arr>t)*val method
def mask_to_bool(array, threshold, val):
    return (array > threshold) * val

# Function for the np.where(Arr>t,val,0) method
def mask_w_where(array, threshold, val):
    return np.where(array>threshold, val, 0)

# If you do this on your own, point to valid file paths!
gray_image = io.imread(input_image, as_gray=True)

# Time both methods
t_where = timeit('mask_w_where(gray_image, 150, 255)',
                 'from __main__ import mask_w_where, gray_image, np', number=10) / 10.0
t_bool = timeit('mask_to_bool(gray_image, 150, 255)',
                'from __main__ import mask_to_bool, gray_image', number=10) / 10.0

# Print results
print(f"\nFor an image with dimensions {gray_image.shape}; size {gray_image.size}:")
print(f"np.where(array > threshold, val, 0) averages {round(t_where, 5)} s to run")
print(f"(array * threshold) * val averages {round(t_bool, 5)} s to run")


In the Context of Aivia

Our Skeletonize recipe uses where() to threshold Aivia channels before performing skeletonization. See the following line in the recipe from the parameter declaration section (just before the run() function):

# [INPUT Name:threshold Type:int DisplayName:'Threshold' Default:128 Min:0 Max:65535]

This tells Aivia to prompt the user to input the grayscale threshold they want to use:

The main program calls the run() function when the script is executed, and Aivia automatically inserts the user's chosen threshold into a dictionary called params. Since the declared name of this threshold is simply 'threshold' in the line above, the script can grab this variable:

threshold=int(params['threshold'])

And then use it with where():

temp_array=np.where(image_data>threshold, 1, 0)