Python Quick Tip #4: Batching Over Files in a Directory

TL;DR

Here is the full script to load all PNGs in a given folder, convert them to 8bit grayscale TIFF, then save them to a different folder:

import os
from skimage import io
from skimage.util import img_as_ubyte

# You need to change these to valid directories on your computer
input_dir = os.path.dirname('C:/FILES/leaves/')
output_dir = os.path.dirname('C:/FILES/leaves converted/')

for f in os.listdir(input_dir):
    if f.lower().endswith('.png') is True:
        image_gray = io.imread(os.path.join(input_dir, f),
                               as_gray=True)
        image_gray = img_as_ubyte(image_gray)
        output_file = f.replace('.png', '_gray.tiff')
        io.imsave(os.path.join(output_dir, output_file), image_gray)

You can also access this script on our GitHub.


Line by Line

Let's import the functionality we’ll need:

import os
from skimage import io
from skimage.util import img_as_ubyte

Let’s set the paths to the input and output file folders we’re using as variables for easy access. My input files are in a folder at C:/FILES/leaves/, and I have an empty folder at C:/FILES/leaves converted/ for the output. The os.path submodule helps ensure that code to work with paths will work across different operating systems, where different characters may be used to separate path descriptors. Here we use dirname to establish input and output directories as variables:

# You need to change these to valid directories on your computer
input_dir = os.path.dirname('C:/FILES/leaves/')
output_dir = os.path.dirname('C:/FILES/leaves converted/')

We need to loop over all the files in our input directory to perform our operation. The os module has a nifty function listdir, which returns a list of all files and subdirectories within the given directory. Let’s do this now and refer to all elements in that list as f:

for f in os.listdir(input_dir):

In my case, I want to limit performing this operation on only the PNGs in my input directory. This if function limits what code is executed to only PNG files:

    if f.lower().endswith('.png') is True:

The two string methods used on each file name f do the following:

  • lower converts f to all lower case characters so that we don’t have to explicitly check for both "PNG" and "png"

  • endswith searches for a substring (e.g. ".png") at the end of a string

Next, we open each PNG image in the directory using scikit-image’s imread with the as_gray parameter set to True, then ensure the array that is returned is scaled and converted to 8 bit using img_as_ubyte from the same package. This was covered in Python Quick Tip 1.

        image_gray = io.imread(os.path.join(input_dir, f),
                               as_gray=True)
        image_gray = img_as_ubyte(image_gray)

What’s new here is the join function from the os.path submodule. We provide the path to a directory and a file name as arguments to join and it will concatenate them “intelligently”.

Next is a trick I use all the time when doing file conversions. Before saving, let’s build an output file name that depends on the input file name. The replace method will insert a new substring in the place of a substring you provide. Since this code only operates on PNGs, we can simply replace that extension with a descriptor, plus the extension that we want to save to:

        output_file = f.replace('.png', '_gray.tiff')

Finally, we save the converted image as a TIFF in our output directory, using os.path.join just as we did to open the original image:

        io.imsave(os.path.join(output_dir, output_file), image_gray)

Try adapting this code to run for images of your own and let us know how it goes on the Aivia Forum!


Additional Thoughts

The listdir function is not the only way to get lists of files, of course. For example, os.walk is much more efficient for searching through subdirectories, though you may find unpacking what it returns to be a little more confusing than listdir. See more here.


What if you need to exclude certain files from your search depending on more than just the file extension? Consider the case where you acquire a tiled Z-stack, and for every tile you also save a max intensity projection. You want to perform some pre-processing only on the 3D images. If you use only TIFF, your directory may look something like this:

MySample_3D_Tile000.tiff
MySample_MIP_Tile000.tiff
MySample_3D_Tile001.tiff
MySample_MIP_Tile001.tiff
MySample_3D_Tile002.tiff
MySample_MIP_Tile002.tiff
etc…

Pre-processing this mixture of 3D and 2D files with the same algorithm doesn’t really make sense. You could add some logic that checks for the shape of each array after it loads and skips 2D images, but this means that your script wastes time loading those 2D images into memory. For hundreds (or thousands) of those operations this could be a significant waste of processing time. It makes more sense to only load 3D files. Enter glob. This builds a list of files in a directory that match a given string pattern, and it can even be recursive.


The extensive use of the os package when handling file paths (os.path.dirname, os.listdir, os.path.join, etc.) confused me when I first learned Python. Using simple strings for things like full file paths just feels simpler. But if you get in the habit now of treating your file paths using these functions, your code is less dependent on your OS, easier to share with colleagues, and more likely to run in the future if you change your development environment.


In the Context of Aivia

It doesn't make sense to use this trick in the context of Aivia directly since Aivia handles file IO using its own mechanisms to link Python to channels and time steps. We do have a utility script, however, that loops over a directory of DICOM files and combines them into a single 3D TIFF:

file_list = [f for f in os.listdir(dicom_directory) if 'dcm' in f.lower() or 'dicom' in f.lower()]

This list comprehension is a prime example of when "Pythonic" goes a bit far. Let's decode it a bit:

  1. os.listdir(dicom_directory) is the expression and returns a list of all files and folders inside the given directory

  2. f for f in is the loop that assigns each item from the returned list to variable f

  3. if 'dcm' in f.lower() is a conditional that runs for each iteration of the loop and only allows items to be assigned to the variable if it's true

  4. if 'dicom' in f.lower() is another conditional that runs for each iteration of the loop and only allows items to be assigned to the variable if it's true

  5. or is a Boolean that combines our two expressions and only returns true if both conditions are true for the same loop

  6. All of this is wrapped in brackets [ ] to return a list to the variable file_list