I need to automate some image transformations to do the following: - read in 16,000+ images that are short and wide, sizing is not the same. - rescale each image to 90 pixe
ImageMagick is a great approach. But should you want to perform some content analysis on the images, here is a solution with R. R does provide some pretty handy tools. Also, images are "nothing" but matrices, which R handles really well. By reducing the images to matrices, the package EBImage
does this very well and, for better or for worse, removes some of the metadata with each image. Here's a R solution with EBImage
. Again though, Mark's solution may be better for really big production runs.
The solution is structured around a large "for" loop. It would be prudent to add error checking at several steps. The code takes advantage of EBImage
to manage both color and grayscale images.
Here, the final image is centered in an extended image by adding pixels of the desired background color. The extended image is then cropped into tiles. The logic determining the value for pad
can be adjusted to simply crop the image or left justify or right justify it, if desired.
It starts by assuming you begin in the working directory with the source files in ./source
and the destination to be in ./dest
. It also creates a new directory for each "tiled" image. That could be changed to have a single directory receive all the images as well as other protective coding. Here, the images are assumed to be PNG files with an appropriate extension. The desired tile size (90) to be applied to both height and width is stored in the variable size
.
# EBImage needs to be available
if (!require(EBImage)) {
source("https://bioconductor.org/biocLite.R")
biocLite("EBImage")
library(EBImage)
}
# From the working directory, select image files
size <- 90
bg.col <- "transparent" # or any other color specification for R
ff <- list.files("source", full = TRUE,
pattern = "png$", ignore.case = TRUE)
# Walk through all files with a 'for' loop,
for (f in ff) {
# Extract base name, even names like "foo.bar.1.png"
txt <- unlist(strsplit(basename(f), ".", fixed = TRUE))
len <- length(txt)
base <- ifelse(len == 1, txt[1], paste(txt[-len], collapse = "."))
# Read one image and resize
img <- readImage(f)
img <- resize(img, h = size) # options allow for antialiasing
# Determine number tiles and padding needed
nx <- ceiling(dim(img)[1]/size)
newdm <- c(nx * size, size) # extend final image
pad <- newdm[1] - dim(img)[1] # pixels needed to extend
# Translate the image with given background fille
img <- translate(img, c(pad%/%2, 0), output.dim = newdm, bg.col = bg.col)
# Split image into appropriate sized tiles with 'untile'
img <- untile(img, c(nx, 1), lwd = 0) # see the help file
# Create a new directory for each image
dpath <- file.path("dest", trimws(base)) # Windows doesn't like " "
if (!dir.create(dpath))
stop("unable to create directory: ", dpath)
# Create new image file names for each frame
fn <- sprintf("%s_%03d.png", base, seq_len(nx))
fpaths <- file.path(dpath, fn)
# Save individual tiles (as PNG) and names of saved files
saved <- mapply(writeImage, x = getFrames(img, type = "render"),
files = fpaths)
# Check on the results from 'mapply'
print(saved)
}
I don't speak R, but I hope to be able to help with the ImageMagick aspects and getting 16,000 images processed.
As you are on a Mac, you can install 2 very useful packages very easily with homebrew, using:
brew install imagemagick
brew install parallel
So, your original sentence image is 1850x105 pixels, you can see that in Terminal like this:
magick identify sentence.png
sentence.png PNG 1850x105 1850x105+0+0 8-bit Gray 256c 51626B 0.000u 0:00.000
If you resize the height to 90px, leaving the width to follow proportionally, it will become 1586x90px:
magick sentence.png -resize x90 info:
sentence.png PNG 1586x90 1586x90+0+0 8-bit Gray 51626B 0.060u 0:00.006
So, if you resize and then crop into 90px wide chunks:
magick sentence.png -resize x90 -crop 90x chunk-%03d.png
you will get 18 chunks, each 90 px wide except the last, as follows:
-rw-r--r-- 1 mark staff 5648 6 Jun 08:07 chunk-000.png
-rw-r--r-- 1 mark staff 5319 6 Jun 08:07 chunk-001.png
-rw-r--r-- 1 mark staff 5870 6 Jun 08:07 chunk-002.png
-rw-r--r-- 1 mark staff 6164 6 Jun 08:07 chunk-003.png
-rw-r--r-- 1 mark staff 5001 6 Jun 08:07 chunk-004.png
-rw-r--r-- 1 mark staff 6420 6 Jun 08:07 chunk-005.png
-rw-r--r-- 1 mark staff 4726 6 Jun 08:07 chunk-006.png
-rw-r--r-- 1 mark staff 5559 6 Jun 08:07 chunk-007.png
-rw-r--r-- 1 mark staff 5053 6 Jun 08:07 chunk-008.png
-rw-r--r-- 1 mark staff 4413 6 Jun 08:07 chunk-009.png
-rw-r--r-- 1 mark staff 5960 6 Jun 08:07 chunk-010.png
-rw-r--r-- 1 mark staff 5392 6 Jun 08:07 chunk-011.png
-rw-r--r-- 1 mark staff 4280 6 Jun 08:07 chunk-012.png
-rw-r--r-- 1 mark staff 5681 6 Jun 08:07 chunk-013.png
-rw-r--r-- 1 mark staff 5395 6 Jun 08:07 chunk-014.png
-rw-r--r-- 1 mark staff 5065 6 Jun 08:07 chunk-015.png
-rw-r--r-- 1 mark staff 6322 6 Jun 08:07 chunk-016.png
-rw-r--r-- 1 mark staff 4848 6 Jun 08:07 chunk-017.png
Now, if you have 16,000 sentences to process, you can use GNU Parallel to get them all done in parallel and also get sensible names for all the files. Let's do a dry-run first so it actually doesn't do anything, but just shows you what it would do:
parallel --dry-run magick {} -resize x90 -crop 90x {.}-%03d.png ::: sentence*
Sample Output
magick sentence1.png -resize x90 -crop 90x sentence1-%03d.png
magick sentence2.png -resize x90 -crop 90x sentence2-%03d.png
magick sentence3.png -resize x90 -crop 90x sentence3-%03d.png
That looks good, so remove the --dry-run
and do it again and you get the following output for the three (identical copies) of your sentence I made:
-rw-r--r-- 1 mark staff 5648 6 Jun 08:13 sentence1-000.png
-rw-r--r-- 1 mark staff 5319 6 Jun 08:13 sentence1-001.png
-rw-r--r-- 1 mark staff 5870 6 Jun 08:13 sentence1-002.png
-rw-r--r-- 1 mark staff 6164 6 Jun 08:13 sentence1-003.png
-rw-r--r-- 1 mark staff 5001 6 Jun 08:13 sentence1-004.png
-rw-r--r-- 1 mark staff 6420 6 Jun 08:13 sentence1-005.png
-rw-r--r-- 1 mark staff 4726 6 Jun 08:13 sentence1-006.png
-rw-r--r-- 1 mark staff 5559 6 Jun 08:13 sentence1-007.png
-rw-r--r-- 1 mark staff 5053 6 Jun 08:13 sentence1-008.png
-rw-r--r-- 1 mark staff 4413 6 Jun 08:13 sentence1-009.png
-rw-r--r-- 1 mark staff 5960 6 Jun 08:13 sentence1-010.png
-rw-r--r-- 1 mark staff 5392 6 Jun 08:13 sentence1-011.png
-rw-r--r-- 1 mark staff 4280 6 Jun 08:13 sentence1-012.png
-rw-r--r-- 1 mark staff 5681 6 Jun 08:13 sentence1-013.png
-rw-r--r-- 1 mark staff 5395 6 Jun 08:13 sentence1-014.png
-rw-r--r-- 1 mark staff 5065 6 Jun 08:13 sentence1-015.png
-rw-r--r-- 1 mark staff 6322 6 Jun 08:13 sentence1-016.png
-rw-r--r-- 1 mark staff 4848 6 Jun 08:13 sentence1-017.png
-rw-r--r-- 1 mark staff 5648 6 Jun 08:13 sentence2-000.png
-rw-r--r-- 1 mark staff 5319 6 Jun 08:13 sentence2-001.png
-rw-r--r-- 1 mark staff 5870 6 Jun 08:13 sentence2-002.png
-rw-r--r-- 1 mark staff 6164 6 Jun 08:13 sentence2-003.png
-rw-r--r-- 1 mark staff 5001 6 Jun 08:13 sentence2-004.png
-rw-r--r-- 1 mark staff 6420 6 Jun 08:13 sentence2-005.png
-rw-r--r-- 1 mark staff 4726 6 Jun 08:13 sentence2-006.png
-rw-r--r-- 1 mark staff 5559 6 Jun 08:13 sentence2-007.png
-rw-r--r-- 1 mark staff 5053 6 Jun 08:13 sentence2-008.png
-rw-r--r-- 1 mark staff 4413 6 Jun 08:13 sentence2-009.png
-rw-r--r-- 1 mark staff 5960 6 Jun 08:13 sentence2-010.png
-rw-r--r-- 1 mark staff 5392 6 Jun 08:13 sentence2-011.png
-rw-r--r-- 1 mark staff 4280 6 Jun 08:13 sentence2-012.png
-rw-r--r-- 1 mark staff 5681 6 Jun 08:13 sentence2-013.png
-rw-r--r-- 1 mark staff 5395 6 Jun 08:13 sentence2-014.png
-rw-r--r-- 1 mark staff 5065 6 Jun 08:13 sentence2-015.png
-rw-r--r-- 1 mark staff 6322 6 Jun 08:13 sentence2-016.png
-rw-r--r-- 1 mark staff 4848 6 Jun 08:13 sentence2-017.png
-rw-r--r-- 1 mark staff 5648 6 Jun 08:13 sentence3-000.png
-rw-r--r-- 1 mark staff 5319 6 Jun 08:13 sentence3-001.png
-rw-r--r-- 1 mark staff 5870 6 Jun 08:13 sentence3-002.png
-rw-r--r-- 1 mark staff 6164 6 Jun 08:13 sentence3-003.png
-rw-r--r-- 1 mark staff 5001 6 Jun 08:13 sentence3-004.png
-rw-r--r-- 1 mark staff 6420 6 Jun 08:13 sentence3-005.png
-rw-r--r-- 1 mark staff 4726 6 Jun 08:13 sentence3-006.png
-rw-r--r-- 1 mark staff 5559 6 Jun 08:13 sentence3-007.png
-rw-r--r-- 1 mark staff 5053 6 Jun 08:13 sentence3-008.png
-rw-r--r-- 1 mark staff 4413 6 Jun 08:13 sentence3-009.png
-rw-r--r-- 1 mark staff 5960 6 Jun 08:13 sentence3-010.png
-rw-r--r-- 1 mark staff 5392 6 Jun 08:13 sentence3-011.png
-rw-r--r-- 1 mark staff 4280 6 Jun 08:13 sentence3-012.png
-rw-r--r-- 1 mark staff 5681 6 Jun 08:13 sentence3-013.png
-rw-r--r-- 1 mark staff 5395 6 Jun 08:13 sentence3-014.png
-rw-r--r-- 1 mark staff 5065 6 Jun 08:13 sentence3-015.png
-rw-r--r-- 1 mark staff 6322 6 Jun 08:13 sentence3-016.png
-rw-r--r-- 1 mark staff 4848 6 Jun 08:13 sentence3-017.png
A word of explanation about the parameters to parallel
:
{}
refers to "the current file"{.}
refers to "the current file without its extension":::
separates the parameters meant for parallel
from those meant for your magick
commandOne note of warning, PNG images can "remember" where they came from which can be useful, or very annoying. If you look at the last chunk from above you will see it is 56x90, but that following that, it "remembers" it came from a canvas 1586x90 at offset 1530,0:
identify sentence3-017.png
sentence3-017.png PNG 56x90 1586x90+1530+0 8-bit Gray 256c 4848B 0.000u 0:00.000
This can sometimes upset subsequent processing which is annoying, or sometimes be very useful in re-assembling images that have been chopped up! If you want to remove it, you need to repage, so the command above becomes:
magick input.png -resize x90 -crop 90x +repage output.png