Wednesday, September 12, 2012

Act 11. Life is not black and white: color segmentation.

My nieces are wriggly creatures... especially when we go out. One is liable to take off without sparing me a second glance, and one likes to hide behind bushes and yell "Boo!". My sister tends to dress them up in bright colors to make them stand out even in a crowd. Now, we'll try to let the machine find them for us.
Adorable nieces Bea (left) an Annika (right) (filename test.png)
To teach the machine what colors to look for, we crop out a representative sample, which I'm calling spot. For humans, it's easy to tell what colors are nearly the same, regardless of their shade. For the computer, colors are simply RGB values, so colors that are close together would have nearly the same RGB values.
sample cloth (spot.png)



Colors that are close together would appear as a connected area in the normalized chromaticity space(NCS). Since it's easier to interpret data that's in two dimensions, instead of explicitly stating the RGB values, the R is mapped on the x axis, the G on the y axis, and B is determined by the equation
B = 1 - R - G
To determine the areas that are connected in the NCS, we take a spread along the R axis, a separate spread along the G axis, and multiply these two spreads together to make a symmetrical blob. Then the test picture is evaluated if each pixel's color has a high enough value in the NCS. Here's the code that does that, using a Gaussian function as the probability spread.

SIP code

Extracted picture using a gaussian distribution

It was able to successfully extract the jackets, but it also detected the lips and part of the chin(presumably because some color from the jacket reflected off their skin, giving it a similar color). Adding the following lines to the code:
A little cleaning

The connected areas which were less than 300 pixels were ignored, giving us a cleaner image.
Cleaner picture

Histogram backprojection

Histogram backprojection works by taking the colors of your sample, and checking how often they appear. The test image is then evaluated - if a pixel color appears often in the sample, then it should be extracted, otherwise it is ignored.

I used the code given by Ma'am Jing to create a 2 dimensional histogram, then added the following lines at the end.


Line 20 basically says that as long as it appeared once in the sample, then it should be extracted in the test image. x and y map the red and green channels of the test image to the range 1-255 and bin it into integers. The for loops get the color of the RG channels, look it up in the hist, and place the hist value in a blank image.

It was somewhat able to get the rough area of the both of the jackets, but the extraction using the parametric method was much better. It can be noticed that Bea's jacket(from which spot was cropped), was detected much more densely than her sister's jacket. The detection can be improved by cropping a larger area next time.
Comparison

In terms of speed, the histogram backprojection should run faster, but it was actually slower in my code because of the indexing. In terms of segmentation success, backprojection's success would be very dependent on the representative sample you get. If a nearby shade does not appear in the sample, it will not be detected. On the other hand, overdetection was an initial problem using the method of segmentation by computing the probability. 

Histogram backprojection would work better if the selected representative cropped image is non-uniform or has two colors, since the parametric method just takes the average and basically restricts the colors to a symmetrical blob in the NCS.

Fun stuff to do

I'm looking forward to using a similar code to keep track of my nieces in a video!

-----
The code used for the 2D histogram was created by Ma'am Jing. I thank Rommel Bartolome for a discussion that reaffirmed my understanding of what to do.

I'm giving myself a grade of 9/10 for being able to do both methods, the -1 point would be because the histogram backprojection was not entirely successful.

**Figure 3 comes from the Activity 11 manual prepared by Dr. Maricor Soriano.

Monday, September 3, 2012

Act 10. Feeling like a robot? Morphological operations 3/3: Looping through images

Our objective for this week is to be able to perform a simple task - specifically tagging objects with irregular features - and automating it for multiple pictures.

The given simple task is to isolate the bigger circles from this picture. This is similar to the detection of possible cancer cells in a sample of normal cells.

Punched out circles with a few larger circles mixed in.
The first step to figuring out which ones are irregular is to create a standard for regular cells using this picture which has no "cancer cells".

Calibration picture

To make it easier to process, we first transform it to a black and white image. Looking at its histogram, it's easy to choose 0.85 to separate the blacks and whites.
(left) histogram             (right) black and white image
Using the same command as the ones used in the singing scilab activity,
[L, n] = bwlabel(image); 
and by imshow-ing L, we can visually see which dots are connected(they have the same exact color).
Original picture with each blob tagged with a different integer
Connected blobs present a problem, as the only difference between "cancer cells" and "normal cells" is their area. Looking at the histogram of blob sizes, 

(code for blob size)

Blob sizes histogram
A lot of particles have blob sizes of roughly 500, so we can hypothesize that the circles have radii of around 12 pixels.

Going back to the cancerous image, I eroded it with a circles of radius 10... and the cancerous cells were successfully identified!

(left) eroded image (right) tagged cells

Yeah, so I'd consider that a success. :D The advantage of the method I employed is that it would be able to identify the big circles even if they touch the smaller circles.

The only bad thing was, it took a couple of seconds of my laptop sounding like a fan to produce this result. The load on the machine can be reduced by processing smaller segments at a time.

I automated the cutting of the original image.

Then, looping through the second element of dir(filepath of pictures), the identification was done automatically. :)

I'd give myself a grade of 9 for not working entirely on the cut-up images.