Can small pixel CMOS image sensors be useful in Machine Vision?

Posted by Gretchen Alper on Wed, Oct 5, 2011

Why is it that cameras for consumer electronic products, e.g. smart phones have more than 5 megapixel tiny cameras that cost next to nothing, are not used for machine vision?  

The larger pixel image sensors (greater than 5.5 um) can allow for the best accuracy (i.e. Full Well and Read Noise), but they also result in the highest costs due to large sensor sizes (silicon real estate consumed) and additionally expensive optics. Larger pixels are still used in the industrial and scientific market, but the trend in other markets has been towards much smaller pixel sizes.  This is especially so with CMOS image sensors.  These new image sensors are enabling better cameras for our smart phones and web cameras, with pixel sizes down to 1.4 um and extremely low cost.  For machine vision, CMOS sensors with smaller pixels (even 2 to 3 um) may not be acceptable, especially with high-end inspection applications, such as semiconductor inspection, Flat Panel Display inspection, or electronic metrology applications. Smaller pixel image sensors should reduce the cost of the camera because of the camera size, or more pixels inside the same camera and optics, leading to higher resolution.

These benefits are all appealing for machine vision too so what is given up?  Our conclusion, based on a thorough analysis is that with pixels less than 4.5 um, is that too much functionality and performance is sacrificed for a lot of machine vision applications.  Here are some more details to help you decide for yourself.

On Function Level – give up global shutter, slower frame rates, and get more noise

CMOS pixels are complex, containing a lot of transistors that enable sophisticated functionality, such as global shutter, integrating and reading out at the same time, and correlated double sampling.  For example, a high-performance CMOS global shutter sensor with pixel size 5.5 um x 5.5 um can have up to 8 transistors per pixel in a 180 nm fab. In order to make the pixels smaller, either fewer transistors can be used which translates to having rolling shutter, slower frame speeds, and more noise, or a smaller technology node fab can be used.

Intermezzo: Where are image sensors fabricated?

Right now the most advanced CMOS image sensors are still being fabricated at 180 nm fabs which compared to microchips, is old technology. Because of the fab’s design rules only a limited number of transistors can be packed into the pixel design. Despite the minefield of pixel-patents where the image sensor companies are competing, they all use the same fabs (most use Tower-Jazz in Israel) unless they are big enough to have their own fab (like Sony and Kodak). These fab design rules limit the image sensor designers. They would love to switch to fabs with smaller technology nodes such as 130nm or 90nm in order to design “better” pixels. The lack of volume in the machine vision market is preventing this.

On the Performance Level – reduced full well capacity, lower MTF, electrical and optical crosstalk, and poor color reproduction

On a performance level, the sensitivity of smaller pixels is still acceptable.  Because the pixel area is smaller, there are fewer electrons per pixel, but the read noise also goes down so the signal to noise ratio is still maintained. In other words, the sensitivity of the image sensor does not scale down with the pixel size.

The real sacrifice is with full well capacity.  In smaller pixels, there is much less space to store charge.  This is not a big issue with viewing applications such as web cams, cell phones, and point and shoot cameras.  Often with machine vision however, frames are frozen for analysis or measurement.  Low full well can result in a noisier image which would decrease the accuracy of the measurement directly. Full Well capacity scales with the area of the pixel and goes down quadratically.

Typically sensors use micro-lenses to focus the light onto the active part of the pixel (the part without transistors).  With a 2 to 3 um pixel, only about 1 um is sensitive.  This is also referred to as the “pixel straw”.  It is very difficult to make micro-lenses (elevated above the active region) that are good enough.  The quantum efficiency, depends on the angle of the light.  If the light comes directly in, the micro-lenses will function ok.  If the light is at an angle, most micro-lenses are not as effective (Figure 1).  

Pixel straw

Figure 1.  The “Pixel Straw” Source:  Matt Whitcombe, Omni Vision, March 26, 2009

The light coming in can go into the wrong pixel or into the wrong photo diode known as optical crosstalk and electrical cross talk.  This will result in worse modulation transfer function, or MTF, and therefore a less sharp image.

optical crosstalk bayer patterned image sensor

Figure 2.  Optical Crosstalk on a Bayer Patterned Sensor

Source:  DxO Labs presentation at Image Sensors Europe 2011

Crosstalk also ruins color reproduction as say some red light goes into a green pixel (shown in Figure 2).  This can be seen when analyzing an image from commonly used cell phone camera (shown in Figure 3).  Once you look at the whole sensor and using optics, then pixels in the middle will get better illumination.  This results in shading on the sides.  This is hard to correct for and especially difficult for color reproduction.  This effect is also present in Figure 3 as the white paper does not appear white and there is shading at the edges.

Color image smart phone camera small pixels

Figure 3.  Color Image from a Cell Phone Camera

Source:  DxO Labs presentation at Image Sensors Europe 2011

Again, the lower MTF and poor color reproduction may reduce the details in the image beyond what is required for demanding inspection and metrology applications.

Back-side illumination (BSI) technology and very sophisticated and costly process development could be used to eliminate the electrical and optical crosstalk.  This is only available in the very large consumer market or the very expensive scientific market.  This will come to the machine vision market eventually, but for now it is not an option and still the domain of researchers.

So to summarize, small pixels (2-3 um) offer the following pros and cons (based on a comparison to larger pixels in the same process):


  • Lower costs through smaller sensors, optics, and cameras
  • Increased resolution with same sensor size and optics
  • Sensitivity in dim light is acceptable
  • Color reproduction is good enough for viewing applications



  • Give up global shutter (get rolling shutter)
  • Slower frame rates
  • More noise
  • Lower full well capacity
  • Lower MTF
  • Optical/Electrical crosstalk
  • Poor color reproduction

In conclusion: There is a trend towards using smaller pixels for machine vision applications leveraging technology developed for the high volume consumer camera applications. However the requirements for machine vision are very different and it will take some time before the image sensor pixels have high enough performance to make it into the machine vision cameras.  To quote Vladimir Koifman from the Image Sensors World blog, “All in all, it's a matter of investment. There is a huge investment in small pixel development vs relatively low investment in large pixels.”

Topics: Sensor Technology, CCD vs. CMOS

Previous blog:

Next blog: