Document detection using EmguCV
In this blog post I would like to go through a simple technique that could be used for detecting documents in an image. We will be using EmguCV for the image processing part. It is a .NET wrapper for OpenCV.
Prerequisities
.NET Framework
EmguCV (I am using 3.1 version)
Camera (You can use external camera or even develop on mobile devices using Xamarin) or static image with some document or paper
Intro
If you ever wondered how applications such as Office Lense or Scanbot work then you are in a good place to learn the basics!
There are multiple ways how one can approach document detection. In this blog post I will focus on document detection using a Canny Edge Detector which was developed by an Australian computer scientist called John F. Canny. Another possible approache is using Hough transform which is nicely explained in Dropbox's blog post Fast and Accurate Document Detection for Scanning, they also use Machine learning so if you are interested in Maching learning then give the Dropbox's blog post a try.
Overview
As I mentioned, we will use Canny Edge Detection to find a document's contours in an image. Canny Edge Detection extracts or "highlights" important structural information from objects in the image. In the following images you can see raw image and image with Canny Edge Detection applied. We want to achieve something similar for our documente detector.
There are 3 required steps for Canny Edge Detection to work correctly and smootly:
Convert image to Grayscale
Apply Gaussian Blur
Apply Canny algorithm
After applying Canny Edge detection on the image we will find the contour which is the most likely to be document's contour and highlight it.
Detection process
In the following steps I will show you how to implement the detection in C#
.
Converting image to Grayscale
Canny algorithm requires the input to be Grayscale so we have to start by converting our image to Grayscale. First of all we have to load our image from the file.
var image = new Image<Bgr, byte>("C:/Projects/DocumentDetection/document.jpg");
Now you have two possible ways how to convert Bgr
image into Gray
image. You can either use
var grayScaleImage = image.Convert<Gray, byte>();
or
using (var grayScaleImage = new UMat())
CvInvoke.CvtColor(image, convertedImage, typeof(Bgr), typeof(Gray));
So we will end up with something like
using (var image = new Image<Bgr, byte>("C:/Projects/DocumentDetection/document.jpg"))
var grayScaleImage = image.Convert<Gray, byte>();
which load and image from "C:/Projects/DocumentDetection/document.jpg"
and converts it into Grayscale
.
Applying GaussianBlur
After conversion into Grayscal
we have to apply Gaussian Blur so we smooth the image and remove any noise that would make the edge detection worse.
Similar to the color conversion, we have two ways of how we can blur the image. Either using Image.SmoothGaussian(int kernelWidth, int kernelHeight, double sigma1, double sigma2)
where kernelWidth
and kernelHeight
are the width and the height of the Gaussian kernel and sigma1
and sigma2
are its standard deviations. I found that Image.SmoothGaussian(5, 5, 0, 0)
are quite good values for learning purposes
using (var image = new Image<Bgr, byte>("C:/Projects/DocumentDetection/document.jpg"))
using (var grayScaleImage = image.Convert<Gray, byte>())
var blurredImage = grayScaleImage.SmoothGaussian(5, 5, 0, 0);
or we can use CvInvoke.GaussianBlur(IInputArray src, IOutputArray dst, Size ksize, double sigmaX)
with same values
using (var image = new Image<Bgr, byte>("C:/Projects/DocumentDetection/document.jpg"))
using (var grayScaleImage = image.Convert<Gray, byte>())
CvInvoke.GaussianBlur(grayScaleImage, grayScaleImage, new Size(5,5), 0);
Now we have Blurred
and Grayscale
image.
Applying Canny algorithm
Canny algorithm is the last algorithm, that we will use, that modifies the image. We will use CvInvoke.Canny(IInputArray image, IOutputArray edges, double threshold1, double threshold2)
. Thresholds are used for hysteresis procuders. You can read more about that at Feature Detection - Canny.
using (var image = new Image<Bgr, byte>("C:/Projects/DocumentDetection/document.jpg"))
using (var grayScaleImage = image.Convert<Gray, byte>())
using (var blurredImage = grayScaleImage.SmoothGaussian(5, 5, 0, 0))
using (var cannyImage = new UMat())
CvInvoke.Canny(blurredImage, cannyImage, 50, 150);
So now we have Canny
image in which we can look for contours.
Finding largest contours
Currently we have following code
using (var image = new Image<Bgr, byte>("C:/Projects/DocumentDetection/document.jpg"))
using (var grayScaleImage = image.Convert<Gray, byte>())
using (var blurredImage = grayScaleImage.SmoothGaussian(5, 5, 0, 0))
using (var cannyImage = new UMat())
CvInvoke.Canny(blurredImage, cannyImage, 50, 150);
In cannyImage
we have the original image after applying abovementioned algorithms. In this image we have to find contours and return only contours, which are probable to be document's contours. To find contours we will use CvInvoke.FindContours(IInputOutputArray image, IOutputArray contours, IOutputArray hierarchy, RetrType mode, ChainApproxMethod method)
.
Our code will look like this
using (var image = new Image<Bgr, byte>("C:/Projects/DocumentDetection/document.jpg"))
using (var grayScaleImage = image.Convert<Gray, byte>())
using (var blurredImage = grayScaleImage.SmoothGaussian(5, 5, 0, 0))
using (var cannyImage = new UMat())
{
CvInvoke.Canny(blurredImage, cannyImage, 50, 150);
using (var contours = new VectorOfVectorOfPoint())
CvInvoke.FindContours(cannyImage, contours, null, RetrType.Tree, ChainApproxMethod.ChainApproxSimple);
}
Now, in the contours
variable, we have all the contours found in the image. You can implement some kind of method that will select only contours which have minimally some area size and return top 5 with the largest areas. I will call the method RetrieveTopContours(...)
. This method depends on you, on your images.
Finding the most probable contour - document's contour
Now we have top 5 contours with largest areas. From these contours we have to select one contour which is the most likely to be the document's contour.
Following code will do this: 1. For each contour in contours (VectorOfPoint[]
) 2. Calculate contours' perimeter 3. Approximate a polygonal curve with the specified precision 4. If contour exists AND contour has 4 corners AND contour is convex then return contour
foreach (var contourVector in contours)
{
using (var contour = new VectorOfPoint())
{
var peri = CvInvoke.ArcLength(contourVector, true);
CvInvoke.ApproxPolyDP(contourVector, contour, 0.1 * peri, true);
if (contour != null && contour.ToArray().Length == 4 && CvInvoke.IsContourConvex(contour))
return contour;
}
}
Highlighting contour
You can draw the contour using CvInvoke.DrawContours or using platform specific APIs. I have used UIBezierPath
in iOS.