Extracting Data From Images Using OpenCV

9 June 2021

Extracting data from images using OpenCV

What is OpenCV?

OpenCV allows us to develop real-time applications on various platforms. One can detect faces or objects from images and videos. You can also perform operations like convert images into grayscale, apply thresholding, edge detection, remove vertical and horizontal lines, table detection using OpenCV library, and more.

Following are the features of OpenCV:

Read and write images.
Capture and save videos
Process images(filter, transform)
Perform feature detection
Detect specific objects such as the face, eyes, cars in the videos or images
Analyze the videos, i.e., estimate the motion in them, subtract the background, and track objects in them.

You can download OpenCV from here.

After extracting the library to a specific path, we import the library into our project as a reference.

You may also configure the same in visual studio and use it in C++ projects.

We cannot use C# directly in .Net.

For that, we have to use a wrapper.

There are various wrappers available on NuGet packages.

OpenCVSharp
EmguCV

Below is the screenshot of the EmguCV NuGet packages.

Create a new C# console application using visual studio

Go to -> References -> Manage Nuget Packages -> Add above nuget package
OpenCV allows you to perform various operations on images. To extract data from the picture, use the tesseract library. Download the Tesseract Library here
Install tesseract and find it from the path below depending on exe version 64 or 32 bit.
For, 64Bit -> C:\Program Files\Tesseract-ocr and 32Bit-> C:\Program Files (x86)\Tesseract-ocr

Pass this tesseract exe path to function while performing Image OCR

Sample Code,

using Emgu.CV;
using Emgu.CV.Structure;
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace OpenCVDemo
{
    class Program
    {
        public static void Main(string[] args)
        {
            GetDataFromImage();
        }

        private static void GetDataFromImage()
        {
            try
            {
                //Create image object and pass your original image
                Image<Bgr, byte> inputimg = new Image<Bgr, byte>(@"D:\ImageData\image1.jpg");
                //Convert into Black and White and save as differnt name 
                Image<Gray, Byte> result = new Image<Gray, byte>(inputimg.Bitmap);
                string imageBKPath = @"D:\ImageData\image1BK.jpg";
                result.Save(imageBKPath);
                //Do ocr of new image and save in text file as we pass the path 
                string tesseractExePath = @"C:\Program Files (x86)\Tesseract-ocr\tesseract.exe";
                string txtFilePath = @"D:\ImageData\readFile.txt";
                Process(imageBKPath, txtFilePath, tesseractExePath);
                //Print file data 
                using (StreamReader file = new StreamReader(txtFilePath))
                {
                    int counter = 0;
                    string ln;
                    while ((ln = file.ReadLine()) != null)
                    {
                        Console.WriteLine(ln);
                        counter++;
                    }
                }                
            }
            catch (Exception)
            {
                throw;
            }
        }

Tesseract command,

public static bool Process(string filePath, string ocrFilePath, string tesseractExePath)

{
            try
            {
                var tesseractPath = tesseractExePath;
                string arguments = @"  " + filePath + "  -l eng -psm 3 " + ocrFilePath.Replace(".jpg", "").Replace(".tif", "").Replace(".txt", "");                
                ExecuteCommandSync(tesseractPath, arguments);
                return true;
            }
            catch (Exception ex)
            {
                return false;
            }
        }

private static void ExecuteCommandSync(string programPath, object command)

        {
            try
            {
                //Trace_WriteLine("ProgramPath = " + programPath + ", Command = " + command.ToString());
                //System.Diagnostics.Debug.WriteLine("ProgramPath = " + programPath + ", Command = " + command.ToString());
                System.Diagnostics.ProcessStartInfo procStartInfo =
                new System.Diagnostics.ProcessStartInfo("cmd", "/c " + Path.GetFileName(programPath) + command);
                procStartInfo.RedirectStandardOutput = true;
                procStartInfo.RedirectStandardError = true;
                procStartInfo.UseShellExecute = false;
                procStartInfo.CreateNoWindow = true;
                procStartInfo.WorkingDirectory = Path.GetDirectoryName(programPath);
                System.Diagnostics.Process proc = new System.Diagnostics.Process();
                proc.StartInfo = procStartInfo;
                proc.Start();
                string result = proc.StandardOutput.ReadToEnd();
                proc.WaitForExit();
            }
            catch (Exception objException)
            {
                throw objException;
            }
        }
    }
}

Input Image,