Image & Signal Processing

Automated data acquisition from the medical paper records using Optical Character Recognition (OCR) for COVID-19 patients

  • From September 2020

  • Project Status: On-Going

H-E-Y P-E-O-P-L-E! Machines have taken-up reading and are now into copying word-by-word, character-to-character, using the language of artificial intelligence.

Documentation of a script - empowering thoughts, creative ideas, impactful inventions, progressive plans, or preventive measures - will treasure the knowledge for the present, and the future generations to come. For ages, these priceless pieces of information were recorded on perishable entities - earlier on rocks and papyrus and later on paper. Among many such scripts, Medical records encompass the finite details of the patient, his infection, the progression of the disease, treatment measures, and the success or failure of the treatment. These records help the doctors and researchers in understanding the reason (diagnosis) and predict an outcome (prognosis) of a disease. Most of the hospitals, both in rural & urban areas, do not have access to computer facilities and continue to maintain medical records manually.

Technologists and medical professionals at Somaiya Vidyavihar University are collaborating to secure COVID-19 case-files, by converting manual formats to digitized medical records. “The handwritten documents can lead to miscommunication and can prove to be fatal in case of medical records,” says Dr. Ninad Mehendale, the lead investigator of this initiative. His team of students, skilled in developing Machine Learning algorithms, are electronically reading and storing printed, semi-printed, and handwritten medical records through the optical character recognition (OCR) process. The documents include patient medical history, consultation reports, laboratory reports, and discharge or death reports.

OCR involves two significant steps: a pre-processing (clearing the background: grids, lines, and blurriness and increase sharpness) and the processing (understanding and re-scripting each character digitally). The team has been successful in digitizing printed and semi-printed formats with 97% and 71% accuracy respectively; however, they are facing a considerable challenge to train the machine to interpret the cursive writing of doctors. “We are up for the challenge,” says Vrusabh Gada, a third-year student from the Department of Electronics, K J Somaiya College of Engineering.


A library of digital medical records for COVID-19 will enable researchers and health agencies across the globe on developing a cure against this global pandemic. Through artificial intelligence, these incomprehensible words could be made comprehensible, stored, searched, copied, and referred – digitally.


Coordinator Research Promotion-SIRAC

Somaiya Vidyavihar University

Principal Investigator


  • Student Team

    Viraj Thakkar (Team Leader), Aditya Panchal & Aditya Vedpathak

    Python dvelopment, Development ML Code. Verification and testing of Algorithim

  • Student Team

    Vrushabh Gada (Student team Leader), Sanika Bagwe, Jugal Chauhan

    Python Programmer

  • Gaurav Khatwani, Madhura Shegaonkar, Vartika Gupta

    Assistant Python Programmer

  • Urvi Bheda

    Deep Learning Expert

  • Durva Raikar, Purvi Harniya

    Technical Writer

  • Vruddhi Shah

    Project Manager

What to read next

Nutrition, Health and Medicine
Humanities & Social Science
Humanities & Social Science
Humanities & Social Science