Automated data acquisition from the medical paper records using Optical Character Recognition (OCR) for COVID-19 patients

H-E-Y P-E-O-P-L-E! Machines have taken-up reading and are now into copying word-by-word, character-to-character, using the language of artificial intelligence.

Documentation of a script - empowering thoughts, creative ideas, impactful inventions, progressive plans, or preventive measures - will treasure the knowledge for the present, and the future generations to come. For ages, these priceless pieces of information were recorded on perishable entities - earlier on rocks and papyrus and later on paper. Among many such scripts, Medical records encompass the finite details of the patient, his infection, the progression of the disease, treatment measures, and the success or failure of the treatment. These records help the doctors and researchers in understanding the reason (diagnosis) and predict an outcome (prognosis) of a disease. Most of the hospitals, both in rural & urban areas, do not have access to computer facilities and continue to maintain medical records manually.

Technologists and medical professionals at Somaiya Vidyavihar University are collaborating to secure COVID-19 case-files, by converting manual formats to digitized medical records. “The handwritten documents can lead to miscommunication and can prove to be fatal in case of medical records,” says Dr. Ninad Mehendale, the lead investigator of this initiative. His team of students, skilled in developing Machine Learning algorithms, are electronically reading and storing printed, semi-printed, and handwritten medical records through the optical character recognition (OCR) process. The documents include patient medical history, consultation reports, laboratory reports, and discharge or death reports.

OCR involves two significant steps: a pre-processing (clearing the background: grids, lines, and blurriness and increase sharpness) and the processing (understanding and re-scripting each character digitally). The team has been successful in digitizing printed and semi-printed formats with 97% and 71% accuracy respectively; however, they are facing a considerable challenge to train the machine to interpret the cursive writing of doctors. “We are up for the challenge,” says Vrusabh Gada, a third-year student from the Department of Electronics, K J Somaiya College of Engineering.

A library of digital medical records for COVID-19 will enable researchers and health agencies across the globe on developing a cure against this global pandemic. Through artificial intelligence, these incomprehensible words could be made comprehensible, stored, searched, copied, and referred – digitally.

Dr. Parvathi JR

Coordinator Research Promotion-SIRAC

Somaiya Vidyavihar University

Principal Investigator

Dr. Ninad Dileep Mehendale

ninad@somaiya.edu

Associate Professor, Department of Electronics

K J Somaiya School of Engineering

Co-Investigators

Student Team

Viraj Thakkar (Team Leader), Aditya Panchal & Aditya Vedpathak

Python dvelopment, Development ML Code. Verification and testing of Algorithim
Student Team

Vrushabh Gada (Student team Leader), Sanika Bagwe, Jugal Chauhan

Python Programmer
Gaurav Khatwani, Madhura Shegaonkar, Vartika Gupta

Assistant Python Programmer
Urvi Bheda

Deep Learning Expert
Durva Raikar, Purvi Harniya

Technical Writer
Vruddhi Shah

Project Manager

Back to Projects

Automated data acquisition from the medical paper records using Optical Character Recognition (OCR) for COVID-19 patients