OCR-Based Safety Check System of Packaged Food for Food Inconvenience Patients
DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | 신옥근 | - |
dc.contributor.author | NURUL AZZAHRA PUTRI BINTI KAMIS | - |
dc.date.accessioned | 2020-07-22T04:17:48Z | - |
dc.date.available | 2020-07-22T04:17:48Z | - |
dc.date.issued | 2020 | - |
dc.identifier.uri | http://repository.kmou.ac.kr/handle/2014.oak/12342 | - |
dc.identifier.uri | http://kmou.dcollection.net/common/orgView/200000283919 | - |
dc.description.abstract | These days, OCR and digital image processing technology are rapidly developing and there are many application areas in research and industry. This thesis presents a method to produce a better and reliable recognition by manipulating the output of OCR process in domain specific word recognition tasks. The output of OCR is improved by two post-processing steps: the tokenization and the extraction of correct word using dictionaries. The tokenization is a process where texts retrieved by OCR are seperated into word tokens. Then the tokens are compared with english and proprietary dictionaries in sequence. English dictionary is used to convert the word tokens into correct words candidates, while proprietary dictionary is used as a guide to select only meaningful words in the domain specific task. The practicality of the proposed approach was demonstrated in the task of text recognition of the ingredients list printed on the cover of the packaged foods. Based on the uploaded image of packaged food, the system performs OCR to get the editable texts. The editable texts are then tokenized into word tokens before the post-processing steps. Word tokens are then converted into correct words by the processes implicates the use of dictionaries.The result of these combined approaches on the system are reliable as it gives an accurate result of the ingredients without useless characters and nonessential ingredients. | - |
dc.description.tableofcontents | Abstract iv List of Abbreviations vi List of Tables vii List of Figures viii Chapter I: Introduction 1.1 Background of Research 1 1.2 Research Objectives 2 Chapter II: Literature Review 2.1 Review of Research Topics 4 2.1.1 Review of Optical Character Recognition (OCR) 4 2.1.2 Review of The Tesseract OCR Engine 6 2.1.3 Review of OCR Post-Processing 8 2.1.4 Review of Food Intolerance, Allergies and Auto-immune Diseases 9 2.2 Review of Related Work 10 2.2.1 Eatable 10 2.2.2 Food Allergy Scanner 12 Chapter III: System Design 3.1 Overall Architecture of the System 14 3.2 Get Image of Product’s Ingredients 16 3.2.1 Client-Server Architecture 16 3.3 Perform OCR 17 3.3.1 OCR and Its Pre-Processing 18 3.4 Post-Processing of OCR 20 3.4.1 Tokenization 21 3.4.2 Extract Correct Ingredients Using Dictionaries 23 3.5 Search Harmful Ingredients for The User using Database 25 3.6 Notify Result to the user 27 Chapter IV: System Implementation 4.1 Overall Explanation 28 4.1.1 Pre-Processing of OCR 31 4.1.2 Post-Processing of OCR 32 4.2 Database of the System 36 4.3 System Prototype using Android Studio 39 Chapter V: Conclusion 42 References 44 Acknowledgement 50 | - |
dc.format.extent | 61 | - |
dc.language | eng | - |
dc.publisher | 한국해양대학교 대학원 | - |
dc.rights | 한국해양대학교 논문은 저작권에 의해 보호받습니다. | - |
dc.title | OCR-Based Safety Check System of Packaged Food for Food Inconvenience Patients | - |
dc.type | Dissertation | - |
dc.date.awarded | 2020. 2 | - |
dc.contributor.department | 대학원 컴퓨터공학과 | - |
dc.description.degree | Master | - |
dc.identifier.bibliographicCitation | NURUL AZZAHRA PUTRI BINTI KAMIS. (2020). OCR-Based Safety Check System of Packaged Food for Food Inconvenience Patients. | - |
dc.subject.keyword | OCR, Post-Processing, Proprietary Dictionary, Tokenization, Tesseract, Food allergy, packaged food, OCR 후 처리 | - |
dc.title.translated | 식품 불내성 환자를 위한 포장 식품의 OCR 기반 안전 확인시스템 | - |
dc.identifier.holdings | 000000001979▲200000001565▲200000283919▲ | - |
Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.