自然场景文本检测技术研究综述

Text detection in natural scenes: a literature review

  • 摘要: 文本检测在自动驾驶和跨模态图像检索中具有极为广泛的应用。该技术也是基于光学字符的文本识别任务中重要的前置环节。目前,复杂场景下的文本检测仍极具挑战性。本文对自然场景文本检测进行综述,回顾了针对该问题的主要技术和相关研究进展,并对研究现状进行分析。首先对问题进行概述,分析了自然场景中文本检测的主要特点;接着,介绍了经典的基于连通域分析、基于滑动检测窗的自然场景文本检测技术;在此基础上,综述了近年来较为常用的深度学习文本检测技术;最后,对自然场景文本检测未来可能的研究方向进行展望。

     

    Abstract: Text detection is widely applied in the automatic driving and cross-modal image retrieval fields. This technique is also an important pre-procedure in optical character-based text recognition tasks. At present, text detection in complex natural scenes remains a challenging topic. Because text distribution and orientation are varied in different scenes and domains, there is still room for improvement in existing computer vision-based text detection methods. To complicate matters, natural scene texts, such as those in guideposts and shop signs, always contain words in different languages. Even characters are missing from some natural scene texts. These circumstances present more difficulties for feature extraction and feature description, thereby weakening the detectability of existing computer vision and image processing methods. In this context, text detection applications in natural scenes were summarized in this paper, the classical and newly presented techniques were reviewed, and the research progress and status were analyzed. First, the definitions of natural scene text detection and associated concepts were provided based on an analysis of the main characteristics of this problem. In addition, the classic natural scene text detection technologies, such as connected component analysis-based methods and sliding detection window-based methods, were introduced comprehensively. These methods were also compared and discussed. Furthermore, common deep learning models for scene text detection of the past decade were also reviewed. We divided these models into two main categories: region proposal-based models and segmentation-based models. Accordingly, the typical detection and semantic segmentation frameworks, including Faster R-CNN, SSD, Mask R-CNN, FCN, and FCIS, were integrated in the deep learning methods reviewed in this section. Moreover, hybrid algorithms that use region proposal ideas and segmentation strategies were also analyzed. As a supplement, several end-to-end text recognition strategies that can automatically identify characters in natural scenes were elucidated. Finally, possible research directions and prospects in this field were analyzed and discussed.

     

/

返回文章
返回