OpenVINO场景文字检测与文字识别教程

作者：贾志刚英特尔边缘计算创新大使
01openvino场景文字检测
openvino是英特尔推出的深度学习模型部署框架，当前最新版本是openvino2023版本。openvino2023自带各种常见视觉任务支持的预训练模型库model zoo，其中支持场景文字检测的网络模型是来自model zoo中名称为：text-detection-0003的模型(基于pixellink架构的场景文字检测网络)。
图-1 pixellink网络模型架构
图-1中的pixellink场景文字检测模型的输入与输出格式说明
输入格式：
1x3x768x1280 bgr彩色图像
输出格式：
name: model/link_logits_/add, [1x16x192x320] – pixellink的输出name: model/segm_logits/add, [1x2x192x320] – 像素分类text/no text
左滑查看更多
02openvino文字识别
openvino支持文字识别(数字与英文)的模型是来自model zoo中名称为：text-recognition-0012d的模型，是典型的crnn结构模型。 (基于类似vgg卷积结构backbone与双向lstm编解码头的文字识别网络)
图-2 crnn网络模型架构
图-2文本识别模型的输入与输出格式如下：
输入格式：1x1x32x120
输出格式：30, 1, 37
输出解释是基于ctc贪心解析方式，其中37字符集长度，字符集为：0123456789abcdefghijklmnopqrstuvwxyz#
#表示空白。
03mediapipe手势识别
谷歌在2020年发布的mediapipe开发包说起，这个开发包集成了包含手势姿态等各种landmark检测与跟踪算法。其中支持手势识别是通过两个模型实现，一个是模型是检测手掌，另外一个模型是实现手掌的landmakr标记。
图-3 手势landmark点位说明
04openvino与mediapipe库的安装
pip install openvino==2023.0.2pip install mediapipe
左滑查看更多
请先安装好opencv-python开发包依赖。
05应用构建说明
首先基于opencv打开usb摄像头或者笔记本的web cam，读取视频帧，然后在每一帧中完成手势landmark检测，根据检测到手势landmark数据，分别获取左右手的食指指尖位置坐标（图-3中的第八个点位），这样就得到了手势选择的roi区域，同时把当前帧的图像送入到openvino场景文字识别模块中，完成场景文字识别，最后对比手势选择的区域与场景文字识别结果每个区域，计算它们的并交比，并交比阈值大于0.5的，就返回该区域对应的ocr识别结果，并显示到界面上。整个流程如下：
图-4程序执行流程图
06代码实现
根据图-4的程序执行流程图，把场景文字检测与识别部分封装到了一个类textdetectandrecognizer，最终实现的主程序代码如下：
import cv2 as cvimport numpy as npimport mediapipe as mpfrom text_detector import textdetectandrecognizerdigit_nums = ['0','1', '2','3','4','5','6','7','8','9','a','b','c','d','e','f','g', 'h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z','#']mp_drawing = mp.solutions.drawing_utilsmp_hands = mp.solutions.handsx0 = 0y0 = 0detector = textdetectandrecognizer()# for webcam input:cap = cv.videocapture(0)cap.set(cv.cap_prop_frame_height, 1080)cap.set(cv.cap_prop_frame_width, 1920)height = cap.get(cv.cap_prop_frame_height)width = cap.get(cv.cap_prop_frame_width)# out = cv.videowriter(d:/test777.mp4, cv.videowriter_fourcc('d', 'i', 'v', 'x'), 15, (np.int(width), np.int(height)), true)with mp_hands.hands( min_detection_confidence=0.75, min_tracking_confidence=0.5) as hands: while cap.isopened(): success, image = cap.read() if not success: break image.flags.writeable = false h, w, c = image.shape image = cv.cvtcolor(image, cv.color_bgr2rgb) results = hands.process(image) image = cv.cvtcolor(image, cv.color_rgb2bgr) x1 = -1 y1 = -1 x2 = -1 y2 = -1 if results.multi_hand_landmarks: for hand_landmarks in results.multi_hand_landmarks: mp_drawing.draw_landmarks( image, hand_landmarks, mp_hands.hand_connections) for idx, landmark in enumerate(hand_landmarks.landmark): x0 = np.int(landmark.x * w) y0 = np.int(landmark.y * h) cv.circle(image, (x0, y0), 4, (0, 0, 255), 4, cv.line_aa) if idx == 8 and x1 == -1 and y1 == -1: x1 = x0 y1 = y0 cv.circle(image, (x1, y1), 4, (0, 255, 0), 4, cv.line_aa) if idx == 8 and x1 > 0 and y1 > 0: x2 = x0 y2 = y0 cv.circle(image, (x2, y2), 4, (0, 255, 0), 4, cv.line_aa) if abs(x1-x2) > 10 and abs(y1-y2) > 10 and x1 > 0 and x2 > 0: if x1 < x2: cv.rectangle(image, (x1, y1), (x2, y2), (255, 0, 0), 2, 8) text = detector.inference_image(image, (x1, y1, x2, y2)) cv.puttext(image, text, (x1, y1), cv.font_hershey_simplex, 1, (0, 255, 255), 2) else: cv.rectangle(image, (x2, y2), (x1, y1), (255, 0, 0), 2, 8) text = detector.inference_image(image, (x2, y2, x1, y1)) cv.puttext(image, text, (x2, y2), cv.font_hershey_simplex, 1, (0, 255, 255), 2) # flip the image horizontally for a selfie-view display. cv.imshow('mediapipe hands', image) # out.write(image) if cv.waitkey(1) & 0xff == 27: breakcap.release()# out.release()
左滑查看更多
07移植到alxboard开发板上
在爱克斯开发板上安装好mediapipe即可，openvino不用安装了，因为爱克斯开发板自带opencv与openvino，然后就可以直接把python代码文件copy过去，插上usb摄像头，直接使用命令行工具运行对应的python文件，就可以直接用了，这样就在alxboard开发板上实现了基于手势选择区域的场景文字识别应用。
08后续指南
安装语音播报支持包：
pip install pyttsx
alxborad开发板是支持3.5mm耳机mic接口，支持语音播报的，如果把区域选择识别的文字，通过pyttsx直接播报就可以实现从手势识别到语音播报了，自动跟读卡片单词启蒙学英语，后续实现一波，请继续关注我们。

APS计划排产实现多种有限能力资源的同步约束
精准加速度，CASAIM全自动化光学系统三维检测汽车排气歧管
人工智能的发展，工业机器人的智能化变革
铅酸电池循环寿命分析
自动搬运车是什么，它的自身优势都有哪些
OpenVINO场景文字检测与文字识别教程
华为mate10售价将超7千华为大力进军日本市场
高压脉冲发生器电路图
基于无线传输网络的平安城市监控系统的应用方案
星型网络,星型网络的拓扑结构是怎样的?
塑胶件表面处理工艺
TDA6111Q电子管中文资料引脚图及参数
依依不舍我们终将告别“爱迪生时代”了吗？
FM2007调频立体声多频点发射模块
微软Windows Phone 8.1应用商店即将停用
数字化转型守护者丨芯盾时代入选“2023央国企数字化产业赋能图谱”多个领域
小天才发布新品--小天才电话手表Z2
衰减器特性阻抗
智慧高校防控墙的作用
起亚这款小型SUV，预售10万起，又一款超高性价比，高颜值SUV即将发布！