github:http://https://github.com/charlesw/tesseract
例子: https://github.com/charlesw/tesseract-samples
语言数据文件github:https://github.com/tesseract-ocr/tessdata
语言数据文件下载:https://github.com/tesseract-ocr/tessdata/archive/refs/heads/main.zip
使用步骤:
(1).在项目中Nuget包中添加“Tesseract”包。
(2)下载语言数据文件:https://github.com/tesseract-ocr/tessdata/archive/refs/heads/main.zip
(3)在项目中新建文件夹名称为“tessdata”
(4)在下载的语言数据文件中找到“chi_sim.traineddata”文件,复制到“tessdata”文件夹中,设置“chi_sim.traineddata”“文件复制到输出目录”设置为“始终复制”。
(5)复制如下示例代码到程序中,并指定对应识别图片
示例代码:
using (var engine = new TesseractEngine(Server.MapPath(@"~/tessdata"), "chi_sim", EngineMode.Default))//英文是eng,简体中文是chi_sim{// have to load Pix via a bitmap since Pix doesn't support loading a stream.var path = @"ocr/xx.png";using (var image = new System.Drawing.Bitmap(path)){using (var pix = PixConverter.ToPix(image)){using (var page = engine.Process(pix)){var meanConfidence = String.Format("{0:P}", page.GetMeanConfidence());var resultText = page.GetText();return Content("meanConfidence: " + meanConfidence + ",resultText" + resultText);}}}}
注:tessdata文件夹下的语言数据文件“复制到输出目录”设置为“始终复制”。