tesseractocr教程_tesseract-ocr怎麼設置只匹配數字+大寫字母

① 如何用Tesseract做日文OCR

下載chi_sim.traindata字型檔
下載tesseract-ocr-setup-3.02.02.exe
下載地址：http://code.google.com/p/tesseract-ocr/downloads/list
下載jTessBoxEditor用於修改box文件
下載地址：http://download.csdn.net/detail/a443475601/5896893 裡面自帶java運行庫，安裝後然後啟動命令行 java -jar jTessBoxEditor.jar即可打開

為了方便 tif文面命名格式[lang].[fontname].exp[num].tif
lang是語言 fontname是字體
比如我們要訓練自定義字型檔 image 字體名MyFont
那麼我們把tif文件重命名 image.MyFont.exp0.tif

下面開始訓練字型檔：
1、tesseract image.MyFont.exp0.tif image.MyFont.exp0 -l chi_sim batch.nochop makebox
該步驟會生成一個image.MyFont.exp0.box文件
把tif文件和box文件放在同一目錄，用jTessBoxEditor.jar打開tif文件，然後根據實際情況修改box文件
2、tesseract image.MyFont.exp0.tif image.MyFont.exp0 nobatch box.train
該步驟生成一個image.MyFont.exp0.tr文件
3、unicharset_extractor image.MyFont.exp0.box
該步驟生成一個unicharset文件
4、新建一個font_properties文件
裡面內容寫入MyFont 0 0 0 0 0 表示默認普通字體
5、運行命令
shapeclustering -F font_properties -U unicharset image.MyFont.exp0.tr
mftraining -F font_properties -U unicharset -O image.unicharset image.MyFont.exp0.tr
cntraining image.MyFont.exp0.tr
6、把目錄下的unicharset、inttemp、pffmtable、shapetable、normproto這五個文件前面都加上image.
7、執行combine_tessdata image.
然後把image.traineddata放到tessdata目錄
8、用新的字型檔對圖片進行分析
tesseract test.tif output -l image

② 文字圖像識別庫 Tesseract-ocr 怎麼學習

去雲脈OCR SDK開發者平台注冊一個賬號，接入一款OCR API，看看有木有靈感

③ 我要用tesseract ocr做漢字識別，平台是windows上的vs2010,誰有好教程請告訴我吧~不勝感激

http://blog.csdn.net/zhymax/article/details/8435303

④ tesseract-ocr怎麼設置只匹配數字+大寫字母

C:Program Files (x86)Tesseract-OCR essdataconfigs文件夾目錄下，復制digits並命名為yours，用notepad++編輯

tessedit_char_

如果你是使用的3.04，按教程直接使用

tesseractC:1.jpgC:1yours

你會很輕專松地得到想要的結果。

但是如屬果你使用的是4.00，你會發現白名單毫無作用。

奇怪，難道是配置環境出錯了？還是字母打錯了？都不是。

tesseract提供了OCR引擎模式。

0=僅限原始Tesseract
1=只有神經網路LSTM
2=Tesseract+LSTM
3=基於可用的默認值

目前LSTM是無法支持白名單的，並且似乎tesseract的團隊無意去解決這個問題。

選擇原始tesseract 即 --oem 0

tesseract--oem0C:1.jpgC:1yours

這樣你就能使用白名單了。但是識別正確率會比原先低。

⑤ 如何通過Tesseract開源OCR引擎創建Android OCR應用

要編譯Android平台的Tesseract，需要使用Google提供的tesseract-android-tools。

代碼獲取方式：

git clone https：//code.。google.com/p/tesseract-android-tools/
打開README，在命令行工具中執行下面的步驟：

cd <project-directory>
curl -O https：//tesseract-ocr。googlecode.。com/files/tesseract-ocr-3.02.02.tar.gz
curl -O http：//leptonica。googlecode。com/files/leptonica-1.69.tar.gz
tar -zxvf tesseract-ocr-3.02.02.tar.gz
tar -zxvf leptonica-1.69.tar.gz
rm -f tesseract-ocr-3.02.02.tar.gz
rm -f leptonica-1.69.tar.gz
mv tesseract-3.02.02 jni/com_googlecode_tesseract_android/src
mv leptonica-1.69 jni/com_googlecode_leptonica_android/src
ndk-build -j8
android update project --target 1 --path .
ant debug (release)
注意：如果你在使用NDK r9，編譯的時候會出現錯誤：

format not a string literal and no format arguments [-Werror=format-security]
解決的方法就是在Application.mk中加入一行：

APP_CFLAGS += -Wno-error=format-security
編譯之後會生成class.jar和一些*.so。

Android OCR Application

創建一個Android應用，把生成的jar和so導入進來。

創建TessOCR：

public class TessOCR {
private TessBaseAPI mTess;

public TessOCR() {
// TODO Auto-generated constructor stub
mTess = new TessBaseAPI();
String datapath = Environment.getExternalStorageDirectory() + "/tesseract/";
String language = "eng";
File dir = new File(datapath + "tessdata/");
if (!dir.exists())
dir.mkdirs();
mTess.init(datapath, language);
}

public String getOCRResult(Bitmap bitmap) {

mTess.setImage(bitmap);
String result = mTess.getUTF8Text();

return result;
}

public void onDestroy() {
if (mTess != null)
mTess.end();
}

}
構造函數中需要在存儲卡上創建一個目錄tessdata，如果不創建程序運行就會出錯。因為源碼中會檢測這個目錄，不存在就拋出異常：

public boolean init(String datapath, String language) {
if (datapath == null) {
throw new IllegalArgumentException("Data path must not be null!");
}
if (!datapath.endsWith(File.separator)) {
datapath += File.separator;
}

File tessdata = new File(datapath + "tessdata");
if (!tessdata.exists() || !tessdata.isDirectory()) {
throw new IllegalArgumentException("Data path must contain subfolder tessdata!");
}

return nativeInit(datapath, language);
}
就這么簡單。現在通過三種方式獲取圖片做OCR：

在圖庫中選取一張圖，選擇發送或者分享，選擇OCR應用

在AndroidManifest.xml中加入IntentFilter，讓OCR應用出現在圖庫的分享列表中：

<intent-filter>
<action android:name="android.intent.action.SEND" />

<category android:name="android.intent.category.DEFAULT" />
<data android:mimeType="text/plain" />
<data android:mimeType="image/*" />
</intent-filter>
獲得URI之後，對URI解碼，獲取bitmap：

if (Intent.ACTION_SEND.equals(intent.getAction())) {
Uri uri = (Uri) intent.getParcelableExtra(Intent.EXTRA_STREAM);
uriOCR(uri);
}
private void uriOCR(Uri uri) {
if (uri != null) {
InputStream is = null;
try {
is = getContentResolver().openInputStream(uri);
Bitmap bitmap = BitmapFactory.decodeStream(is);
mImage.setImageBitmap(bitmap);
doOCR(bitmap);
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} finally {
if (is != null) {
try {
is.close();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
}
}
啟動OCR應用，從圖庫中選擇一張圖做OCR

發送Intent調用圖庫，在onActivityResult中獲取返回的URI做OCR：

Intent intent = new Intent(Intent.ACTION_PICK, android.provider.MediaStore.Images.Media.EXTERNAL_CONTENT_URI);
startActivityForResult(intent, REQUEST_PICK_PHOTO);
啟動OCR應用，拍照之後做OCR

為了獲取高質量的圖片，在Intent中加入圖片路徑。返回之後就可以直接使用這個圖片路徑解碼：

private void dispatchTakePictureIntent() {
Intent takePictureIntent = new Intent(MediaStore.ACTION_IMAGE_CAPTURE);
// Ensure that there's a camera activity to handle the intent
if (takePictureIntent.resolveActivity(getPackageManager()) != null) {
// Create the File where the photo should go
File photoFile = null;
try {
photoFile = createImageFile();
} catch (IOException ex) {
// Error occurred while creating the File

}
// Continue only if the File was successfully created
if (photoFile != null) {
takePictureIntent.putExtra(MediaStore.EXTRA_OUTPUT,
Uri.fromFile(photoFile));
startActivityForResult(takePictureIntent, REQUEST_TAKE_PHOTO);
}
}
}
最後不要忘記下載語言包，並push到存儲卡的tessdata目錄下。

⑥ tesseract-ocr怎麼使用

開源的？？不好用哦…識別率不太行呢！

⑦ 怎麼安裝tesseract ocr庫

之前使用 sudo apt-get install tesseract-ocr 安裝的tesseract-ocr有問題，不能使用psm參數。決定手動編譯安裝。下面參考別人的安裝過程。
安裝所需的庫

sudo apt-get install libpng12-dev
sudo apt-get install libjpeg62-dev
sudo apt-get install libtiff4-dev

sudo apt-get install gcc
sudo apt-get install g++
sudo apt-get install automake

pytesser 調用了 tesseract，因此需要安裝 tesseract，安裝 tesseract 需要安裝 leptonica，否則編譯tesseract 的時候出現 "configure: error: leptonica not found"。

以下都是解壓編譯安裝的老步驟：
./configure
make -j4
sudo make install

下載安裝leptonica
http://www.leptonica.org/download.html 或者
http://code.google.com/p/leptonica/downloads/list

最新的是leptonica-1.69.tar.bz2

下載安裝tesseract
http://code.google.com/p/tesseract-ocr/
最新的是 tesseract-ocr-3.02.02.tar.gz

⑧ tesseract-ocr源文件怎麼用

如果你是終端用戶請下載exe安裝包，

這個是源碼包是針對開次開發或DIY用戶使用的

如果你想自己編程調用請直接看api目錄的api.cpp文件即可

這個文件是調用入口，裡面有詳細的注釋，只不過是英文的

⑨ tesseract-ocr-setup-3.02.02怎麼安裝

tesseract-ocr(開源圖像識別引擎)
http://www.ddooo.com/softdown/94968.htm
1.打開下載的壓縮包，找到「tesseract-ocr-setup-3.02.02.exe」，雙擊運行，進入下圖所示安裝界面，點擊「next」。

2.勾選「I
accept....」，然後點擊「next」。

3.選擇可以使用該軟體的用戶，這里我們選擇第一個，任何使用該電腦的人都可以使用它，然後點擊"next"。

4.選擇安裝路徑，之後點擊「next」。

5.選擇需要安裝的組件，默認是沒有勾選語言組件的，我們可以勾選自己想要識別的語言。勾選簡體中文的話，那麼就可以識別有簡體中文的圖像了，其他語言同理。

6.正在安裝中，請耐心等待。

7.安裝完成之後我們打開cmd，輸入「tesseract」，出現下圖所示表示安裝成功

導航:首頁 > 文件教程 > tesseractocr教程

tesseractocr教程

與tesseractocr教程相關的資料

友情鏈接