Azure Cognitive Service – Computer Visionを使って一瞬でOCRアプリケーションを作成する

こんにちは。今日は、Azure Cognitive ServiceのComputer Visionを少し触ってみたので、それをもとに簡単にOCRアプリケーションを作成する方法のメモを残しておきたいと思います。

それではまいります。

Contents

Computer Visionとは？

Computer Vision とは – Azure Cognitive Services | Microsoft Docs

超簡単な概要ですが、以下の機能を提供するAPIの集合体で、Azure Cognitive Serviceの中の１サービスと位置付けられています。

光学文字認識（OCR）
画像分析
Face（顔認識）
空間分析

各機能のコアとなるAIエンジンはMicrosoftによって開発されており、利用者はそれをAPI経由で呼び出して分析結果を得ることができるスグレモノです。

以下に詳細がありますが、Freeプランがあるので、お試しで使ってみる分にはタダで使えてしまう点も魅力です。

価格 – Computer Vision API | Microsoft Azure

今回は、その中でもOCRの機能を提供するAPIを使って、簡単なアプリケーションを作ってみたので、その方法を残しておきます。

といっても、コード自体は公式ドキュメントに親切なチュートリアルがありそちらを利用しています。この記事では、APIの戻り値など、チュートリアルでは詳しく言及されていないですが気になった部分を少し掘り下げて確認してみた結果をメモとして残しています。

シンプルなOCRアプリケーションの作成手順

Computer Visionリソースを作成する

はじめにMicrosoft Azure上でComputer Visionリソースを作成します。

アプリケーションコードの実装（JavaScript）

コードの実装は、以下の公式ドキュメントのサンプルコードで実装することとします。

クイックスタート: 光学式文字認識 (OCR) – Azure Cognitive Services | Microsoft Docs

ローカル環境等の開発環境上で以下を実行。

mkdir myapp && cd myapp

npm init

npm install @azure/cognitiveservices-computervision

npm install async

その上でプロジェクトフォルダ内にindex.jsを作成して、以下コードを実装します。

ここで、コード中のkeyとendpointは、作成したComputer Visionリソースのものを設定します。

このコードは、ハードコードした画像のURLを読み込んで、OCRの解析をかけた結果を返すシンプルな処理になっています。

読み込む画像（printed_text.jpg (1254×704) (raw.githubusercontent.com)

## index.js

'use strict';

const async = require('async');
const fs = require('fs');
const https = require('https');
const path = require("path");
const createReadStream = require('fs').createReadStream
const sleep = require('util').promisify(setTimeout);
const ComputerVisionClient = require('@azure/cognitiveservices-computervision').ComputerVisionClient;
const ApiKeyCredentials = require('@azure/ms-rest-js').ApiKeyCredentials;

/**
 * AUTHENTICATE
 * This single client is used for all examples.
 */
const key = '★Computer Visionエンドポイントのキー★';
const endpoint = '★Computer VisionエンドポイントのURL★';

const computerVisionClient = new ComputerVisionClient(
  new ApiKeyCredentials({ inHeader: { 'Ocp-Apim-Subscription-Key': key } }), endpoint);
/**
 * END - Authenticate
 */

function computerVision() {
  async.series([
    async function () {

      /**
       * OCR: READ PRINTED & HANDWRITTEN TEXT WITH THE READ API
       * Extracts text from images using OCR (optical character recognition).
       */
      console.log('-------------------------------------------------');
      console.log('READ PRINTED, HANDWRITTEN TEXT AND PDF');
      console.log();

      // URL images containing printed and/or handwritten text. 
      // The URL can point to image files (.jpg/.png/.bmp) or multi-page files (.pdf, .tiff).
//      const printedTextSampleURL = 'https://raw.githubusercontent.com/Azure-Samples/cognitive-services-sample-data-files/master/ComputerVision/Images/printed_text.jpg';

      // Recognize text in printed image from a URL
      console.log('Read printed text from URL...', printedTextSampleURL.split('/').pop());
      const printedResult = await readTextFromURL(computerVisionClient, printedTextSampleURL);
      printRecText(printedResult);

      // Perform read and await the result from URL
      async function readTextFromURL(client, url) {
        // To recognize text in a local image, replace client.read() with readTextInStream() as shown:
        let result = await client.read(url);
        //console.log(result);
        // Operation ID is last path segment of operationLocation (a URL)
        let operation = result.operationLocation.split('/').slice(-1)[0];

        // Wait for read recognition to complete
        // result.status is initially undefined, since it's the result of read
        while (result.status !== "succeeded") { await sleep(1000); result = await client.getReadResult(operation); }
        return result.analyzeResult.readResults; // Return the first page of result. Replace [0] with the desired page if this is a multi-page file such as .pdf or .tiff.
      }

      // Prints all text from Read result
      function printRecText(readResults) {
        console.log('Recognized text:');
        for (const page in readResults) {
          if (readResults.length > 1) {
            console.log(`==== Page: ${page}`);
          }
          const result = readResults[page];
          if (result.lines.length) {
            for (const line of result.lines) {
              console.log(line.words.map(w => w.text).join(' '));
            }
          }
          else { console.log('No recognized text.'); }
        }
      }

      /**
       * 
       * Download the specified file in the URL to the current local folder
       * 
       */
      function downloadFilesToLocal(url, localFileName) {
        return new Promise((resolve, reject) => {
          console.log('--- Downloading file to local directory from: ' + url);
          const request = https.request(url, (res) => {
            if (res.statusCode !== 200) {
              console.log(`Download sample file failed. Status code: ${res.statusCode}, Message: ${res.statusMessage}`);
              reject();
            }
            var data = [];
            res.on('data', (chunk) => {
              data.push(chunk);
            });
            res.on('end', () => {
              console.log('   ... Downloaded successfully');
              fs.writeFileSync(localFileName, Buffer.concat(data));
              resolve();
            });
          });
          request.on('error', function (e) {
            console.log(e.message);
            reject();
          });
          request.end();
        });
      }

      /**
       * END - Recognize Printed & Handwritten Text
       */
      console.log();
      console.log('-------------------------------------------------');
      console.log('End of quickstart.');

    },
    function () {
      return new Promise((resolve) => {
        resolve();
      })
    }
  ], (err) => {
    throw (err);
  });
}

computerVision();

以下を実行すると、プログラムが実行されます。

node index.js

＃実行結果
-------------------------------------------------
READ PRINTED, HANDWRITTEN TEXT AND PDF

Read printed text from URL... printed_text.jpg
Recognized text:
Nutrition Facts Amount Per Serving
Serving size: 1 bar (40g)
Serving Per Package: 4
Total Fat 13g
Saturated Fat 1.5g
Amount Per Serving
Trans Fat 0g
Calories 190
Cholesterol 0mg
ories from Fat 110
Sodium 20mg
nt Daily Values are based on Vitamin A 50%
calorie diet.

-------------------------------------------------
End of quickstart.

実行結果をもうちょっとみてみる

上のサンプルコードを眺めてみると、ざっくり

まずCustom Visionのread APIで画像URLを投げて

let result = await client.read(url);

その結果からoperation IDを抜き出してきてgetReadResult APIを呼び出して

result = await client.getReadResult(operation);

その結果をちまちまと出力（笑）しているプログラムであることが分かります。

function printRecText(readResults) {
        console.log('Recognized text:');
        for (const page in readResults) {
          if (readResults.length > 1) {
            console.log(`==== Page: ${page}`);
          }
          const result = readResults[page];
          if (result.lines.length) {
            for (const line of result.lines) {
              console.log(line.words.map(w => w.text).join(' '));
            }
          }
          else { console.log('No recognized text.'); }
        }
      }

なるほど、以下にも書いていますが、たしかにread apiの戻り値自体には、OCR解析の結果は含まれていなくて、それを取得するには別途Operation IDをキーにgetReadResult APIをCALLする必要があるようです。

Read API を呼び出す方法 – Azure Cognitive Services | Microsoft Docs

一応、read apiの結果と、getReadResult APIの戻り値も確認してみるとこんな感じ。たしかにoperationLocationの末尾がoperation IDになっていますね。

#read APIの戻り値

{
  operationLocation: 'https://computervision013.cognitiveservices.azure.com/vision/v3.2/read/analyzeResults/1c7960db-ff32-4e70-8ccb-cfbfa26339a3',
  'apim-request-id': '1c7960db-ff32-4e70-8ccb-cfbfa26339a3',
  connection: 'close',
  'content-length': '0',
  'csp-billing-usage': 'CognitiveServices.ComputerVision.Transaction=1',
  date: 'Thu, 14 Jul 2022 09:57:06 GMT',
  'strict-transport-security': 'max-age=31536000; includeSubDomains; preload',
  'x-content-type-options': 'nosniff',
  'x-envoy-upstream-service-time': '329',
  body: undefined
}

#getReadResult APIの戻り値
{
  status: 'succeeded',
  createdDateTime: '2022-07-14T09:57:07Z',
  lastUpdatedDateTime: '2022-07-14T09:57:07Z',
  analyzeResult: {
    version: '3.2.0',
    modelVersion: '2022-04-30',
    readResults: [ [Object] ]
  }
}

# readResultsの中身をみてみると・・・
[
  {
    page: 1,
    angle: 12.8499,
    width: 1254,
    height: 704,
    unit: 'pixel',
    lines: [
      [Object], [Object],
      [Object], [Object],
      [Object], [Object],
      [Object], [Object],
      [Object], [Object],
      [Object], [Object],
      [Object], [Object]
    ]
  }
]

#さらにlinesの中身を見てみると・・
{
  boundingBox: [
    147,    0, 1233,
    213, 1221,  274,
    134,   57
  ],
  appearance: { style: { name: 'other', confidence: 0.972 } },
  text: 'Nutrition Facts Amount Per Serving',
  words: [
    { boundingBox: [Array], text: 'Nutrition', confidence: 0.993 },
    { boundingBox: [Array], text: 'Facts', confidence: 0.997 },
    { boundingBox: [Array], text: 'Amount', confidence: 0.997 },
    { boundingBox: [Array], text: 'Per', confidence: 0.998 },
    { boundingBox: [Array], text: 'Serving', confidence: 0.997 }
  ]
}
{
  boundingBox: [
    112,  65, 597,
    160, 589, 201,
    104, 106
  ],
  appearance: { style: { name: 'other', confidence: 0.972 } },
  text: 'Serving size: 1 bar (40g)',
  words: [
    { boundingBox: [Array], text: 'Serving', confidence: 0.956 },
    { boundingBox: [Array], text: 'size:', confidence: 0.997 },
    { boundingBox: [Array], text: '1', confidence: 0.997 },
    { boundingBox: [Array], text: 'bar', confidence: 0.998 },
    { boundingBox: [Array], text: '(40g)', confidence: 0.994 }
  ]
}
{
  boundingBox: [
     83, 108, 555, 205,
    544, 258,  72, 162
  ],
  appearance: { style: { name: 'other', confidence: 0.972 } },
  text: 'Serving Per Package: 4',
  words: [
    { boundingBox: [Array], text: 'Serving', confidence: 0.997 },
    { boundingBox: [Array], text: 'Per', confidence: 0.998 },
    { boundingBox: [Array], text: 'Package:', confidence: 0.996 },
    { boundingBox: [Array], text: '4', confidence: 0.995 }
  ]
}
{
  boundingBox: [
    685, 215, 1000,
    284, 988,  336,
    674, 263
  ],
  appearance: { style: { name: 'other', confidence: 0.972 } },
  text: 'Total Fat 13g',
  words: [
    { boundingBox: [Array], text: 'Total', confidence: 0.998 },
    { boundingBox: [Array], text: 'Fat', confidence: 0.998 },
    { boundingBox: [Array], text: '13g', confidence: 0.994 }
  ]
}
{
  boundingBox: [
    696,  299, 1120,
    395, 1107,  450,
    684,  349
  ],
  appearance: { style: { name: 'other', confidence: 0.972 } },
  text: 'Saturated Fat 1.5g',
  words: [
    { boundingBox: [Array], text: 'Saturated', confidence: 0.996 },
    { boundingBox: [Array], text: 'Fat', confidence: 0.998 },
    { boundingBox: [Array], text: '1.5g', confidence: 0.981 }
  ]
}
{
  boundingBox: [
     29, 218, 486, 314,
    476, 360,  19, 259
  ],
  appearance: { style: { name: 'other', confidence: 0.972 } },
  text: 'Amount Per Serving',
  words: [
    { boundingBox: [Array], text: 'Amount', confidence: 0.997 },
    { boundingBox: [Array], text: 'Per', confidence: 0.998 },
    { boundingBox: [Array], text: 'Serving', confidence: 0.997 }
  ]
}
{
  boundingBox: [
    670, 371, 952,
    439, 941, 483,
    659, 412
  ],
  appearance: { style: { name: 'other', confidence: 0.972 } },
  text: 'Trans Fat 0g',
  words: [
    { boundingBox: [Array], text: 'Trans', confidence: 0.993 },
    { boundingBox: [Array], text: 'Fat', confidence: 0.998 },
    { boundingBox: [Array], text: '0g', confidence: 0.927 }
  ]
}
{
  boundingBox: [
     10, 294, 263, 349,
    253, 397,   0, 341
  ],
  appearance: { style: { name: 'other', confidence: 0.972 } },
  text: 'alories 190',
  words: [
    { boundingBox: [Array], text: 'alories', confidence: 0.995 },
    { boundingBox: [Array], text: '190', confidence: 0.998 }
  ]
}
{
  boundingBox: [
    595, 429, 1006,
    528, 992,  581,
    583, 478
  ],
  appearance: { style: { name: 'other', confidence: 0.972 } },
  text: 'Cholesterol Omg',
  words: [
    { boundingBox: [Array], text: 'Cholesterol', confidence: 0.994 },
    { boundingBox: [Array], text: 'Omg', confidence: 0.621 }
  ]
}
{
  boundingBox: [
      1, 377, 397, 463,
    386, 512,   0, 421
  ],
  appearance: { style: { name: 'other', confidence: 0.972 } },
  text: 'ories from Fat 110',
  words: [
    { boundingBox: [Array], text: 'ories', confidence: 0.994 },
    { boundingBox: [Array], text: 'from', confidence: 0.989 },
    { boundingBox: [Array], text: 'Fat', confidence: 0.998 },
    { boundingBox: [Array], text: '110', confidence: 0.998 }
  ]
}
{
  boundingBox: [
    562, 502, 911,
    590, 898, 641,
    549, 550
  ],
  appearance: { style: { name: 'other', confidence: 0.972 } },
  text: 'Sodium 20mg',
  words: [
    { boundingBox: [Array], text: 'Sodium', confidence: 0.997 },
    { boundingBox: [Array], text: '20mg', confidence: 0.989 }
  ]
}
{
  boundingBox: [
     10, 479, 460, 586,
    451, 623,   2, 518
  ],
  appearance: { style: { name: 'other', confidence: 0.972 } },
  text: 'nt Daily Values are based on',
  words: [
    { boundingBox: [Array], text: 'nt', confidence: 0.958 },
    { boundingBox: [Array], text: 'Daily', confidence: 0.998 },
    { boundingBox: [Array], text: 'Values', confidence: 0.997 },
    { boundingBox: [Array], text: 'are', confidence: 0.998 },
    { boundingBox: [Array], text: 'based', confidence: 0.998 },
    { boundingBox: [Array], text: 'on', confidence: 0.998 }
  ]
}
{
  boundingBox: [
    525, 602, 770,
    659, 762, 698,
    516, 643
  ],
  appearance: { style: { name: 'other', confidence: 0.972 } },
  text: 'Vitamin A 50%',
  words: [
    { boundingBox: [Array], text: 'Vitamin', confidence: 0.991 },
    { boundingBox: [Array], text: 'A', confidence: 0.997 },
    { boundingBox: [Array], text: '50%', confidence: 0.955 }
  ]
}
{
  boundingBox: [
     12, 535, 194, 577,
    186, 614,   3, 572
  ],
  appearance: { style: { name: 'other', confidence: 0.972 } },
  text: 'calorie diet',
  words: [
    { boundingBox: [Array], text: 'calorie', confidence: 0.992 },
    { boundingBox: [Array], text: 'diet', confidence: 0.909 }
  ]
}

なるほど、boundingBox（見方はこちらが参考になりそう？）という座標情報に加えて、読み取った文字とその信頼度スコアの情報が含まれていますね。

このあたりの情報も使えば、活用の幅を広げる余地もありそうですね。

さて、今回はチュートリアルに沿って簡単なアプリケーションを作ってみましたが、実際に活用するなら、この処理をWebアプリケーションのバックエンドに仕込んだり、あるいはAzure Functionsにこのコードを実装して、Storageに画像がアップロードされる都度トリガーして裏で解析するような活用方法になりますかね。

少しでも参考になりましたら幸いです。

おしまい

Azure Cognitive Service – Computer Visionを使って一瞬でOCRアプリケーションを作成する

Computer Visionとは？

シンプルなOCRアプリケーションの作成手順

Computer Visionリソースを作成する

アプリケーションコードの実装（JavaScript）

実行結果をもうちょっとみてみる

関連

コメントを残す

Profile

Categories

yutaro013_scenery

Trending Posts

【2023年度最新版】Azure Administrator Associate資格 (AZ-104)を3日で取得した話。勉強方法は？取ってよかった？

【詳解】クライアント証明書認証を実装しながら理解する – 前編：概要～証明書作成編

【VSCodeで開発】コミットしようとしたら「Git の ‘user.name’ と ‘user.email’ を構成していることを確認してください」エラーが発生する

【自然言語処理】PythonとTwitter APIでデータ分析

【2023年最新版】Microsoft Cybersecurity Architect Expert資格 (SC-100)を1週間で取得した話。勉強方法は？取ってよかった？

【2023年最新版】Azure Solution Architect Expert資格 (AZ-305)を1週間で取得した話。勉強方法は？取ってよかった？

超便利＆簡単！VS Codeの設定を複数PC間で同期する

【機械学習】Scikit-Learnで交差検証(Cross-Validation)を一瞬で実装する【Python】

AzureのSAS（共有アクセス署名）を分かりやすく解説する

Azure Developer Associate資格(AZ-204)を10日間で取得した話。勉強方法は？取ってよかった？

Microsoft Power Platform Fundamental資格 (PL-900)を取得した話。勉強方法は？取ってよかった？

【Python×自然言語処理】テキストデータを極性辞書で感情分析してみる

【2023年最新版】Azureの認定資格を1ヶ月で全部とる。資格一覧と対策方法・体験記まとめ

Azure Data Fundamentals資格 (DP-900)を2日で取得した話。勉強方法は？取ってよかった？

【データ分析】MacOSで複数のPython/Anacondaバージョンを使い分ける方法【pyenv】

【機械学習】決定木モデルの変数重要度をわかりやすく解説する

CPU使用率とCPU時間について分かりやすく解説する

機械学習における転移学習とファインチューニング

Azure AI Fundamentals (AI-900)を2日で取得した話。勉強方法は？取ってよかった？

3日間集中勉強でAWSソリューションアーキテクトアソシエイト資格試験(SAA-C02)に合格した話

アーカイブ