Pou's IT Life: UWP - Project Oxford 的 Emotion API

Project Oxford 開放了很多新奇有趣的新服務，例如：Face API，Emotion API，LUIS 語音服務...等，該篇先介紹 Emotion API 的服務。

參考<Emotion API>文件的説明，提供兩種使用方式：

Emotion Recognition

根據上傳圖片或是使用網路的圖片/影片，分析裏面人物數量（利用 Face API）與各自的情感指數（happiness, sadness, surprise, anger, fear, contempt, disgust or neutral）。

這些分析出來的數據可以按照 App 自己的定義來取得識別的依據。

如果有先使用 Face API 取得圖片中人物的位置，可以直接在 Emotion API 裏面指定位置做該人物的情感分析，可以參考<Emotion Recognition with Face Rectangles>。

如果是一般要分析整個圖片，則改用<Emotion Recognition>。

[
  {
    "faceRectangle": {
      "left": 68,
      "top": 97,
      "width": 64,
      "height": 97
    },
    "scores": {
      "anger": 0.00300731952,
      "contempt": 5.14648448E-08,
      "disgust": 9.180124E-06,
      "fear": 0.0001912825,
      "happiness": 0.9875571,
      "neutral": 0.0009861537,
      "sadness": 1.889955E-05,
      "surprise": 0.008229999
    }
  }
]

根據分析的結果回傳多個人臉的陣列，faceRectangle 代表人臉在圖片的位置，scores 則是情感指數（利用科學符號表示），指數裏面的值越大則代表越接近那個情感。

Emotion Recognition 使用注意事項：

支援上傳圖片格式有：JPEG, PNG, GIF(the first frame), BMP。圖片最大 size 不可以超過 4MB。
如果已經有利用 Face API 得到 face 的範圍時，可以在使用 Emotion API 的時候一并送出，Emtion API 優先分析那個部分。
可檢測的 face 範圍從 36x36 到 4096x4096 pixels。超過範圍的不會被檢測到。
每一個圖片可以被檢測到最多 46 個 faces，按照臉矩形大小降冪排列。如果檢測到沒有臉，將返回一個空陣列。
有些圖片可能無法被檢測出來，例如：圖片有非常大型的臉角（頭-姿勢），檢測到網路不穩定或是阻塞。

Emotion in Video

根據上傳的影片分析并隨著播放進度回傳裏面人物的情感，可利用這樣的特性追蹤一個人或是一群人隨著時間的推移抓取得到的内容表現在自己的 App 裏面。可以偵測到的情感類型有：anger, contempt, disgust, fear, happiness, neutral, sadness, and surprise。

針對 Video 的部分 API 一開始分析所有的 frame 中的 face，回傳内容中包含兩種結果：

windowMeanScores

代表在一個 frame 中檢測到的每一個情感框架的臉部的平均分數，這個分數代表是最高分（接近）的情感。用戶可以按照自己的需求定義指數的範圍來觸發事件。

，值是。檢測到的情感應被解釋為得分最高，情感，如分數歸一化到一個總和。使用者可以選擇要設置更高的可信度閾值在其應用程式中，根據他們的需要。windowFaceDistribution 列出了每個情感作為主導情緒的那張臉滿面的分佈。基於情感得分最高的那張臉，確定了每一張臉的主要情緒。

windowFaceDistribution

代表每一個臉部上情感的分佈，基於與該臉部得分最高的情感來確定。

由於 video 是隨著播放在改變每個檢測到人臉情感的變化，因此建議根據播放的時間差作爲情感表情的判斷依據。有關更多的説明可以參考<API Reference>。

Emotion Recognition in Video 使用注意事項：

支援影片格式：MP4, MOV, and WMV。最大 size 不超過 100MB。
可檢測的 face 範圍從 24x24 到 2048x2048 pixels。超過範圍的不會被檢測到。
每一個影片可以被檢測到最多 46 個 faces。
上傳的檔案在 24hr 之後會被刪除。
影片可能無法被檢測出來，例如：圖片有非常大型的臉角（頭-姿勢），檢測到網路不穩定或是阻塞。

[範例説明]
source code：EmotionAPISample。

範例1：上傳圖片分析裏面的人物數量與各自的情感：

1. 先加入 Cognitive Service 的專案，并且取得 Token：

2. 參考<Emotion Recognition>的介紹，支援兩種上傳的方式：URL 與圖檔

2-1. 建立一個可以讀取本地圖片的按鈕功能：

private async void MenuItem_Click(object sender, RoutedEventArgs e)
{
    StorageFile file;
    var item = sender as MenuFlyoutItem;
    if (item.Tag.ToString() == "photo")
    {
        FileOpenPicker openPicker = new FileOpenPicker();
        openPicker.ViewMode = PickerViewMode.Thumbnail;
        openPicker.SuggestedStartLocation = PickerLocationId.PicturesLibrary;
        openPicker.FileTypeFilter.Add(".jpg");
        openPicker.FileTypeFilter.Add(".jpeg");
        openPicker.FileTypeFilter.Add(".png");

        file = await openPicker.PickSingleFileAsync();
    }
    else
    {
        CameraCaptureUI captureUI = new CameraCaptureUI();
        captureUI.PhotoSettings.Format = CameraCaptureUIPhotoFormat.Jpeg;
        captureUI.PhotoSettings.CroppedSizeInPixels = new Size(200, 200);

        file = await captureUI.CaptureFileAsync(CameraCaptureUIMode.Photo);
    }
    if (file != null)
    {
        // Application now has read/write access to the picked file
        var img = await LoadImage(file);
        imgSource.Source = img;
        fileStream = await file.OpenStreamForWriteAsync();
    }
}

要記得設定必要的權限： <uap:Capability Name="picturesLibrary" /> 與 <DeviceCapability Name="webcam" />。

2-2. 呼叫 Emotion API 并且根據回應的結果把 face 範圍標記起來：

private async void OnAnalysisByEmotionAPI(object sender, RoutedEventArgs e)
{
    string key = "";
    using (HttpClient httpClient = new HttpClient())
    {
 httpClient.DefaultRequestHeaders.Add("Ocp-Apim-Subscription-Key", key);

 string queryString = "";
 string uri = $"https://api.projectoxford.ai/emotion/v1.0/recognize?{queryString}";

 HttpContent content = null;

 if (fileStream != null)
 {
            // file straem to byte array
            byte[] data = StreamToByteAraray(fileStream);
            content = new ByteArrayContent(data);
            content.Headers.ContentType = new MediaTypeHeaderValue("application/octet-stream");
 }
        else 
        {
            //application/json 
        }

 var response = await httpClient.PostAsync(uri, content);
 if (response.StatusCode == System.Net.HttpStatusCode.OK)
 {
            string jsonContent = await response.Content.ReadAsStringAsync();
            var emotionResponse = JsonConvert.DeserializeObject<List<EmotionData>>(jsonContent);
            AddFaceRectangles(emotionResponse);
 }
 else
 {
            MessageDialog dialog = new MessageDialog("failed");
            var task = dialog.ShowAsync();
 }
    }
}

要記得在 HTTP Header 裏面加入 Ocp-Apim-Subscription-Key = subscription key ，根據送出不同的内容 Content-Type 也要記得更換。

上述我將回應的 json 内容獨立成 data model 讓自定義的 FaceRectangle 控制項可以做 databinding。

執行結果如下:

如果您已經有 face 位置的資料可以搭配下面的範例，把資料組合送給 Emotion API：

private string GetFaceRectangleQueryString(List<FacerectangleData> rectangles)
{
    List<string> param = new List<string>();
    // override ToString method to return left,top,width,height 
    param.AddRange(rectangles.Select(x => x.ToString()).ToList());

    // Delimited multiple face rectangles with a “;”. 
    return $"faceRectangles={string.Join(";", param)}";
}

範例2：上傳影片分析裏面的人物數量與各自的表情：

1. 加入一個按鈕可以來讀取 video 檔案：

private async void MenuItem_Click(object sender, RoutedEventArgs e)
{
    StorageFile file;
    FileOpenPicker openPicker = new FileOpenPicker();
    openPicker.ViewMode = PickerViewMode.Thumbnail;
    openPicker.SuggestedStartLocation = PickerLocationId.VideosLibrary;
    openPicker.FileTypeFilter.Add(".mp4");
    openPicker.FileTypeFilter.Add(".wmv");
    openPicker.FileTypeFilter.Add(".mov");

    file = await openPicker.PickSingleFileAsync();
    if (file != null)
    {
 fileStream = await file.OpenAsync(FileAccessMode.Read);
 videoPlayer.SetSource(fileStream, "mpeg/video");
    }
}

2. 呼叫 Emotion API：
Emotion for Video 的處理方式跟 for Image 的不一樣，如果成功的話内容會放在 HTTP Header 裏面有 Operation-Location ，它的值爲 URL 代表上傳的影片目前處理的位置，例如：https://api.projectoxford.ai/emotion/v1.0/operations/EF217D0C-9085-45D7-AAE0-2B36471B89B5 。

其中EF217D0C-9085-45D7-AAE0-2B36471B89B5 後面的内容爲 operation id，每次上傳的影片都會拿到這個 operation id。爲什麽會需要，因爲影片處理需要一些時間所以拿到這個 operation id 後，需要寫一個 loop 定期去詢問影片分析完畢沒有。

因此，emotion api for Video 整個流程是這樣的：

private async void OnAnalysisByEmotionAPI(object sender, RoutedEventArgs e)
{
    VideoOperationResult videoResult = null;
    byte[] data = UtilityHelper.StreamToByteAraray(fileStream.AsStreamForRead());
    string key = "";
    EmotionAPIService api = new EmotionAPIService(key);
    api.VideoAnalysisProgressChanged += Api_VideoAnalysisProgressChanged;
    api.VideoAnalysisSuccessed += Api_VideoAnalysisSuccessed;
    videoResult = await api.RecognizeVideo(data.AsBuffer(), string.Empty);
}

private void Api_VideoAnalysisSuccessed(object sender, VideoProcessingResult e)
{
    if (e != null)
    {
     videoProgressResult = e;
    }
}

private void Api_VideoAnalysisProgressChanged(object sender, double e)
{
    Debug.WriteLine($"video progress ... : {e}");
}

上述包裝了 Emotion API for Video 的邏輯，并且獨立兩個 event 讓外部知道目前 Emotion 處理的進度跟處理完畢時要觸發的事件。影片分析真的非常需要時間，如果檔案有很大的話，建議另外寫一個 Background Task 去做等待。

細部來看包裝的邏輯：

2-1. 上傳檔案到 Emotion API：

private async Task<VideoOperationResult> InvokeEmotionAPIForVideo(IBuffer buffer, string queryString)
{
    VideoOperationResult result = null;
    Uri uri = new Uri($"{RecognitionUrl[RecognitionType.Video]}?{queryString}");
    HttpBufferContent content = new HttpBufferContent(buffer);
    content.Headers.ContentType = new HttpMediaTypeHeaderValue("application/octet-stream");
    
    var response = await httpClient.PostAsync(uri, content);
    if (response.StatusCode == HttpStatusCode.Accepted)
    {
        string location = string.Empty;
 response.Headers.TryGetValue("Operation-Location", out location);
 if (string.IsNullOrEmpty(location) == false)
 {
            Uri operationUri = new Uri(location);
            var locationResponse = await httpClient.GetAsync(operationUri);
            string jsonResult = await locationResponse.Content.ReadAsStringAsync();
            result = JsonConvert.DeserializeObject<VideoOperationResult>(jsonResult);
            ProcessVideoResult(result);
            if (result.Status == VideoOperationStatus.Running)
            {
                var task = MonitorVideoProgress(operationUri);
            }
        }
    }
    return result;
}

2-2. 如果 Emotion 回報現在影片正在處理中，那需要 App 加上一個 loop 去詢問 Emotion 目前處理的進度：

private async Task MonitorVideoProgress(Uri location)
{
    int delaySecond = 20;

    while (true)
    {
 var response = await httpClient.GetStringAsync(location);
 var videoResult = JsonConvert.DeserializeObject<VideoOperationResult>(response);
 ProcessVideoResult(videoResult);
 if (videoResult.Status == VideoOperationStatus.Succeeded)
 {
            break;
        }
        await Task.Delay(TimeSpan.FromSeconds(delaySecond));
    }

    await Task.CompletedTask;
}

private void ProcessVideoResult(VideoOperationResult result)
{
    switch (result.Status)
    {
     case VideoOperationStatus.Succeeded:
         var progressResult = JsonConvert.DeserializeObject<VideoProcessingResult>(result.ProcessingResult);                        
         VideoAnalysisSuccessed?.Invoke(this, progressResult);
  break;
 case VideoOperationStatus.Running:
  VideoAnalysisProgressChanged(this, result.Progress);
  break;
 case VideoOperationStatus.Uploading:
 case VideoOperationStatus.Failed:
 case VideoOperationStatus.NotStarted:
  break;
    }
}

以上是簡單的範例，再開發上如果有遇到 http response 的内容可以另外參考<Emotion API>的説明。

[注意]

Emotion API 并非完全免費，如果需要更多的服務内容，可以參考<Pricing options>。

======
Emotion API 很多地方可以應用，例如：開發一個即時分析路過人的表情指數統計每天經過這條路的人的表情，或是根據表情顯示顯示對應的顔色/音樂...等。

快來試用看看吧。

References：

Pou's IT Life

2016/6/3

UWP - Project Oxford 的 Emotion API

沒有留言:

張貼留言