One of the big trends in technology is Augmented Reality. Devices such as Google Project Tango or Microsoft HoloLens enable you to place 3D Objects in the real world.
In this blog we will show, how you can integrate Microsoft HoloLens with services like the Google Cloud Vision API or Microsoft Cognitive Services to annotate the real world around you. We will create a Unity project which uses the HoloLens’ camera to record pictures which are then send to one of the cloud services. The returned annotations will be positioned in the real world based on the spatial mapping information provided by the HoloLens.

About the authors: David Dornseifer and Christoph Kraemer are developers in the SAP Engineering team, which is part of the SAP Innovation Center Network (ICN) based out of Palo Alto, CA. In our team we explore new technologies and bring them into a business context.


Before we get into the code a short sketch of what we are building:

  1. Initialize the HoloLens camera and start taking pictures at a certain interval. (Note: The HoloLens Camera API only allows you to either take pictures and process them or take a video and store it. There is no possibility to stream the images. Because of that, we will only be able to support a low frame rate in the final application of about 2 annotation updates per second.)
  2. Take a photo. HoloLens automatically remembers the position of where the photo was taken in the scene. Copy the image data into a buffer.
  3. Prepare the request for the selected cloud provider and send it. Then parse the JSON response.
  4. Use the position where the photo was taken to do a raycast into the spatial space and place the annotation marker using this information.


You can checkout the source code of this project here.

Setting up the Unity Scene

  1. Import the HoloToolkit into your Unity project. You will only need the Main Camera prefab and the Spatial Mapping Collider, so it is fine if you only import those specific assets.
  2. Import SimpleJSON.cs into your assets.
  3. Add a custom layer SpatialMapping to your Unity project.
  4. Replace the default Main Camera with the one from the HoloToolkit.
  5. Create an empty GameObject rename it to “SpatialMapping” and add the Spatial Mapping Collider component to it. Then change its Physics Layer to SpatialMapping, adjust the Level of Detail to High and decrease the Time Between Updates to 1 second.
  6. Create another empty GameObject which we will use later to add the custom script for the backend communication and annotation handling. Rename it to “AnnotationHandling”.
  7. Create another empty GameObject which we will use later to act as a parent for the GameObjects we generate to show the annotations. Rename it to “Annotations”.
  8. Save the scene.


Project Setup

Creating the Main Script

Create a new C# script and name it AnnotationHandling.cs. Add this script to the AnnotationHandling GameObject.
First we need to setup some public properties: We need to specify the cloud provider, we want to use as well as the corresponding API key. Other information includes the layer which we want to use for raycasting, the parent in which we want to manage all annotations, the GameObject we want to instantiate for each annotation that we are showing and the picture interval.

using UnityEngine;
using UnityEngine.Networking;
using UnityEngine.VR.WSA.WebCam;
using SimpleJSON;
using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using System.Text;
/// <summary>
/// Handle taking pictures, send them to the backend and process the
/// results to display annotations in the real world.
/// </summary>
public class AnnotationHandling : MonoBehaviour {
    public enum CloudProvider
    public CloudProvider m_cloudProvider;
    public string m_apiKey;
    public float m_pictureInterval;
    public LayerMask m_raycastLayer;
    public GameObject m_annotationParent;
    public GameObject m_annotationTemplate;


Properties set in the inspector pane

Creating an Annotation Visualization

We want to visualize annotations in the real world. To do that we need to visualize them in the scene. Most annotations come with coordinates where they are positioned in the image. In this demo we simply create a frame to show these coordinates. We therefore created a simple frame in Blender and imported it into our scene. You can find the frame in the source code.


Frame we use to visualize the annotations

Create a prefab with this frame and drag it onto the annotation template property of our annotation handling script.

Taking Photos

Next we want to use the camera to take pictures. First we add to private properties to our class to manage the cameras resolution and the photo capture object. Then we can implement the Start method.

void Start ()
    //Get the highest resolution
    m_cameraResolution = PhotoCapture.SupportedResolutions.OrderByDescending((res) => res.width * res.height).First();
    PhotoCapture.CreateAsync(false, delegate (PhotoCapture captureObject) {
          //Assign capture object
          m_photoCapture = captureObject;
          //Configure camera
          CameraParameters cameraParameters = new CameraParameters();
          cameraParameters.hologramOpacity = 0.0f;
          cameraParameters.cameraResolutionWidth = m_cameraResolution.width;
          cameraParameters.cameraResolutionHeight = m_cameraResolution.height;
          cameraParameters.pixelFormat = CapturePixelFormat.JPEG;
          //Start the photo mode and start taking pictures
          m_photoCapture.StartPhotoModeAsync(cameraParameters,  false, delegate(PhotoCapture.PhotoCaptureResult result) {
              Debug.Log("Photo Mode started");
              InvokeRepeating("ExecutePictureProcess", 0, m_pictureInterval);

We also need to make sure to clean up everything when the application exists.

void OnDestroy()
    m_photoCapture.StopPhotoModeAsync(delegate (PhotoCapture.PhotoCaptureResult res) {
          m_photoCapture = null;
          Debug.Log("Photo Mode stopped");

In the code above (InvokeRepeating(“ExecutePictureProcess”)) we defined that we want to execute the function ExecutePictureProcess in an interval. So, let’s implement it.

void ExecutePictureProcess()
    if (m_photoCapture != null) {
          //Take a picture
          m_photoCapture.TakePhotoAsync(delegate (PhotoCapture.PhotoCaptureResult result, PhotoCaptureFrame photoCaptureFrame) {
              List<byte> buffer = new List<byte>();
              Matrix4x4 cameraToWorldMatrix;
              //Check if we can receive the position
              //where the photo was taken
              if (!photoCaptureFrame.TryGetCameraToWorldMatrix(out cameraToWorldMatrix)) {
              //Start a coroutine to handle the server request
              StartCoroutine(UploadAndHandlePhoto(buffer.ToArray(), cameraToWorldMatrix));

Sending the Server Request

Now we need to send the picture to one of our cloud providers. We need to prepare the request accordingly. Google Cloud Platform (GCP) expects the image as part of the JSON request as base64 encoded string. Microsoft Cognitive Services expects us to send the request as multipart/form data. We create a function which creates the UnityWebRequest based on the user settings. In this example the preparation for GCP is shown. You can see the code for Microsoft in the source code.

UnityWebRequest CreateRequest(byte[] photo)
    DownloadHandler download = new DownloadHandlerBuffer();
    if (m_cloudProvider == CloudProvider.GoogleCloudPlatform) {
          string base64image = Convert.ToBase64String(photo);
          string json = "{\"requests\": [{\"image\": {\"content\": \"" + base64image + "\"},\"features\": [{\"type\": \"FACE_DETECTION\",\"maxResults\": 5}]}]}";
          byte[] content = Encoding.UTF8.GetBytes(json);
          UploadHandler upload = new UploadHandlerRaw(content);
          string url = "https://vision.googleapis.com/v1/images:annotate?key=" + m_apiKey;
          UnityWebRequest www = new UnityWebRequest(url,"POST", download, upload);
          www.SetRequestHeader("Content-Type", “application/json”);
          return www;
    }  else  {
          //Prepare Microsoft request

Creating the Annotation Frames

We always need to create the frames to visualize the annotations relative to the location where the photo was taken. In our project we use a second camera instance that we position to where the photo was taken. Then it is used to do the raycast.
First we need to create the Camera. In Unity use the Create button in the Hierarchy to create a new Camera element. Create it as a child element of the AnnotationHandling GameObject. Do not tag it as a MainCamera.
Using the cameraToWorldMatrix we can position this camera.

Vector3 position = cameraToWorldMatrix.MultiplyPoint(Vector3.zero);
Quaternion rotation = Quaternion.LookRotation(-cameraToWorldMatrix.GetColumn(2), cameraToWorldMatrix.GetColumn(1));
Camera raycastCamera = this.gameObject.GetComponentInChildren<Camera>();
raycastCamera.transform.position = position;
raycastCamera.transform.rotation = rotation;

The next thing we do is to find the center of each detected annotation and do a raycast at this position. This helps us to determine the distance of the object to the camera. We use the ScreenPointToRay function for that. You need to be aware, that the coordinates used for ScreenToRay are based on the Screen size. This does not mean, that the image has the exact same pixel size. We need to scale the properties accordingly.

For each of the four corners of the rectangle reported by the cloud service a Vector is created. Then the center is calculated. We use this to create the ray.

Vector3 topLeft = CalcTopLeftVector(face);
Vector3 topRight = CalcTopRightVector(face);
Vector3 bottomRight = CalcBottomRightVector(face);
Vector3 bottomLeft = CalcBottomLeftVector(face);
Vector3 raycastPoint = (topLeft + topRight + bottomRight + bottomLeft) / 4;
Ray ray = raycastCamera.ScreenPointToRay(raycastPoint);

With this information we can execute the raycast and use the information to position the annotation:

RaycastHit centerHit;
if (Physics.Raycast(ray, out centerHit, 15.0f, m_raycastLayer))
    GameObject go = Instantiate(m_annotationTemplate) as GameObject;
    go.transform.rotation = Quaternion.LookRotation(Camera.main.transform.forward, Vector3.up);
    go.transform.position = centerHit.point;

There is one more thing we have to add. At the moment the annotation always has the same size. We want to adjust it based on the result reported from the backend. Therefore we create three more rays and calculate the distance at the hit points distance.

Ray topLeftRay = raycastCamera.ScreenPointToRay(topLeft);
Ray topRightRay = raycastCamera.ScreenPointToRay(topRight);
Ray bottomLeftRay = raycastCamera.ScreenPointToRay(bottomLeft);
float distance = centerHit.distance;
float goScaleX = Vector3.Distance(topLeftRay.GetPoint(distance), topRightRay.GetPoint(distance));
float goScaleY = Vector3.Distance( topLeftRay.GetPoint(distance), bottomLeftRay.GetPoint(distance));
go.transform.localScale = new Vector3(0.1f, goScaleX, goScaleY);

Wrap Up

That’s it. Build the application and deploy it to your HoloLens. If you face somebody a frame will be drawn directly around his/her face.

The code for this project is available to download. We would appreciate if you report any bugs you may find.

To report this post you need to login first.

1 Comment

You must be Logged on to comment or reply to a post.

Leave a Reply