Localizing the AR Cloud with Replay Data

Hiroyuki Makino
5 min readFeb 24, 2021

- Restoring position with recorded AR data for use in testing/debugging -

In my previous article, I introduced an approach to test AR by playing back AR related data on the editor to make testing and debugging more efficient. The other day, x garden which is a creative co-creation company in the XR industry that supports XR media, new businesses, and the creator community tried out this technology and evaluated the efficiency of testing in AR development. As a result, I received feedback that the verification of AR, which took about 270 seconds per item, resulted in a reduction in man-hours of about 20 seconds, or more than 90%. They were able to do a lot of testing without having to build and deploy to their device, since there were many kinds of AR items to handle, and the number of test items increased.
Now, I’m going to take a closer look at the replay data to see if we can test AR on a larger scale, such as permanently putting AR contents in a specific location.

To permanently place AR contents to a specific location, we use AR cloud technology.

What is AR Cloud?
To quote Ori Inber, a well-known AR expert, the definition of an AR cloud is,
1) A persistent point cloud aligned with real world coordinates — a shared soft-copy of the world
2) The ability to instantly localize (align the world’s soft-copy with the world itself) from anywhere and on multi devices
3) The ability to place virtual content in the world’s soft-copy and interact with it in real-time, on-device and remotely

In other words, the AR cloud allows us to collect and store information about a location, share it across multiple devices, and tie in virtual content to manage it.

AR application using AR cloud is developed and used in the following flow.
- Creating a map — To gather spatial information
- Placing Contents — To place contents in a map
- Localization — To enter spatial information to align

In applications that use the AR cloud, the content is linked to the location, so it’s assumed that you will visit the location and check the behavior of AR. In the case of remote locations, it’s not easy to go to the site for testing every time the code is changed, so it’s considered efficient to be able to test with the data stored in advance.
In this test, I will input replay data and see if localization is possible when we confirm the content by localization.

I used Immersal SDK for AR cloud.
Immersal SDK realizes AR cloud by computer vision technology based on camera images, and it’s implemented on AR Foundation, so I thought it had high affinity with recorded replay data.

Comparing localizing images on the device and localizing with replay data on the editor

Implementation

I will implement it in the following steps.

Replay recorded camera image and AR related data (CameraIntrinsics) → Request to REST API of Immersal → Synchronize Pose (Camera position and rotation) information

OnCameraFrameReceived() function of Reproducer.cs, the part of the AR data to be sent during playback, as introduced in the previous article, is implemented to be sent to the Localizer API of Immersal. Localizers is to be implemented to process replayed images to localize instead of retrieving them from the devices’ camera. (Note: after successful localization, the coordinate transformation process is the same as in ServerLocalizer of ARLocalizer in the Immersal SDK.)

void OnCameraFrameReceived(ARCameraFrameEventArgs args)
{
var cameraImagePath = dirPath + args.timestampNs.Value + ".jpg";
if (File.Exists(cameraImagePath))
{
FileStream stream = File.OpenRead(cameraImagePath);
var data = new byte[stream.Length];
stream.Read(data, 0, (int)stream.Length);
tex.LoadImage(data);
float curTime = Time.unscaledTime;
if (!arcloud.isLocalizing && (curTime - m_LastLocalizeTime) >= m_LocalizationInterval)
{
m_LastLocalizeTime = curTime;
arcloud.LocalizeServer(data);
}
stream.Close();
}
}
using UnityEngine;
using Immersal;
using Immersal.AR;
using Immersal.REST;
using static Immersal.AR.ARLocalizer;public class ARCloud : LocalizerBase
{
public event MapChanged OnMapChanged = null;
public event PoseFound OnPoseFound = null; GameObject arcontents;
MeshRenderer maprenderer;
int mapId; public override void Start()
{
m_Sdk = ImmersalSDK.Instance;
mapId = GameObject.Find("AR Map").GetComponent<ARMap>().serverMapId; arcontents = GameObject.Find("AR Contents");
arcontents.SetActive(false); var armapmesh = GameObject.Find("AR MapMesh");
if (armapmesh != null)
{
armapmesh.SetActive(false);
} maprenderer = GameObject.Find("AR Map").GetComponent<ARMap>().m_MeshRenderer;
maprenderer.enabled = false;
} public async void LocalizeServer(byte[] pixels)
{
if (pixels != null)
{
JobLocalizeServerAsync j = new JobLocalizeServerAsync(); j.mapIds = new SDKMapId[1];
j.mapIds[0] = new SDKMapId();
j.mapIds[0].id = mapId;
Camera cam = Camera.main;
Vector3 camPos = cam.transform.position;
Quaternion camRot = cam.transform.rotation;
j.rotation = camRot;
j.position = camPos;
ARHelper.GetIntrinsics(out j.intrinsics);
//JobLocalizeServerAsyncEnter the replayed camera image data in the parameter j.image of request j
j.image = pixels;
[The request and response processing is the same as the
ServerLocalizer in the Immersal SDK, so it is omitted.]
// show up contents
arcontents.SetActive(true);
maprenderer.enabled = true;
}
}
}

The Camera Intrinsics (The camera’s internal parameters, focalLength and principalPoint) required for the localizer API parameters vary from device to device, so they can be recorded together when AR data is recorded.

// Get camera intrinsics
cameraManager.TryGetIntrinsics(out cameraIntrinsics);var packet = new ARKitRemotePacket()
{
cameraFrame = new ARKitRemotePacket.CameraFrameEvent()
{
timestampNs = args.timestampNs.Value,
projectionMatrix = args.projectionMatrix.Value,
displayMatrix = args.displayMatrix.Value
},
cameraIntrinsics = new ARKitRemotePacket.CameraIntrinsics()
{
focalLength = cameraIntrinsics.focalLength,
principalPoint = cameraIntrinsics.principalPoint,
resolution = cameraIntrinsics.resolution
}

};

When retrieving CameraIntrinsics, you should return the camera’s focal length and principal point from the AR data frame played back by TryGetIntrinsics().

public override bool TryGetIntrinsics(out XRCameraIntrinsics cameraIntrinsics)
{
var remote = ARKitReceiver.Instance;
if (remote == null)
{
cameraIntrinsics = default(XRCameraIntrinsics);
return false;
} var remoteFrame = ARKitReceiver.Instance.CameraIntrinsics;
if (remoteFrame == null)
{
cameraIntrinsics = default(XRCameraIntrinsics);
return false;
} cameraIntrinsics = new CameraIntrinsics()
{
focalLength = remoteFrame.focalLength,
principalPoint = remoteFrame.principalPoint,
resolution = remoteFrame.resolution
};

return true;
}

Result

When run in Play mode on the Unity Editor, recorded AR data was replayed, localized in seconds, and AR contents placed on the map were restored.
You’ll see similar results when you build and run it on the device.

Localization results on editor (left figure) and on device after build and deploy (right figure)

Now we can use the replay data to perform unit tests, such as using the Unity Test Framework to see if it’s in the right place, without having to build it every time you make a change to the code and go out there. It is also possible to automate testing for large applications with the AR cloud using CI/CD tools.

Summary
By playing back the recorded AR data, I confirmed that the AR cloud can be localized. This makes it possible to check the behavior of the content linked to the real space position on the editor, and it is thought that the development of the application which interacts with the real space becomes familiar.

In the future, in order to make it easier to author and develop content linked to the AR cloud, it will be necessary to combine deep learning inferences, generate data for robust testing, and examine the appearance of AR to improve test scenarios. Thanks for reading, I welcome your comments and ideas for creating the future of the XR.

--

--

Hiroyuki Makino

XR Metaverse Researcher, R&D Engineer at NTT, Japan. Excited for the future of AR and what amazing people create.