Skip to main content

How to Convert a Screen Point to Real-World Position Using Depth

Lightship's depth map output allows for dynamically placing objects in an AR scene without the use of planes or a mesh. This How-To covers the process of choosing a point on the screen and placing an object in 3D space by using the depth output.

Placing Cubes with Depth

Prerequisites

You will need a Unity project with Lightship AR enabled. For more information, see Installing ARDK 3.

tip

If this is your first time using depth, Accessing and Displaying Depth Information provides a simpler use case for depth and is easier to start with.

Steps

If the main scene is not AR-ready, set it up:

  1. Remove the Main Camera.

  2. Add an ARSession and XROrigin to the Hierarchy, then add an AR Occlusion Manager Component to either of them. If you want higher quality occlusions, see How to Set Up Real-World Occlusion to learn how to use the LightshipOcclusionExtension.

    AR Session and XR OriginAR Occlusion Manager
  3. Create a script that will handle depth picking and placing prefabs. Name it Depth_ScreenToWorldPosition.

  4. Add Camera Update Events:

    1. This will store the most recently used DisplayMatrix to later align the depth texture to the screen space.

    2. Add a serialized ARCameraManager and a private Matrix4x4 field to the top of the script.

      [SerializeField]
      private ARCameraManager _arCameraManager;

      private Matrix4x4 _displayMatrix;
    3. Subscribe to OnCameraFrameEventReceived for receiving camera updates.

          private void OnCameraFrameEventReceived(ARCameraFrameEventArgs args)
      {
      // Cache the screen to image transform
      if (args.displayMatrix.HasValue)
      {
      #if UNITY_IOS
      _displayMatrix = args.displayMatrix.Value.transpose;
      #else
      _displayMatrix = args.displayMatrix.Value;
      #endif
      }
      }
      note

      You must transpose the DisplayMatrix on iOS.

  5. Collect Depth Images on Update

    1. Add a serialized AROcclusionManager and a private XRCpuImage field.

      [SerializeField]
      private AROcclusionManager _occlusionManager;

      private XRCpuImage? _depthimage;
      note

      Depth images are surfaced from the Machine Learning model rotated 90 degrees and need to be sampled with respect to the display transform. The display transform gives a mapping to convert from screen space to the ML coordinate system. XRCpuImage is being used instead of a GPU Texture, so the Sample(Vector2 uv, Matrix4x4 transform) method can be used on the CPU.

    2. In the Update method:

      1. Check that the XROcclusionSubsystem is valid and running.
      2. Call _occlusionManager.TryAcquireEnvironmentDepthCpuImage to retrieve the latest depth image form the AROcclusionManager.
      3. Dispose the old depth image and cache the new value.
      void Update()
      {
      if (!_occlusionManager.subsystem.running)
      {
      return;
      }
      if (_occlusionManager.TryAcquireEnvironmentDepthCpuImage(out var image))
      {
      // Dispose the old image
      _depthImage?.Dispose();
      _depthImage = image;
      }
      else
      {
      return;
      }
      }
  6. Set up code to Handle Touch Inputs:

    1. Create a private Method named "HandleTouch".
    2. In editor, we'll use "Input.MouseDown" to detect mouse clicks.
    3. For phone, the "Input.GetTouch"
    4. Then, get the 2D screenPosition Coordinates from the device.
    private void HandleTouch()
    {
    // in the editor we want to use mouse clicks, on phones we want touches.
    #if UNITY_EDITOR
    if (Input.GetMouseButtonDown(0) || Input.GetMouseButtonDown(1) || Input.GetMouseButtonDown(2))
    {
    var screenPosition = new Vector2(Input.mousePosition.x, Input.mousePosition.y);
    #else
    //if there is no touch or touch selects UI element
    if (Input.touchCount <= 0)
    return;
    var touch = Input.GetTouch(0);

    // only count touches that just began
    if (touch.phase == UnityEngine.TouchPhase.Began)
    {
    var screenPosition = touch.position;
    #endif
    // do something with touches
    }
    }
    }
  7. Convert touch points from the screen to 3D Coordinates using Depth

    1. In the HandleTouch method, check for a valid depth image when a touch is detected.

          // do something with touches
      if (_depthImage.HasValue)
      {
      // 1. Sample eye depth

      // 2. Get world position

      // 3. Spawn a thing on the depth map
      }
    2. Sample the depth image at the screenPosition to get the z-value

      // 1. Sample eye depth
      var uv = new Vector2(screenPosition.x / Screen.width, screenPosition.y / Screen.height);
      var eyeDepth = (float) _depthImage.Value.Sample(uv, _displayMatrix);
    3. Add a Camera field to the top of script:

      [SerializeField]
      private Camera _camera;
    4. This will use Unity's Camera.ScreenToWorldPoint function. Call the method in "HandleTouch" to convert screenPosition and eyeDepth to worldPositions.

      // 2. Get world position
      var worldPosition =
      _camera.ScreenToWorldPoint(new Vector3(screenPosition.x, screenPosition.y, eyeDepth));
    5. Spawn a GameObject at this location in world space:

    6. Add a GameObject field to the top of the script:

      [SerializeField]
      private GameObject _prefabToSpawn;
    7. Instantiate a copy of this prefab at this position:

      // 3. Spawn a thing on the depth map
      Instantiate(_prefabToSpawn, worldPosition, Quaternion.identity);
  8. Add HandleTouch to the end of the Update method.

        private void HandleTouch()
    {
    // in the editor we want to use mouse clicks, on phones we want touches.
    #if UNITY_EDITOR
    if (Input.GetMouseButtonDown(0) || Input.GetMouseButtonDown(1) || Input.GetMouseButtonDown(2))
    {
    var screenPosition = new Vector2(Input.mousePosition.x, Input.mousePosition.y);
    #else
    //if there is no touch or touch selects UI element
    if (Input.touchCount <= 0)
    return;
    var touch = Input.GetTouch(0);

    // only count touches that just began
    if (touch.phase == UnityEngine.TouchPhase.Began)
    {
    var screenPosition = touch.position;
    #endif
    // do something with touches
    if (_depthImage.HasValue)
    {
    // Sample eye depth
    var uv = new Vector2(screenPosition.x / Screen.width, screenPosition.y / Screen.height);
    var eyeDepth = (float) _depthImage.Value.Sample(uv, _displayMatrix);

    // Get world position
    var worldPosition =
    _camera.ScreenToWorldPoint(new Vector3(screenPosition.x, screenPosition.y, eyeDepth));

    //spawn a thing on the depth map
    Instantiate(_prefabToSpawn, worldPosition, Quaternion.identity);
    }
    }
    }
  9. Add the Depth_ScreenToWorldPosition script as a Component of the XROrigin in the Hierarchy:

    1. In the Hierarchy window, select the XROrigin, then click Add Component in the Inspector.
    2. Search for the Depth_ScreenToWorldPosition script, then select it.
  10. Create a Cube to use as the object that will spawn into the scene:

    1. In the Hierarchy, right-click, then, in the Create menu, mouse over 3D Object and select Cube.
    2. Drag the new Cube object from the Hierarchy to the Assets window to create a prefab of it, then delete it from the Hierarchy. (The Cube in the Assets window should remain.)
  11. Assign the fields in the Depth_ScreenToWorldPosition script:

    1. In the Hierarchy window, select the XROrigin, then expand the Depth_ScreenToWorldPosition Component in the Inspector window.
    2. Assign the XROrigin to the AROcclusionManager field.
    3. Assign the Main Camera to the Camera field.
    4. Assign the Main Camera to the ARCameraManager field.
    5. Assign your new Cube prefab to the Prefab to Spawn field.
    Depth_ScreenToWorldPosition editor properties
  12. Try running the scene in-editor using Playback or open Build Settings, then click Build and Run to build to device.

    Placing Cubes with Depth
  13. If something did not work, double check the steps above and compare your script to the one below.

Click to show the Depth_ScreenToWorldPosition script
using Niantic.Lightship.AR.Utilities;
using UnityEngine;
using UnityEngine.XR.ARFoundation;
using UnityEngine.XR.ARSubsystems;

public class Depth_ScreenToWorldPosition : MonoBehaviour
{
[SerializeField]
private AROcclusionManager _occlusionManager;

[SerializeField]
private ARCameraManager _arCameraManager;

[SerializeField]
private Camera _camera;

[SerializeField]
private GameObject _prefabToSpawn;

private Matrix4x4 _displayMatrix;
private XRCpuImage? _depthImage;

private void OnEnable()
{
_arCameraManager.frameReceived += OnCameraFrameEventReceived;
}

private void OnDisable()
{
_arCameraManager.frameReceived -= OnCameraFrameEventReceived;
}

private void OnCameraFrameEventReceived(ARCameraFrameEventArgs args)
{
// Cache the screen to image transform
if (args.displayMatrix.HasValue)
{
#if UNITY_IOS
_displayMatrix = args.displayMatrix.Value.transpose;
#else
_displayMatrix = args.displayMatrix.Value;
#endif
}
}

void Update()
{
if (!_occlusionManager.subsystem.running)
{
return;
}
if (_occlusionManager.TryAcquireEnvironmentDepthCpuImage(out
var image))
{
// Dispose the old image
_depthImage?.Dispose();
_depthImage = image;
}
else
{
return;
}

HandleTouch();
}

private void HandleTouch()
{
// in the editor we want to use mouse clicks, on phones we want touches.
#if UNITY_EDITOR
if (Input.GetMouseButtonDown(0) || Input.GetMouseButtonDown(1) || Input.GetMouseButtonDown(2))
{
var screenPosition = new Vector2(Input.mousePosition.x, Input.mousePosition.y);
#else
//if there is no touch or touch selects UI element
if (Input.touchCount <= 0)
return;
var touch = Input.GetTouch(0);

// only count touches that just began
if (touch.phase == UnityEngine.TouchPhase.Began)
{
var screenPosition = touch.position;
#endif
// do something with touches
if (_depthImage.HasValue)
{
// Sample eye depth
var uv = new Vector2(screenPosition.x / Screen.width, screenPosition.y / Screen.height);
var eyeDepth = (float) _depthImage.Value.Sample(uv, _displayMatrix);

// Get world position
var worldPosition =
_camera.ScreenToWorldPoint(new Vector3(screenPosition.x, screenPosition.y, eyeDepth));

//spawn a thing on the depth map
Instantiate(_prefabToSpawn, worldPosition, Quaternion.identity);
}
}
}
}

More Information

You can also try combining this How-To with ObjectDetection or Semantics to know where things are in 3D space.

Placing Cubes with Depth and Object Detection