VR Multicopter Teleoperation - Garegin Mazmanyan

The Problem

First-person-view (FPV) drone piloting suffers from limited spatial awareness - operators can't see the drone's relationship to nearby obstacles. This leads to collisions, especially in confined indoor environments.

The question: can a third-person virtual reality perspective improve obstacle clearance without slowing the operator down?

Approach

Real-time 3D virtual environment mirroring the physical space

Built a system that creates a real-time 3D virtual environment mirroring the physical space. The operator sees the drone from a third-person camera angle in VR, providing spatial context that FPV cannot.

Architecture: ZED stereo camera → real-time SLAM → 3D environment reconstruction → ROS2 bridge → WebXR on Meta Quest 3. A low-latency WebXR-to-ROS2 control bridge runs on an NVIDIA Jetson Orin NX that handles SLAM, ROS2, and WebSocket streaming simultaneously.

WebXR over native Unity VR

Native VR applications require app store submissions, per-headset SDK integration, and build pipelines. WebXR runs in any headset browser - the same codebase works on Meta Quest 3, Quest Pro, and future devices without modification.

WHY → runs on any headset with a browser, no app store deployment

Third-person perspective with adjustable offset

A fixed third-person camera forces operators to accept a single vantage point that may not suit their spatial reasoning style. An adjustable camera offset - configurable in real time via controller input - lets each operator dial in the viewing angle that feels most natural.

WHY → lets operators find their preferred viewing angle

NVIDIA Jetson Orin NX as bridge computer

The Jetson Orin NX provides enough GPU compute to run ZED SDK SLAM, process point clouds, serve the WebSocket stream, and host the ROS2 control stack concurrently - tasks that would overwhelm a standard embedded board and require an external workstation if done separately.

WHY → handles SLAM, ROS2, and WebSocket streaming simultaneously

Gallery

VR teleoperation system architecture diagram

Multicopter hardware with ZED camera and Jetson Orin NX

Operator wearing Meta Quest 3 headset piloting the multicopter

Physical test environment with obstacle course

Bar chart comparing clearance distance between FPV and TPV

Bar chart comparing task completion time between FPV and TPV

System architecture: ZED camera → SLAM → ROS2 bridge → WebXR on Meta Quest 3

Implementation

Key features

Real-Time SLAM Pipeline

ZED stereo camera generates dense 3D point clouds and tracks drone pose at 30Hz. The environment mesh is progressively refined as the drone explores and streamed live to the VR client so the virtual world stays in sync with the physical one.

Low-Latency VR Control

The WebXR application renders the virtual environment and drone model using Three.js. Controller inputs map to velocity commands sent back through the ROS2 bridge via WebSocket. End-to-end latency is kept under 50ms - imperceptible to most operators.

Comparative Evaluation

Formal user study comparing FPV vs TPV across identical obstacle courses. Measured clearance distances, task completion time, and collision frequency - providing quantitative evidence for the spatial awareness advantage of the third-person perspective.

Results

By the numbers

+0.20m Obstacle Clearance Improvement

0 Increase in Task Time

<50ms End-to-End Latency

6-DOF Full Control

arXiv Published Research Paper

Reflection

The 0.20m clearance improvement may sound modest, but for a micro aerial vehicle in a confined lab, it's the difference between a clean pass and a collision. What surprised me most was that operators didn't slow down - the spatial context from TPV let them plan paths more efficiently, compensating for the overhead of processing a richer visual stream.

The WebXR decision was controversial early on but proved right - it made the system headset-agnostic and dramatically simplified deployment compared to building native VR applications.

Publication

Read the paper on arXiv View on ResearchGate View source code Watch on YouTube