VR Multicopter Teleoperation

Third-person VR control system for aerial robots with real-time SLAM

Lead Engineer 2025
WebXR Three.js ROS2 SLAM Jetson Orin ZED

First-person-view (FPV) drone piloting suffers from limited spatial awareness - operators can't see the drone's relationship to nearby obstacles. This leads to collisions, especially in confined indoor environments.

The question: can a third-person virtual reality perspective improve obstacle clearance without slowing the operator down?

Real-time 3D virtual environment mirroring the physical space

Built a system that creates a real-time 3D virtual environment mirroring the physical space. The operator sees the drone from a third-person camera angle in VR, providing spatial context that FPV cannot.

Architecture: ZED stereo camera → real-time SLAM → 3D environment reconstruction → ROS2 bridge → WebXR on Meta Quest 3. A low-latency WebXR-to-ROS2 control bridge runs on an NVIDIA Jetson Orin NX that handles SLAM, ROS2, and WebSocket streaming simultaneously.

VR teleoperation system architecture diagram showing ZED camera, SLAM pipeline, ROS2 bridge, and WebXR interface Multicopter hardware platform with ZED stereo camera and Jetson Orin NX onboard computer
WebXR over native Unity VR

Native VR applications require app store submissions, per-headset SDK integration, and build pipelines. WebXR runs in any headset browser - the same codebase works on Meta Quest 3, Quest Pro, and future devices without modification.

WHY → runs on any headset with a browser, no app store deployment
Third-person perspective with adjustable offset

A fixed third-person camera forces operators to accept a single vantage point that may not suit their spatial reasoning style. An adjustable camera offset - configurable in real time via controller input - lets each operator dial in the viewing angle that feels most natural.

WHY → lets operators find their preferred viewing angle
NVIDIA Jetson Orin NX as bridge computer

The Jetson Orin NX provides enough GPU compute to run ZED SDK SLAM, process point clouds, serve the WebSocket stream, and host the ROS2 control stack concurrently - tasks that would overwhelm a standard embedded board and require an external workstation if done separately.

WHY → handles SLAM, ROS2, and WebSocket streaming simultaneously

Key features

Operator wearing Meta Quest 3 headset piloting the multicopter via the VR teleoperation system
Operator using the Meta Quest 3 headset to pilot the multicopter in third-person VR view
Real-Time SLAM Pipeline
ZED stereo camera generates dense 3D point clouds and tracks drone pose at 30Hz. The environment mesh is progressively refined as the drone explores and streamed live to the VR client so the virtual world stays in sync with the physical one.
Low-Latency VR Control
The WebXR application renders the virtual environment and drone model using Three.js. Controller inputs map to velocity commands sent back through the ROS2 bridge via WebSocket. End-to-end latency is kept under 50ms - imperceptible to most operators.
Comparative Evaluation
Formal user study comparing FPV vs TPV across identical obstacle courses. Measured clearance distances, task completion time, and collision frequency - providing quantitative evidence for the spatial awareness advantage of the third-person perspective.
Physical test environment with obstacle course used for FPV vs TPV evaluation
Obstacle course environment used for the comparative FPV vs TPV user study

By the numbers

Bar chart comparing minimum obstacle clearance distance between FPV and TPV conditions Bar chart comparing task completion time between FPV and TPV conditions
+0.20m Obstacle Clearance Improvement
0 Increase in Task Time
<50ms End-to-End Latency
6-DOF Full Control
arXiv Published Research Paper

The 0.20m clearance improvement may sound modest, but for a micro aerial vehicle in a confined lab, it's the difference between a clean pass and a collision. What surprised me most was that operators didn't slow down - the spatial context from TPV let them plan paths more efficiently, compensating for the overhead of processing a richer visual stream.

The WebXR decision was controversial early on but proved right - it made the system headset-agnostic and dramatically simplified deployment compared to building native VR applications.