← All projects

Autonomous Rover

experimental

A mecanum-wheeled rover built to test one question: can an LLM fuse multiple camera feeds to navigate an indoor space — and skip the usual SLAM and LIDAR stack entirely?

  • Robotics
  • Computer Vision
  • Agentic AI
  • Navigation
The rover on the workbench — mecanum wheels, onboard compute, and a camera bar

My rover is a 4WD robot I designed and built — but really it’s a test bench for a bigger question: how much of the traditional autonomy stack can an LLM replace?

Most indoor robots navigate with SLAM (simultaneous localization and mapping) on ROS2. I want to find out if I can skip that path entirely and go straight to autonomous navigation — using cameras, an LLM, and the spatial reasoning that already lives in CorTex.

The inspiration

I saw a robot in a hospital making a scheduled drop-off of samples to the lab. Hospitals are busy, chaotic places — hallways full of people, carts, and commotion — and this thing was threading through all of it, safely and on schedule. It stuck with me: how is it doing this safely?

The question

Indoor robots usually lean on SLAM. But building a map of a space isn’t the same as understanding it. Self-driving has split into two camps chasing the harder version of this problem: LIDAR (measure the world in 3D) and vision (read the world the way a person does). Tesla famously bet on vision; others bet on LIDAR; some use both.

So here’s my version: can a rover skip the typical ROS2 / SLAM path and go straight to autonomous navigation — with an LLM as the thing that ties the sensors together?

The experiment

To test it without sinking months into it, I’m building a controlled experiment in my shop:

  • A taped-off box on the floor — a known, bounded world to start from.
  • An overhead camera piped through CorTex, which has vision capabilities — a stationary, god’s-eye view of the rover and the space.
  • The rover’s forward-facing camera — the first-person view.
  • (Planned) a LIDAR module for object detection, to put vision and LIDAR head to head.

The core test: can an LLM stitch together the forward-facing and overhead camera feeds to control the rover more precisely than either view could alone?

The hardware: mecanum wheels

I designed the rover with mecanum wheels, which give it some unusual moves — it can strafe sideways and crawl on a diagonal, not just drive and turn.

Mecanum wheels in action — strafing and diagonal moves, not just drive-and-turn.

That freedom comes with a real tradeoff. In motion, mecanum wheels produce so much vibration that the onboard cameras can’t reliably run CV — fine when the rover is parked, but not while it’s navigating. That limitation is exactly why the overhead camera earns its place: it’s stationary, so it stays sharp while the rover moves.

Building it

I designed and built the rover from the chassis up — wheels, wiring, compute, and all. Here’s a bit of that process, including a moment that didn’t go entirely to plan.

Putting the rover together.
The rover on the floor with one mecanum wheel detached a few feet away

Not every test drive ends with all four wheels still attached.

What’s next

The fun part will be testing the forward and overhead views in tandem in a real indoor environment.

I don’t have a LIDAR module on the rover yet. Maybe I’ll add one — or maybe vision and an LLM turn out to be enough, and I skip SLAM completely. That’s what the experiment is for.