Can my rover skip SLAM?

There’s a robot I can’t stop thinking about.

I saw it in a hospital, making a scheduled drop-off of samples to the lab. Hospitals are some of the busiest indoor spaces there are — hallways full of people, carts, and constant commotion — and this thing was threading through all of it, calmly and on schedule. My first thought wasn’t “neat robot.” It was: how is it doing this safely?

That question turned into an experiment running in my shop.

The usual answer: SLAM

Most indoor robots navigate with SLAM — simultaneous localization and mapping. The robot builds a map of the space and works out where it sits inside that map, all at once. It works. But building a map of a room isn’t the same as understanding it, and going down that road usually means committing to the full ROS2 stack before the robot does anything useful.

Self-driving cars are fighting the harder version of the same problem, and they’ve split into two camps: LIDAR, which measures the world in 3D, and vision, which reads the world the way a person does. Tesla famously bet on cameras. Others bet on LIDAR. Some hedge and use both.

Which got me to my own version of the question.

My bet

Can a robot skip the typical ROS2 / SLAM path entirely and go straight to autonomous navigation — with an LLM as the thing that ties the sensors together?

I don’t know yet. That’s the honest answer, and it’s exactly why I’m running an experiment instead of writing a manifesto.

The experiment

I’m keeping it small and controlled so I can actually learn something:

A box taped out on my shop floor — a known, bounded world to start in.
An overhead camera, piped through CorTex, for a stationary, god’s-eye view of the rover and the space.
The rover’s own forward-facing camera — the first-person view.
And maybe a LIDAR module for object detection, so I can put vision and LIDAR side by side.

The question at the center of it: can an LLM stitch together the forward and overhead camera feeds and drive the rover more precisely than either view could on its own?

The mecanum wrinkle

There’s a twist baked into the hardware. I built the rover on mecanum wheels, so it can strafe sideways and crawl on a diagonal — great for tight, precise moves.

Close-up of the rover's mecanum wheels with the chassis lifted off — Mecanum wheels buy precise, sideways moves — but in motion they shake the onboard camera useless.

The catch: in motion, those wheels shake the chassis hard enough that the onboard camera can’t do reliable CV. Fine parked, useless while driving.

That one annoyance is why the overhead camera matters so much. It doesn’t move, so it stays sharp exactly when the rover’s own eyes go blurry. The two views aren’t redundant — they cover for each other.

What I expect to learn

I genuinely don’t know which way this goes. Maybe vision plus an LLM is enough and I never bolt a LIDAR module onto the rover at all. Maybe I hit a wall and discover SLAM was load-bearing for reasons I haven’t appreciated yet. Either way I learn something real — and that beats assuming.

The fun part is next: testing the forward and overhead views in tandem, in a messier space than a taped box on the floor. I’ll report back.