How does SLAM work in simple terms?

SLAM builds a map and tracks the robot's location at the same time. As the robot moves, it takes sensor readings and matches them against what it has seen before to estimate both where it is and what the environment looks like. When it recognizes a place it visited earlier (loop closure), it corrects any accumulated drift in both the map and its trajectory.

What sensors are needed for SLAM?

SLAM can work with many sensor types. 2D LiDAR is the most common for indoor ground robots. 3D LiDAR is used for outdoor and autonomous vehicle SLAM. Cameras enable visual SLAM (vSLAM), which is cheaper but more sensitive to lighting. Many systems fuse multiple sensors — for example, LiDAR plus IMU, or stereo camera plus wheel odometry — for greater robustness.

Can SLAM work in real time?

Yes, modern SLAM algorithms are designed for real-time operation. 2D LiDAR SLAM (such as GMapping or Cartographer) runs comfortably at 10 to 30 Hz on a standard laptop CPU. Visual SLAM systems like ORB-SLAM3 also achieve real-time performance. However, large-scale mapping with dense 3D point clouds may require GPU acceleration or careful tuning to maintain real-time rates.

The Chicken-and-Egg Problem

Here's a riddle: to build a map, you need to know where you are. To know where you are, you need a map. So if you have neither... where do you start?

That's the SLAM problem — Simultaneous Localization And Mapping. The robot must build a map of an unknown environment while simultaneously figuring out its position on that incomplete map. It's like drawing a floor plan of a dark building while blindfolded, using only your footsteps and a flashlight to sense nearby walls.

SLAM chicken-and-egg problem — circular dependency between mapping (needs position) and localization (needs map) — SLAM's fundamental challenge: you need your position to build the map, but you need the map to find your position. SLAM algorithms break this circular dependency by jointly estimating both, updating each as new sensor data arrives.

SLAM is considered one of the fundamental problems in mobile robotics, and solving it robustly is what separates toy robots from serious autonomous systems.

Why SLAM is Hard

Let's break down the difficulty:

Circular Dependency

You need your position to add sensor data to the map in the right place. You need the map to figure out your position. Every measurement depends on every previous measurement — errors compound.

Sensor Drift

Odometry (wheel encoders, IMU) accumulates error over time. Walk 100 meters counting steps, and you might be off by several meters. The map you're building is based on these drifting position estimates.

Data Association

When the robot sees a corner, is it a new corner to add to the map, or one it saw 30 seconds ago? Getting this wrong creates duplicate landmarks or destroys your map.

Scale

A building might have thousands of landmarks. Tracking the relationships between all of them requires sophisticated data structures.

Note

The breakthrough that made SLAM practical was realizing you don't need to maintain a perfect map constantly — you just need to track uncertainty and occasionally correct accumulated errors through "loop closure" (which we'll cover in the next lesson).

Two Main Approaches

SLAM algorithms fall into two broad categories:

1. Feature-Based SLAM

The robot extracts distinctive landmarks (corners, edges, specific objects) from sensor data and tracks them as it moves. The map is a list of landmark positions.

Feature-based SLAM — showing how the robot tracks distinctive landmarks to simultaneously estimate its trajectory and the map — Feature-based SLAM extracts distinctive landmarks (corners, edges, objects) from sensor data and tracks them across frames. By triangulating against known landmarks, the robot estimates its position while adding new landmarks to the map.

How it works:

Detect features in sensor data (camera: corners, edges; LiDAR: lines, planes)
Match features across consecutive observations to track them
Use feature positions to estimate robot motion
Refine both robot trajectory and feature positions together

Pros:

Compact maps (just landmark coordinates)
Fast for sparse environments
Good for visual SLAM with cameras

Cons:

Requires distinctive features (fails in featureless hallways)
Matching features across views is hard

Feature-Based SLAM Outline

class FeatureBasedSLAM:
    def __init__(self):
        self.landmarks = []  # List of (x, y) positions
        self.robot_pose = Pose(0, 0, 0)
 
    def process_scan(self, sensor_data):
        # Extract features from sensor data
        observed_features = detect_features(sensor_data)
 
        # Match to known landmarks (or create new ones)
        for feature in observed_features:
            landmark_id = match_to_landmarks(feature, self.landmarks)
            if landmark_id is None:
                # New landmark
                self.landmarks.append(feature.position)
            else:
                # Update robot pose using known landmark
                correct_pose_estimate(self.robot_pose, landmark_id, feature)
 
        # Update landmark positions based on refined pose
        refine_landmark_positions()

2. Graph-Based SLAM

Instead of maintaining a single map, the algorithm builds a graph where nodes are robot poses at different times, and edges are spatial constraints (odometry measurements, landmark observations).

Graph-based SLAM — pose graph with nodes at robot positions and edges representing odometry and loop closure constraints — Graph-based SLAM builds a graph of poses (nodes) connected by spatial constraints (edges). Odometry edges link consecutive poses. Loop closure edges connect revisited locations. Optimization adjusts all poses to satisfy every constraint as well as possible.

The magic happens during graph optimization — periodically, the algorithm adjusts all poses to make the constraints as consistent as possible. This is like adjusting a house of cards until all the pieces fit together without contradiction.

How it works:

Add a node for each robot pose as it moves
Add edges between consecutive poses (from odometry)
When recognizing a previous location, add a "loop closure" edge
Periodically optimize the graph to minimize constraint violations
Build the map from the optimized trajectory

Pros:

Handles large-scale environments
Naturally incorporates loop closure
Can produce occupancy grids or feature maps

Cons:

Periodic optimization can be slow
Requires good loop closure detection

Tip

Modern SLAM systems often combine both approaches — use features for tracking frame-to-frame, but maintain a pose graph for global optimization.

The Role of Filtering vs. Smoothing

Filtering vs smoothing — comparing EKF-SLAM (filters forward only) with graph-based SLAM (smoothes entire trajectory) — Filtering processes data forward-only — once a pose is estimated, it's locked in. Smoothing (graph-based) can revise past poses when new evidence arrives, like loop closure. This makes smoothing produce better maps but at higher computational cost.

There are two philosophies for handling SLAM uncertainty:

Filtering (EKF-SLAM, FastSLAM)

Maintain a probability distribution over the current robot pose and map. Update this distribution incrementally with each new measurement. Once a decision is made about past poses, it's locked in.

Fast per-update
Can't revise history
Errors accumulate

Smoothing (Graph-Based SLAM)

Maintain a history of poses and constraints. Periodically re-optimize the entire trajectory, revising past poses if new evidence (like loop closure) suggests they were wrong.

Slower periodic optimization
Can correct past errors
Better final map quality

Most modern SLAM systems use smoothing because loop closure (next lesson) is critical for long-term accuracy, and smoothing naturally incorporates it.

Visual SLAM vs. LiDAR SLAM

SLAM can use different sensors:

Sensor Type	Pros	Cons	Common Use
Camera	Rich features, cheap, passive	Struggles in darkness, affected by lighting	Drones, AR/VR headsets
LiDAR	Works in any lighting, accurate depth	Expensive, can't read text/color	Self-driving cars, indoor robots
Both (Fusion)	Best of both worlds	Complexity, sensor synchronization	High-end autonomous systems

Visual SLAM (using cameras) is called VSLAM and often uses feature matching (ORB-SLAM, PTAM). LiDAR SLAM often uses scan matching (ICP, Cartographer).

What's Next?

SLAM works surprisingly well for short-term mapping, but there's a catch: errors slowly accumulate as the robot explores. The solution is loop closure — recognizing when you've returned to a previously visited place. That's our next lesson, and it's the secret ingredient that makes long-term SLAM possible.

SLAM Explained

The Chicken-and-Egg Problem

Why SLAM is Hard

Two Main Approaches

1. Feature-Based SLAM

2. Graph-Based SLAM

The Role of Filtering vs. Smoothing

Visual SLAM vs. LiDAR SLAM

What's Next?

Frequently Asked Questions

Further Reading

Related Lessons

Discussion