LiDAR vs Camera for Robots: Which Sensor Should You Use?
Every robot that interacts with the physical world needs to perceive its environment. The two dominant sensing modalities in robotics are LiDAR (Light Detection and Ranging) and cameras. Each has devoted advocates, and the question of which to use is one of the most frequent debates in robotics engineering.
The short answer is: it depends on your application, budget, environment, and what information you need to extract. The long answer is this article. By the end, you will understand how each sensor works at a physical level, where each excels and struggles, what they cost, and when you should use both.
How LiDAR Works
A LiDAR sensor measures distance by emitting laser pulses and timing how long they take to bounce back from surfaces. The fundamental equation is:
distance = (speed_of_light * time_of_flight) / 2
The division by 2 accounts for the round trip.
A single laser beam measures distance along one direction. To build a map of the environment, LiDAR sensors use one of several strategies:
Mechanical spinning LiDAR (e.g., Velodyne, Ouster) rotates one or more laser emitters 360 degrees around a vertical axis. This produces a point cloud -- a set of 3D points representing surfaces in the environment. A 64-channel spinning LiDAR produces hundreds of thousands of points per second.
Solid-state LiDAR (e.g., Livox, Hesai) uses MEMS mirrors or optical phased arrays to steer the beam electronically, with no moving parts. These sensors are more durable and cheaper than mechanical LiDARs but typically have a narrower field of view.
2D LiDAR (e.g., RPLIDAR, Hokuyo) scans a single horizontal plane with a rotating laser. These are the workhorses of indoor mobile robot navigation: affordable, reliable, and sufficient for obstacle avoidance and SLAM in structured environments.
The output of a LiDAR sensor is a point cloud: a collection of (x, y, z) coordinates, often with an intensity value indicating how strongly the laser reflected off the surface.
import numpy as np
# A LiDAR point cloud is fundamentally an Nx3 (or Nx4) array
# Each row is a 3D point: [x, y, z] in meters, optionally with intensity
point_cloud = np.array([
[1.23, 0.45, 0.02], # Point on a nearby wall
[3.56, 1.22, 0.05], # Point on a farther surface
[0.89, -0.34, 1.50], # Point on a shelf above
# ... hundreds of thousands more
])
# Basic filtering: remove points beyond 10 meters
max_range = 10.0
distances = np.linalg.norm(point_cloud, axis=1)
filtered = point_cloud[distances < max_range]
print(f"Kept {len(filtered)} of {len(point_cloud)} points within {max_range}m")How Cameras Work
A camera captures light reflected from the environment and projects it onto a 2D image sensor (CCD or CMOS). Each pixel records color intensity (RGB) and possibly depth information.
Monocular cameras produce 2D color images. They capture rich visual information -- color, texture, edges, shapes -- but do not directly measure depth. Depth can be estimated from a single image using deep learning (monocular depth estimation) or by tracking feature points across multiple frames (structure from motion).
Stereo cameras (e.g., ZED, Intel RealSense D400 series) use two cameras separated by a known baseline distance. By matching features between the left and right images, the system computes a disparity map, which converts to a depth map using triangulation:
depth = (focal_length * baseline) / disparity
Structured light cameras (e.g., Intel RealSense SR300, early Microsoft Kinect) project a known pattern (dots or stripes) onto the scene and analyze the deformation of the pattern to compute depth. These work well indoors but fail in sunlight, which overwhelms the projected pattern.
Time-of-flight (ToF) cameras (e.g., Microsoft Azure Kinect, PMD sensors) emit modulated infrared light and measure the phase shift of the returned signal to compute depth. These are technically a form of LiDAR but are packaged as camera-like devices with depth maps rather than point clouds.
The output of a camera is a 2D image, typically 640x480 to 1920x1080 pixels, at 30-60 frames per second:
import numpy as np
# An RGB image is a Height x Width x 3 array of uint8 values
image = np.zeros((480, 640, 3), dtype=np.uint8)
# A depth image from a stereo/ToF camera is Height x Width of float32 (meters)
depth_image = np.zeros((480, 640), dtype=np.float32)
# Access a specific pixel's depth
row, col = 240, 320 # Center of the image
center_depth = depth_image[row, col]
print(f"Depth at image center: {center_depth:.2f} meters")Strengths and Weaknesses
LiDAR Strengths
Accurate range measurement. LiDAR measures distance directly with centimeter or millimeter precision. There is no estimation, no inference, no model required. A point at 5.23 meters is at 5.23 meters.
Works in any lighting. LiDAR uses its own light source (a laser) and is unaffected by ambient lighting conditions. It works equally well in pitch darkness, blinding sunlight, and everything in between.
Direct 3D geometry. A 3D LiDAR produces a point cloud that directly represents the geometry of the environment. No processing is needed to get spatial structure -- it is the raw output.
Long range. High-end LiDARs can measure distances beyond 200 meters. This is critical for autonomous vehicles operating at highway speeds, where the stopping distance can exceed 100 meters.
High angular accuracy. The laser beam is narrow (typically 0.1-0.4 degree divergence), allowing precise angular measurement of where objects are.
LiDAR Weaknesses
No color or texture. LiDAR sees geometry but not appearance. It cannot distinguish a red stop sign from a green go sign. It cannot read text, recognize faces, or identify object types from appearance alone.
Sparse data. Even a high-end 128-channel LiDAR produces far fewer data points per frame than a camera produces pixels. At 50 meters, the vertical spacing between scan lines can be 30-50 cm, meaning small objects (pedestrians, cyclists) are represented by only a handful of points.
Cost. A 2D LiDAR suitable for indoor mobile robots costs $100 to $500. A 3D LiDAR suitable for outdoor autonomous vehicles costs $1,000 to $10,000 or more. Solid-state LiDARs are bringing prices down, but cameras are still an order of magnitude cheaper for equivalent coverage.
Reflective and transparent surfaces. LiDAR struggles with mirrors, glass windows, and highly reflective surfaces. The laser beam reflects away from the sensor, producing either no return or a false return. This is a significant issue for indoor robots operating near glass doors and windows.
Weather sensitivity. Rain, snow, fog, and dust scatter the laser beam, reducing range and creating noise. Heavy rain can halve the effective range of a LiDAR sensor.
Camera Strengths
Rich visual information. Cameras capture color, texture, patterns, and fine detail. This enables tasks that LiDAR cannot do: reading signs, recognizing objects by appearance, detecting lane markings, identifying people.
Dense data. A 1080p camera produces over 2 million pixels per frame. Every pixel contains information. This density makes cameras excellent for detecting and classifying objects, even small or distant ones.
Low cost. A high-quality global shutter industrial camera costs $50 to $300. A webcam or Raspberry Pi camera module costs $5 to $25. For budget-constrained projects, cameras are often the only viable sensor.
Mature software ecosystem. Computer vision has decades of research, and deep learning has made camera-based perception remarkably capable. Object detection (YOLO, SSD), semantic segmentation (DeepLab, U-Net), visual odometry (ORB-SLAM), and depth estimation all have mature, open-source implementations.
Small and lightweight. A camera is a few grams. A LiDAR can be several hundred grams to kilograms, with significant size. For drones and small robots, this matters.
Camera Weaknesses
No direct depth. A monocular camera does not measure depth. Stereo cameras and depth cameras add depth capability, but with limited range (typically 0.5 to 10 meters) and reduced accuracy compared to LiDAR.
Lighting dependent. Cameras need ambient light to function. They struggle in darkness (unless using infrared), are blinded by direct sunlight or headlights, and produce inconsistent images across different lighting conditions. Dynamic range limitations mean that a scene with both bright and dark regions may lose detail in one or both.
Computationally expensive. Extracting useful information from images requires significant processing. Running a deep learning object detector at 30 fps requires a GPU. LiDAR point clouds are geometrically simple and can be processed on a CPU.
Scale ambiguity (monocular). A monocular camera cannot distinguish between a small object nearby and a large object far away without additional context. This fundamental ambiguity requires either stereo vision, known object sizes, or additional sensors to resolve.
Cost Comparison
Here is a realistic cost breakdown for sensors commonly used in robotics as of early 2026:
| Sensor | Type | Approximate Cost | Typical Use Case |
|---|---|---|---|
| Raspberry Pi Camera v3 | Monocular | $25 | Education, hobby projects |
| Intel RealSense D435i | Stereo depth | $300 | Indoor robots, drones |
| ZED 2i | Stereo depth | $450 | Mobile robots, mapping |
| Oak-D Pro | Stereo + AI | $300 | Edge AI, object detection |
| RPLIDAR A1 | 2D LiDAR | $100 | Indoor SLAM |
| RPLIDAR A3 | 2D LiDAR | $300 | Indoor SLAM (larger range) |
| Livox Mid-360 | 3D LiDAR (solid state) | $1,000 | Outdoor robots, drones |
| Ouster OS1-64 | 3D LiDAR (spinning) | $6,000 | Autonomous vehicles |
| Velodyne VLP-16 | 3D LiDAR (spinning) | $4,000 | Autonomous vehicles, survey |
The cost gap is narrowing as solid-state LiDARs mature, but cameras remain significantly cheaper, especially for applications where depth precision is not critical.
Use Cases: When to Use What
Use LiDAR When:
- You need accurate, long-range distance measurement. Autonomous vehicles, drone surveying, and outdoor robots that need to detect obstacles at 50+ meters.
- You are doing SLAM in an unstructured environment. LiDAR SLAM (e.g., Cartographer, LOAM, LIO-SAM) is more robust than visual SLAM in environments with poor visual features (plain walls, repetitive patterns).
- Lighting is uncontrolled. Underground mines, outdoor robots operating day and night, and industrial environments with variable lighting.
- You need reliable obstacle detection for safety. LiDAR provides direct, accurate range measurements that do not depend on lighting or visual features. Safety-critical systems often require LiDAR.
Use Cameras When:
- You need to recognize or classify objects. Identifying products on a shelf, reading QR codes, detecting people, classifying terrain types.
- Budget is tight. For educational, hobby, or research projects where a $25 camera is acceptable but a $1,000 LiDAR is not.
- You need dense spatial information at close range. A depth camera provides a complete depth map at 30+ fps, which is more information-dense than a LiDAR point cloud for tabletop manipulation or close-range navigation.
- Weight and size are constrained. Small drones and micro-robots cannot carry a LiDAR.
- The environment has good visual features and controlled lighting. An indoor office with distinct furniture, posters, and lighting fixtures is an excellent environment for visual SLAM.
Use Both (Sensor Fusion) When:
- You need the best of both worlds. Self-driving cars, advanced mobile robots, and outdoor autonomous systems typically use both LiDAR and cameras. LiDAR provides accurate geometry, and cameras provide appearance and classification. Fusing both produces richer, more reliable perception than either alone.
- Safety requirements demand redundancy. If one sensor fails or is degraded, the other provides backup. A robot that depends solely on cameras is blind in the dark. A robot that depends solely on LiDAR cannot read a warning sign.
Sensor Fusion in Practice
Sensor fusion combines data from multiple sensors into a unified representation. For LiDAR-camera fusion, the most common approaches are:
Early fusion (point-level): Project LiDAR points into the camera image using a calibrated extrinsic transformation. Each LiDAR point gets a corresponding RGB color. This creates a colored point cloud that has both geometry and appearance.
import numpy as np
def project_lidar_to_camera(
points_3d: np.ndarray,
extrinsic: np.ndarray,
intrinsic: np.ndarray,
image_width: int,
image_height: int,
) -> np.ndarray:
"""
Project 3D LiDAR points into 2D camera pixel coordinates.
Args:
points_3d: Nx3 array of 3D points in LiDAR frame.
extrinsic: 4x4 transform from LiDAR frame to camera frame.
intrinsic: 3x3 camera intrinsic matrix.
image_width: Image width in pixels.
image_height: Image height in pixels.
Returns:
Mx3 array of [u, v, depth] for points that project into the image.
"""
n = points_3d.shape[0]
# Convert to homogeneous coordinates
ones = np.ones((n, 1))
points_h = np.hstack([points_3d, ones]) # Nx4
# Transform to camera frame
points_cam = (extrinsic @ points_h.T).T # Nx4
points_cam = points_cam[:, :3] # Nx3
# Filter points behind the camera
valid = points_cam[:, 2] > 0
points_cam = points_cam[valid]
# Project to pixel coordinates
projected = (intrinsic @ points_cam.T).T # Nx3
u = projected[:, 0] / projected[:, 2]
v = projected[:, 1] / projected[:, 2]
depth = points_cam[:, 2]
# Filter points outside the image
in_image = (
(u >= 0) & (u < image_width) &
(v >= 0) & (v < image_height)
)
result = np.stack([u[in_image], v[in_image], depth[in_image]], axis=1)
return resultLate fusion (decision-level): Run independent perception pipelines on each sensor and combine the results. For example, detect objects in the camera image using YOLO, detect objects in the LiDAR point cloud using PointPillars, and match detections from both sensors. Detections confirmed by both sensors have higher confidence.
Mid fusion (feature-level): Extract features from both sensors and feed them into a shared neural network. This is the approach used by many modern autonomous driving systems (e.g., BEVFusion, TransFusion).
Depth Cameras: The Middle Ground
Depth cameras (stereo cameras, structured light sensors, ToF cameras) occupy an interesting middle ground. They provide both RGB images and depth maps, giving you the visual richness of a camera and (limited) 3D geometry.
Advantages over pure cameras: Direct depth measurement eliminates the scale ambiguity of monocular vision. You get a dense depth map at every frame.
Advantages over LiDAR: Lower cost (a RealSense D435 costs a fraction of a 3D LiDAR), higher spatial density at close range, and RGB data in the same device.
Limitations compared to LiDAR: Shorter range (typically 0.5 to 10 meters), lower depth accuracy (centimeter-level vs millimeter-level), sensitivity to sunlight (structured light sensors fail outdoors entirely), and lower reliability in featureless or transparent scenes.
For indoor mobile robots and robot arms operating in structured environments, a depth camera is often the best single sensor: affordable, information-rich, and sufficient for navigation and manipulation within a few meters.
Choosing Your Sensor Stack
Here is a decision framework:
Step 1: Define your range requirements. If you need accurate sensing beyond 10 meters, you need LiDAR. Depth cameras top out at about 10 meters, and accuracy degrades significantly beyond 5 meters.
Step 2: Define your perception tasks. If you need object classification, text reading, or color-based detection, you need a camera. LiDAR alone cannot do these tasks.
Step 3: Consider your lighting environment. If the robot operates outdoors in variable lighting or in darkness, a camera alone is insufficient. You need LiDAR or at least an active depth sensor (ToF or structured light with appropriate wavelength filtering).
Step 4: Check your budget. A camera plus a 2D LiDAR costs $150 to $400 total. A depth camera alone costs $300 to $500. A 3D LiDAR starts at $1,000. Match your sensor spend to your project requirements.
Step 5: Evaluate weight and power constraints. A camera draws under 1 watt. A 2D LiDAR draws 2 to 5 watts. A 3D LiDAR draws 10 to 20 watts. For battery-powered robots, this matters.
Summary
LiDAR and cameras are complementary, not competing, technologies. LiDAR excels at accurate geometric measurement in any lighting condition. Cameras excel at visual understanding, object recognition, and dense close-range perception at low cost. Depth cameras split the difference for indoor, short-range applications.
The trend in robotics is toward multi-sensor systems. Even budget hobby robots increasingly combine a cheap 2D LiDAR for SLAM with a camera for object recognition. As sensor costs continue to fall and fusion algorithms improve, the question is not "LiDAR or camera?" but "how do I best combine them?"
Understanding the physics, trade-offs, and practical implications of each sensor type is fundamental to designing effective robot perception systems.
Want to experiment with sensor data? Our Sensors and Perception lesson lets you visualize LiDAR point clouds and camera images side by side, project one onto the other, and see how sensor fusion works in an interactive simulator.