Map Representations in SLAM Architecture: Point Clouds, Occupancy Grids, and Meshes

The map representation chosen within a Simultaneous Localization and Mapping system shapes every downstream decision: how much memory the pipeline consumes, how fast the robot can query navigable space, and how precisely localization estimates converge. Three representations dominate deployed systems — point clouds, occupancy grids, and surface meshes — each with distinct structural properties, computational costs, and applicability to specific sensor modalities and task requirements. Understanding the boundaries between these formats is essential for engineers designing SLAM system architectures that must operate under real hardware constraints.

Definition and Scope

A map representation in SLAM is the data structure used to encode the geometry (and optionally the semantics) of an environment as the robot simultaneously estimates its own pose within that environment. The representation is not merely a storage choice; it directly couples to the localization front-end, the loop closure mechanism, and the motion planner.

The three canonical types are:

Point clouds — unordered or ordered sets of 3D coordinates (and optionally intensity, color, or surface normals), each representing a measured surface sample. LiDAR-based SLAM systems such as those built on the Velodyne HDL-64E sensor generate dense point clouds at rates exceeding 1.3 million points per second (IEEE 802.3 and sensor interface standards aside, the scan rate specification is published by Velodyne Lidar's public sensor datasheets).
Occupancy grids — discretized, probabilistic grid structures in which each cell stores the estimated probability that the corresponding region of space is occupied. The framework originates from work published by Moravec and Elfes (1985) in the Proceedings of the IEEE International Conference on Robotics and Automation, and remains a reference structure in robotics middleware such as ROS (Robot Operating System), maintained by Open Robotics.
Surface meshes — polygonal tessellations, typically triangulated, that approximate continuous surfaces extracted from depth data. Mesh representations carry explicit geometric connectivity absent from point clouds, enabling direct use in physics simulation, augmented reality rendering, and finite-element analysis.

These three types span a spectrum from raw measurement fidelity (point clouds) to probabilistic spatial reasoning (occupancy grids) to continuous geometric abstraction (meshes). Core components of a SLAM architecture — front-end odometry, back-end graph optimization, and loop closure — interact differently with each representation type.

How It Works

Point Clouds

A point cloud map accumulates sensor returns across successive scans. In LiDAR-based pipelines, algorithms such as ICP (Iterative Closest Point) align new scans to an existing cloud by minimizing point-to-point or point-to-plane distances. The map grows as a concatenated set of transformed point batches. Without additional processing, raw accumulated clouds exhibit drift artifacts wherever pose estimation error compounds; loop closure in SLAM architecture corrects this by retroactively adjusting the global point positions after a pose graph optimization pass.

Storage cost scales linearly with scan count. A single 360° LiDAR sweep from a 64-beam sensor at 10 Hz produces approximately 640,000 points per second. Over a 10-minute indoor survey, uncompressed storage requirements reach hundreds of megabytes without voxel downsampling. Voxelization — binning points into a 3D voxel grid at a fixed resolution such as 5 cm — reduces density while preserving structural geometry.

Occupancy Grids

Each cell in a 2D occupancy grid (or its 3D volumetric extension, often called an occupancy voxel map) stores a log-odds value updated via Bayes' rule as sensor rays pass through or terminate in that cell. The update rule is:

l(s | z) = l(s | z_{1:t-1}) + l(s | z_t) - l(s)

where l(s | z_t) is the log-odds sensor model for the new measurement. Standard implementations in ROS use the nav_msgs/OccupancyGrid message type, which encodes a 2D grid at configurable resolution — commonly 5 cm per cell for indoor navigation tasks.

3D extensions such as OctoMap (OctoMap library documentation, University of Freiburg) use octree structures to achieve adaptive resolution, storing only cells that contain information and compressing homogeneous regions. OctoMap at 5 cm resolution for a 50 m × 50 m × 5 m indoor environment requires approximately 40–100 MB depending on scene complexity.

Surface Meshes

Mesh generation from SLAM output typically follows a multi-stage pipeline:

Accumulate and denoise a point cloud map.
Estimate surface normals at each point using local neighborhood analysis (e.g., PCA over a 20-nearest-neighbor radius).
Apply a surface reconstruction algorithm — Poisson Surface Reconstruction (Kazhdan et al., 2006, Eurographics Symposium on Geometry Processing) is a widely cited method — to fit a watertight mesh to the oriented point set.
Simplify and decimate the mesh to a target polygon budget compatible with downstream rendering or planning requirements.

Mesh representations support direct ray–triangle intersection queries, making them efficient substrates for visual SLAM architecture pipelines that require dense photometric consistency checks.

Common Scenarios

Different deployment contexts favor different representations:

Scenario	Preferred Representation	Primary Reason
Warehouse autonomous mobile robots (AMRs)	2D occupancy grid	Fast path planning via A* or Dijkstra on grid
Outdoor autonomous vehicles	3D point cloud (HDMap)	Centimeter-level lane and curb geometry
Augmented reality headsets	Surface mesh	GPU rasterization pipeline compatibility
Search-and-rescue drones	OctoMap (3D occupancy)	Compact volumetric free-space for 3D path planning
Indoor building documentation	Dense mesh	Architectural dimensioning and BIM integration

SLAM architecture for autonomous vehicles commonly stores map data in a layered format combining a sparse point cloud for localization with a separate occupancy grid for dynamic obstacle detection. SLAM for robotics deployments more frequently use 2D occupancy grids because most warehouse and logistics environments are effectively 2.5D — planar floor with fixed ceiling height.

SLAM architecture for drones and UAVs imposes the tightest memory and power budgets; OctoMap or truncated signed distance field (TSDF) volumes are preferred because they encode free space explicitly, enabling 3D trajectory planning without collision-checking an entire point cloud.

Decision Boundaries

Selecting a map representation requires evaluating five criteria against deployment constraints:

Memory budget — Point clouds scale with scan count and require active management (voxelization, keyframe pruning). OctoMap scales with scene complexity, not time. Meshes are compact post-reconstruction but expensive to update incrementally.
Update rate — Occupancy grids support incremental Bayesian updates at sensor frame rates (10–30 Hz typical). Point clouds append directly at sensor rate. Meshes require batch reconstruction and are rarely updated online without specialized incremental algorithms such as KinectFusion (Newcombe et al., 2011, IEEE ISMAR).
Localization query type — Point-to-point or point-to-plane ICP localization favors point clouds. Monte Carlo Localization (particle filter) favors occupancy grids. Photometric or dense visual localization favors meshes.
Downstream consumer — Motion planners using grid-based search (A*, Dijkstra) require occupancy grids. Renderers and physics engines consume meshes. Machine learning pipelines for semantic SLAM architecture often operate directly on point clouds with per-point feature vectors.
Sensor modality — LiDAR naturally produces point clouds; radar-based SLAM outputs sparse Doppler point sets poorly suited to mesh reconstruction; RGB-D cameras feed dense depth images appropriate for TSDF-to-mesh pipelines.

The evaluation and testing framework for SLAM architecture used in research benchmarks such as TUM RGB-D (Technical University of Munich Computer Vision Group) and KITTI (Karlsruhe Institute of Technology and Toyota Technological Institute at Chicago) measures localization accuracy independently of map representation, but map format directly constrains which algorithms are benchmarkable on those datasets. Engineers evaluating SLAM architecture scalability should note that point cloud maps used in large-scale outdoor deployments commonly exceed 10 GB before compression, whereas equivalent OctoMap representations at 10 cm resolution occupy under 500 MB for kilometer-scale environments.

For GPS-denied environments, the choice between these representations compounds with sensor availability: environments with no LiDAR access rely on monocular or stereo depth estimation, which produces semi-dense point clouds requiring mesh reconstruction to achieve planning-grade geometric accuracy. Industry standards and benchmarks for SLAM architecture do not yet mandate a single map format, leaving the choice as a systems engineering decision governed by the constraints above.

References

Open Robotics — ROS nav_msgs/OccupancyGrid documentation
[OctoMap: An Efficient Probabilistic 3D Mapping Framework — University of Freiburg](https