Planar Reconstruction - 深度学习之平面重建
0x00 Datasets
- ScanNet [1,3,4]
- SYNTHIA [2,3]
- Cityscapes [2]
- NYU Depth Dataset [1,3,4]
- Labeling method
ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes. annotated with 3D camera poses, surface reconstructions, and instance-level semantic segmentations.
SYNTHIA: The SYNTHetic collection of Imagery and Annotations. 8 RGB cameras forming a binocular 360º camera, 8 depth sensors
Cityscapes: Benchmark suite and evaluation server for pixel-level and instance-level semantic labeling.
video frames / stereo / GPS / vehicle odometry
NYU Depth Dataset: is recorded by both the RGB and Depth cameras from the Microsoft Kinect.
- Dense multi-class labels with instance number (cup1, cup2, cup3, etc).
- Raw: The raw rgb, depth and accelerometer data as provided by the Kinect.
- Toolbox: Useful functions for manipulating the data and labels.
Obtaining ground truth plane annotations :
Difficulty in detect planes from the 3D point cloud by using J-Linkage method.
Labeling method:
| ScanNet: |
|---|
| 1. Fit plans to a consolidated mesh (merge planes if (normal diff < 20° && distance < 5cm) |
| 2. Project plans back to individual frames |
| SYNTHIA: |
|---|
| 1. Manually draw a quadrilateral region |
| 2. Obtain the plane parameters and variance of the distance distribution |
| 3. Find all pixels that belong to the plane by using the plane parameters and the variance estimate |
| Cityscapes: |
|---|
| 1. “planar” = {ground, road, sidewalk,parking, rail track, building, wall, fence, guard rail, bridge, and terrain} |
| 2. Manually label the boundary of each plane using polygons |
0x01 PlaneNet
[CVPR 2018] Liu, Chen, et al. Washington University in St. Louis, Adobe.
The first deep neural architecture for piece-wise planar depthmap reconstruction from a RGB image.
Pipeline
DRN: Dilated Residual Networks (2096 channels)
CRF: Conditional Random Field Algorithm
| Step | Loss |
|---|---|
| Plane parameter: | |
| Plane segmentation: softmax cross entropy | |
| Non-planar depth: ground-truth <==> predicted depthmap | |
| - |
0x02 Plane Recover
[ECCV 18] Fengting Yang and Zihan Zhou Pennsylvania State University.
Recovering 3D Planes from a Single Image. Propose a novel plane structure-induced loss
| Step | Loss |
|---|---|
| Plane loss | |
| Loss |
0x03 PlaneRCNN
[CVPR2019] Liu, Chen, et al. NVIDIA, Washington University in St. Louis, SenseTime, Simon Fraser University
0x04 PlanarReconstruction
[CVPR 2019] Yu, Zehao, et al. ShanghaiTech University, The Pennsylvania State University
Single-Image Piece-wise Planar 3D Reconstruction via Associative Embedding
| Step | Loss |
|---|---|
| Segmentation: balanced cross entropy | |
| Embedding: discuiminative loss | |
| Per-pixel plane: L1 loss | |
| Instance Parameter: | |
| Loss |
Embedding:
associative emvedding (End-to-End Learning for Joint Detection and Grouping) ;
Discriminative loss function
- An image can contain an arbitrary number of instances
- The labeling is permutation-invariant: it does not matter which specific label an instance gets, as long as it is different from all otherinstance labels.
Here,
Instance Parameter Loss:























