Planar Reconstruction - 深度学习之平面重建

input image piece-wise planar segmentation reconstructed depthmap texture-mapped 3D model

0x00 Datasets

  • ScanNet [1,3,4]
  • SYNTHIA [2,3]
  • Cityscapes [2]
  • NYU Depth Dataset [1,3,4]
  • Labeling method

ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes. annotated with 3D camera poses, surface reconstructions, and instance-level semantic segmentations.

SYNTHIA: The SYNTHetic collection of Imagery and Annotations. 8 RGB cameras forming a binocular 360º camera, 8 depth sensors

Cityscapes: Benchmark suite and evaluation server for pixel-level and instance-level semantic labeling.
video frames / stereo / GPS / vehicle odometry

NYU Depth Dataset: is recorded by both the RGB and Depth cameras from the Microsoft Kinect.

  • Dense multi-class labels with instance number (cup1, cup2, cup3, etc).
  • Raw: The raw rgb, depth and accelerometer data as provided by the Kinect.
  • Toolbox: Useful functions for manipulating the data and labels.

Obtaining ground truth plane annotations :

Difficulty in detect planes from the 3D point cloud by using J-Linkage method.

(c-d): Plane fitting results generated by J-Linkage with δ = 0.5 and δ = 2, respectively.

Labeling method:

ScanNet:
1. Fit plans to a consolidated mesh (merge planes if (normal diff < 20° && distance < 5cm)
2. Project plans back to individual frames
SYNTHIA:
1. Manually draw a quadrilateral region
2. Obtain the plane parameters and variance of the distance distribution
3. Find all pixels that belong to the plane by using the plane parameters and the variance estimate
Cityscapes:
1. “planar” = {ground, road, sidewalk,parking, rail track, building, wall, fence, guard rail, bridge, and terrain}
2. Manually label the boundary of each plane using polygons

0x01 PlaneNet

[CVPR 2018] Liu, Chen, et al. Washington University in St. Louis, Adobe.

The first deep neural architecture for piece-wise planar depthmap reconstruction from a RGB image.

Pipeline

DRN: Dilated Residual Networks (2096 channels)

CRF: Conditional Random Field Algorithm

Step Loss
Plane parameter: LP=i=1Kminj[1,K]PiPj22(K=10)
Plane segmentation: softmax cross entropy LM=i=1K+1pI(1(M(p)=i)log(1Mi(p)))
Non-planar depth: ground-truth <==> predicted depthmap LD=i=1K+1pI(Mi(p)(Di(p)D(p))2)
- M(p): probability of p belonging to the ith plane ;D(p): depth value at pixel p ;*: GT .

0x02 Plane Recover

[ECCV 18] Fengting Yang and Zihan Zhou Pennsylvania State University.

Recovering 3D Planes from a Single Image. Propose a novel plane structure-induced loss

Step Loss
Plane loss Lreg(Si)=qz(q)log(pplane(q))(1z(q))log(1pplane(q))
Loss L=i=1nj=1m(qSij(q)|(nij)TQ1|)+αi=1nLreg(Si)

0x03 PlaneRCNN

[CVPR2019] Liu, Chen, et al. NVIDIA, Washington University in St. Louis, SenseTime, Simon Fraser University

0x04 PlanarReconstruction

[CVPR 2019] Yu, Zehao, et al. ShanghaiTech University, The Pennsylvania State University

Single-Image Piece-wise Planar 3D Reconstruction via Associative Embedding

Step Loss
Segmentation: balanced cross entropy LS=(1w)iFlogpiwiBlog(1pi)
Embedding: discuiminative loss LE=Lpull+Lpush
Per-pixel plane: L1 loss LPP=1Ni=1N|nini |
Instance Parameter: LIP=1NC~j=1C~i=1NSij|njTQi1|
Loss L=LS+LE+LPP+LIP+

Embedding:
associative emvedding (End-to-End Learning for Joint Detection and Grouping) ;

Discriminative loss function

  • An image can contain an arbitrary number of instances
  • The labeling is permutation-invariant: it does not matter which specific label an instance gets, as long as it is different from all otherinstance labels.

LE=Lpull+Lpush

where

Lpull=1Cc=1C1Nci=1Ncmax(μcxiδv,0)

Lpush=1C(C1)cA=1CcB=1CcAcBmax(δdμcAμcB,0)

Here, C is the number of clusters C (planes) in the ground truth, Nc is the number of elements in cluster c, xi is the pixel embedding, μc is the mean embedding of the cluster c, and δv and δd are the margin for “pull” and “push” losses, respectively.

Instance Parameter Loss:

LIP=1NC~j=1C~i=1NSij|njTQi1| S: instance segmentation mapnj: predicted plane paramQi: the 3D point at pixel i

nn~/d , where n~S2 and d denote the surface normal and plane distance to the origin

0xFF Results

PlaneNet

PlaneRecover

PlaneRCNN

PlanarReconstruction