MoCo-Flow: Neural Motion Consensus Flow for

Dynamic Humans in Stationary Monocular Cameras

Accepted to Eurographics 2022

Xuelin Chen

Weiyu Li

Tencent AI Lab

Shandong University

Tencent AI Lab

Daniel Cohen-Or

Niloy J. Mitra

Baoquan Chen

Tel Aviv University

University College London

CFCS, Peking University

Adobe Research

Paper(arxiv)

Code

Data

Overview Video

Abstract

Synthesizing novel views of dynamic humans from stationary monocular cameras is a popular scenario. This is particularly attractive as it does not require static scenes, controlled environments, or specialized hardware. In contrast to techniques that exploit multi-view observations to constrain the modeling, given a single fixed viewpoint only, the problem of modeling the dynamic scene is significantly more under-constrained and ill-posed. In this paper, we introduce Neural Motion Consensus Flow (MoCo-Flow), a representation that models the dynamic scene using a 4D continuous time-variant function. The proposed representation is learned by an optimization which models a dynamic scene that minimizes the error of rendering all observation images. At the heart of our work lies a novel optimization formulation, which is constrained by a motion consensus regularization on the motion flow. We extensively evaluate MoCo-Flow on several datasets that contain human motions of varying complexity, and compare, both qualitatively and quantitatively, to several baseline methods and variants of our methods.

Figure 1: (Left) Multi-view cameras setup for full observation of the dynamic scene; (middle) single free-viewpoint camera setup that captures the dynamics from varying viewpoints; (right/ours) stationary monocular camera which observes the dynamic scene from one single fixed viewpoint only.

Method overview

Figure 2: MoCo-Flow architecture.

The dynamic scene is represented by a shared canonical NeRF and motion flows. We trace rays in the observation space \(t_i\) and transform the samples \(x\) along the ray to 3D samples \(x'\) in the canonical space via the neural backward motion flow \(M^bw : (x,t_i) → x'\). We evaluate the color and density of \(x\) at \(t_i\) through the canonical NeRF with a condition appearance code \(l_i: F(x', l_i) → (c, σ) \). The networks are initialized with rough human mesh estimation and then optimized to minimize the error \(\mathcal{L_{photo}}\) of rendering captured images. An auxiliary neural forward motion network \(\mathcal{M^{fw}}\) is introduced to constrain the optimization with motion consensus regularization \(\mathcal{L_{moco}}\) (see the loop formed by the blue arrows).

MoCo-Flow: Neural Motion Consensus Flow for

Dynamic Humans in Stationary Monocular Cameras

Accepted to Eurographics 2022

Tencent AI Lab

Shandong University

Tencent AI Lab

Tel Aviv University

University College London

CFCS, Peking University

Adobe Research

Overview Video

Abstract

Method overview

Results

People-Snapshot

AIST

ZJU-MoCap

Comparison

GT

D-NeRF*

NSFF

NeuralBody

MoCo-Flow

* Note: D-NeRF failed on the task and output blank imagery; NSFF only supports reconstruction in NDC space, it is non-trivial to adapt it to work on non-NDC space.

References

BibTeX