Creating 3D Videos from RGB Videos | by Berkan Zorlubas | Aug, 2023

The training part is straightforward. To train the neural network with your custom dataset, simply run the following command in the terminal:

python --net scene_flow_motion_field ^
--dataset custom_sequence --track_id custom ^
--log_time --epoch_batches 2000 --epoch 10 ^
--lr 1e-6 --html_logger --vali_batches 150 ^
--batch_size 1 --optim adam --vis_batches_vali 1 ^
--vis_every_vali 1 --vis_every_train 1 ^
--vis_batches_train 1 --vis_at_start --gpu 0 ^
--save_net 1 --workers 1 --one_way ^
--loss_type l1 --l1_mul 0 --acc_mul 1 ^
--disp_mul 1 --warm_sf 5 --scene_lr_mul 1000 ^
--repeat 1 --flow_mul 1 --sf_mag_div 100 ^
--time_dependent --gaps 1,2,4,6,8 --midas ^
--use_disp --logdir 'logdir/' ^
--suffix 'track_{track_id}' ^

After training the neural network for 10 epochs, I observed that the loss began to saturate, and as a result, I decided not to continue training for additional epochs. Below is the loss curve graph of my training:

Loss vs. epoch curve — Image by the author

Throughout the training, all checkpoints are stored in the directory ./logdir/nets/. Furthermore, after every epoch, the training script generates test visualizations in the directory ./logdir/visualize. These visualizations can be particularly helpful in identifying any potential issues that might have occurred during training, in addition to monitoring the loss.

Using the latest checkpoint, we now generate the depth map of each frame with script. Simply run the following command in the terminal:

python --net scene_flow_motion_field ^
--dataset custom_sequence --workers 1 ^
--output_dir .test_resultscustom_sequence ^
--epoch 10 --html_logger --batch_size 1 ^
--gpu 0 --track_id custom --suffix custom ^
--checkpoint_path .logdir

This will generate one .npz file for each frame(a dictionary file consisting of RGB frame, depth, camera pose, flow to next image, and so on), and three depth renders (ground truth, MiDaS, and trained network’s estimation) for each frame.

In the last step, we load the batched .npz files frame-by-frame and create colored point clouds by using the depth and RGB information. I use the open3d library to create and render point clouds in Python. It is a powerful tool with which you can create virtual cameras in 3D space and take captures of your point clouds with them. You can also edit/manipulate your point clouds; I applied open3d’s built-in outlier removal functions to remove flickery and noisy points.

While I won’t delve into the specific details of my open3d usage to keep this blog post succinct, I have included the script, render_pointcloud_video.pywhich should be self-explanatory. If you have any questions or require further clarification, please don’t hesitate to ask.

Here is what the point cloud and depth map videos look like for the video I processed.

(Left) Stock footage provided by Videvo, downloaded from | (Right) Depthmap video created by the author | (Bottom) Colorized point cloud video created by the author

A higher-resolution version of this animation is uploaded to YouTube.

Source link

Leave a Comment