How to add bounding box coordinates to find depth from Monodepth2 model

65
April 02, 2022, at 11:50 PM

BACKGROUND The Monodepth2 code finds disparity map of an image. And from the disparity map it predicts min and max depth values. The code block for that is as follows:

        # PREDICTION
        input_image = input_image.to(device)
        features = encoder(input_image)
        outputs = depth_decoder(features)
        disp = outputs[("disp", 0)]
        disp_resized = torch.nn.functional.interpolate(
            disp, (original_height, original_width), mode="bilinear", align_corners=False)
        # Saving numpy file
        output_name = os.path.splitext(os.path.basename(image_path))[0]
        scaled_disp, depth = disp_to_depth(disp, 0.1, 100)
        if args.pred_metric_depth:
            name_dest_npy = os.path.join(output_directory, "{}_depth.npy".format(output_name))
            metric_depth = STEREO_SCALE_FACTOR * depth.cpu().numpy()
            np.save(name_dest_npy, metric_depth)
        else:
            name_dest_npy = os.path.join(output_directory, "{}_disp.npy".format(output_name))
            np.save(name_dest_npy, scaled_disp.cpu().numpy())

WHAT I AM TRYING TO DO

I am trying to input bounding box coordinates (which I already have in the from x1,y1,x2,y2) and predict the depth values from the center of that bounding box only. The way I have amended the code is as follows:

I included argument to the code as follows:

    parser.add_argument("--depth_pos", nargs='+', type=int,
                        help = 'set the position within the image where to find the depth')

And then input the coordinate values as top, left, 10, 10 which gives me a 10x10 bounding box from the top, left coordinates of the bbox:

            # Saving numpy file
            output_name = os.path.splitext(os.path.basename(image_path))[0]
            scaled_disp, depth = disp_to_depth(disp, 0.1, 100)
            if args.pred_metric_depth:
                name_dest_npy = os.path.join(output_directory, "{}_depth.npy".format(output_name))
                metric_depth = STEREO_SCALE_FACTOR * depth.cpu().numpy()
                np.save(name_dest_npy, metric_depth)
                #print('max',depth.cpu().numpy().max(), 'min', depth.cpu().numpy().min())
                # Modify start
                top = args.depth_pos[0]
                left = args.depth_pos[1]
                w = args.depth_pos[2]
                h = args.depth_pos[3]
                print("{}, {}, {}, {}".format(top, left, w, h))
        
                disp_resized = torch.nn.functional.interpolate(
                    disp, (original_height, original_width), mode="bilinear", align_corners=False)
                _, depth_resized = disp_to_depth(disp_resized, 0.1, 100)
                metric_depth_resized = STEREO_SCALE_FACTOR * depth_resized.cpu().numpy()
                sub_metric_depth_resized = metric_depth_resized[:, :, top:top+h, left:left+w]
                print("target depth: {}".format(np.mean(sub_metric_depth_resized)))
                # Modify end
            else:
                name_dest_npy = os.path.join(output_directory, "{}_disp.npy".format(output_name))
                np.save(name_dest_npy, scaled_disp.cpu().numpy())

PROBLEM WITH MY APPROACH

This code block does not predict the values from the center of bounding box. Rather it only takes the top and left values and create a 10x10 bounding box. Although I am able to see the depth value, however this is not what is desired. The output looks like this:

(detectron2) xubuntu@4fe306cfd731:/homelocal/monodepth2$ python test_simple.py --image_path assets/home_image.jpg --model_name mono+stereo_640x192 --pred_metric_depth --depth_pos 737 545 10 10
-> Loading model from  models/mono+stereo_640x192
   Loading pretrained encoder
   Loading pretrained decoder
-> Predicting on 1 test images
737, 545, 10, 10
target depth: 29.860118865966797
   Processed 1 of 1 images - saved predictions to:
   - assets/home_image_disp.jpeg
   - assets/home_image_depth.npy
-> Done!

CONCLUSION

I would like to include the bbox coordinates into the code block so as to predict the depth only from the center of bbox.

READ ALSO
how to make changes to a existing column based on multiple conditions in python csv

how to make changes to a existing column based on multiple conditions in python csv

So I am working on data processing and I want to make changes to a column called "temp_coil" based on the condition in other columns

92
How do I display role members separately from online members?

How do I display role members separately from online members?

I searched for my problem in the API Reference, but I didn't find anything

80
Fast Python outer difference of list

Fast Python outer difference of list

I want to compute the difference between every element in a Python list of equally long lists and put it into a Numpy array

69