How to add bounding box coordinates to find depth from Monodepth2 model

105
April 02, 2022, at 11:50 PM

BACKGROUND The Monodepth2 code finds disparity map of an image. And from the disparity map it predicts min and max depth values. The code block for that is as follows:

        # PREDICTION
        input_image = input_image.to(device)
        features = encoder(input_image)
        outputs = depth_decoder(features)
        disp = outputs[("disp", 0)]
        disp_resized = torch.nn.functional.interpolate(
            disp, (original_height, original_width), mode="bilinear", align_corners=False)
        # Saving numpy file
        output_name = os.path.splitext(os.path.basename(image_path))[0]
        scaled_disp, depth = disp_to_depth(disp, 0.1, 100)
        if args.pred_metric_depth:
            name_dest_npy = os.path.join(output_directory, "{}_depth.npy".format(output_name))
            metric_depth = STEREO_SCALE_FACTOR * depth.cpu().numpy()
            np.save(name_dest_npy, metric_depth)
        else:
            name_dest_npy = os.path.join(output_directory, "{}_disp.npy".format(output_name))
            np.save(name_dest_npy, scaled_disp.cpu().numpy())

WHAT I AM TRYING TO DO

I am trying to input bounding box coordinates (which I already have in the from x1,y1,x2,y2) and predict the depth values from the center of that bounding box only. The way I have amended the code is as follows:

I included argument to the code as follows:

    parser.add_argument("--depth_pos", nargs='+', type=int,
                        help = 'set the position within the image where to find the depth')

And then input the coordinate values as top, left, 10, 10 which gives me a 10x10 bounding box from the top, left coordinates of the bbox:

            # Saving numpy file
            output_name = os.path.splitext(os.path.basename(image_path))[0]
            scaled_disp, depth = disp_to_depth(disp, 0.1, 100)
            if args.pred_metric_depth:
                name_dest_npy = os.path.join(output_directory, "{}_depth.npy".format(output_name))
                metric_depth = STEREO_SCALE_FACTOR * depth.cpu().numpy()
                np.save(name_dest_npy, metric_depth)
                #print('max',depth.cpu().numpy().max(), 'min', depth.cpu().numpy().min())
                # Modify start
                top = args.depth_pos[0]
                left = args.depth_pos[1]
                w = args.depth_pos[2]
                h = args.depth_pos[3]
                print("{}, {}, {}, {}".format(top, left, w, h))
        
                disp_resized = torch.nn.functional.interpolate(
                    disp, (original_height, original_width), mode="bilinear", align_corners=False)
                _, depth_resized = disp_to_depth(disp_resized, 0.1, 100)
                metric_depth_resized = STEREO_SCALE_FACTOR * depth_resized.cpu().numpy()
                sub_metric_depth_resized = metric_depth_resized[:, :, top:top+h, left:left+w]
                print("target depth: {}".format(np.mean(sub_metric_depth_resized)))
                # Modify end
            else:
                name_dest_npy = os.path.join(output_directory, "{}_disp.npy".format(output_name))
                np.save(name_dest_npy, scaled_disp.cpu().numpy())

PROBLEM WITH MY APPROACH

This code block does not predict the values from the center of bounding box. Rather it only takes the top and left values and create a 10x10 bounding box. Although I am able to see the depth value, however this is not what is desired. The output looks like this:

(detectron2) xubuntu@4fe306cfd731:/homelocal/monodepth2$ python test_simple.py --image_path assets/home_image.jpg --model_name mono+stereo_640x192 --pred_metric_depth --depth_pos 737 545 10 10
-> Loading model from  models/mono+stereo_640x192
   Loading pretrained encoder
   Loading pretrained decoder
-> Predicting on 1 test images
737, 545, 10, 10
target depth: 29.860118865966797
   Processed 1 of 1 images - saved predictions to:
   - assets/home_image_disp.jpeg
   - assets/home_image_depth.npy
-> Done!

CONCLUSION

I would like to include the bbox coordinates into the code block so as to predict the depth only from the center of bbox.

Rent Charter Buses Company
READ ALSO
how to make changes to a existing column based on multiple conditions in python csv

how to make changes to a existing column based on multiple conditions in python csv

So I am working on data processing and I want to make changes to a column called "temp_coil" based on the condition in other columns

134
How do I display role members separately from online members?

How do I display role members separately from online members?

I searched for my problem in the API Reference, but I didn't find anything

121
Fast Python outer difference of list

Fast Python outer difference of list

I want to compute the difference between every element in a Python list of equally long lists and put it into a Numpy array

114