get_sliced_prediction giving different results with videoframes vs images #1102

kjellerikfurnes · 2024-11-25T09:24:46Z

kjellerikfurnes
Nov 25, 2024

Hi !

I have a small project for a sports detection where I am trying to detect a small ball on a sports-pitch (floorball).
I have a custom model which contains only a ball category. The dataset has aprox 2500 images, and it is trained for 300 epochs.
To try to work my way out of the problem with missing detections I have added images and training for more epochs, but I have been unable to get a brute-force solution.

The problem is that I am getting very low hit-rate on predictions when using video as input, and I notice that there are some differences
between using get_sliced_prediction for video (frames) and still frames (imgs).

The same problem was addressed by a different user in a previous threadin the ultralytics library (https://github.com/orgs/ultralytics/discussions/8121#discussioncomment-8871506). The reply to this thread suggest that the problem is caused by not using the same prediction code for images and video.

To verify that this is not the problem I am using the same code for both predictions (frames and imgs) based on the examples/YOLOv8-SAHI-Inference-Video/yolov8_sahi.py code

self.detection_model = AutoDetectionModel.from_pretrained(
            model_type="yolov8", model_path=yolov11_model_path, confidence_threshold=0.3, device="cuda=0"
        )

and

	results = get_sliced_prediction(
                frame,
                self.detection_model,
                slice_height=512,
                slice_width=512,
                overlap_height_ratio=0.2,
                overlap_width_ratio=0.2,

If i split a video into images using (200 frames)

ffmpeg -i 20241013_1periode-test.mp4  images/20241013_1periode-test-frame-%04d.png

When I do sliced_prediction on these images I get a hit with confidence in high 80's and above for 80 % of the frames. (The ball is occluded for some frames so this hit rate is fine )

If I do a sliced_prediction on the video i get a hit for 2% of the frames with a confidence in the low 40s.

But if I drop the confidence_threshold to 0.001 for the model, and run the prediction on the video again I get some very interesting result if I compare predictions on images ( conf.thres=0.3) and frames (conf.thres=0.001).

Filtering out all other objects which gets a false positive for the low confidence threshold

(random frame selection)

image/frame	img_conf_pred	frame_conf_pred
10	0.96	0.04
23	0.86	0.13
43	0.92	0.08
64	0.87	0.12
78	0.92	0.07
113	0.96	0.04
149	0.91	0.09
174	0.84	0.16
181	0.92	0.08
196	0.87	0.13

The results here may be coincidental, but to me it looks like there might be a basic math problem in the get_sliced_predictions method when using video frames as input.

I have tested on both Windows and on Linux. The same result exist on both platforms

My environment is ( relevant packages on windows)

Package              Version
-------------------- -----------
matplotlib           3.9.2
matplotlib-inline    0.1.7
numpy                1.26.4
opencv-python        4.9.0.80
pandas               2.2.3
pillow               10.4.0
sahi                 0.11.19
torch                2.5.1
torchaudio           2.5.1
torchvision          0.20.1
tqdm                 4.67.0
ultralytics          8.3.36
ultralytics-thop     2.0.12

On windows I am using python 3.9.13 and CUDA 12.2
On Linux I am using python 3.10.11 and CUDA 12.4 in a anaconda environment

kjellerikfurnes · 2024-11-27T09:27:36Z

kjellerikfurnes
Nov 27, 2024
Author

Problem is located and solved.
The problem is that openCV (cap.read()) is returning an image in BGR format, and when the dataset has been trained on standard images the get_sliced_predictions will return very low values for the predictions.
The solution is to convert the frames returned by openCV to RGB format prior to loading them into the get_sliced_prediction method.
Adding frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB), and using the frame_rgb as the get_sliced_prediction ensures that the prodictions are consisten between still images and video frames

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

get_sliced_prediction giving different results with videoframes vs images #1102

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

get_sliced_prediction giving different results with videoframes vs images #1102

Uh oh!

kjellerikfurnes Nov 25, 2024

Filtering out all other objects which gets a false positive for the low confidence threshold

Replies: 1 comment

Uh oh!

kjellerikfurnes Nov 27, 2024 Author

kjellerikfurnes
Nov 25, 2024

kjellerikfurnes
Nov 27, 2024
Author