get_sliced_prediction giving different results with videoframes vs images #1102
Replies: 1 comment
-
|
Problem is located and solved. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi !
I have a small project for a sports detection where I am trying to detect a small ball on a sports-pitch (floorball).
I have a custom model which contains only a ball category. The dataset has aprox 2500 images, and it is trained for 300 epochs.
To try to work my way out of the problem with missing detections I have added images and training for more epochs, but I have been unable to get a brute-force solution.
The problem is that I am getting very low hit-rate on predictions when using video as input, and I notice that there are some differences
between using get_sliced_prediction for video (frames) and still frames (imgs).
The same problem was addressed by a different user in a previous threadin the ultralytics library (https://github.com/orgs/ultralytics/discussions/8121#discussioncomment-8871506). The reply to this thread suggest that the problem is caused by not using the same prediction code for images and video.
To verify that this is not the problem I am using the same code for both predictions (frames and imgs) based on the examples/YOLOv8-SAHI-Inference-Video/yolov8_sahi.py code
and
If i split a video into images using (200 frames)
When I do sliced_prediction on these images I get a hit with confidence in high 80's and above for 80 % of the frames. (The ball is occluded for some frames so this hit rate is fine )
If I do a sliced_prediction on the video i get a hit for 2% of the frames with a confidence in the low 40s.
But if I drop the confidence_threshold to 0.001 for the model, and run the prediction on the video again I get some very interesting result if I compare predictions on images ( conf.thres=0.3) and frames (conf.thres=0.001).
Filtering out all other objects which gets a false positive for the low confidence threshold
(random frame selection)
The results here may be coincidental, but to me it looks like there might be a basic math problem in the get_sliced_predictions method when using video frames as input.
I have tested on both Windows and on Linux. The same result exist on both platforms
My environment is ( relevant packages on windows)
On windows I am using python 3.9.13 and CUDA 12.2
On Linux I am using python 3.10.11 and CUDA 12.4 in a anaconda environment
Beta Was this translation helpful? Give feedback.
All reactions