edit to clarify a misconception in the comments, this is an instagram post so “caption” refers to the description under the image or video
as an example, this text i am typing now is also a “caption”
just saying because someone started a debate misunderstanding this to be about subtitles (aka “closed captions”) and that’s just not the case 👍
Automatic subtitles like on YouTube use Machine Learning, NOT a Large Language Model.
I used youtube only as a basic comparison as thats the one everybody has some experience with.