edit to clarify a misconception in the comments, this is an instagram post so “caption” refers to the description under the image or video

as an example, this text i am typing now is also a “caption”

just saying because someone started a debate misunderstanding this to be about subtitles (aka “closed captions”) and that’s just not the case 👍

  • vzqq@lemmy.blahaj.zone
    link
    fedilink
    English
    arrow-up
    9
    ·
    edit-2
    5 days ago

    Yes and no. There are specialized models that perform better than general purpose LLM with vastly lower resource use. But… the output part is essentially a language model too, so it’s prone to a lot of the same issues.

    They perform A LOT better than traditional models though. So much better it’s not even funny.