• 0 Posts
  • 2 Comments
Joined 2 years ago
cake
Cake day: June 12th, 2023

help-circle
  • An LLM is an equation, fundamentally. Map a word to a number, equation, map back to words and now llm. If you’re curious write a name generator using torch with an rnn (plenty of tutorials online) and you’ll have a good idea.

    The parameters of the equation are referred to as weights. They release the weights but may not have released:

    • source code for training
    • there source code for inference / validation
    • training data
    • cleaning scripts
    • logs, git history, development notes etc.

    Open source is typically more concerned with the open nature of the code base to foster community engagement and less on the price of the resulting software.

    Curiously, open weighted LLM development has somewhat flipped this on its head. Where the resulting software is freely accessible and distributed, but the source code and material is less accessible.


  • The energy use isn’t that extreme. A forward pass on a 7B can be achieved on a Mac book.

    If it’s code and you RAG over some docs you could probably get away with a 4B tbh.

    ML models use more energy than a simple model, however, not that much more.

    The reason large companies are using so much energy is that they are using absolutely massive models to do everything so they can market a product. If individuals used the right model to solve the right problem (size, training, feed it with context etc. ) there would be no real issue.

    It’s important we don’t conflate the excellent progress we’ve made with transformers over the last decade with an unregulated market, bad company practices and limited consumer Tech literacy.

    TL;DR: LLM != search engine