@Hawk

Hawk@lemmynsfw.com · 3 months ago

An LLM is an equation, fundamentally. Map a word to a number, equation, map back to words and now llm. If you’re curious write a name generator using torch with an rnn (plenty of tutorials online) and you’ll have a good idea.

The parameters of the equation are referred to as weights. They release the weights but may not have released:

source code for training
there source code for inference / validation
training data
cleaning scripts
logs, git history, development notes etc.

Open source is typically more concerned with the open nature of the code base to foster community engagement and less on the price of the resulting software.

Curiously, open weighted LLM development has somewhat flipped this on its head. Where the resulting software is freely accessible and distributed, but the source code and material is less accessible.

Hawk@lemmynsfw.com · 3 months ago

The energy use isn’t that extreme. A forward pass on a 7B can be achieved on a Mac book.

If it’s code and you RAG over some docs you could probably get away with a 4B tbh.

ML models use more energy than a simple model, however, not that much more.

The reason large companies are using so much energy is that they are using absolutely massive models to do everything so they can market a product. If individuals used the right model to solve the right problem (size, training, feed it with context etc. ) there would be no real issue.

It’s important we don’t conflate the excellent progress we’ve made with transformers over the last decade with an unregulated market, bad company practices and limited consumer Tech literacy.

TL;DR: LLM != search engine