Hungry Hungry Llamas: Resource Consumption
I think we’ve all accepted now that we can’t talk about small data. Small data is insulting, correlating to the desire to grow a beard and buy a race car. Big data, we only talk about big data. Your csv, it’s the biggest data you or I have ever seen. We’ll ignore the truth that if you had big data you’re likely using something like Avro or Parquet. LLMs are BIG data. Like unfathomable data. It’s really mind blowing how much of a beefcake these things are in the data world. Given that we’re all building some future Sky Net, makes sense LLMs would require the same beefiness we know Schwarzenegger for.
To give perspective, your contact database, or customer database is likely less than a few gigs. Steam games take more disk space than your company. You wait longer to play vidya games than it takes to transfer the fundamentals of your business. So when running an LLM and you see 150 GB+ requirements for disk space and ram it should make you want to change your bank passwords. It’s expensive. So much so, it’s hard to fathom the incredulous amount of computing required. And this is for dev use. Commercially, handling all of the hardware requirements must be eye watering and wallet burning; somewhere around Bitcoin maybe less. Not that I’m an expert here, but last I knew Bitcoin has mostly been running on FGPAs (specialized hardware for mining. Only used for Bitcoin).
Not saying this is what you should buy or even that this is something correctly specced, but here’s an example of just getting your feet wet with running decently quantized (think truncating numbers so they take less space, 1.234 is now 1.23) models:
That’s about $5,000 and should get you running LLama3 70B quantized. You see that power supply though? That thing is 1500W! And guess what you still can’t run the full model on those specs. I don’t need to excuse tangents, but this should give us all pause on the idea of free and open computing. For something that has encoded a substantial portion of human knowledge, we should be free to run, modify, and transmit these with impunity. If you want to move up and get a better taste of the full model, you’ll need to start with doubling those specs. At a certain point you’ll have to start looking into multiple PSUs and possibly running 240v (the same voltage as your dryer). Now there are options to reduce your costs, like buying GPUs for datacenter use or old sever hardware. But, if there’s an axiom here, it’s that you will still need to provision a massive amount of hardware and power. And as a reminder this isn’t even for commercial usage where you might be concerned with things like hardware redundancy and failover.