https://www.ebay.com/p/116332559 lga2011 motherboards quite cheap, insert 2 xeon 2696v4 44 threads each totalling at 88 threads and 8 ddr4 32gb sticks, it comes quite cheap actually, you can also install Nvidia p40 with 24gb each, you can max out this build for ai for under 2000$
Typically you need about 1GB graphics RAM for each billion parameters (i.e. one byte per parameter). This is a 405B parameter model. Ouch.
Or you could run it via cpu and ram at a much slower rate.
Yeah uh let me just put in my 512GB ram stick…
Samsung do make them.
Goodluck finding 512gb of VRAM.
https://www.ebay.com/p/116332559 lga2011 motherboards quite cheap, insert 2 xeon 2696v4 44 threads each totalling at 88 threads and 8 ddr4 32gb sticks, it comes quite cheap actually, you can also install Nvidia p40 with 24gb each, you can max out this build for ai for under 2000$
Finally! My dumb dumb 1TB ram server (4x E5-4640 + 32x32GB DDR3 ECC) can shine.
At work we habe a small cluster totalling around 4TB of RAM
It has 4 cooling units, a m3 of PSUs and it must take something like 30 m2 of space
When the 8 bit quants hit, you could probably lease a 128GB system on runpod.
Can you run this in a distributed manner, like with kubernetes and lots of smaller machines?
According to huggingface, you can run a 34B model using 22.4GBs of RAM max. That’s a RTX 3090 Ti.
Hmm, I probably have that much distributed across my network… maybe I should look into some way of distributing it across multiple gpu.
Frak, just counted and I only have 270gb installed. Approx 40gb more if I install some of the deprecated cards in any spare pcie slots i can find.
Ypu mean my 4090 isn’t good enough 🤣😂