• Naz@sh.itjust.works
      link
      fedilink
      arrow-up
      0
      ·
      5 months ago

      Llama-3 (Open Source) at 70B is pretty capable if you can manage to run it. I’d say it’s comparable to GPT-4, or maybe GPT 3.5.

      In second place is WizardLM-2, at 8B parameters (if you are memory constrained).

      You should run the largest model that you can fit completely in VRAM for maximum speed. Higher precision is better, FP32>16>8>4>2. 8-bit is probably more than enough for most consumer/local LLM applications/deployments, and 4-bit if you want to experiment with size vs accuracy.

      LLM Arena is a good place to benchmark the different models on a personal A/B basis, everyone has different needs and personal needs for what different models can do, from help with coding, translation, medical diagnoses, and so on.

      They all have various strengths and weaknesses presently, as optimizing a model for a specific domain or task seems to (not guaranteed, but only seems to) make it weaker in doing other tasks.