Trying to Build My First (Ever) Deep Learning Rig

Hi All,

My first forum post evert so please be easy on me!

I'm a computer scientist (not specialised in ML/DL) and trying to make it through this recent AI wave. I'm at that stage of trying to build my own (big) DL models and infrastructure is the biggest challenge nowadays. I'm trying to build a solid, future-extensible rig that might survive the next 3 to 5 years. I don't know much about parts but my requirements (which you can add to or help shape better) are:
1. Motherboard with the possibility of adding multiple GPUs: I'm assuming a motherboard with SLI is the way to go; but that's all I know;
2. CPU: Don't know much but I guess more cores/threads is the better
3. GPU: I would like to start with one at the moment; assuming the best one for buckets right now is the 4070Ti (please correct me if I'm wrong)
4. RAM: don't know much about the technical details but again the bigger the better (would like to start with 2x32GB)
5. Don't know much about PSUs (how much watts are needed for future proofing) and coolers for both CPU and GPU (assuming liquid cooling is the way to go);
6. A case that's great for future proofing and offers better air-flow.

That'll do it for now but please feel free to share your expertise.

ALSO: Would you recommend going with a CLOUD PROVIDER instead of building your own rig? Which one is more cost effective?

Happy weekend y'all!

Comments

  • +2

    None of these questions matter until you figure out what you're trying to run and what the optimal hardware for it is. Also, what your budget is… No point suggesting a Titan RTX if you have $3k total budget.

    You have assumed a gaming GPU but is that the right choice? Wouldn't maximising CUDA cores on a Tesla or Quadro card be your main criteria? Or are you dependent on other libraries like ROCm or TensorFlow etc? Is GPU memory capacity more important or memory bandwidth? So on…

    Once again, comparison between a dedicated machine vs cloud provider depends on your specific usage requirements and budget. No one can tell you a generic answer to that.

    This might not be the best place to ask & answer these questions. There's bound to be more focused forums or subreddits around where you can get much more technical expertise.

    • Hi @Hybroid, thanks for the feedback. What I meant is definitely a starting point build that I can extend and upgrade with time (future proofing). So to be more specific:
      1) What would be a motherboard that's multiple gpu capable and has the future proofing capability;
      2) And same goes for a compatible CPU;

      The rest (GPU, RAM, etc.) I think is more budget-dependant.

  • +5

    They have the internet on computers now?

  • +2

    I'm trying to build a solid, future-extensible rig that might survive the next 3 to 5 years

    Based on your post, you are in the "hobbyist" tier so start off small and build from there

    Work out what models you are actually playing with and what libraries you need to use before deciding on the GPU hardware

    For NVIDIA/CUDA, from RTX 4060 Ti 16GB (available July) to RTX 4090 24GB. RTX 3060 12GB can be an ultra-budget choice for getting your hands dirty

    For AMD/ROCm, from RX 6800 16GB to RX 7900 XTX 24GB

    For Intel/OpenVINO, Arc A770 16GB can also be an ultra-budget choice

    Once you have experimented with local hardware and start hitting limits, then price out hardware upgrades or cloud compute costs

  • +1

    ai will end us all….

    • +1

      ai is a fad, just like fidget spinners

    • +3

      This guy’s wallet will be the first casualty by the sounds of it.

  • +2

    Just buy a 3700X and a 3060 and see how much you run into limitations. You're far better off buying small amounts of time on Azure though, rather than trying to build a beefy machine that's only really working a fraction of the time.

  • Would you recommend going with a CLOUD PROVIDER instead of building your own rig? Which one is more cost effective?

    Depends on the ML model you're building / running… isn't it just most cost-effective to run it on the cloud instead? eg. Google Colab provides GPU compute for Stable Diffusion models. Even if you run of memory on the free tier; you can switch to a paid tier.

  • <script src="ChatGPT.com"></script>

  • +2

    If you want to try a free cloud setup to dip your toe in - https://www.oracle.com/au/cloud/free/
    They have an always free tier with 50GB of storage - check the page for more details:

    Always Free services
    Services you can use for an unlimited time.

    • Two Oracle Autonomous Databases with powerful tools like Oracle APEX and Oracle SQL Developer
    • Two AMD Compute VMs
    • Up to 4 instances of ARM Ampere A1 Compute with 3,000 OCPU hours and 18,000 GB hours per month
    • Block, Object, and Archive Storage; Load Balancer and data egress; Monitoring and Notifications
  • +2

    If you are only starting, I'd probably suggest not to invest so much on expensive rig. You can train most basic DL models on CPU. The difference is the training on CPU might be a bit slower. I've trained a deep learning model on a 10yo Macbook pro before (on its CPU too), and it only took me a few hours. Though the model was a simple DL architecture that I built from scratch with PyTorch.

    Some of the advice I was given initially was to start small. Don't go trying to fine tune 1million parameters on billions of images. If you keep your model small, with a reasonably sized training sets, even some basic rig can get you started. And once you have a solid basic model, you can bring this to commercial cloud to scale it for bigger training (utilising the misc free credits).

    Something like Google Colab is also worth exploring. Very easy to setup and free, though the wait time can be painful sometimes.

    After some experience, you can then decide which path you want to take. Cloud providers give you access to the latest and greatest hardware without the expensive capital. But I wouldnt go trying things out without knowing details to go with cloud. Training DL models using cloud instances can be very expensive quickly if the model isnt properly optimised. Though it might be ok to be used for inference.

    • Thanks @richrichie. A very rich advise!!

Login or Join to leave a comment