This content provides a practical blueprint for building a $2,000 local AI server capable of running the massive Deepseek R1 671b model. Learn the specific hardware choices, like cost-efficient AMD Epics, and critical BIOS settings needed to achieve a respectable 4 tokens per. In the next few tables I will lay out the testing I did not only on Deepseek R1 671b Q4 but also Gemma 3, QwQ and Cogito on CPU only. Building a dedicated home server that can handle more then 16 pcie lanes falls well into my favorite category of systems, servers and workstations. 5 to 4 tokens per second on CPU, which is considered respectable for its price range. Priced at approximately ¥149,000 RMB (around $20,000 USD. In the quest to understand the complexities of running deep learning models, it's important to explore the intricacies and challenges presented by running these models locally. The 671b model, in particular, presents a unique set of challenges and opportunities. (Please check the video. The challenge with the 671B parameter LLM is that it takes a lot of hardware to run in the data center, given the FP16 model takes almost 1. 128TB), but it can fit into an 8x GPU AMD.
[PDF Version]