The Nvidia A100 32G SXM2 PG199+FA2 can support local agentic coding tools like Roo, Cline, and Kilo, with acceptable output speeds. Taking Qwen3 Coder 30b as an example, chat performance reaches 80-90 TPS, but drops to 40-50 TPS when using agent due to the long context window. With contexts exceeding 60K tokens, performance decreases by approximately 5 TPS.
The motherboard, CPU, and RAM also affect agent output speeds. For instance, a newer HP EliteBook 840 laptop with DDR5 memory and a PG199 card can achieve over 50 TPS, while an X99 motherboard with an E5 processor and DDR4 memory only reaches about 40 TPS.
Here are the speeds achieved on different models using the X99 platform (garantee context length > 128K by switching KV cache between FP16 and Q4) with the same prompt "generate python snakegame with pygame":
No.1 , Single PG199+Roo with Qwen3 Coder 30b: ~40 TPS,
No 2, Dual PG199+Roo with GLM4.5 Air 106b: ~18 TPS
No 3., Single PG199+Roo with Seed OSS 36b: ~15 TPS