K-AI 64 Rome 5090 3352TOPS — 2x RTX 5090 Entry Blackwell AI Server

K-AI 64 Rome 5090 3352TOPS — 2x RTX 5090 Entry Blackwell AI Server

Brand: Kentino s.r.o.
13836.00 USD In stock Buy at Merchant

K-AI 64 Rome 5090 3352TOPS Entry Blackwell 2-GPU Server 2x RTX 5090 | EPYC Milan | 3 352 TOPS INT8 3 352 TOPS INT8 64 GB VRAM GDDR7 fp8 native tensor rack ready Entry Blackwell 2-GPU server — 64 GB pooled VRAM, 3 352 INT8 TOPS, native fp8. The Ada-to-Blackwell step-up from 2x4090. A two-GPU Blackwell AI server built on ROMED8-2T / EPYC Milan. Two RTX 5090 deliver a 64 GB pooled VRAM envelope with native fp8 tensor math — roughly double the raw TOPS of 2x RTX 4090 in the same chassis footprint, and the first 2-GPU tier that comfortably runs Llama 3.3 70B Q4, Qwen3.5-122B-A10B Q4, and HunyuanVideo at bf16 / fp8 with headroom. Hardware Component Detail GPUs 2x NVIDIA GeForce RTX 5090 32 GB GDDR7 (575 W, PCIe 5.0 x16, Blackwell) VRAM pool 64 GB CPU AMD EPYC 7643 Milan (48C/96T, 225 W, 128x PCIe 4.0 lanes) Motherboard ASRock Rack ROMED8-2T (SP3, 7x PCIe 4.0 x16, 8x DDR4 ECC, 2x 10 GbE, IPMI) System RAM 128 GB DDR4-2666 ECC RDIMM (2x 64 GB) Boot / storage 1 TB NVMe M.2 (PCIe 4.0 x4) Power supply Single 2 kW ATX PSU Chassis 4U rack-mount, passive Gen4 x16 risers Cooling SP3 tower cooler, 3x 120 mm front intake + 1x 120 mm rear exhaust (industrial fans) Network Onboard dual 10 GbE (Intel X550) + IPMI Power envelope GPU draw: 2 x 575 W = 1 150 W System total at full load: ~1 475 W PSU total: 2 000 W (single 2 kW ATX) — 26.25 % headroom Workable single-PSU margin; dual-PSU upgrade available for extra headroom Lane topology ROMED8-2T fans out 2x16 Gen4 from CPU root complex. 5090 is Gen5 silicon running Gen4 x16 without bandwidth penalty for inference. No PCIe switch. No NVLink on GeForce 5090 — tensor-parallel 2-way P2P uses PCIe. What you can run With 64 GB of pooled GDDR7 VRAM across 2 Blackwell cards, this server handles 70B Q4 tensor-parallel, MoE flagships, native fp8 image generation, video AI, and multi-model concurrent serving. LLMs — text / reasoning / coding Chinese frontier Qwen3-32B Q8 / bf16 (near-fp16 quality) (~40-55 tok/s single-stream on Blackwell fp8, published reference) QwQ-32B bf16; Qwen3-30B-A3B / Coder-30B-A3B bf16 (~60 GB fits) Qwen3.5-122B-A10B Q4 (~70-75 GB with RAM spill) — MoE flagship at Q4 fits Hunyuan-A13B fp8 (~80 GB tight) or Q6 (~36 GB comfortable) Seed-OSS-36B bf16 (~72 GB tight — prefer fp8 ~36 GB) DeepSeek-R2 32B sparse MoE bf16 GLM-4.5-Air 106B/12B Q4_K_M (~60 GB) — MoE with headroom ERNIE-4.5-47B-A3B Q6-Q8 Western frontier Llama 3.3 70B Q4_K_M (~43 GB) — the headline workload for this tier (~20-28 tok/s single-stream on 2x 5090, published reference) Hermes 3 70B / Tulu 3 70B Q4 — open post-training Llama derivatives Mistral Small 3 / Magistral / Devstral Small 2 24B bf16; Mixtral 8x7B bf16 Gemma 3 27B multimodal bf16 + reasoning headroom Phi-4 14B bf16; Nemotron-Super 49B Q6-Q8 gpt-oss-20b MXFP4 (16 GB) + gpt-oss-120b MXFP4 (80 GB — fits tight with short ctx) OLMo 2 32B / OLMo 3.1-32B-Think bf16 Vision-Language Qwen3-VL-32B / Qwen3-VL-30B-A3B / Qwen3-Omni-30B-A3B bf16; InternVL3.5-38B bf16; Llama 3.2 90B Vision Q4 (~52 GB); Pixtral 12B bf16; Pixtral Large 124B Q3 (~58 GB tight); Gemma 3 27B multimodal bf16; PaliGemma 2 28B bf16; Molmo 72B Q4 (~45 GB). Image generation 5090 native fp8 is the speed story — FLUX.1 / SD 3.5 / HunyuanImage run materially faster than on Ada: FLUX.1 [dev] / [schnell] fp8 native (~12 GB) with 2x parallel across cards (~8-12 seconds per 1024x1024 image on Blackwell, published reference); FLUX.1 Kontext [dev]; SD 3.5 Large (18 GB fp16 or 11 GB fp8); SDXL 1.0; HunyuanImage-2.1 bf16 (~34 GB); HunyuanImage-3.0 NF4; AuraFlow v0.3 / OmniGen v1 / Kolors 2.0. Video generation Wan 2.2 T2V-A14B / I2V-A14B bf16 (~54 GB total) — MoE two-expert at full precision; Wan 2.2 TI2V-5B bf16 per-card, 2 parallel tenants; HunyuanVideo 13B Q4-Q5 (~30 GB), fp8 tight; HunyuanVideo 1.5 (8.3B) bf16 per-card; Open-Sora 2.0 (11B) bf16; CogVideoX-5B / 1.5 bf16; Mochi-1 bf16 (~42 GB fits); LTX-Video 2B; NVIDIA Cosmos Predict 2. Audio / Speech / TTS Same full Chinese + Western speech stack as the 4090 tier fits with more headroom: Whisper v3 + Parakeet + Canary + Moshi + Step-Audio 2 / R1 + CosyVoice 3.0 + Kokoro + Stable Audio Open + MusicGen + AudioGen + SeamlessM4T v2 + MMS. On fp8-native 5090, Whisper / Parakeet decode at materially higher real-time factor. Whisper v3 turbo runs at ~75x realtime on Blackwell (published reference). Multi-model / multi-tenant Resident stack: Llama 3.3 70B Q4 (~43 GB tensor-parallel 2-way) + FLUX.1 fp8 (~12 GB) + Whisper-turbo + Moshi 2-4 concurrent tenants on 32B class at Q6-Q8 per card LoRA / QLoRA fine-tuning of 7-14B comfortable, 24-32B tight Target workloads Small-team developer workstation with 70B Q4 serving headroom Blackwell step-up from a 2x RTX 4090 box — same chassis, ~2.5x TOPS, fp8 native Image / video generation workstation with FLUX native fp8 speedup Multi-model concurrent box: 70B Q4 + FLUX + Whisper + Moshi resident simultaneously 4-8 concurrent user inference endpoint for 32B class LLMs Published performance references Published reference | 2x RTX 5090 comparable hardware Benchmark Result Llama 3.3 70B Q4_K_M llama.cpp decode ~20-28 tok/s single-stream Qwen3-32B Q8 vLLM single-stream ~45-60 tok/s decode at fp8 FLUX.1 [dev] fp8 native Blackwell ~1.5-1.9 s per 1024x1024 at 20 steps HunyuanVideo 13B Q5 TP-2 5 s 720p in ~5-7 min Published, not measured on Kentino hardware. Kentino measured reference on 4x RTX 4090: 647 TFLOPS fp16, 179 tok/s batch-32 aggregate. Not ideal for 100B+ dense models at bf16 (DeepSeek-V3, Kimi K2, Mistral Large 3 — need 256+ GB pool) Frontier video generation at bf16 long-form full-resolution Warranty and lead time 2 years parts warranty 1 year labor warranty 10-28 days lead time Build includes assembly, BIOS configuration, driver install, burn-in testing, and functional verification. Lead time depends on component availability, confirmed at order. Recommended add-ons NVIDIA ConnectX-5 100 GbE MCX555A-ECAT Upgrade boot drive to 2 TB NVMe — or 4 TB Upgrade RAM to 256 GB (4x 64 GB) — MoE KV cache headroom / multi-model concurrent serving Rack PDU (C13/C19 metered) and 3 kVA online UPS

Variants (1)
  • Default Title — 13836.00 USD — In stock

AI Readiness

Good foundation, but some important product data is still missing.

83%