Skip to content

GPT OSS 120B

OpenAI
Multilingual Thinking Tool Calls
MXFP4 nvidia-h100-80gb-350gb

Prerequisites

Ensure your GPU nodes are prepared with the NVIDIA container toolkit:

ansible-playbook prositronic.infra.nvidia_container_toolkit

Deploy Command

helmfile --state-values-file <(curl -s https://prositronic.org/values/gpt-oss-120b/mxfp4/nvidia-h100-80gb-350gb.yaml) apply

Generated values.yaml

/values/gpt-oss-120b/mxfp4/nvidia-h100-80gb-350gb.yaml

# Prositronic Model Card
# https://prositronic.org/deploy/gpt-oss-120b/mxfp4/nvidia-h100-80gb-350gb
#
# Model: GPT OSS 120B (MXFP4)
# Hardware: nvidia-h100-80gb-350gb

image:
  backend: cuda13
modelDownloads:
  - name: gpt-oss-120b
    url: https://huggingface.co/ggml-org/gpt-oss-120b-GGUF/resolve/main/gpt-oss-120b-mxfp4-00001-of-00003.gguf
    filename: gpt-oss-120b.gguf
models:
  gpt-oss-120b:
    m: /models/gpt-oss-120b.gguf
    ngl: 36
    ctx-size: 131072
    flash-attn: true
    load-on-startup: true
resources:
  requests:
    nvidia.com/gpu: 1
    memory: 80Gi
  limits:
    nvidia.com/gpu: 1
    memory: 350Gi