GPT OSS 120B
OpenAI
Multilingual Thinking Tool Calls
MXFP4 nvidia-h100-80gb-350gb
Prerequisites
Ensure your GPU nodes are prepared with the NVIDIA container toolkit:
ansible-playbook prositronic.infra.nvidia_container_toolkit Deploy Command
helmfile --state-values-file <(curl -s https://prositronic.org/values/gpt-oss-120b/mxfp4/nvidia-h100-80gb-350gb.yaml) apply Generated values.yaml
/values/gpt-oss-120b/mxfp4/nvidia-h100-80gb-350gb.yaml
# Prositronic Model Card
# https://prositronic.org/deploy/gpt-oss-120b/mxfp4/nvidia-h100-80gb-350gb
#
# Model: GPT OSS 120B (MXFP4)
# Hardware: nvidia-h100-80gb-350gb
image:
backend: cuda13
modelDownloads:
- name: gpt-oss-120b
url: https://huggingface.co/ggml-org/gpt-oss-120b-GGUF/resolve/main/gpt-oss-120b-mxfp4-00001-of-00003.gguf
filename: gpt-oss-120b.gguf
models:
gpt-oss-120b:
m: /models/gpt-oss-120b.gguf
ngl: 36
ctx-size: 131072
flash-attn: true
load-on-startup: true
resources:
requests:
nvidia.com/gpu: 1
memory: 80Gi
limits:
nvidia.com/gpu: 1
memory: 350Gi