deepseekdeutsch

DeepSeek-V3 Weight File Documentation

New Fields in config.json


Weight Structure Overview

The DeepSeek-V3 weight file consists of two main components: Main Model Weights and MTP Modules.

1. Main Model Weights

Structural Details

2. Multi-Token Prediction (MTP) Modules

Structural Details


Loading Rules


FP8 Weight Documentation

DeepSeek-V3 natively supports FP8 weight format with 128x128 block scaling.

FP8 Configuration

The FP8 weight file introduces a quantization_config field to describe the quantization method. Below is an example configuration:

"quantization_config": {
  "activation_scheme": "dynamic",
  "fmt": "e4m3",
  "quant_method": "fp8",
  "weight_block_size": [128, 128]
}

Dequantization Method

The FP8 weight file includes a weight_scale_inv field, which stores the dequantization scale for each weight block.

Through dequantization of the FP8 weights, runtime operations enable online quantization at a granularity of per-token-per-128-channel.