性能结果
Bow平台-训练
这里提供了 Bow Pod 平台的初步训练性能结果。 此处吞吐量定义为模型每秒处理的输入数据点(序列、图像或行)的数量。
以下结果详细说明了在指定配置中,每个引用模型的吞吐量。
Model | Variant | Platform | SDK Version | Framework | Dataset | Batch Size | Precision | Throughput (items/sec) |
---|---|---|---|---|---|---|---|---|
BERT Large | Ph1 Pre-Training (SL128) - Packed | Bow Pod16 | SDK 3.1 | PopART | Wikipedia | 54,784 | 16.16 | 6044 |
BERT Large | Ph1 Pre-Training (SL128) | Bow Pod16 | SDK 3.1 | TensorFlow2 | Wikipedia | 65,280 | 16.16 | 4528 |
BERT Large | Ph1 Pre-Training (SL128) - Packed | Bow Pod16 | SDK 3.1 | PyTorch | Wikipedia | 56,064 | 16.16 | 5592 |
BERT Large | Ph1 Pre-Training (SL128) - Packed | Bow Pod64 | SDK 3.1 | PopART | Wikipedia | 54,784 | 16.16 | 22759 |
BERT Large | Ph1 Pre-Training (SL128) | Bow Pod64 | SDK 3.0 | TensorFlow2 | Wikipedia | 66,560 | 16.16 | 18199 |
BERT Large | Ph1 Pre-Training (SL128) - Packed | Bow Pod64 | SDK 3.1 | PyTorch | Wikipedia | 56,064 | 16.16 | 18494 |
BERT Large | Ph2 Pre-Training (SL512) - Packed | Bow Pod16 | SDK 3.1 | PopART | Wikipedia | 9,600 | 16.16 | 2126 |
BERT Large | Ph2 Pre-Training (SL512) - Packed | Bow Pod16 | SDK 3.1 | PyTorch | Wikipedia | 8,192 | 16.16 | 1969 |
BERT Large | Ph2 Pre-Training (SL512) - Packed | Bow Pod64 | SDK 3.1 | PopART | Wikipedia | 9,600 | 16.16 | 7789 |
BERT Large | Ph2 Pre-Training (SL512) - Packed | Bow Pod64 | SDK 3.1 | PyTorch | Wikipedia | 8,192 | 16.16 | 6510 |
BERT Large | Fine-Tuning (SL384 - SQuAD) | Bow Pod16 | SDK 3.1 | PopART | SQuAD | 256 | 16.16 | 1183 |
BERT Large | Fine-Tuning (SL384 - SQuAD) | Bow Pod16 | SDK 3.1 | PyTorch | SQuAD | 256 | 16.16 | 989 |
BERT Base | Ph1 Pre-Training (SL128) | Bow Pod16 | SDK 3.1 | PopART | Wikipedia | 65,536 | 16.16 | 16391 |
BERT Base | Ph1 Pre-Training (SL128) | Bow Pod16 | SDK 3.1 | TensorFlow2 | Wikipedia | 65,280 | 16.16 | 15157 |
BERT Base | Ph1 Pre-Training (SL128) | Bow Pod16 | SDK 3.1 | PyTorch | Wikipedia | 65,536 | 16.16 | 15895 |
BERT Base | Ph2 Pre-Training (SL512) | Bow Pod16 | SDK 3.1 | PopART | Wikipedia | 16,384 | 16.16 | 3656 |
BERT Base | Ph2 Pre-Training (SL384) | Bow Pod16 | SDK 3.1 | TensorFlow2 | Wikipedia | 16,320 | 16.16 | 4385 |
BERT Base | Ph2 Pre-Training (SL512) | Bow Pod16 | SDK 3.1 | PyTorch | Wikipedia | 16,384 | 16.16 | 3512 |
Group BERT Base | Ph1 Pre-Training (SL128) | Bow Pod16 | SDK 3.0 | TensorFlow1 | Wikipedia | 65,520 | 16.16 | 7187 |
Group BERT Base | Ph2 Pre-Training (SL384) | Bow Pod16 | SDK 3.0 | TensorFlow1 | Wikipedia | 32,800 | 16.16 | 2288 |
Group BERT Base | Ph1 Pre-Training (SL128) | Bow Pod64 | SDK 3.0 | TensorFlow1 | Wikipedia | 64,800 | 16.16 | 26425 |
Group BERT Base | Ph2 Pre-Training (SL384) | Bow Pod64 | SDK 3.0 | TensorFlow1 | Wikipedia | 32,640 | 16.16 | 7572 |
BERT Base - HuggingFace | Fine-Tuning (SL384 - SQuAD) | Bow Pod16 | SDK 3.0 | TensorFlow2 | SQuAD | 320 | 16.16 | 1014 |
GPT2 | GPT2-Large (SL512) | Bow Pod16 | SDK 3.1 | PyTorch | Wikipedia | 8,192 | 16.16 | 414 |
GPT2 | GPT2-Large (SL512) | Bow Pod64 | SDK 3.1 | PyTorch | Wikipedia | 8,192 | 16.16 | 1592 |
GPT2 | GPT2-Large (SL1024) | Bow Pod16 | SDK 3.1 | PyTorch | Wikipedia | 8,192 | 16.16 | 178 |
GPT2 | GPT2-Medium (SL1024) | Bow Pod16 | SDK 3.1 | PyTorch | Wikipedia | 8,192 | 16.16 | 337 |
GPT2 | GPT2-Medium (SL1024) | Bow Pod64 | SDK 3.1 | PyTorch | Wikipedia | 8,192 | 16.16 | 1317 |
GPT2 | GPT2-Small (SL1024) | Bow Pod16 | SDK 3.1 | PyTorch | Wikipedia | 8,192 | 16.16 | 1065 |
GPT2 | GPT2-Small (SL1024) | Bow Pod64 | SDK 3.1 | PyTorch | Wikipedia | 8,192 | 16.16 | 4096 |
Conformer-Medium | WeNet-Conformer-Medium | Bow Pod16 | SDK 3.1 | PyTorch | AiShell1 | 288 | 16.16 | 1209 |
RNN-T | Transformer Transducer | Bow Pod16 | SDK 3.0 | PopART | Generated | 32 | 32.32 | 703 |
DeepVoice3 | Bow-2000 | SDK 3.1 | PopART | VCTK Corpus | 128 | 32.32 | 9653 | |
FastSpeech2 | Bow Pod16 | SDK 3.1 | TensorFlow2 | LJ Speech | 64 | 16.16 | 1653 | |
FastPitch | frames/s | Bow Pod16 | SDK 3.1 | PyTorch | Generated | 128 | 32.32 | 1504496 |
TGN | Temporal Graph Network | 1x Bow IPU | SDK 3.1 | PyTorch | 31883 | |||
Cluster-GCN | Bow-2000 | SDK 3.1 | TensorFlow2 | PPI | 16.16 | 671344 | ||
Cluster-GCN | Bow-2000 | SDK 3.1 | TensorFlow2 | ArXiv | 16.16 | 3603374 | ||
Cluster-GCN | Bow-2000 | SDK 3.1 | TensorFlow2 | 16.16 | 1971876 | |||
Cluster-GCN | Bow-2000 | SDK 3.1 | TensorFlow2 | Products | 16.16 | 3330611 | ||
Cluster-GCN | Bow-2000 | SDK 3.1 | TensorFlow2 | ogbn-mag | 16.16 | 2567104 | ||
MPNN-GIN | MP Graph Isomorphism n/w | Bow-2000 | SDK 3.1 | TensorFlow2 | Generated | 1,024 | 16.16 | 455832 |
ResNet-50 v1.5 | Bow Pod16 | SDK 3.0 | TensorFlow1 | ImageNet2012 | 3,520 | 16.16 | 44059 | |
ResNet-50 v1.5 | Bow Pod16 | SDK 3.1 | PyTorch | ImageNet2012 | 16,384 | 16.16 | 36225 | |
ResNet-50 v1.5 | Bow Pod64 | SDK 3.0 | TensorFlow1 | ImageNet2012 | 5,120 | 16.16 | 153205 | |
ResNet-50 v1.5 | Bow Pod64 | SDK 3.1 | PyTorch | ImageNet2012 | 16,384 | 16.16 | 111282 | |
ResNet-50 v1.5 | Bow Pod256 | SDK 3.0 | TensorFlow1 | ImageNet2012 | 10,240 | 16.16 | 456906 | |
EfficientNet-B4 | G16-EfficientNet | Bow Pod16 | SDK 3.0 | TensorFlow1 | ImageNet2012 | 6,144 | 16.16 | 9000 |
EfficientNet-B4 | G16-EfficientNet | Bow Pod16 | SDK 3.1 | PyTorch | ImageNet2012 | 1,024 | 16.32 | 8307 |
EfficientNet-B4 | G16-EfficientNet | Bow Pod64 | SDK 3.0 | TensorFlow1 | ImageNet2012 | 6,144 | 16.16 | 34140 |
EfficientNet-B4 | G16-EfficientNet | Bow Pod256 | SDK 3.0 | TensorFlow1 | ImageNet2012 | 6,144 | 16.16 | 122665 |
ResNeXt101 | Bow Pod16 | SDK 3.0 | TensorFlow1 | ImageNet2012 | 768 | 16.16 | 12277 | |
ViT | Pre-Training | Bow Pod16 | SDK 3.1 | PyTorch | ImageNet1k | 65,536 | 16.16 | 7607 |
ViT | Pre-Training | Bow Pod64 | SDK 3.1 | PyTorch | ImageNet1k | 65,536 | 16.16 | 28132 |
ViT | Fine-Tuning | Bow Pod16 | SDK 3.1 | PyTorch | ImageNet1k | 2,040 | 16.16 | 7914 |
DINO | Vision Transformer | Bow Pod16 | SDK 3.1 | PyTorch | ImageNet1k | 3,200 | 16.16 | 737 |
DINO | Vision Transformer | Bow Pod64 | SDK 3.1 | PyTorch | ImageNet1k | 3,200 | 16.16 | 3583 |
Swin-Base (224) | Vision Transformer - Pre-Training | Bow Pod16 | SDK 3.1 | PyTorch | ImageNet1k | 512 | 32.32 | 1455 |
Swin-Tiny (224) | Vision Transformer - Pre-Training | Bow Pod16 | SDK 3.1 | PyTorch | ImageNet1k | 1,024 | 32.32 | 3817 |
Swin-Large (224) | Vision Transformer - Fine-Tuning | Bow Pod16 | SDK 3.1 | PyTorch | ImageNet1k | 8,196 | 16.16 | 3276 |
UNet (Medical) | Bow-2000 | SDK 3.1 | TensorFlow2 | EM segmentation | 24 | 16.16 | 150 | |
Mini DALL-E | Bow Pod16 | SDK 3.1 | PyTorch | COCO 2017 | 6,144 | 16.16 | 1828 | |
Mini DALL-E | Bow Pod64 | SDK 3.1 | PyTorch | COCO 2017 | 24,576 | 16.16 | 6812 | |
MAE | Masked Autoencoder for visual representation learning | Bow Pod16 | SDK 3.1 | PyTorch | ImageNet | 4,128 | 16.16 | 7111 |
Frozen In Time | Multimodal - Pre-Training (1 frame) | Bow Pod8 | SDK 3.1 | PyTorch | webvid | 240 | 16.16 | 457 |
CLIP | Multimodel (language/vision) | Bow Pod8 | SDK 3.1 | PyTorch | c3m | 795 | 16.16 | 2653 |
Bow平台-推理
此处模型推理是指在输入数据上运行经过训练的模型来推断输出。 实际商业应用中的推理性能通常根据两个指标来衡量:吞吐量(如前所述)和时延,在此上下文中是指为模型在给定输入的情况下提供输出所花费的时间。
以下为 Bow-2000 平台上指定批大小下的吞吐量和延迟结果。
Model | Variant | Platform | SDK Version | Framework | Dataset | Batch Size | Precision | Throughput (items/sec) | Latency (ms) |
---|---|---|---|---|---|---|---|---|---|
BERT-Large | SL128 | Bow-2000 | SDK 3.1 | PopART | SQuAD | 4 | 16.16 | 2908 | 1.36 |
BERT-Large | SL128 | Bow-2000 | SDK 3.1 | PopART | SQuAD | 8 | 16.16 | 4096 | 1.94 |
BERT-Large | SL128 | Bow-2000 | SDK 3.1 | PopART | SQuAD | 12 | 16.16 | 4655 | 2.56 |
BERT-Large | SL128 | Bow-2000 | SDK 3.1 | PopART | SQuAD | 16 | 16.16 | 5292 | 3.01 |
BERT-Base | SL128 | Bow-2000 | SDK 3.1 | PopART | SQuAD | 4 | 16.16 | 6508 | 0.6 |
BERT-Base | SL128 | Bow-2000 | SDK 3.1 | PopART | SQuAD | 320 | 16.16 | 28069 | 11.41 |
GPT2 | GPT2-Small | Bow-2000 | SDK 3.1 | PyTorch | Synthetic (host-generated) | 4 | 16.16 | 1339 | 5.69 |
GPT2 | GPT2-Medium | Bow-2000 | SDK 3.1 | PyTorch | Synthetic (host-generated) | 2 | 16.16 | 337 | 11.92 |
GPT2 | GPT2-Large | Bow Pod16 | SDK 3.1 | PyTorch | Synthetic (host-generated) | 2 | 16.16 | 96 | 20.65 |
ResNet-50v1.5 | lowest latency config | Bow-2000 | SDK 3.1 | PyTorch | Synthetic (host-generated) | 4 | 16.16 | 9517 | 0.4 |
ResNet-50v1.5 | higher throughput config | Bow-2000 | SDK 3.1 | PyTorch | Synthetic (host-generated) | 4 | 16.16 | 12679 | 1.48 |
ResNet-50v1.5 | Bow-2000 | SDK 3.1 | PyTorch | Synthetic (host-generated) | 256 | 16.16 | 53870 | 21.71 | |
EfficientNet-B0 | lowest latency config | Bow-2000 | SDK 3.1 | PyTorch | Synthetic (host-generated) | 4 | 16.16 | 10994 | 0.34 |
EfficientNet-B0 | higher throughput config | Bow-2000 | SDK 3.1 | PyTorch | Synthetic (host-generated) | 4 | 16.16 | 15210 | 1.21 |
EfficientNet-B0 | Bow-2000 | SDK 3.1 | PyTorch | Synthetic (host-generated) | 144 | 16.16 | 56548 | 11.39 | |
EfficientNet-B4 | lowest latency config | Bow-2000 | SDK 3.1 | PyTorch | Synthetic (host-generated) | 4 | 16.16 | 4545 | 0.84 |
EfficientNet-B4 | higher throughput config | Bow-2000 | SDK 3.1 | PyTorch | Synthetic (host-generated) | 4 | 16.16 | 5804 | 3.25 |
EfficientNet-B4 | Bow-2000 | SDK 3.1 | PyTorch | Synthetic (host-generated) | 48 | 16.16 | 18658 | 11.89 | |
ResNeXt101 | lowest latency config | Bow-2000 | SDK 3.1 | PyTorch | Synthetic (host-generated) | 4 | 16.16 | 4653 | 0.83 |
ResNeXt101 | higher throughput config | Bow-2000 | SDK 3.1 | PyTorch | Synthetic (host-generated) | 4 | 16.16 | 5331 | 3.57 |
ResNeXt101 | Bow-2000 | SDK 3.1 | PyTorch | Synthetic (host-generated) | 64 | 16.16 | 16278 | 19.24 | |
Yolo v4 | image 896, bps 5, max det 200 | Bow-2000 | SDK 3.1 | PyTorch | Synthetic (host-generated) | 4 | 16.16 | 926 | 6.49 |
Yolo v4 | image 896, bps 10, max det 300 | Bow-2000 | SDK 3.1 | PyTorch | Synthetic (host-generated) | 4 | 16.16 | 991 | 6.75 |
Yolo v4 | image 640, bps 5, max det 200 | Bow-2000 | SDK 3.1 | PyTorch | Synthetic (host-generated) | 8 | 16.16 | 1845 | 6.61 |
Yolo v4 | image 640, bps 10, max det 300 | Bow-2000 | SDK 3.1 | PyTorch | Synthetic (host-generated) | 8 | 16.16 | 1954 | 6.9 |
Yolo v4 | image 512, bps 5, max det 200 | Bow-2000 | SDK 3.1 | PyTorch | Synthetic (host-generated) | 8 | 16.16 | 2478 | 4.84 |
Yolo v4 | image 512, bps 10, max det 300 | Bow-2000 | SDK 3.1 | PyTorch | Synthetic (host-generated) | 8 | 16.16 | 2683 | 4.94 |
Yolo v4 | image 416, bps 5, max det 200 | Bow-2000 | SDK 3.1 | PyTorch | Synthetic (host-generated) | 8 | 16.16 | 3178 | 3.71 |
Yolo v4 | image 416, bps 10, max det 100 | Bow-2000 | SDK 3.1 | PyTorch | Synthetic (host-generated) | 16 | 16.16 | 4277 | 6.28 |
EfficientDet-D0 | Bow-2000 | SDK 3.1 | TF2 w/Keras | Synthetic (host-generated) | 16 | 16.16 | 5833 | 0.69 | |
EfficientDet-D1 | Bow-2000 | SDK 3.1 | TF2 w/Keras | Synthetic (host-generated) | 12 | 16.16 | 3283 | 1.22 | |
EfficientDet-D2 | Bow-2000 | SDK 3.1 | TF2 w/Keras | Synthetic (host-generated) | 8 | 16.16 | 2165 | 1.85 | |
EfficientDet-D3 | Bow-2000 | SDK 3.1 | TF2 w/Keras | Synthetic (host-generated) | 4 | 16.16 | 1109 | 3.61 | |
EfficientDet-D4 | Bow-2000 | SDK 3.1 | TF2 w/Keras | Synthetic (host-generated) | 4 | 16.16 | 808 | 4.95 | |
Unet (Medical) | Bow-2000 | SDK 3.1 | TensorFlow2 | Synthetic (host-generated) | 4 | 16.16 | 1881 | ||
Unet (Medical) | Bow-2000 | SDK 3.1 | TensorFlow2 | Synthetic (host-generated) | 8 | 16.16 | 2107 | ||
FastSpeech2 | Bow-2000 | SDK 3.1 | TensorFlow2 | Synthetic (host-generated) | 4 | 16.16 | 2610 | 1.53 | |
FastSpeech2 | Bow-2000 | SDK 3.1 | TensorFlow2 | Synthetic (host-generated) | 16 | 16.16 | 4354 | 0.92 | |
FastSpeech2 | Bow-2000 | SDK 3.1 | TensorFlow2 | Synthetic (host-generated) | 32 | 16.16 | 4949 | 0.81 | |
FastSpeech2 | Bow-2000 | SDK 3.1 | TensorFlow2 | Synthetic (host-generated) | 60 | 16.16 | 5201 | 0.77 |
MLPerf训练2.0版本表现
对于我们向 MLPerf 培训 2.0 版提交的内容,我们选择提交图像分类 (ResNet-50) 和自然语言处理 (BERT) 的流行应用程序基准类别,以及一个新条目作为 RNN-T 语音转录类别中的公开提交。
提交有两个分区(Division)。封闭分区(Closed Division)要求提交者使用完全相同的模型和优化器实施,包括定义超参数状态和训练时期。还有一个开放分区(Open Division),通过支持更适合不同处理器功能的不同模型实现来促进和支持创新,但确保达到与封闭分区完全相同的模型准确性和质量。


Division | Model | MLPerf Quality Target | Platform | SDK Version | Framework | MLPerf ID | Dataset | Precision | Time to Train (mins) |
---|---|---|---|---|---|---|---|---|---|
Closed | ResNet50 v1.5 | 75.90% classification | Bow Pod16 | SDK 2.5.1 | TensorFlow | 2.0-2047 | ImageNet2012 | 16.16 | 19.64 |
Closed | ResNet50 v1.5 | 75.90% classification | Bow Pod64 | SDK 2.5.1 | TensorFlow | 2.0-2050 | ImageNet2012 | 16.16 | 6.30 |
Closed | ResNet50 v1.5 | 75.90% classification | Bow Pod128 | SDK 2.5.1 | TensorFlow | 2.0-2052 | ImageNet2012 | 16.16 | 4.19 |
Closed | ResNet50 v1.5 | 75.90% classification | Bow Pod256 | SDK 2.5.1 | TensorFlow | 2.0-2054 | ImageNet2012 | 16.16 | 2.67 |
Closed | BERT | 0.72 Mask-LM accuracy | Bow Pod16 | SDK 2.5.1 | PopART | 2.0-2045 | Wikipedia | 16.16 | 20.66 |
Closed | BERT | 0.72 Mask-LM accuracy | Bow Pod16 | SDK 2.5.1 | PaddlePaddle | 2.0-2046 | Wikipedia | 16.16 | 20.75 |
Closed | BERT | 0.72 Mask-LM accuracy | Bow Pod64 | SDK 2.5.1 | PopART | 2.0-2049 | Wikipedia | 16.16 | 6.70 |
Closed | BERT | 0.72 Mask-LM accuracy | Bow Pod64 | SDK 2.5.1 | PaddlePaddle | 2.0-2048 | Wikipedia | 16.16 | 6.77 |
Closed | BERT | 0.72 Mask-LM accuracy | Bow Pod128 | SDK 2.5.1 | PopART | 2.0-2051 | Wikipedia | 16.16 | 4.42 |
Closed | BERT | 0.72 Mask-LM accuracy | Bow Pod256 | SDK 2.5.1 | PopART | 2.0-2053 | Wikipedia | 16.16 | 3.19 |
Open | RNN-T | - | Bow Pod64 | SDK 2.5.1 | PopART | 2.0-2125 | Customer dataset | 16.16 | 109.36 |
MLPerf的名称和徽标是MLCommons联盟(MLCommons Association)在美国和其他国家的商标。
版权所有,严禁未经授权使用。有关更多信息,请访问 www.mlperf.org
IPU-POD经典款-训练
训练机器学习模型涉及在输入数据集(训练数据)上运行算法,直到模型收敛,收敛意味着它已经学会以指定的准确性产生所需的输出。在此语境中,吞吐量被定义为模型每秒处理的输入数据点(序列、图像或行)的数量。吞吐量通常用作衡量硬件性能的指标,因为它与将模型训练达到指定准确性的时间直接相关。
下面提供的结果详细说明了在指定配置中每个参考模型获得的吞吐量值。在真实数据上运行的所有配置都针对收敛进行了验证。
Model | Variant | Platform | SDK Version | Framework | Dataset | Batch Size | Precision | Throughput (items/sec) |
---|---|---|---|---|---|---|---|---|
BERT Large | Ph1 Pre-Training (SL128) | IPU-POD16 | SDK 2.4.0 | PopART | Wikipedia | 65,536 | 16.16 | 3738 |
BERT Large | Ph1 Pre-Training (SL128) | IPU-POD16 | SDK 2.4.0 | TensorFlow1 | Wikipedia | 65,600 | 16.16 | 3704 |
BERT Large | Ph1 Pre-Training (SL128) | IPU-POD16 | SDK 2.4.0 | PyTorch | Wikipedia | 65,536 | 16.16 | 3582 |
BERT Large | Ph1 Pre-Training (SL128) | IPU-POD64 | SDK 2.4.0 | PopART | Wikipedia | 65,536 | 16.16 | 14189 |
BERT Large | Ph1 Pre-Training (SL128) | IPU-POD64 | SDK 2.4.0 | TensorFlow1 | Wikipedia | 66,560 | 16.16 | 13917 |
BERT Large | Ph1 Pre-Training (SL128) | IPU-POD64 | SDK 2.4.0 | PyTorch | Wikipedia | 65,536 | 16.16 | 12251 |
BERT Large | Ph1 Pre-Training (SL128) | IPU-POD128 | SDK 2.4.0 | PopART | Wikipedia | 65,536 | 16.16 | 24424 |
BERT Large | Ph1 Pre-Training (SL128) | IPU-POD128 | SDK 2.4.0 | TensorFlow1 | Wikipedia | 66,560 | 16.16 | 24900 |
BERT Large | Ph1 Pre-Training (SL128) | IPU-POD128 | SDK 2.4.0 | PyTorch | Wikipedia | 65,536 | 16.16 | 22402 |
BERT Large | Ph2 Pre-Training (SL384) | IPU-POD16 | SDK 2.4.0 | PopART | Wikipedia | 16,384 | 16.16 | 1063 |
BERT Large | Ph2 Pre-Training (SL384) | IPU-POD16 | SDK 2.4.0 | TensorFlow1 | Wikipedia | 16,400 | 16.16 | 1025 |
BERT Large | Ph2 Pre-Training (SL384) | IPU-POD16 | SDK 2.4.0 | PyTorch | Wikipedia | 16,384 | 16.16 | 1012 |
BERT Large | Ph2 Pre-Training (SL384) | IPU-POD64 | SDK 2.4.0 | PopART | Wikipedia | 16,384 | 16.16 | 4003 |
BERT Large | Ph2 Pre-Training (SL384) | IPU-POD64 | SDK 2.4.0 | TensorFlow1 | Wikipedia | 16,640 | 16.16 | 3938 |
BERT Large | Ph2 Pre-Training (SL384) | IPU-POD64 | SDK 2.4.0 | PyTorch | Wikipedia | 16,384 | 16.16 | 3611 |
BERT Large | Ph2 Pre-Training (SL384) | IPU-POD128 | SDK 2.4.0 | PopART | Wikipedia | 16,384 | 16.16 | 7127 |
BERT Large | Ph2 Pre-Training (SL384) | IPU-POD128 | SDK 2.4.0 | TensorFlow1 | Wikipedia | 16,640 | 16.16 | 7292 |
BERT Large | Ph2 Pre-Training (SL384) | IPU-POD128 | SDK 2.4.0 | PyTorch | Wikipedia | 16,384 | 16.16 | 6500 |
BERT Large | Fine-Tuning (SL384 - SQuAD) | IPU-POD16 | SDK 2.4.0 | PopART | SQuAD | 256 | 16.16 | 884 |
BERT Large | Fine-Tuning (SL384 - SQuAD) | IPU-POD16 | SDK 2.4.0 | PyTorch | SQuAD | 256 | 16.16 | 744 |
BERT Base | Ph1 Pre-Training (SL128) | IPU-POD16 | SDK 2.4.0 | PopART | Wikipedia | 65,536 | 16.16 | 11991 |
BERT Base | Ph1 Pre-Training (SL128) | IPU-POD16 | SDK 2.4.0 | TensorFlow1 | Wikipedia | 65,280 | 16.16 | 11647 |
BERT Base | Ph1 Pre-Training (SL128) | IPU-POD16 | SDK 2.4.0 | TensorFlow2 | Wikipedia | 65,280 | 16.16 | 11035 |
BERT Base | Ph1 Pre-Training (SL128) | IPU-POD16 | SDK 2.4.0 | PyTorch | Wikipedia | 65,536 | 16.16 | 11184 |
BERT Base | Ph2 Pre-Training (SL384) | IPU-POD16 | SDK 2.4.0 | PopART | Wikipedia | 16,384 | 16.16 | 3545 |
BERT Base | Ph2 Pre-Training (SL384) | IPU-POD16 | SDK 2.4.0 | TensorFlow1 | Wikipedia | 16,320 | 16.16 | 3288 |
BERT Base | Ph2 Pre-Training (SL384) | IPU-POD16 | SDK 2.4.0 | TensorFlow2 | Wikipedia | 16,320 | 16.16 | 3155 |
BERT Base | Ph2 Pre-Training (SL384) | IPU-POD16 | SDK 2.4.0 | PyTorch | Wikipedia | 16,384 | 16.16 | 3334 |
BERT Base - HuggingFace | Fine-Tuning (SL384 - SQuAD) | IPU-POD16 | SDK 2.4.0 | TensorFlow2 | SQuAD | 320 | 16.16 | 375 |
GPT2 | GPT2-medium | IPU-POD16 | SDK 2.3.0 | PyTorch | Wikipedia | 65,536 | 16.16 | 2540 |
GPT2 | GPT2-medium | IPU-POD64 | SDK 2.3.0 | PyTorch | Wikipedia | 65,536 | 16.16 | 9870 |
GPT2 | GPT2-medium | IPU-POD128 | SDK 2.3.0 | PyTorch | Wikipedia | 65,536 | 16.16 | 18842 |
GPT2 | GPT2-medium | IPU-POD256 | SDK 2.3.0 | PyTorch | Wikipedia | 65,536 | 16.16 | 31025 |
ResNet-50 v1.5 | IPU-M2000 | SDK 2.4.0 | TensorFlow1 | ImageNet2012 | 1,920 | 16.16 | 7864 | |
ResNet-50 v1.5 | IPU-M2000 | SDK 2.4.0 | PyTorch | ImageNet2012 | 16,384 | 16.16 | 7303 | |
ResNet-50 v1.5 | IPU-POD16 | SDK 2.4.0 | TensorFlow1 | ImageNet2012 | 1,920 | 16.16 | 30690 | |
ResNet-50 v1.5 | IPU-POD16 | SDK 2.4.0 | PyTorch | ImageNet2012 | 16,384 | 16.16 | 25534 | |
ResNet-50 v1.5 | IPU-POD64 | SDK 2.4.0 | TensorFlow1 | ImageNet2012 | 2,560 | 16.16 | 108566 | |
ResNet-50 v1.5 | IPU-POD128 | SDK 2.4.0 | TensorFlow1 | ImageNet2012 | 5,120 | 16.16 | 205006 | |
ResNet-50 v1.5 | IPU-POD256 | SDK 2.4.0 | TensorFlow1 | ImageNet2012 | 10,240 | 16.16 | 365040 | |
ResNeXt101 | IPU-M2000 | SDK 2.4.0 | TensorFlow1 | ImageNet2012 | 768 | 16.16 | 2514 | |
ResNeXt101 | IPU-POD16 | SDK 2.4.0 | TensorFlow1 | ImageNet2012 | 768 | 16.16 | 9023 | |
EfficientNet-B4 | G16-EfficientNet | IPU-M2000 | SDK 2.4.0 | TensorFlow1 | ImageNet2012 | 800 | 16.16 | 1618 |
EfficientNet-B4 | G16-EfficientNet | IPU-M2000 | SDK 2.4.0 | PyTorch | ImageNet2012 | 1,024 | 16.32 | 1400 |
EfficientNet-B4 | G16-EfficientNet | IPU-POD16 | SDK 2.4.0 | TensorFlow1 | ImageNet2012 | 6,144 | 16.16 | 6379 |
EfficientNet-B4 | G16-EfficientNet | IPU-POD16 | SDK 2.4.0 | PyTorch | ImageNet2012 | 1,024 | 16.32 | 4311 |
EfficientNet-B4 | G16-EfficientNet | IPU-POD64 | SDK 2.4.0 | TensorFlow1 | ImageNet2012 | 6,144 | 16.16 | 24946 |
EfficientNet-B4 | G16-EfficientNet | IPU-POD128 | SDK 2.4.0 | TensorFlow1 | ImageNet2012 | 6,144 | 16.16 | 48015 |
EfficientNet-B4 | G16-EfficientNet | IPU-POD256 | SDK 2.4.0 | TensorFlow1 | ImageNet2012 | 6,144 | 16.16 | 87968 |
ViT | Vision Transformer | IPU-POD16 | SDK 2.3.0 | PyTorch | ImageNet1k | 65,536 | 16.16 | 6535 |
ViT | Vision Transformer | IPU-POD64 | SDK 2.3.0 | PyTorch | ImageNet1k | 65,536 | 16.16 | 25080 |
ViT | Vision Transformer | IPU-POD128 | SDK 2.3.0 | PyTorch | ImageNet1k | 65,536 | 16.16 | 46320 |
ViT | Vision Transformer | IPU-POD256 | SDK 2.3.0 | PyTorch | ImageNet1k | 65,536 | 16.16 | 68800 |
UNet (Medical) | IPU-M2000 | SDK 2.4.0 | TensorFlow2 | EM segmentation | 24 | 16.16 | 139 | |
Mini DALL-E | IPU-M2000 | SDK 2.4.0 | PyTorch | COCO 2017 | 1,536 | 16.16 | 319 | |
Mini DALL-E | IPU-POD16 | SDK 2.4.0 | PyTorch | COCO 2017 | 6,144 | 16.16 | 815 | |
DeepVoice3 | IPU-M2000 | SDK 2.4.0 | PopART | VCTK Corpus | 128 | 32.32 | 8496 | |
FastSpeech2 | IPU-M2000 | SDK 2.4.0 | TensorFlow2 | LJ Speech | 32 | 16.16 | 406 | |
FastSpeech2 | IPU-POD16 | SDK 2.4.0 | TensorFlow2 | LJ Speech | 64 | 16.16 | 1141 | |
Conformer | IPU-M2000 | SDK 2.4.0 | PyTorch | AiShell1 | 96 | 16.16 | 1030 | |
Conformer | IPU-POD16 | SDK 2.4.0 | PyTorch | AiShell1 | 96 | 16.16 | 3395 | |
TGN | Temporal Graph Network | GC200 IPU | SDK 2.4.0 | TensorFlow1 | JODIE Wikipedia | 200 | 16.32 | 190472 |
IPU-POD经典款(Time to Result)
Model | Variant | Platform | SDK Version | Framework | Dataset | Batch Size | Precision | Time To Result (secs) |
---|---|---|---|---|---|---|---|---|
MCMC TFP | IPU-M2000 | SDK 2.4.0 | TensorFlow1 | Proprietary | 32.32 | 49 |
IPU-POD经典款- 推理
此语境中的模型推理是指在输入数据上运行模型以推断输出。生产设置中的推理性能通常通过两个指标来衡量:吞吐量(如前所述)和时延,后者被定义为执行推理所需的时间。
Model | Variant | Platform | SDK Version | Framework | Dataset | Batch Size | Precision | Throughput (items/sec) | Latency (ms) |
---|---|---|---|---|---|---|---|---|---|
BERT-Large | SL128 | IPU-M2000 | SDK 2.4.0 | PopART | SQuAD | 4 | 16.16 | 2071 | 1.92 |
BERT-Large | SL128 | IPU-M2000 | SDK 2.4.0 | PopART | SQuAD | 8 | 16.16 | 2911 | 2.73 |
BERT-Large | SL128 | IPU-M2000 | SDK 2.4.0 | PopART | SQuAD | 12 | 16.16 | 3303 | 3.62 |
BERT-Base | SL128 | IPU-M2000 | SDK 2.4.0 | PopART | SQuAD | 4 | 16.16 | 4580 | 0.86 |
BERT-Base | SL128 | IPU-M2000 | SDK 2.4.0 | PopART | SQuAD | 8 | 16.16 | 7069 | 1.11 |
BERT-Base | SL128 | IPU-M2000 | SDK 2.4.0 | PopART | SQuAD | 16 | 16.16 | 9687 | 1.65 |
BERT-Base | SL128 | IPU-M2000 | SDK 2.4.0 | PopART | SQuAD | 32 | 16.16 | 12584 | 2.53 |
BERT-Base | SL128 | IPU-M2000 | SDK 2.4.0 | PopART | SQuAD | 64 | 16.16 | 15346 | 4.16 |
BERT-Base | SL128 | IPU-M2000 | SDK 2.4.0 | PopART | SQuAD | 128 | 16.16 | 17972 | 7.11 |
BERT-Base | SL128 | IPU-M2000 | SDK 2.4.0 | PopART | SQuAD | 256 | 16.16 | 19484 | 13.11 |
BERT-Base | SL128 | IPU-M2000 | SDK 2.4.0 | PopART | SQuAD | 320 | 16.16 | 20803 | 15.36 |
ResNet-50v1.5 | IPU-M2000 | SDK 2.4.0 | TensorFlow1 | Synthetic (host-generated) | 4 | 16.16 | 7152 | 1.66 | |
ResNet-50v1.5 | IPU-M2000 | SDK 2.4.0 | TensorFlow1 | Synthetic (host-generated) | 8 | 16.16 | 10515 | 2.27 | |
ResNet-50v1.5 | IPU-M2000 | SDK 2.4.0 | TensorFlow1 | Synthetic (host-generated) | 16 | 16.16 | 16207 | 2.95 | |
ResNet-50v1.5 | IPU-M2000 | SDK 2.4.0 | TensorFlow1 | Synthetic (host-generated) | 32 | 16.16 | 22544 | 4.24 | |
ResNet-50v1.5 | IPU-M2000 | SDK 2.4.0 | TensorFlow1 | Synthetic (host-generated) | 64 | 16.16 | 28762 | 6.66 | |
ResNet-50v1.5 | IPU-M2000 | SDK 2.4.0 | TensorFlow1 | Synthetic (host-generated) | 128 | 16.16 | 35155 | 10.91 | |
ResNet-50v1.5 | IPU-M2000 | SDK 2.4.0 | TensorFlow1 | Synthetic (host-generated) | 256 | 16.16 | 40085 | 19.14 | |
ResNet-50v1.5 | lowest latency config | IPU-M2000 | SDK 2.4.0 | PyTorch | Synthetic (host-generated) | 4 | 16.16 | 7397 | 0.52 |
ResNet-50v1.5 | higher throughput config | IPU-M2000 | SDK 2.4.0 | PyTorch | Synthetic (host-generated) | 4 | 16.16 | 9404 | 2.04 |
ResNet-50v1.5 | IPU-M2000 | SDK 2.4.0 | PyTorch | Synthetic (host-generated) | 16 | 16.16 | 14321 | 2.69 | |
ResNet-50v1.5 | IPU-M2000 | SDK 2.4.0 | PyTorch | Synthetic (host-generated) | 32 | 16.16 | 20927 | 3.7 | |
ResNet-50v1.5 | IPU-M2000 | SDK 2.4.0 | PyTorch | Synthetic (host-generated) | 64 | 16.16 | 36193 | 8.62 | |
ResNet-50v1.5 | IPU-M2000 | SDK 2.4.0 | PyTorch | Synthetic (host-generated) | 128 | 16.16 | 43472 | 14.38 | |
ResNet-50v1.5 | IPU-M2000 | SDK 2.4.0 | PyTorch | Synthetic (host-generated) | 256 | 16.16 | 49816 | 25.13 | |
ResNet-50v1.5 | IPU-M2000 | SDK 2.4.0 | PyTorch | Synthetic (host-generated) | 360 | 16.16 | 50883 | 30.68 | |
ResNeXt101 | IPU-M2000 | SDK 2.4.0 | TensorFlow1 | Synthetic (host-generated) | 4 | 16.16 | 4483 | 2.66 | |
ResNeXt101 | IPU-M2000 | SDK 2.4.0 | TensorFlow1 | Synthetic (host-generated) | 8 | 16.16 | 6435 | 3.71 | |
ResNeXt101 | IPU-M2000 | SDK 2.4.0 | TensorFlow1 | Synthetic (host-generated) | 16 | 16.16 | 9705 | 4.93 | |
ResNeXt101 | IPU-M2000 | SDK 2.4.0 | TensorFlow1 | Synthetic (host-generated) | 32 | 16.16 | 13693 | 6.99 | |
ResNeXt101 | IPU-M2000 | SDK 2.4.0 | TensorFlow1 | Synthetic (host-generated) | 64 | 16.16 | 17176 | 11.16 | |
ResNeXt101 | IPU-M2000 | SDK 2.4.0 | PyTorch | Synthetic (host-generated) | 4 | 16.16 | 3395 | 1.14 | |
ResNeXt101 | IPU-M2000 | SDK 2.4.0 | PyTorch | Synthetic (host-generated) | 8 | 16.16 | 4840 | 1.62 | |
ResNeXt101 | IPU-M2000 | SDK 2.4.0 | PyTorch | Synthetic (host-generated) | 16 | 16.16 | 6483 | 2.43 | |
ResNeXt101 | IPU-M2000 | SDK 2.4.0 | PyTorch | Synthetic (host-generated) | 64 | 16.16 | 11320 | 27.83 | |
EfficientNet-B0 | lowest latency config | IPU-M2000 | SDK 2.4.0 | PyTorch | Synthetic (host-generated) | 4 | 16.16 | 8686 | 0.44 |
EfficientNet-B0 | higher throughput config | IPU-M2000 | SDK 2.4.0 | PyTorch | Synthetic (host-generated) | 4 | 16.16 | 10907 | 1.69 |
EfficientNet-B0 | IPU-M2000 | SDK 2.4.0 | PyTorch | Synthetic (host-generated) | 32 | 16.16 | 50510 | 3.05 | |
EfficientNet-B0 | IPU-M2000 | SDK 2.4.0 | PyTorch | Synthetic (host-generated) | 64 | 16.16 | 71839 | 4.26 | |
EfficientNet-B0 | IPU-M2000 | SDK 2.4.0 | PyTorch | Synthetic (host-generated) | 128 | 16.16 | 86986 | 6.77 | |
EfficientNet-B0 | IPU-M2000 | SDK 2.4.0 | PyTorch | Synthetic (host-generated) | 144 | 16.16 | 69852 | 9.15 | |
EfficientNet-B0 | IPU-M2000 | SDK 2.4.0 | PyTorch | Synthetic (host-generated) | 196 | 16.16 | 61714 | 13.38 | |
EfficientNet-B0 | IPU-M2000 | SDK 2.4.0 | TensorFlow1 | Synthetic (host-generated) | 4 | 16.16 | 8289 | 1.43 | |
EfficientNet-B0 | IPU-M2000 | SDK 2.4.0 | TensorFlow1 | Synthetic (host-generated) | 8 | 16.16 | 13056 | 1.82 | |
EfficientNet-B0 | IPU-M2000 | SDK 2.4.0 | TensorFlow1 | Synthetic (host-generated) | 16 | 16.16 | 22217 | 2.15 | |
EfficientNet-B0 | IPU-M2000 | SDK 2.4.0 | TensorFlow1 | Synthetic (host-generated) | 32 | 16.16 | 34448 | 2.77 | |
EfficientNet-B0 | IPU-M2000 | SDK 2.4.0 | TensorFlow1 | Synthetic (host-generated) | 64 | 16.16 | 43351 | 4.41 | |
EfficientNet-B0 | IPU-M2000 | SDK 2.4.0 | TensorFlow1 | Synthetic (host-generated) | 128 | 16.16 | 53256 | 7.19 | |
EfficientNet-B0 | IPU-M2000 | SDK 2.4.0 | TensorFlow1 | Synthetic (host-generated) | 160 | 16.16 | 55169 | 8.68 | |
EfficientNet-B4 | lowest latency config | IPU-M2000 | SDK 2.4.0 | PyTorch | Synthetic (host-generated) | 4 | 16.16 | 3539 | 1.09 |
EfficientNet-B4 | higher throughput config | IPU-M2000 | SDK 2.4.0 | PyTorch | Synthetic (host-generated) | 4 | 16.16 | 4081 | 1.85 |
EfficientNet-B4 | IPU-M2000 | SDK 2.4.0 | PyTorch | Synthetic (host-generated) | 16 | 16.16 | 8299 | 3.5 | |
EfficientNet-B4 | IPU-M2000 | SDK 2.4.0 | PyTorch | Synthetic (host-generated) | 24 | 16.16 | 9874 | 4.37 | |
EfficientNet-B4 | IPU-M2000 | SDK 2.4.0 | PyTorch | Synthetic (host-generated) | 32 | 16.16 | 10753 | 5.3 | |
EfficientNet-B4 | IPU-M2000 | SDK 2.4.0 | PyTorch | Synthetic (host-generated) | 40 | 16.16 | 11578 | 6.22 | |
EfficientNet-B4 | IPU-M2000 | SDK 2.4.0 | TensorFlow1 | Synthetic (host-generated) | 4 | 16.16 | 3718 | 3.21 | |
EfficientNet-B4 | IPU-M2000 | SDK 2.4.0 | TensorFlow1 | Synthetic (host-generated) | 8 | 16.16 | 5514 | 4.34 | |
EfficientNet-B4 | IPU-M2000 | SDK 2.4.0 | TensorFlow1 | Synthetic (host-generated) | 16 | 16.16 | 7959 | 6.01 | |
EfficientNet-B4 | IPU-M2000 | SDK 2.4.0 | TensorFlow1 | Synthetic (host-generated) | 20 | 16.16 | 8958 | 6.68 | |
EfficientNet-B7 | IPU-M2000 | SDK 2.4.0 | TensorFlow1 | Synthetic (host-generated) | 4 | 16.16 | 1407 | 8.52 | |
EfficientNet-B7 | IPU-M2000 | SDK 2.4.0 | TensorFlow1 | Synthetic (host-generated) | 8 | 16.16 | 1869 | 12.82 | |
Yolo v4 | image 896, bps 5, max det 200 | IPU-M2000 | SDK 2.4.0 | PyTorch | Synthetic (host-generated) | 4 | 16.16 | 690 | 9.4 |
Yolo v4 | image 896, bps 10, max det 300 | IPU-M2000 | SDK 2.4.0 | PyTorch | Synthetic (host-generated) | 4 | 16.16 | 722 | 9.74 |
Yolo v4 | image 640, bps 5, max det 200 | IPU-M2000 | SDK 2.4.0 | PyTorch | Synthetic (host-generated) | 8 | 16.16 | 1306 | 10.03 |
Yolo v4 | image 640, bps 10, max det 300 | IPU-M2000 | SDK 2.4.0 | PyTorch | Synthetic (host-generated) | 8 | 16.16 | 1364 | 10.39 |
Yolo v4 | image 512, bps 5, max det 200 | IPU-M2000 | SDK 2.4.0 | PyTorch | Synthetic (host-generated) | 8 | 16.16 | 1772 | 7.25 |
Yolo v4 | image 512, bps 10, max det 300 | IPU-M2000 | SDK 2.4.0 | PyTorch | Synthetic (host-generated) | 8 | 16.16 | 1915 | 7.31 |
Yolo v4 | image 416, bps 5, max det 200 | IPU-M2000 | SDK 2.4.0 | PyTorch | Synthetic (host-generated) | 4 | 16.16 | 2195 | 5.88 |
Yolo v4 | image 416, bps 10, max det 100 | IPU-M2000 | SDK 2.4.0 | PyTorch | Synthetic (host-generated) | 4 | 16.16 | 2994 | 9.42 |
Unet (Medical) | IPU-M2000 | SDK 2.4.0 | TensorFlow2 | Synthetic (host-generated) | 4 | 16.16 | 1144 | ||
Unet (Medical) | IPU-M2000 | SDK 2.4.0 | TensorFlow2 | Synthetic (host-generated) | 8 | 16.16 | 1190 |