性能结果

这里提供了Bow Pod 平台的初步性能结果,以及MLPerf Training v2.0 提交结果,以及我们自己针对训练和推理的更广泛模型的基准测试结果。

Bow平台-训练

这里提供了 Bow Pod 平台的初步训练性能结果。 此处吞吐量定义为模型每秒处理的输入数据点(序列、图像或行)的数量。

以下结果详细说明了在指定配置中,每个引用模型的吞吐量。

ModelVariantPlatformSDK VersionFrameworkDatasetBatch SizePrecisionThroughput (items/sec)
ResNet-50 v1.5Bow Pod16SDK 3.0TensorFlow1ImageNet20123,52016.1644059
ResNet-50 v1.5Bow Pod16SDK 3.0PyTorchImageNet201216,38416.1637061
ResNet-50 v1.5Bow Pod64SDK 3.0TensorFlow1ImageNet20125,12016.16153205
ResNet-50 v1.5Bow Pod256SDK 3.0TensorFlow1ImageNet201210,24016.16456906
EfficientNet-B4G16-EfficientNetBow Pod16SDK 3.0TensorFlow1ImageNet20126,14416.169000
EfficientNet-B4G16-EfficientNetBow Pod16SDK 3.0PyTorchImageNet20121,02416.328263
EfficientNet-B4G16-EfficientNetBow Pod64SDK 3.0TensorFlow1ImageNet20126,14416.1634140
EfficientNet-B4G16-EfficientNetBow Pod256SDK 3.0TensorFlow1ImageNet20126,14416.16122665
ResNeXt101Bow Pod16SDK 3.0TensorFlow1ImageNet201276816.1612277
ViTPre-TrainingBow Pod16SDK 3.0PyTorchImageNet1k65,53616.167615
ViTPre-TrainingBow Pod64SDK 3.0PyTorchImageNet1k65,53616.1627868
ViTFine-TuningBow Pod16SDK 3.0PyTorchImageNet1k2,04016.167319
DINOVision TransformerBow Pod16SDK 3.0PyTorchImageNet1k3,20016.16681
DINOVision TransformerBow Pod64SDK 3.0PyTorchImageNet1k3,20016.163217
Swin-Base (224)Vision Transformer - Pre-TrainingBow Pod16SDK 3.0PyTorchImageNet1k51232.321447
Swin-Tiny (224)Vision Transformer - Pre-TrainingBow Pod16SDK 3.0PyTorchImageNet1k1,02432.323779
Swin-Large (224)Vision Transformer - Fine-TuningBow Pod16SDK 3.0PyTorchImageNet1k8,19616.163279
UNet (Medical)Bow-2000SDK 3.0TensorFlow2EM segmentation2416.16151
Mini DALL-EBow Pod16SDK 3.0PyTorchCOCO 20176,14416.161916
Mini DALL-EBow Pod64SDK 3.0PyTorchCOCO 201724,57616.167265
MAEMasked Autoencoder for visual representation learningBow Pod16SDK 3.0PyTorchImageNet4,12816.167211
MAEMasked Autoencoder for visual representation learningBow Pod64SDK 3.0PyTorchImageNet4,12816.1617477
Frozen In TimeMultimodal - Pre-Training (1 frame)Bow Pod16SDK 3.0PyTorchwebvid48016.16639
CLIPMultimodel (language/vision)Bow Pod8SDK 3.0PyTorchc3m79516.163283
BERT LargePh1 Pre-Training (SL128) - PackedBow Pod16SDK 3.0PopARTWikipedia54,78416.166100
BERT LargePh1 Pre-Training (SL128)Bow Pod16SDK 3.0TensorFlow2Wikipedia65,28016.164523
BERT LargePh1 Pre-Training (SL128) - PackedBow Pod16SDK 3.0PyTorchWikipedia56,06416.165614
BERT LargePh1 Pre-Training (SL128) - PackedBow Pod64SDK 3.0PopARTWikipedia54,78416.1622688
BERT LargePh1 Pre-Training (SL128)Bow Pod64SDK 3.0TensorFlow2Wikipedia66,56016.1618199
BERT LargePh1 Pre-Training (SL128) - PackedBow Pod64SDK 3.0PyTorchWikipedia56,06416.1618843
BERT LargePh1 Pre-Training (SL128) - PackedBow Pod256SDK 3.0PopARTWikipedia54,78416.1653115
BERT LargePh2 Pre-Training (SL512) - PackedBow Pod16SDK 3.0PopARTWikipedia9,60016.162124
BERT LargePh2 Pre-Training (SL512) - PackedBow Pod16SDK 3.0PyTorchWikipedia8,19216.161969
BERT LargePh2 Pre-Training (SL512) - PackedBow Pod64SDK 3.0PopARTWikipedia9,60016.167701
BERT LargePh2 Pre-Training (SL512) - PackedBow Pod64SDK 3.0PyTorchWikipedia8,19216.166796
BERT LargePh2 Pre-Training (SL512) - PackedBow Pod256SDK 3.0PopARTWikipedia9,60016.1620320
BERT LargeFine-Tuning (SL384 - SQuAD)Bow Pod16SDK 3.0PopARTSQuAD25616.161167
BERT LargeFine-Tuning (SL384 - SQuAD)Bow Pod16SDK 3.0PyTorchSQuAD25616.16976
BERT BasePh1 Pre-Training (SL128)Bow Pod16SDK 3.0PopARTWikipedia65,53616.1616620
BERT BasePh1 Pre-Training (SL128)Bow Pod16SDK 3.0TensorFlow1Wikipedia65,28016.1616203
BERT BasePh1 Pre-Training (SL128)Bow Pod16SDK 3.0TensorFlow2Wikipedia65,28016.1615160
BERT BasePh1 Pre-Training (SL128)Bow Pod16SDK 3.0PyTorchWikipedia65,53616.1615824
BERT BasePh2 Pre-Training (SL512)Bow Pod16SDK 3.0PopARTWikipedia16,38416.163667
BERT BasePh2 Pre-Training (SL384)Bow Pod16SDK 3.0TensorFlow1Wikipedia16,32016.164617
BERT BasePh2 Pre-Training (SL384)Bow Pod16SDK 3.0TensorFlow2Wikipedia16,32016.164384
BERT BasePh2 Pre-Training (SL512)Bow Pod16SDK 3.0PyTorchWikipedia16,38416.163520
Group BERT BasePh1 Pre-Training (SL128)Bow Pod16SDK 3.0TensorFlow1Wikipedia65,52016.167187
Group BERT BasePh2 Pre-Training (SL384)Bow Pod16SDK 3.0TensorFlow1Wikipedia32,80016.162288
Group BERT BasePh1 Pre-Training (SL128)Bow Pod64SDK 3.0TensorFlow1Wikipedia64,80016.1626425
Group BERT BasePh2 Pre-Training (SL384)Bow Pod64SDK 3.0TensorFlow1Wikipedia32,64016.167572
BERT Base - HuggingFaceFine-Tuning (SL384 - SQuAD)Bow Pod16SDK 3.0TensorFlow2SQuAD32016.161014
GPT2GPT2-Large (SL512)Bow Pod16SDK 3.0PyTorchWikipedia8,19216.16410
GPT2GPT2-Large (SL512)Bow Pod64SDK 3.0PyTorchWikipedia8,19216.161580
GPT2GPT2-Large (SL1024)Bow Pod16SDK 3.0PyTorchWikipedia8,19216.16178
GPT2GPT2-Medium (SL1024)Bow Pod16SDK 3.0PyTorchWikipedia8,19216.16337
GPT2GPT2-Medium (SL1024)Bow Pod64SDK 3.0PyTorchWikipedia8,19216.161321
GPT2GPT2-Small (SL1024)Bow Pod16SDK 3.0PyTorchWikipedia8,19216.161064
GPT2GPT2-Small (SL1024)Bow Pod64SDK 3.0PyTorchWikipedia8,19216.164093
Conformer-MediumWeNet-Conformer-MediumBow Pod16SDK 3.0PyTorchAiShell128816.161141
RNN-TTransformer TransducerBow Pod16SDK 3.0PopARTGenerated3232.32703
DeepVoice3Bow-2000SDK 3.0PopARTVCTK Corpus12832.329411
FastSpeech2Bow Pod16SDK 3.0TensorFlow2LJ Speech6416.161635
FastPitchframes/sBow Pod16SDK 3.0PyTorchGenerated12832.321556824
TGNTemporal Graph Network1x Bow IPUSDK 3.0TensorFlow1JODIE Wikipedia20016.3228037
Cluster-GCNBow-2000SDK 3.0TensorFlow2PPI16.16622394
Cluster-GCNBow-2000SDK 3.0TensorFlow2ArXiv16.163725109
Cluster-GCNBow-2000SDK 3.0TensorFlow2Reddit16.161960430
Cluster-GCNBow-2000SDK 3.0TensorFlow2Products16.163253605
Cluster-GCNBow-2000SDK 3.0TensorFlow2ogbn-mag16.162622256
MPNN-GINMP Graph Isomorphism n/wBow-2000SDK 3.0TensorFlow2Generated1,02416.16460179

Bow平台-推理

此处模型推理是指在输入数据上运行经过训练的模型来推断输出。 实际商业应用中的推理性能通常根据两个指标来衡量:吞吐量(如前所述)和时延,在此上下文中是指为模型在给定输入的情况下提供输出所花费的时间。

以下为 Bow-2000 平台上指定批大小下的吞吐量和延迟结果。

ModelVariantPlatformSDK VersionFrameworkDatasetBatch SizePrecisionThroughput (items/sec)Latency (ms)
BERT-LargeSL128Bow-2000SDK 3.0PopARTSQuAD416.1628701.37
BERT-LargeSL128Bow-2000SDK 3.0PopARTSQuAD816.1640681.95
BERT-LargeSL128Bow-2000SDK 3.0PopARTSQuAD1216.1646312.57
BERT-LargeSL128Bow-2000SDK 3.0PopARTSQuAD1616.1652623.02
BERT-BaseSL128Bow-2000SDK 3.0PopARTSQuAD416.1663820.61
BERT-BaseSL128Bow-2000SDK 3.0PopARTSQuAD32016.162736211.7
GPT2GPT2-SmallBow-2000SDK 3.0PyTorchSynthetic (host-generated)416.1612196.86
GPT2GPT2-MediumBow-2000SDK 3.0PyTorchSynthetic (host-generated)216.1633811.72
GPT2GPT2-LargeBow Pod16SDK 3.0PyTorchSynthetic (host-generated)216.169620.7
ResNet-50v1.5lowest latency configBow-2000SDK 3.0PyTorchSynthetic (host-generated)416.1696990.4
ResNet-50v1.5higher throughput configBow-2000SDK 3.0PyTorchSynthetic (host-generated)416.16128191.47
ResNet-50v1.5Bow-2000SDK 3.0PyTorchSynthetic (host-generated)32016.166192724.09
EfficientNet-B0lowest latency configBow-2000SDK 3.0PyTorchSynthetic (host-generated)416.16108890.34
EfficientNet-B0higher throughput configBow-2000SDK 3.0PyTorchSynthetic (host-generated)416.16151821.21
EfficientNet-B0Bow-2000SDK 3.0PyTorchSynthetic (host-generated)19216.165515815.12
EfficientNet-B4lowest latency configBow-2000SDK 3.0PyTorchSynthetic (host-generated)416.1644920.84
EfficientNet-B4higher throughput configBow-2000SDK 3.0PyTorchSynthetic (host-generated)416.1657603.26
EfficientNet-B4Bow-2000SDK 3.0PyTorchSynthetic (host-generated)4816.161879511.76
ResNeXt101lowest latency configBow-2000SDK 3.0PyTorchSynthetic (host-generated)416.1645770.84
ResNeXt101higher throughput configBow-2000SDK 3.0PyTorchSynthetic (host-generated)416.1653633.57
ResNeXt101Bow-2000SDK 3.0PyTorchSynthetic (host-generated)6416.161629019.24
Yolo v4image 896, bps 5, max det 200Bow-2000SDK 3.0PyTorchSynthetic (host-generated)416.169286.51
Yolo v4image 896, bps 10, max det 300Bow-2000SDK 3.0PyTorchSynthetic (host-generated)416.169886.74
Yolo v4image 640, bps 5, max det 200Bow-2000SDK 3.0PyTorchSynthetic (host-generated)816.1618526.63
Yolo v4image 640, bps 10, max det 300Bow-2000SDK 3.0PyTorchSynthetic (host-generated)816.1619546.91
Yolo v4image 512, bps 5, max det 200Bow-2000SDK 3.0PyTorchSynthetic (host-generated)816.1625484.81
Yolo v4image 512, bps 10, max det 300Bow-2000SDK 3.0PyTorchSynthetic (host-generated)816.1626874.93
Yolo v4image 416, bps 5, max det 200Bow-2000SDK 3.0PyTorchSynthetic (host-generated)816.1632133.71
Yolo v4image 416, bps 10, max det 100Bow-2000SDK 3.0PyTorchSynthetic (host-generated)1616.1642746.28
EfficientDet-D0Bow-2000SDK 3.0TF2 w/KerasSynthetic (host-generated)1616.1657790.7
EfficientDet-D1Bow-2000SDK 3.0TF2 w/KerasSynthetic (host-generated)1216.1633361.2
EfficientDet-D2Bow-2000SDK 3.0TF2 w/KerasSynthetic (host-generated)816.1621691.85
EfficientDet-D3Bow-2000SDK 3.0TF2 w/KerasSynthetic (host-generated)416.1611253.56
EfficientDet-D4Bow-2000SDK 3.0TF2 w/KerasSynthetic (host-generated)416.168144.91
Unet (Medical)Bow-2000SDK 3.0TensorFlow2Synthetic (host-generated)416.161839
Unet (Medical)Bow-2000SDK 3.0TensorFlow2Synthetic (host-generated)816.162008
FastSpeech2Bow-2000SDK 3.0TensorFlow2Synthetic (host-generated)416.1626181.53
FastSpeech2Bow-2000SDK 3.0TensorFlow2Synthetic (host-generated)1616.1643540.92
FastSpeech2Bow-2000SDK 3.0TensorFlow2Synthetic (host-generated)3216.1649460.81
FastSpeech2Bow-2000SDK 3.0TensorFlow2Synthetic (host-generated)6016.1652020.77

MLPerf训练2.0版本表现

对于我们向 MLPerf 培训 2.0 版提交的内容,我们选择提交图像分类 (ResNet-50) 和自然语言处理 (BERT) 的流行应用程序基准类别,以及一个新条目作为 RNN-T 语音转录类别中的公开提交。

提交有两个分区(Division)。封闭分区(Closed Division)要求提交者使用完全相同的模型和优化器实施,包括定义超参数状态和训练时期。还有一个开放分区(Open Division),通过支持更适合不同处理器功能的不同模型实现来促进和支持创新,但确保达到与封闭分区完全相同的模型准确性和质量。

DivisionModelMLPerf Quality TargetPlatformSDK VersionFrameworkMLPerf IDDatasetPrecisionTime to Train (mins)
ClosedResNet50 v1.575.90% classificationBow Pod16SDK 2.5.1TensorFlow2.0-2047ImageNet201216.1619.64
ClosedResNet50 v1.575.90% classificationBow Pod64SDK 2.5.1TensorFlow2.0-2050ImageNet201216.166.30
ClosedResNet50 v1.575.90% classificationBow Pod128SDK 2.5.1TensorFlow2.0-2052ImageNet201216.164.19
ClosedResNet50 v1.575.90% classificationBow Pod256SDK 2.5.1TensorFlow2.0-2054ImageNet201216.162.67
ClosedBERT0.72 Mask-LM accuracyBow Pod16SDK 2.5.1PopART2.0-2045Wikipedia16.1620.66
ClosedBERT0.72 Mask-LM accuracyBow Pod16SDK 2.5.1PaddlePaddle2.0-2046Wikipedia16.1620.75
ClosedBERT0.72 Mask-LM accuracyBow Pod64SDK 2.5.1PopART2.0-2049Wikipedia16.166.70
ClosedBERT0.72 Mask-LM accuracyBow Pod64SDK 2.5.1PaddlePaddle2.0-2048Wikipedia16.166.77
ClosedBERT0.72 Mask-LM accuracyBow Pod128SDK 2.5.1PopART2.0-2051Wikipedia16.164.42
ClosedBERT0.72 Mask-LM accuracyBow Pod256SDK 2.5.1PopART2.0-2053Wikipedia16.163.19
OpenRNN-T -Bow Pod64SDK 2.5.1PopART2.0-2125Customer dataset16.16109.36

MLPerf的名称和徽标是MLCommons联盟(MLCommons Association)在美国和其他国家的商标。
版权所有,严禁未经授权使用。有关更多信息,请访问 www.mlperf.org

IPU-POD经典款-训练

训练机器学习模型涉及在输入数据集(训练数据)上运行算法,直到模型收敛,收敛意味着它已经学会以指定的准确性产生所需的输出。在此语境中,吞吐量被定义为模型每秒处理的输入数据点(序列、图像或行)的数量。吞吐量通常用作衡量硬件性能的指标,因为它与将模型训练达到指定准确性的时间直接相关。

下面提供的结果详细说明了在指定配置中每个参考模型获得的吞吐量值。在真实数据上运行的所有配置都针对收敛进行了验证。

ModelVariantPlatformSDK VersionFrameworkDatasetBatch SizePrecisionThroughput (items/sec)
BERT LargePh1 Pre-Training (SL128)IPU-POD16SDK 2.4.0PopARTWikipedia65,53616.163738
BERT LargePh1 Pre-Training (SL128)IPU-POD16SDK 2.4.0TensorFlow1Wikipedia65,60016.163704
BERT LargePh1 Pre-Training (SL128)IPU-POD16SDK 2.4.0PyTorchWikipedia65,53616.163582
BERT LargePh1 Pre-Training (SL128)IPU-POD64SDK 2.4.0PopARTWikipedia65,53616.1614189
BERT LargePh1 Pre-Training (SL128)IPU-POD64SDK 2.4.0TensorFlow1Wikipedia66,56016.1613917
BERT LargePh1 Pre-Training (SL128)IPU-POD64SDK 2.4.0PyTorchWikipedia65,53616.1612251
BERT LargePh1 Pre-Training (SL128)IPU-POD128SDK 2.4.0PopARTWikipedia65,53616.1624424
BERT LargePh1 Pre-Training (SL128)IPU-POD128SDK 2.4.0TensorFlow1Wikipedia66,56016.1624900
BERT LargePh1 Pre-Training (SL128)IPU-POD128SDK 2.4.0PyTorchWikipedia65,53616.1622402
BERT LargePh2 Pre-Training (SL384)IPU-POD16SDK 2.4.0PopARTWikipedia16,38416.161063
BERT LargePh2 Pre-Training (SL384)IPU-POD16SDK 2.4.0TensorFlow1Wikipedia16,40016.161025
BERT LargePh2 Pre-Training (SL384)IPU-POD16SDK 2.4.0PyTorchWikipedia16,38416.161012
BERT LargePh2 Pre-Training (SL384)IPU-POD64SDK 2.4.0PopARTWikipedia16,38416.164003
BERT LargePh2 Pre-Training (SL384)IPU-POD64SDK 2.4.0TensorFlow1Wikipedia16,64016.163938
BERT LargePh2 Pre-Training (SL384)IPU-POD64SDK 2.4.0PyTorchWikipedia16,38416.163611
BERT LargePh2 Pre-Training (SL384)IPU-POD128SDK 2.4.0PopARTWikipedia16,38416.167127
BERT LargePh2 Pre-Training (SL384)IPU-POD128SDK 2.4.0TensorFlow1Wikipedia16,64016.167292
BERT LargePh2 Pre-Training (SL384)IPU-POD128SDK 2.4.0PyTorchWikipedia16,38416.166500
BERT LargeFine-Tuning (SL384 - SQuAD)IPU-POD16SDK 2.4.0PopARTSQuAD25616.16884
BERT LargeFine-Tuning (SL384 - SQuAD)IPU-POD16SDK 2.4.0PyTorchSQuAD25616.16744
BERT BasePh1 Pre-Training (SL128)IPU-POD16SDK 2.4.0PopARTWikipedia65,53616.1611991
BERT BasePh1 Pre-Training (SL128)IPU-POD16SDK 2.4.0TensorFlow1Wikipedia65,28016.1611647
BERT BasePh1 Pre-Training (SL128)IPU-POD16SDK 2.4.0TensorFlow2Wikipedia65,28016.1611035
BERT BasePh1 Pre-Training (SL128)IPU-POD16SDK 2.4.0PyTorchWikipedia65,53616.1611184
BERT BasePh2 Pre-Training (SL384)IPU-POD16SDK 2.4.0PopARTWikipedia16,38416.163545
BERT BasePh2 Pre-Training (SL384)IPU-POD16SDK 2.4.0TensorFlow1Wikipedia16,32016.163288
BERT BasePh2 Pre-Training (SL384)IPU-POD16SDK 2.4.0TensorFlow2Wikipedia16,32016.163155
BERT BasePh2 Pre-Training (SL384)IPU-POD16SDK 2.4.0PyTorchWikipedia16,38416.163334
BERT Base - HuggingFaceFine-Tuning (SL384 - SQuAD)IPU-POD16SDK 2.4.0TensorFlow2SQuAD32016.16375
GPT2GPT2-mediumIPU-POD16SDK 2.3.0PyTorchWikipedia65,53616.162540
GPT2GPT2-mediumIPU-POD64SDK 2.3.0PyTorchWikipedia65,53616.169870
GPT2GPT2-mediumIPU-POD128SDK 2.3.0PyTorchWikipedia65,53616.1618842
GPT2GPT2-mediumIPU-POD256SDK 2.3.0PyTorchWikipedia65,53616.1631025
ResNet-50 v1.5IPU-M2000SDK 2.4.0TensorFlow1ImageNet20121,92016.167864
ResNet-50 v1.5IPU-M2000SDK 2.4.0PyTorchImageNet201216,38416.167303
ResNet-50 v1.5IPU-POD16SDK 2.4.0TensorFlow1ImageNet20121,92016.1630690
ResNet-50 v1.5IPU-POD16SDK 2.4.0PyTorchImageNet201216,38416.1625534
ResNet-50 v1.5IPU-POD64SDK 2.4.0TensorFlow1ImageNet20122,56016.16108566
ResNet-50 v1.5IPU-POD128SDK 2.4.0TensorFlow1ImageNet20125,12016.16205006
ResNet-50 v1.5IPU-POD256SDK 2.4.0TensorFlow1ImageNet201210,24016.16365040
ResNeXt101IPU-M2000SDK 2.4.0TensorFlow1ImageNet201276816.162514
ResNeXt101IPU-POD16SDK 2.4.0TensorFlow1ImageNet201276816.169023
EfficientNet-B4G16-EfficientNetIPU-M2000SDK 2.4.0TensorFlow1ImageNet201280016.161618
EfficientNet-B4G16-EfficientNetIPU-M2000SDK 2.4.0PyTorchImageNet20121,02416.321400
EfficientNet-B4G16-EfficientNetIPU-POD16SDK 2.4.0TensorFlow1ImageNet20126,14416.166379
EfficientNet-B4G16-EfficientNetIPU-POD16SDK 2.4.0PyTorchImageNet20121,02416.324311
EfficientNet-B4G16-EfficientNetIPU-POD64SDK 2.4.0TensorFlow1ImageNet20126,14416.1624946
EfficientNet-B4G16-EfficientNetIPU-POD128SDK 2.4.0TensorFlow1ImageNet20126,14416.1648015
EfficientNet-B4G16-EfficientNetIPU-POD256SDK 2.4.0TensorFlow1ImageNet20126,14416.1687968
ViTVision TransformerIPU-POD16SDK 2.3.0PyTorchImageNet1k65,53616.166535
ViTVision TransformerIPU-POD64SDK 2.3.0PyTorchImageNet1k65,53616.1625080
ViTVision TransformerIPU-POD128SDK 2.3.0PyTorchImageNet1k65,53616.1646320
ViTVision TransformerIPU-POD256SDK 2.3.0PyTorchImageNet1k65,53616.1668800
UNet (Medical)IPU-M2000SDK 2.4.0TensorFlow2EM segmentation2416.16139
Mini DALL-EIPU-M2000SDK 2.4.0PyTorchCOCO 20171,53616.16319
Mini DALL-EIPU-POD16SDK 2.4.0PyTorchCOCO 20176,14416.16815
DeepVoice3IPU-M2000SDK 2.4.0PopARTVCTK Corpus12832.328496
FastSpeech2IPU-M2000SDK 2.4.0TensorFlow2LJ Speech3216.16406
FastSpeech2IPU-POD16SDK 2.4.0TensorFlow2LJ Speech6416.161141
ConformerIPU-M2000SDK 2.4.0PyTorchAiShell19616.161030
ConformerIPU-POD16SDK 2.4.0PyTorchAiShell19616.163395
TGNTemporal Graph NetworkGC200 IPUSDK 2.4.0TensorFlow1JODIE Wikipedia20016.32190472

IPU-POD经典款(Time to Result)

ModelVariantPlatformSDK VersionFrameworkDatasetBatch SizePrecisionTime To Result (secs)
MCMC TFPIPU-M2000SDK 2.4.0TensorFlow1Proprietary32.3249

IPU-POD经典款- 推理

此语境中的模型推理是指在输入数据上运行模型以推断输出。生产设置中的推理性能通常通过两个指标来衡量:吞吐量(如前所述)和时延,后者被定义为执行推理所需的时间。

ModelVariantPlatformSDK VersionFrameworkDatasetBatch SizePrecisionThroughput (items/sec)Latency (ms)
BERT-LargeSL128IPU-M2000SDK 2.4.0PopARTSQuAD416.1620711.92
BERT-LargeSL128IPU-M2000SDK 2.4.0PopARTSQuAD816.1629112.73
BERT-LargeSL128IPU-M2000SDK 2.4.0PopARTSQuAD1216.1633033.62
BERT-BaseSL128IPU-M2000SDK 2.4.0PopARTSQuAD416.1645800.86
BERT-BaseSL128IPU-M2000SDK 2.4.0PopARTSQuAD816.1670691.11
BERT-BaseSL128IPU-M2000SDK 2.4.0PopARTSQuAD1616.1696871.65
BERT-BaseSL128IPU-M2000SDK 2.4.0PopARTSQuAD3216.16125842.53
BERT-BaseSL128IPU-M2000SDK 2.4.0PopARTSQuAD6416.16153464.16
BERT-BaseSL128IPU-M2000SDK 2.4.0PopARTSQuAD12816.16179727.11
BERT-BaseSL128IPU-M2000SDK 2.4.0PopARTSQuAD25616.161948413.11
BERT-BaseSL128IPU-M2000SDK 2.4.0PopARTSQuAD32016.162080315.36
ResNet-50v1.5IPU-M2000SDK 2.4.0TensorFlow1Synthetic (host-generated)416.1671521.66
ResNet-50v1.5IPU-M2000SDK 2.4.0TensorFlow1Synthetic (host-generated)816.16105152.27
ResNet-50v1.5IPU-M2000SDK 2.4.0TensorFlow1Synthetic (host-generated)1616.16162072.95
ResNet-50v1.5IPU-M2000SDK 2.4.0TensorFlow1Synthetic (host-generated)3216.16225444.24
ResNet-50v1.5IPU-M2000SDK 2.4.0TensorFlow1Synthetic (host-generated)6416.16287626.66
ResNet-50v1.5IPU-M2000SDK 2.4.0TensorFlow1Synthetic (host-generated)12816.163515510.91
ResNet-50v1.5IPU-M2000SDK 2.4.0TensorFlow1Synthetic (host-generated)25616.164008519.14
ResNet-50v1.5lowest latency configIPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)416.1673970.52
ResNet-50v1.5higher throughput configIPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)416.1694042.04
ResNet-50v1.5IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)1616.16143212.69
ResNet-50v1.5IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)3216.16209273.7
ResNet-50v1.5IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)6416.16361938.62
ResNet-50v1.5IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)12816.164347214.38
ResNet-50v1.5IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)25616.164981625.13
ResNet-50v1.5IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)36016.165088330.68
ResNeXt101IPU-M2000SDK 2.4.0TensorFlow1Synthetic (host-generated)416.1644832.66
ResNeXt101IPU-M2000SDK 2.4.0TensorFlow1Synthetic (host-generated)816.1664353.71
ResNeXt101IPU-M2000SDK 2.4.0TensorFlow1Synthetic (host-generated)1616.1697054.93
ResNeXt101IPU-M2000SDK 2.4.0TensorFlow1Synthetic (host-generated)3216.16136936.99
ResNeXt101IPU-M2000SDK 2.4.0TensorFlow1Synthetic (host-generated)6416.161717611.16
ResNeXt101IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)416.1633951.14
ResNeXt101IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)816.1648401.62
ResNeXt101IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)1616.1664832.43
ResNeXt101IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)6416.161132027.83
EfficientNet-B0lowest latency configIPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)416.1686860.44
EfficientNet-B0higher throughput configIPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)416.16109071.69
EfficientNet-B0IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)3216.16505103.05
EfficientNet-B0IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)6416.16718394.26
EfficientNet-B0IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)12816.16869866.77
EfficientNet-B0IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)14416.16698529.15
EfficientNet-B0IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)19616.166171413.38
EfficientNet-B0IPU-M2000SDK 2.4.0TensorFlow1Synthetic (host-generated)416.1682891.43
EfficientNet-B0IPU-M2000SDK 2.4.0TensorFlow1Synthetic (host-generated)816.16130561.82
EfficientNet-B0IPU-M2000SDK 2.4.0TensorFlow1Synthetic (host-generated)1616.16222172.15
EfficientNet-B0IPU-M2000SDK 2.4.0TensorFlow1Synthetic (host-generated)3216.16344482.77
EfficientNet-B0IPU-M2000SDK 2.4.0TensorFlow1Synthetic (host-generated)6416.16433514.41
EfficientNet-B0IPU-M2000SDK 2.4.0TensorFlow1Synthetic (host-generated)12816.16532567.19
EfficientNet-B0IPU-M2000SDK 2.4.0TensorFlow1Synthetic (host-generated)16016.16551698.68
EfficientNet-B4lowest latency configIPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)416.1635391.09
EfficientNet-B4higher throughput configIPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)416.1640811.85
EfficientNet-B4IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)1616.1682993.5
EfficientNet-B4IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)2416.1698744.37
EfficientNet-B4IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)3216.16107535.3
EfficientNet-B4IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)4016.16115786.22
EfficientNet-B4IPU-M2000SDK 2.4.0TensorFlow1Synthetic (host-generated)416.1637183.21
EfficientNet-B4IPU-M2000SDK 2.4.0TensorFlow1Synthetic (host-generated)816.1655144.34
EfficientNet-B4IPU-M2000SDK 2.4.0TensorFlow1Synthetic (host-generated)1616.1679596.01
EfficientNet-B4IPU-M2000SDK 2.4.0TensorFlow1Synthetic (host-generated)2016.1689586.68
EfficientNet-B7IPU-M2000SDK 2.4.0TensorFlow1Synthetic (host-generated)416.1614078.52
EfficientNet-B7IPU-M2000SDK 2.4.0TensorFlow1Synthetic (host-generated)816.16186912.82
Yolo v4image 896, bps 5, max det 200IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)416.166909.4
Yolo v4image 896, bps 10, max det 300IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)416.167229.74
Yolo v4image 640, bps 5, max det 200IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)816.16130610.03
Yolo v4image 640, bps 10, max det 300IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)816.16136410.39
Yolo v4image 512, bps 5, max det 200IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)816.1617727.25
Yolo v4image 512, bps 10, max det 300IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)816.1619157.31
Yolo v4image 416, bps 5, max det 200IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)416.1621955.88
Yolo v4image 416, bps 10, max det 100IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)416.1629949.42
Unet (Medical)IPU-M2000SDK 2.4.0TensorFlow2Synthetic (host-generated)416.161144
Unet (Medical)IPU-M2000SDK 2.4.0TensorFlow2Synthetic (host-generated)816.161190
精度术语:X.Y定义如下:X是存储激活和梯度的精度,Y是存储权重的精度。在16.16权重中训练时,我们可能仍将FP32用于其他变量(例如规范或动量),并包括随机舍入。 基准测试是使用我们在 Graphcore GitHub 上的示例生成的。
本页最近更新日期为2022年10月4

获取最新的GRAPHCORE资讯

在下方注册以获取最新的资讯和更新: