性能结果

这里提供了Bow Pod 平台的初步性能结果,以及MLPerf Training v2.0 提交结果,以及我们自己针对训练和推理的更广泛模型的基准测试结果。

Bow平台-训练

这里提供了 Bow Pod 平台的初步训练性能结果。 此处吞吐量定义为模型每秒处理的输入数据点(序列、图像或行)的数量。

以下结果详细说明了在指定配置中,每个引用模型的吞吐量。

ModelVariantPlatformSDK VersionFrameworkDatasetBatch SizePrecisionThroughput (items/sec)
BERT LargePh1 Pre-Training (SL128) - PackedBow Pod16SDK 3.1PopARTWikipedia54,78416.166044
BERT LargePh1 Pre-Training (SL128)Bow Pod16SDK 3.3TensorFlow2Wikipedia65,28016.164527
BERT LargePh1 Pre-Training (SL128) - PackedBow Pod16SDK 3.3PyTorchWikipedia56,06416.165600
BERT LargePh1 Pre-Training (SL128) - PackedBow Pod64SDK 3.1PopARTWikipedia54,78416.1622759
BERT LargePh1 Pre-Training (SL128)Bow Pod64SDK 3.0TensorFlow2Wikipedia66,56016.1618199
BERT LargePh1 Pre-Training (SL128) - PackedBow Pod64SDK 3.2PyTorchWikipedia56,06416.1618442
BERT LargePh2 Pre-Training (SL512) - PackedBow Pod16SDK 3.1PopARTWikipedia9,60016.162126
BERT LargePh2 Pre-Training (SL512) - PackedBow Pod16SDK 3.3PyTorchWikipedia8,19216.161973
BERT LargePh2 Pre-Training (SL512) - PackedBow Pod64SDK 3.1PopARTWikipedia9,60016.167789
BERT LargePh2 Pre-Training (SL512) - PackedBow Pod64SDK 3.2PyTorchWikipedia8,19216.166543
BERT LargeFine-Tuning (SL384 - SQuAD)Bow Pod16SDK 3.1PopARTSQuAD25616.161183
BERT LargeFine-Tuning (SL384 - SQuAD)Bow Pod16SDK 3.3PyTorchSQuAD25616.161009
BERT BasePh1 Pre-Training (SL128)Bow Pod16SDK 3.1PopARTWikipedia65,53616.1616391
BERT BasePh1 Pre-Training (SL128)Bow Pod16SDK 3.3TensorFlow2Wikipedia65,28016.1615173
BERT BasePh1 Pre-Training (SL128)Bow Pod16SDK 3.3PyTorchWikipedia65,53616.1615911
BERT BasePh2 Pre-Training (SL512)Bow Pod16SDK 3.1PopARTWikipedia16,38416.163656
BERT BasePh2 Pre-Training (SL384)Bow Pod16SDK 3.3TensorFlow2Wikipedia16,32016.164387
BERT BasePh2 Pre-Training (SL512)Bow Pod16SDK 3.3PyTorchWikipedia16,38416.163527
Group BERT BasePh1 Pre-Training (SL128)Bow Pod16SDK 3.0TensorFlow1Wikipedia65,52016.167187
Group BERT BasePh2 Pre-Training (SL384)Bow Pod16SDK 3.0TensorFlow1Wikipedia32,80016.162288
Group BERT BasePh1 Pre-Training (SL128)Bow Pod64SDK 3.0TensorFlow1Wikipedia64,80016.1626425
Group BERT BasePh2 Pre-Training (SL384)Bow Pod64SDK 3.0TensorFlow1Wikipedia32,64016.167572
BERT Base - HuggingFaceFine-Tuning (SL384 - SQuAD)Bow Pod16SDK 3.0TensorFlow2SQuAD32016.161014
GPT2GPT2-Large (SL512)Bow Pod16SDK 3.3PyTorchWikipedia8,19216.16414
GPT2GPT2-Large (SL512)Bow Pod64SDK 3.3PyTorchWikipedia8,19216.161590
GPT2GPT2-Large (SL1024)Bow Pod16SDK 3.3PyTorchWikipedia8,19216.16178
GPT2GPT2-Medium (SL1024)Bow Pod16SDK 3.3PyTorchWikipedia8,19216.16337
GPT2GPT2-Medium (SL1024)Bow Pod64SDK 3.3PyTorchWikipedia8,19216.161320
GPT2GPT2-Small (SL1024)Bow Pod16SDK 3.3PyTorchWikipedia8,19216.161065
GPT2GPT2-Small (SL1024)Bow Pod64SDK 3.3PyTorchWikipedia8,19216.164094
Conformer-MediumWeNet-Conformer-MediumBow Pod16SDK 3.3PyTorchAiShell128816.161167
RNN-TTransformer TransducerBow Pod16SDK 3.1PopARTGenerated3232.321442
DeepVoice3Bow-2000SDK 3.1PopARTVCTK Corpus12832.329653
FastSpeech2Bow Pod16SDK 3.1TensorFlow2LJ Speech6416.161653
FastPitchframes/sBow Pod16SDK 3.3PyTorchGenerated12832.321489341
TGNTemporal Graph Network1x Bow IPUSDK 3.3Pytorch Geometric31617
Cluster-GCNBow-2000SDK 3.2TensorFlow2PPI16.16684439
Cluster-GCNBow-2000SDK 3.2TensorFlow2ArXiv16.163521863
Cluster-GCNBow-2000SDK 3.2TensorFlow2Reddit16.161959184
Cluster-GCNBow-2000SDK 3.2TensorFlow2Products16.163412474
Cluster-GCNBow-2000SDK 3.2TensorFlow2ogbn-mag16.162586673
MPNN-GINMP Graph Isomorphism n/wBow-2000SDK 3.3TensorFlow2Generated1,02416.16473608
ResNet-50 v1.5Bow Pod16SDK 3.0TensorFlow1ImageNet20123,52016.1644059
ResNet-50 v1.5Bow Pod16SDK 3.3PyTorchImageNet201216,38416.1638036
ResNet-50 v1.5Bow Pod64SDK 3.0TensorFlow1ImageNet20125,12016.16153205
ResNet-50 v1.5Bow Pod64SDK 3.2PyTorchImageNet201216,38416.16109232
EfficientNet-B4G16-EfficientNetBow Pod16SDK 3.0TensorFlow1ImageNet20126,14416.169000
EfficientNet-B4G16-EfficientNetBow Pod16SDK 3.3PyTorchImageNet20121,02416.328223
EfficientNet-B4G16-EfficientNetBow Pod64SDK 3.0TensorFlow1ImageNet20126,14416.1634140
ResNeXt101Bow Pod16SDK 3.0TensorFlow1ImageNet201276816.1612277
ViTPre-TrainingBow Pod16SDK 3.3PyTorchImageNet1k65,53616.167608
ViTPre-TrainingBow Pod64SDK 3.3PyTorchImageNet1k65,53616.1626185
ViTFine-TuningBow Pod16SDK 3.3PyTorchImageNet1k2,04016.168148
DINOVision TransformerBow Pod16SDK 3.3PyTorchImageNet1k3,20016.16696
DINOVision TransformerBow Pod64SDK 3.2PyTorchImageNet1k3,20016.163437
Swin-Base (224)Vision Transformer - Pre-TrainingBow Pod16SDK 3.3PyTorchImageNet1k51232.321442
Swin-Tiny (224)Vision Transformer - Pre-TrainingBow Pod16SDK 3.3PyTorchImageNet1k1,02432.323687
Swin-Large (224)Vision Transformer - Fine-TuningBow Pod16SDK 3.3PyTorchImageNet1k8,19616.163283
UNet (Medical)Bow-2000SDK 3.3TensorFlow2EM segmentation2416.16152
Mini DALL-EBow Pod16SDK 3.3PyTorchCOCO 20176,14416.161843
Mini DALL-EBow Pod64SDK 3.3PyTorchCOCO 201724,57616.166787
MAEMasked Autoencoder for visual representation learningBow Pod16SDK 3.1PyTorchImageNet4,12816.167111
Frozen In TimeMultimodal - Pre-Training (1 frame)Bow Pod8SDK 3.3PyTorchwebvid24016.16447
CLIPMultimodel (language/vision)Bow Pod8SDK 3.3PyTorchc3m79516.162498

Bow平台-推理

此处模型推理是指在输入数据上运行经过训练的模型来推断输出。 实际商业应用中的推理性能通常根据两个指标来衡量:吞吐量(如前所述)和时延,在此上下文中是指为模型在给定输入的情况下提供输出所花费的时间。

以下为 Bow-2000 平台上指定批大小下的吞吐量和延迟结果。

ModelVariantPlatformSDK VersionFrameworkDatasetBatch SizePrecisionThroughput (items/sec)Latency (ms)
GPT2GPT2-SmallBow Pod16SDK 3.3PyTorchSynthetic (host-generated)416.1613615.51
GPT2GPT2-MediumBow Pod16SDK 3.3PyTorchSynthetic (host-generated)216.1633711.75
GPT2GPT2-LargeBow Pod16SDK 3.3PyTorchSynthetic (host-generated)216.169720.57
BERT-LargeSL128Bow-2000SDK 3.1PopARTSQuAD416.1629081.36
BERT-LargeSL128Bow-2000SDK 3.1PopARTSQuAD816.1640961.94
BERT-LargeSL128Bow-2000SDK 3.1PopARTSQuAD1216.1646552.56
BERT-LargeSL128Bow-2000SDK 3.1PopARTSQuAD1616.1652923.01
BERT-BaseSL128Bow-2000SDK 3.1PopARTSQuAD416.1665080.6
BERT-BaseSL128Bow-2000SDK 3.1PopARTSQuAD32016.162806911.41
ResNet-50v1.5lowest latency configBow-2000SDK 3.3PyTorchSynthetic (host-generated)416.1692970.58
ResNet-50v1.5higher throughput configBow-2000SDK 3.3PyTorchSynthetic (host-generated)416.16122821.66
ResNet-50v1.5Bow-2000SDK 3.3PyTorchSynthetic (host-generated)25616.164617426.16
EfficientNet-B0lowest latency configBow-2000SDK 3.3PyTorchSynthetic (host-generated)416.16108660.39
EfficientNet-B0higher throughput configBow-2000SDK 3.3PyTorchSynthetic (host-generated)416.16147891.29
EfficientNet-B0Bow-2000SDK 3.3PyTorchSynthetic (host-generated)19216.164610719.97
EfficientNet-B4lowest latency configBow-2000SDK 3.3PyTorchSynthetic (host-generated)416.1644191.26
EfficientNet-B4higher throughput configBow-2000SDK 3.3PyTorchSynthetic (host-generated)416.1656463.53
EfficientNet-B4Bow-2000SDK 3.3PyTorchSynthetic (host-generated)4816.161545814.35
Yolo v4image 896, bps 5, max det 200Bow-2000SDK 3.3PyTorchSynthetic (host-generated)416.169246.49
Yolo v4image 896, bps 10, max det 300Bow-2000SDK 3.3PyTorchSynthetic (host-generated)416.169886.76
Yolo v4image 640, bps 5, max det 200Bow-2000SDK 3.3PyTorchSynthetic (host-generated)816.1618546.61
Yolo v4image 640, bps 10, max det 300Bow-2000SDK 3.3PyTorchSynthetic (host-generated)816.1619486.92
Yolo v4image 512, bps 5, max det 200Bow-2000SDK 3.3PyTorchSynthetic (host-generated)816.1624774.81
Yolo v4image 512, bps 10, max det 300Bow-2000SDK 3.3PyTorchSynthetic (host-generated)816.1626634.93
Yolo v4image 416, bps 5, max det 200Bow-2000SDK 3.3PyTorchSynthetic (host-generated)816.1632023.7
Yolo v4image 416, bps 10, max det 100Bow-2000SDK 3.3PyTorchSynthetic (host-generated)1616.1642686.28
EfficientDet-D0Bow-2000SDK 3.3TF2 w/KerasSynthetic (host-generated)1616.1651710.77
EfficientDet-D1Bow-2000SDK 3.3TF2 w/KerasSynthetic (host-generated)1216.1628751.4
EfficientDet-D2Bow-2000SDK 3.3TF2 w/KerasSynthetic (host-generated)816.1618692.14
EfficientDet-D3Bow-2000SDK 3.3TF2 w/KerasSynthetic (host-generated)416.169254.33
EfficientDet-D4Bow-2000SDK 3.3TF2 w/KerasSynthetic (host-generated)416.166646.03
Unet (Medical)Bow-2000SDK 3.3TensorFlow2Synthetic (host-generated)416.161920
Unet (Medical)Bow-2000SDK 3.3TensorFlow2Synthetic (host-generated)816.162081
FastSpeech2Bow-2000SDK 3.1TensorFlow2Synthetic (host-generated)416.1626101.53
FastSpeech2Bow-2000SDK 3.1TensorFlow2Synthetic (host-generated)1616.1643540.92
FastSpeech2Bow-2000SDK 3.1TensorFlow2Synthetic (host-generated)3216.1649490.81
FastSpeech2Bow-2000SDK 3.1TensorFlow2Synthetic (host-generated)6016.1652010.77

MLPerf训练2.0版本表现

对于我们向 MLPerf 培训 2.0 版提交的内容,我们选择提交图像分类 (ResNet-50) 和自然语言处理 (BERT) 的流行应用程序基准类别,以及一个新条目作为 RNN-T 语音转录类别中的公开提交。

提交有两个分区(Division)。封闭分区(Closed Division)要求提交者使用完全相同的模型和优化器实施,包括定义超参数状态和训练时期。还有一个开放分区(Open Division),通过支持更适合不同处理器功能的不同模型实现来促进和支持创新,但确保达到与封闭分区完全相同的模型准确性和质量。

DivisionModelMLPerf Quality TargetPlatformSDK VersionFrameworkMLPerf IDDatasetPrecisionTime to Train (mins)
ClosedResNet50 v1.575.90% classificationBow Pod16SDK 2.5.1TensorFlow2.0-2047ImageNet201216.1619.64
ClosedResNet50 v1.575.90% classificationBow Pod64SDK 2.5.1TensorFlow2.0-2050ImageNet201216.166.30
ClosedResNet50 v1.575.90% classificationBow Pod128SDK 2.5.1TensorFlow2.0-2052ImageNet201216.164.19
ClosedResNet50 v1.575.90% classificationBow Pod256SDK 2.5.1TensorFlow2.0-2054ImageNet201216.162.67
ClosedBERT0.72 Mask-LM accuracyBow Pod16SDK 2.5.1PopART2.0-2045Wikipedia16.1620.66
ClosedBERT0.72 Mask-LM accuracyBow Pod16SDK 2.5.1PaddlePaddle2.0-2046Wikipedia16.1620.75
ClosedBERT0.72 Mask-LM accuracyBow Pod64SDK 2.5.1PopART2.0-2049Wikipedia16.166.70
ClosedBERT0.72 Mask-LM accuracyBow Pod64SDK 2.5.1PaddlePaddle2.0-2048Wikipedia16.166.77
ClosedBERT0.72 Mask-LM accuracyBow Pod128SDK 2.5.1PopART2.0-2051Wikipedia16.164.42
ClosedBERT0.72 Mask-LM accuracyBow Pod256SDK 2.5.1PopART2.0-2053Wikipedia16.163.19
OpenRNN-T -Bow Pod64SDK 2.5.1PopART2.0-2125Customer dataset16.16109.36

MLPerf的名称和徽标是MLCommons联盟(MLCommons Association)在美国和其他国家的商标。
版权所有,严禁未经授权使用。有关更多信息,请访问 www.mlperf.org

IPU-POD经典款-训练

训练机器学习模型涉及在输入数据集(训练数据)上运行算法,直到模型收敛,收敛意味着它已经学会以指定的准确性产生所需的输出。在此语境中,吞吐量被定义为模型每秒处理的输入数据点(序列、图像或行)的数量。吞吐量通常用作衡量硬件性能的指标,因为它与将模型训练达到指定准确性的时间直接相关。

下面提供的结果详细说明了在指定配置中每个参考模型获得的吞吐量值。在真实数据上运行的所有配置都针对收敛进行了验证。

ModelVariantPlatformSDK VersionFrameworkDatasetBatch SizePrecisionThroughput (items/sec)
BERT LargePh1 Pre-Training (SL128)IPU-POD16SDK 2.4.0PopARTWikipedia65,53616.163738
BERT LargePh1 Pre-Training (SL128)IPU-POD16SDK 2.4.0TensorFlow1Wikipedia65,60016.163704
BERT LargePh1 Pre-Training (SL128)IPU-POD16SDK 2.4.0PyTorchWikipedia65,53616.163582
BERT LargePh1 Pre-Training (SL128)IPU-POD64SDK 2.4.0PopARTWikipedia65,53616.1614189
BERT LargePh1 Pre-Training (SL128)IPU-POD64SDK 2.4.0TensorFlow1Wikipedia66,56016.1613917
BERT LargePh1 Pre-Training (SL128)IPU-POD64SDK 2.4.0PyTorchWikipedia65,53616.1612251
BERT LargePh1 Pre-Training (SL128)IPU-POD128SDK 2.4.0PopARTWikipedia65,53616.1624424
BERT LargePh1 Pre-Training (SL128)IPU-POD128SDK 2.4.0TensorFlow1Wikipedia66,56016.1624900
BERT LargePh1 Pre-Training (SL128)IPU-POD128SDK 2.4.0PyTorchWikipedia65,53616.1622402
BERT LargePh2 Pre-Training (SL384)IPU-POD16SDK 2.4.0PopARTWikipedia16,38416.161063
BERT LargePh2 Pre-Training (SL384)IPU-POD16SDK 2.4.0TensorFlow1Wikipedia16,40016.161025
BERT LargePh2 Pre-Training (SL384)IPU-POD16SDK 2.4.0PyTorchWikipedia16,38416.161012
BERT LargePh2 Pre-Training (SL384)IPU-POD64SDK 2.4.0PopARTWikipedia16,38416.164003
BERT LargePh2 Pre-Training (SL384)IPU-POD64SDK 2.4.0TensorFlow1Wikipedia16,64016.163938
BERT LargePh2 Pre-Training (SL384)IPU-POD64SDK 2.4.0PyTorchWikipedia16,38416.163611
BERT LargePh2 Pre-Training (SL384)IPU-POD128SDK 2.4.0PopARTWikipedia16,38416.167127
BERT LargePh2 Pre-Training (SL384)IPU-POD128SDK 2.4.0TensorFlow1Wikipedia16,64016.167292
BERT LargePh2 Pre-Training (SL384)IPU-POD128SDK 2.4.0PyTorchWikipedia16,38416.166500
BERT LargeFine-Tuning (SL384 - SQuAD)IPU-POD16SDK 2.4.0PopARTSQuAD25616.16884
BERT LargeFine-Tuning (SL384 - SQuAD)IPU-POD16SDK 2.4.0PyTorchSQuAD25616.16744
BERT BasePh1 Pre-Training (SL128)IPU-POD16SDK 2.4.0PopARTWikipedia65,53616.1611991
BERT BasePh1 Pre-Training (SL128)IPU-POD16SDK 2.4.0TensorFlow1Wikipedia65,28016.1611647
BERT BasePh1 Pre-Training (SL128)IPU-POD16SDK 2.4.0TensorFlow2Wikipedia65,28016.1611035
BERT BasePh1 Pre-Training (SL128)IPU-POD16SDK 2.4.0PyTorchWikipedia65,53616.1611184
BERT BasePh2 Pre-Training (SL384)IPU-POD16SDK 2.4.0PopARTWikipedia16,38416.163545
BERT BasePh2 Pre-Training (SL384)IPU-POD16SDK 2.4.0TensorFlow1Wikipedia16,32016.163288
BERT BasePh2 Pre-Training (SL384)IPU-POD16SDK 2.4.0TensorFlow2Wikipedia16,32016.163155
BERT BasePh2 Pre-Training (SL384)IPU-POD16SDK 2.4.0PyTorchWikipedia16,38416.163334
BERT Base - HuggingFaceFine-Tuning (SL384 - SQuAD)IPU-POD16SDK 2.4.0TensorFlow2SQuAD32016.16375
GPT2GPT2-mediumIPU-POD16SDK 2.3.0PyTorchWikipedia65,53616.162540
GPT2GPT2-mediumIPU-POD64SDK 2.3.0PyTorchWikipedia65,53616.169870
GPT2GPT2-mediumIPU-POD128SDK 2.3.0PyTorchWikipedia65,53616.1618842
GPT2GPT2-mediumIPU-POD256SDK 2.3.0PyTorchWikipedia65,53616.1631025
ResNet-50 v1.5IPU-M2000SDK 2.4.0TensorFlow1ImageNet20121,92016.167864
ResNet-50 v1.5IPU-M2000SDK 2.4.0PyTorchImageNet201216,38416.167303
ResNet-50 v1.5IPU-POD16SDK 2.4.0TensorFlow1ImageNet20121,92016.1630690
ResNet-50 v1.5IPU-POD16SDK 2.4.0PyTorchImageNet201216,38416.1625534
ResNet-50 v1.5IPU-POD64SDK 2.4.0TensorFlow1ImageNet20122,56016.16108566
ResNet-50 v1.5IPU-POD128SDK 2.4.0TensorFlow1ImageNet20125,12016.16205006
ResNet-50 v1.5IPU-POD256SDK 2.4.0TensorFlow1ImageNet201210,24016.16365040
ResNeXt101IPU-M2000SDK 2.4.0TensorFlow1ImageNet201276816.162514
ResNeXt101IPU-POD16SDK 2.4.0TensorFlow1ImageNet201276816.169023
EfficientNet-B4G16-EfficientNetIPU-M2000SDK 2.4.0TensorFlow1ImageNet201280016.161618
EfficientNet-B4G16-EfficientNetIPU-M2000SDK 2.4.0PyTorchImageNet20121,02416.321400
EfficientNet-B4G16-EfficientNetIPU-POD16SDK 2.4.0TensorFlow1ImageNet20126,14416.166379
EfficientNet-B4G16-EfficientNetIPU-POD16SDK 2.4.0PyTorchImageNet20121,02416.324311
EfficientNet-B4G16-EfficientNetIPU-POD64SDK 2.4.0TensorFlow1ImageNet20126,14416.1624946
EfficientNet-B4G16-EfficientNetIPU-POD128SDK 2.4.0TensorFlow1ImageNet20126,14416.1648015
EfficientNet-B4G16-EfficientNetIPU-POD256SDK 2.4.0TensorFlow1ImageNet20126,14416.1687968
ViTVision TransformerIPU-POD16SDK 2.3.0PyTorchImageNet1k65,53616.166535
ViTVision TransformerIPU-POD64SDK 2.3.0PyTorchImageNet1k65,53616.1625080
ViTVision TransformerIPU-POD128SDK 2.3.0PyTorchImageNet1k65,53616.1646320
ViTVision TransformerIPU-POD256SDK 2.3.0PyTorchImageNet1k65,53616.1668800
UNet (Medical)IPU-M2000SDK 2.4.0TensorFlow2EM segmentation2416.16139
Mini DALL-EIPU-M2000SDK 2.4.0PyTorchCOCO 20171,53616.16319
Mini DALL-EIPU-POD16SDK 2.4.0PyTorchCOCO 20176,14416.16815
DeepVoice3IPU-M2000SDK 2.4.0PopARTVCTK Corpus12832.328496
FastSpeech2IPU-M2000SDK 2.4.0TensorFlow2LJ Speech3216.16406
FastSpeech2IPU-POD16SDK 2.4.0TensorFlow2LJ Speech6416.161141
ConformerIPU-M2000SDK 2.4.0PyTorchAiShell19616.161030
ConformerIPU-POD16SDK 2.4.0PyTorchAiShell19616.163395
TGNTemporal Graph NetworkGC200 IPUSDK 2.4.0TensorFlow1JODIE Wikipedia20016.32190472

IPU-POD经典款(Time to Result)

ModelVariantPlatformSDK VersionFrameworkDatasetBatch SizePrecisionTime To Result (secs)
MCMC TFPIPU-M2000SDK 2.4.0TensorFlow1Proprietary32.3249

IPU-POD经典款- 推理

此语境中的模型推理是指在输入数据上运行模型以推断输出。生产设置中的推理性能通常通过两个指标来衡量:吞吐量(如前所述)和时延,后者被定义为执行推理所需的时间。

ModelVariantPlatformSDK VersionFrameworkDatasetBatch SizePrecisionThroughput (items/sec)Latency (ms)
BERT-LargeSL128IPU-M2000SDK 2.4.0PopARTSQuAD416.1620711.92
BERT-LargeSL128IPU-M2000SDK 2.4.0PopARTSQuAD816.1629112.73
BERT-LargeSL128IPU-M2000SDK 2.4.0PopARTSQuAD1216.1633033.62
BERT-BaseSL128IPU-M2000SDK 2.4.0PopARTSQuAD416.1645800.86
BERT-BaseSL128IPU-M2000SDK 2.4.0PopARTSQuAD816.1670691.11
BERT-BaseSL128IPU-M2000SDK 2.4.0PopARTSQuAD1616.1696871.65
BERT-BaseSL128IPU-M2000SDK 2.4.0PopARTSQuAD3216.16125842.53
BERT-BaseSL128IPU-M2000SDK 2.4.0PopARTSQuAD6416.16153464.16
BERT-BaseSL128IPU-M2000SDK 2.4.0PopARTSQuAD12816.16179727.11
BERT-BaseSL128IPU-M2000SDK 2.4.0PopARTSQuAD25616.161948413.11
BERT-BaseSL128IPU-M2000SDK 2.4.0PopARTSQuAD32016.162080315.36
ResNet-50v1.5IPU-M2000SDK 2.4.0TensorFlow1Synthetic (host-generated)416.1671521.66
ResNet-50v1.5IPU-M2000SDK 2.4.0TensorFlow1Synthetic (host-generated)816.16105152.27
ResNet-50v1.5IPU-M2000SDK 2.4.0TensorFlow1Synthetic (host-generated)1616.16162072.95
ResNet-50v1.5IPU-M2000SDK 2.4.0TensorFlow1Synthetic (host-generated)3216.16225444.24
ResNet-50v1.5IPU-M2000SDK 2.4.0TensorFlow1Synthetic (host-generated)6416.16287626.66
ResNet-50v1.5IPU-M2000SDK 2.4.0TensorFlow1Synthetic (host-generated)12816.163515510.91
ResNet-50v1.5IPU-M2000SDK 2.4.0TensorFlow1Synthetic (host-generated)25616.164008519.14
ResNet-50v1.5lowest latency configIPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)416.1673970.52
ResNet-50v1.5higher throughput configIPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)416.1694042.04
ResNet-50v1.5IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)1616.16143212.69
ResNet-50v1.5IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)3216.16209273.7
ResNet-50v1.5IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)6416.16361938.62
ResNet-50v1.5IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)12816.164347214.38
ResNet-50v1.5IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)25616.164981625.13
ResNet-50v1.5IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)36016.165088330.68
ResNeXt101IPU-M2000SDK 2.4.0TensorFlow1Synthetic (host-generated)416.1644832.66
ResNeXt101IPU-M2000SDK 2.4.0TensorFlow1Synthetic (host-generated)816.1664353.71
ResNeXt101IPU-M2000SDK 2.4.0TensorFlow1Synthetic (host-generated)1616.1697054.93
ResNeXt101IPU-M2000SDK 2.4.0TensorFlow1Synthetic (host-generated)3216.16136936.99
ResNeXt101IPU-M2000SDK 2.4.0TensorFlow1Synthetic (host-generated)6416.161717611.16
ResNeXt101IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)416.1633951.14
ResNeXt101IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)816.1648401.62
ResNeXt101IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)1616.1664832.43
ResNeXt101IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)6416.161132027.83
EfficientNet-B0lowest latency configIPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)416.1686860.44
EfficientNet-B0higher throughput configIPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)416.16109071.69
EfficientNet-B0IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)3216.16505103.05
EfficientNet-B0IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)6416.16718394.26
EfficientNet-B0IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)12816.16869866.77
EfficientNet-B0IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)14416.16698529.15
EfficientNet-B0IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)19616.166171413.38
EfficientNet-B0IPU-M2000SDK 2.4.0TensorFlow1Synthetic (host-generated)416.1682891.43
EfficientNet-B0IPU-M2000SDK 2.4.0TensorFlow1Synthetic (host-generated)816.16130561.82
EfficientNet-B0IPU-M2000SDK 2.4.0TensorFlow1Synthetic (host-generated)1616.16222172.15
EfficientNet-B0IPU-M2000SDK 2.4.0TensorFlow1Synthetic (host-generated)3216.16344482.77
EfficientNet-B0IPU-M2000SDK 2.4.0TensorFlow1Synthetic (host-generated)6416.16433514.41
EfficientNet-B0IPU-M2000SDK 2.4.0TensorFlow1Synthetic (host-generated)12816.16532567.19
EfficientNet-B0IPU-M2000SDK 2.4.0TensorFlow1Synthetic (host-generated)16016.16551698.68
EfficientNet-B4lowest latency configIPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)416.1635391.09
EfficientNet-B4higher throughput configIPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)416.1640811.85
EfficientNet-B4IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)1616.1682993.5
EfficientNet-B4IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)2416.1698744.37
EfficientNet-B4IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)3216.16107535.3
EfficientNet-B4IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)4016.16115786.22
EfficientNet-B4IPU-M2000SDK 2.4.0TensorFlow1Synthetic (host-generated)416.1637183.21
EfficientNet-B4IPU-M2000SDK 2.4.0TensorFlow1Synthetic (host-generated)816.1655144.34
EfficientNet-B4IPU-M2000SDK 2.4.0TensorFlow1Synthetic (host-generated)1616.1679596.01
EfficientNet-B4IPU-M2000SDK 2.4.0TensorFlow1Synthetic (host-generated)2016.1689586.68
EfficientNet-B7IPU-M2000SDK 2.4.0TensorFlow1Synthetic (host-generated)416.1614078.52
EfficientNet-B7IPU-M2000SDK 2.4.0TensorFlow1Synthetic (host-generated)816.16186912.82
Yolo v4image 896, bps 5, max det 200IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)416.166909.4
Yolo v4image 896, bps 10, max det 300IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)416.167229.74
Yolo v4image 640, bps 5, max det 200IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)816.16130610.03
Yolo v4image 640, bps 10, max det 300IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)816.16136410.39
Yolo v4image 512, bps 5, max det 200IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)816.1617727.25
Yolo v4image 512, bps 10, max det 300IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)816.1619157.31
Yolo v4image 416, bps 5, max det 200IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)416.1621955.88
Yolo v4image 416, bps 10, max det 100IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)416.1629949.42
Unet (Medical)IPU-M2000SDK 2.4.0TensorFlow2Synthetic (host-generated)416.161144
Unet (Medical)IPU-M2000SDK 2.4.0TensorFlow2Synthetic (host-generated)816.161190
精度术语:X.Y定义如下:X是存储激活和梯度的精度,Y是存储权重的精度。在16.16权重中训练时,我们可能仍将FP32用于其他变量(例如规范或动量),并包括随机舍入。基准测试是使用我们在 Graphcore GitHub 上的示例生成的。

本页最近更新日期为2023年4月18

获取最新的GRAPHCORE资讯

在下方注册以获取最新的资讯和更新: