性能结果

这里提供了新 Bow Pod 平台的初步性能结果,以及MLPerf Training v1.1 提交结果,以及我们自己针对训练和推理的更广泛模型的基准测试结果。

Bow平台-训练

这里提供了全新 Bow Pod 平台的初步训练性能结果。 此处吞吐量定义为模型每秒处理的输入数据点(序列、图像或行)的数量。

以下结果详细说明了在指定配置中,每个引用模型的吞吐量。

ModelVariantPlatformSDK VersionFrameworkDatasetBatch SizePrecisionThroughput (items/sec)
ResNet-50 v1.5Bow Pod16Pre-SDK2.5TensorFlow1ImageNet20121,92016.1642029
ResNet-50 v1.5Bow Pod64Pre-SDK2.5TensorFlow1ImageNet20122,56016.16145287
ResNet-50 v1.5Bow Pod256Pre-SDK2.5TensorFlow1ImageNet201210,24016.16425514
EfficientNet-B4G16-EfficientNetBow Pod16Pre-SDK2.5TensorFlow1ImageNet20126,14416.168936
EfficientNet-B4G16-EfficientNetBow Pod64Pre-SDK2.5TensorFlow1ImageNet20126,14416.1634943
EfficientNet-B4G16-EfficientNetBow Pod256Pre-SDK2.5TensorFlow1ImageNet20126,14416.16117641
ResNeXt101Bow Pod16Pre-SDK2.5TensorFlow1ImageNet201276816.1612221
ViTVision TransformerBow Pod64Pre-SDK2.5PyTorchImageNet1k65,53616.1631200
Mini DALL-EBow Pod16Pre-SDK2.5PyTorchCOCO 20176,14416.161855
GraphSageBow Pod16Pre-SDK2.5TensorFlow2COCO 201716.161.95s epoch time
BERT LargePh1 Pre-Training (SL128)Bow Pod16Pre-SDK2.5PopARTWikipedia65,53616.165179
BERT LargePh1 Pre-Training (SL128)Bow Pod16Pre-SDK2.5TensorFlow1Wikipedia65,60016.165125
BERT LargePh1 Pre-Training (SL128)Bow Pod64Pre-SDK2.5PopARTWikipedia65,53616.1619353
BERT LargePh1 Pre-Training (SL128)Bow Pod64Pre-SDK2.5TensorFlow1Wikipedia66,56016.1618907
BERT LargePh2 Pre-Training (SL384)Bow Pod16Pre-SDK2.5PopARTWikipedia16,38416.161470
BERT LargePh2 Pre-Training (SL384)Bow Pod16Pre-SDK2.5TensorFlow1Wikipedia16,40016.161420
BERT LargePh2 Pre-Training (SL384)Bow Pod64Pre-SDK2.5PopARTWikipedia16,38416.165444
BERT LargePh2 Pre-Training (SL384)Bow Pod64Pre-SDK2.5TensorFlow1Wikipedia16,40016.165340
BERT BasePh1 Pre-Training (SL128)Bow Pod16Pre-SDK2.5PopARTWikipedia65,53616.1616508
GPT2GPT2-LargeBow Pod64Pre-SDK2.5PyTorchWikipedia1316
GPT2GPT2-Medium (SL1024)Bow Pod64Pre-SDK2.5PyTorchWikipedia65,53616.16347
Conformer-LargeBow Pod64Pre-SDK2.5PyTorchAiShell116.168157
FastSpeech2Bow Pod16Pre-SDK2.5TensorFlow2LJ Speech6416.161569

Bow平台-推理

此处模型推理是指在输入数据上运行经过训练的模型来推断输出。 实际商业应用中的推理性能通常根据两个指标来衡量:吞吐量(如前所述)和时延,在此上下文中是指为模型在给定输入的情况下提供输出所花费的时间。

以下为 Bow-2000 平台上指定批大小下的吞吐量和延迟结果。

ModelVariantPlatformSDK VersionFrameworkDatasetBatch SizePrecisionThroughput (items/sec)Latency (ms)
BERT-LargeSL128Bow-2000Pre-SDK2.5PopARTSQuAD416.1628771.37
BERT-LargeSL128Bow-2000Pre-SDK2.5PopARTSQuAD1616.1651803.07
BERT-BaseSL128Bow-2000Pre-SDK2.5PopARTSQuAD416.1662350.62
BERT-BaseSL128Bow-2000Pre-SDK2.5PopARTSQuAD32016.162889211.07
ResNet-50v1.5lowest latency configBow-2000Pre-SDK2.5PyTorchSynthetic (host-generated)416.1695700.4
ResNet-50v1.5higher throughput configBow-2000Pre-SDK2.5PyTorchSynthetic (host-generated)416.16127201.46
ResNet-50v1.5Bow-2000Pre-SDK2.5PyTorchSynthetic (host-generated)32016.166377422.79
EfficientNet-B0Bow-2000Pre-SDK2.5PyTorchSynthetic (host-generated)19216.16895139.16
EfficientNet-B4lowest latency configBow-2000Pre-SDK2.5PyTorchSynthetic (host-generated)416.1642100.91
EfficientNet-B4higher throughput configBow-2000Pre-SDK2.5PyTorchSynthetic (host-generated)416.1650941.46
EfficientNet-B4Bow-2000Pre-SDK2.5PyTorchSynthetic (host-generated)4016.16138074.96
ResNeXt101Bow-2000Pre-SDK2.5PyTorchSynthetic (host-generated)416.1659482
ResNeXt101Bow-2000Pre-SDK2.5PyTorchSynthetic (host-generated)6416.16231958.26
Yolo v4image size 896Bow-2000Pre-SDK2.5PyTorchSynthetic (host-generated)416.169806.92
EfficientDet-D3Bow-2000Pre-SDK2.5TF2 w/KerasSynthetic (host-generated)416.1610043.78

MLPerf训练1.1版本表现

对于我们向MLPerf训练1.1版本的提交,我们选择了提交流行应用程序基准类别,包括图像分类(ResNet-50)和自然语言处理(BERT)

提交有两个分区(Division)。封闭分区(Closed Division)要求提交者使用完全相同的模型和优化器实施,包括定义超参数状态和训练时期。还有一个开放分区(Open Division),通过支持更适合不同处理器功能的不同模型实现来促进和支持创新,但确保达到与封闭分区完全相同的模型准确性和质量。

DivisionModelMLPerf Quality TargetPlatformSDK VersionFrameworkMLPerf IDDatasetPrecisionTime to Train (mins)
ClosedResNet50 v1.575.90% classificationIPU-POD16SDK 2.3.0TensorFlow1.1-2040ImageNet201216.1628.33
ClosedResNet50 v1.575.90% classificationIPU-POD64SDK 2.3.0TensorFlow1.1-2042ImageNet201216.168.50
ClosedResNet50 v1.575.90% classificationIPU-POD128SDK 2.3.0TensorFlow1.1-2044ImageNet201216.165.67
ClosedResNet50 v1.575.90% classificationIPU-POD256SDK 2.3.0TensorFlow1.1-2045ImageNet201216.163.79
ClosedBERT0.72 Mask-LM accuracyIPU-POD16SDK 2.3.0PopART1.1-2039Wikipedia16.1632.70
ClosedBERT0.72 Mask-LM accuracyIPU-POD64SDK 2.3.0PopART1.1-2041Wikipedia16.1610.56
ClosedBERT0.72 Mask-LM accuracyIPU-POD128SDK 2.3.0PopART1.1-2043Wikipedia16.166.86
OpenBERT0.72 Mask-LM accuracyIPU-POD16SDK 2.3.0PopART1.1-2088Wikipedia16.1626.05
OpenBERT0.72 Mask-LM accuracyIPU-POD64SDK 2.3.0PopART1.1-2089Wikipedia16.168.25
OpenBERT0.72 Mask-LM accuracyIPU-POD128SDK 2.3.0PopART1.1-2087Wikipedia16.165.88

MLPerf的名称和徽标是MLCommons联盟(MLCommons Association)在美国和其他国家的商标。
版权所有,严禁未经授权使用。有关更多信息,请访问 www.mlperf.org

IPU-POD经典款-训练

训练机器学习模型涉及在输入数据集(训练数据)上运行算法,直到模型收敛,收敛意味着它已经学会以指定的准确性产生所需的输出。在此语境中,吞吐量被定义为模型每秒处理的输入数据点(序列、图像或行)的数量。吞吐量通常用作衡量硬件性能的指标,因为它与将模型训练达到指定准确性的时间直接相关。

下面提供的结果详细说明了在指定配置中每个参考模型获得的吞吐量值。在真实数据上运行的所有配置都针对收敛进行了验证。

ModelVariantPlatformSDK VersionFrameworkDatasetBatch SizePrecisionThroughput (items/sec)
BERT LargePh1 Pre-Training (SL128)IPU-POD16SDK 2.4.0PopARTWikipedia65,53616.163738
BERT LargePh1 Pre-Training (SL128)IPU-POD16SDK 2.4.0TensorFlow1Wikipedia65,60016.163704
BERT LargePh1 Pre-Training (SL128)IPU-POD16SDK 2.4.0PyTorchWikipedia65,53616.163582
BERT LargePh1 Pre-Training (SL128)IPU-POD64SDK 2.4.0PopARTWikipedia65,53616.1614189
BERT LargePh1 Pre-Training (SL128)IPU-POD64SDK 2.4.0TensorFlow1Wikipedia66,56016.1613917
BERT LargePh1 Pre-Training (SL128)IPU-POD64SDK 2.4.0PyTorchWikipedia65,53616.1612251
BERT LargePh1 Pre-Training (SL128)IPU-POD128SDK 2.4.0PopARTWikipedia65,53616.1624424
BERT LargePh1 Pre-Training (SL128)IPU-POD128SDK 2.4.0TensorFlow1Wikipedia66,56016.1624900
BERT LargePh1 Pre-Training (SL128)IPU-POD128SDK 2.4.0PyTorchWikipedia65,53616.1622402
BERT LargePh2 Pre-Training (SL384)IPU-POD16SDK 2.4.0PopARTWikipedia16,38416.161063
BERT LargePh2 Pre-Training (SL384)IPU-POD16SDK 2.4.0TensorFlow1Wikipedia16,40016.161025
BERT LargePh2 Pre-Training (SL384)IPU-POD16SDK 2.4.0PyTorchWikipedia16,38416.161012
BERT LargePh2 Pre-Training (SL384)IPU-POD64SDK 2.4.0PopARTWikipedia16,38416.164003
BERT LargePh2 Pre-Training (SL384)IPU-POD64SDK 2.4.0TensorFlow1Wikipedia16,64016.163938
BERT LargePh2 Pre-Training (SL384)IPU-POD64SDK 2.4.0PyTorchWikipedia16,38416.163611
BERT LargePh2 Pre-Training (SL384)IPU-POD128SDK 2.4.0PopARTWikipedia16,38416.167127
BERT LargePh2 Pre-Training (SL384)IPU-POD128SDK 2.4.0TensorFlow1Wikipedia16,64016.167292
BERT LargePh2 Pre-Training (SL384)IPU-POD128SDK 2.4.0PyTorchWikipedia16,38416.166500
BERT LargeFine-Tuning (SL384 - SQuAD)IPU-POD16SDK 2.4.0PopARTSQuAD25616.16884
BERT LargeFine-Tuning (SL384 - SQuAD)IPU-POD16SDK 2.4.0PyTorchSQuAD25616.16744
BERT BasePh1 Pre-Training (SL128)IPU-POD16SDK 2.4.0PopARTWikipedia65,53616.1611991
BERT BasePh1 Pre-Training (SL128)IPU-POD16SDK 2.4.0TensorFlow1Wikipedia65,28016.1611647
BERT BasePh1 Pre-Training (SL128)IPU-POD16SDK 2.4.0TensorFlow2Wikipedia65,28016.1611035
BERT BasePh1 Pre-Training (SL128)IPU-POD16SDK 2.4.0PyTorchWikipedia65,53616.1611184
BERT BasePh2 Pre-Training (SL384)IPU-POD16SDK 2.4.0PopARTWikipedia16,38416.163545
BERT BasePh2 Pre-Training (SL384)IPU-POD16SDK 2.4.0TensorFlow1Wikipedia16,32016.163288
BERT BasePh2 Pre-Training (SL384)IPU-POD16SDK 2.4.0TensorFlow2Wikipedia16,32016.163155
BERT BasePh2 Pre-Training (SL384)IPU-POD16SDK 2.4.0PyTorchWikipedia16,38416.163334
BERT Base - HuggingFaceFine-Tuning (SL384 - SQuAD)IPU-POD16SDK 2.4.0TensorFlow2SQuAD32016.16375
GPT2GPT2-mediumIPU-POD16SDK 2.3.0PyTorchWikipedia65,53616.162540
GPT2GPT2-mediumIPU-POD64SDK 2.3.0PyTorchWikipedia65,53616.169870
GPT2GPT2-mediumIPU-POD128SDK 2.3.0PyTorchWikipedia65,53616.1618842
GPT2GPT2-mediumIPU-POD256SDK 2.3.0PyTorchWikipedia65,53616.1631025
ResNet-50 v1.5IPU-M2000SDK 2.4.0TensorFlow1ImageNet20121,92016.167864
ResNet-50 v1.5IPU-M2000SDK 2.4.0PyTorchImageNet201216,38416.167303
ResNet-50 v1.5IPU-POD16SDK 2.4.0TensorFlow1ImageNet20121,92016.1630690
ResNet-50 v1.5IPU-POD16SDK 2.4.0PyTorchImageNet201216,38416.1625534
ResNet-50 v1.5IPU-POD64SDK 2.4.0TensorFlow1ImageNet20122,56016.16108566
ResNet-50 v1.5IPU-POD128SDK 2.4.0TensorFlow1ImageNet20125,12016.16205006
ResNet-50 v1.5IPU-POD256SDK 2.4.0TensorFlow1ImageNet201210,24016.16365040
ResNeXt101IPU-M2000SDK 2.4.0TensorFlow1ImageNet201276816.162514
ResNeXt101IPU-POD16SDK 2.4.0TensorFlow1ImageNet201276816.169023
EfficientNet-B4G16-EfficientNetIPU-M2000SDK 2.4.0TensorFlow1ImageNet201280016.161618
EfficientNet-B4G16-EfficientNetIPU-M2000SDK 2.4.0PyTorchImageNet20121,02416.321400
EfficientNet-B4G16-EfficientNetIPU-POD16SDK 2.4.0TensorFlow1ImageNet20126,14416.166379
EfficientNet-B4G16-EfficientNetIPU-POD16SDK 2.4.0PyTorchImageNet20121,02416.324311
EfficientNet-B4G16-EfficientNetIPU-POD64SDK 2.4.0TensorFlow1ImageNet20126,14416.1624946
EfficientNet-B4G16-EfficientNetIPU-POD128SDK 2.4.0TensorFlow1ImageNet20126,14416.1648015
EfficientNet-B4G16-EfficientNetIPU-POD256SDK 2.4.0TensorFlow1ImageNet20126,14416.1687968
ViTVision TransformerIPU-POD16SDK 2.3.0PyTorchImageNet1k65,53616.166535
ViTVision TransformerIPU-POD64SDK 2.3.0PyTorchImageNet1k65,53616.1625080
ViTVision TransformerIPU-POD128SDK 2.3.0PyTorchImageNet1k65,53616.1646320
ViTVision TransformerIPU-POD256SDK 2.3.0PyTorchImageNet1k65,53616.1668800
UNet (Medical)IPU-M2000SDK 2.4.0TensorFlow2EM segmentation2416.16139
Mini DALL-EIPU-M2000SDK 2.4.0PyTorchCOCO 20171,53616.16319
Mini DALL-EIPU-POD16SDK 2.4.0PyTorchCOCO 20176,14416.16815
DeepVoice3IPU-M2000SDK 2.4.0PopARTVCTK Corpus12832.328496
FastSpeech2IPU-M2000SDK 2.4.0TensorFlow2LJ Speech3216.16406
FastSpeech2IPU-POD16SDK 2.4.0TensorFlow2LJ Speech6416.161141
ConformerIPU-M2000SDK 2.4.0PyTorchAiShell19616.161030
ConformerIPU-POD16SDK 2.4.0PyTorchAiShell19616.163395
TGNTemporal Graph NetworkGC200 IPUSDK 2.4.0TensorFlow1JODIE Wikipedia20016.32190472

IPU-POD经典款(Time to Result)

ModelVariantPlatformSDK VersionFrameworkDatasetBatch SizePrecisionTime To Result (secs)
MCMC TFPIPU-M2000SDK 2.4.0TensorFlow1Proprietary32.3249

IPU-POD经典款- 推理

此语境中的模型推理是指在输入数据上运行模型以推断输出。生产设置中的推理性能通常通过两个指标来衡量:吞吐量(如前所述)和时延,后者被定义为执行推理所需的时间。

ModelVariantPlatformSDK VersionFrameworkDatasetBatch SizePrecisionThroughput (items/sec)Latency (ms)
BERT-LargeSL128IPU-M2000SDK 2.4.0PopARTSQuAD416.1620711.92
BERT-LargeSL128IPU-M2000SDK 2.4.0PopARTSQuAD816.1629112.73
BERT-LargeSL128IPU-M2000SDK 2.4.0PopARTSQuAD1216.1633033.62
BERT-BaseSL128IPU-M2000SDK 2.4.0PopARTSQuAD416.1645800.86
BERT-BaseSL128IPU-M2000SDK 2.4.0PopARTSQuAD816.1670691.11
BERT-BaseSL128IPU-M2000SDK 2.4.0PopARTSQuAD1616.1696871.65
BERT-BaseSL128IPU-M2000SDK 2.4.0PopARTSQuAD3216.16125842.53
BERT-BaseSL128IPU-M2000SDK 2.4.0PopARTSQuAD6416.16153464.16
BERT-BaseSL128IPU-M2000SDK 2.4.0PopARTSQuAD12816.16179727.11
BERT-BaseSL128IPU-M2000SDK 2.4.0PopARTSQuAD25616.161948413.11
BERT-BaseSL128IPU-M2000SDK 2.4.0PopARTSQuAD32016.162080315.36
ResNet-50v1.5IPU-M2000SDK 2.4.0TensorFlow1Synthetic (host-generated)416.1671521.66
ResNet-50v1.5IPU-M2000SDK 2.4.0TensorFlow1Synthetic (host-generated)816.16105152.27
ResNet-50v1.5IPU-M2000SDK 2.4.0TensorFlow1Synthetic (host-generated)1616.16162072.95
ResNet-50v1.5IPU-M2000SDK 2.4.0TensorFlow1Synthetic (host-generated)3216.16225444.24
ResNet-50v1.5IPU-M2000SDK 2.4.0TensorFlow1Synthetic (host-generated)6416.16287626.66
ResNet-50v1.5IPU-M2000SDK 2.4.0TensorFlow1Synthetic (host-generated)12816.163515510.91
ResNet-50v1.5IPU-M2000SDK 2.4.0TensorFlow1Synthetic (host-generated)25616.164008519.14
ResNet-50v1.5lowest latency configIPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)416.1673970.52
ResNet-50v1.5higher throughput configIPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)416.1694042.04
ResNet-50v1.5IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)1616.16143212.69
ResNet-50v1.5IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)3216.16209273.7
ResNet-50v1.5IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)6416.16361938.62
ResNet-50v1.5IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)12816.164347214.38
ResNet-50v1.5IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)25616.164981625.13
ResNet-50v1.5IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)36016.165088330.68
ResNeXt101IPU-M2000SDK 2.4.0TensorFlow1Synthetic (host-generated)416.1644832.66
ResNeXt101IPU-M2000SDK 2.4.0TensorFlow1Synthetic (host-generated)816.1664353.71
ResNeXt101IPU-M2000SDK 2.4.0TensorFlow1Synthetic (host-generated)1616.1697054.93
ResNeXt101IPU-M2000SDK 2.4.0TensorFlow1Synthetic (host-generated)3216.16136936.99
ResNeXt101IPU-M2000SDK 2.4.0TensorFlow1Synthetic (host-generated)6416.161717611.16
ResNeXt101IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)416.1633951.14
ResNeXt101IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)816.1648401.62
ResNeXt101IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)1616.1664832.43
ResNeXt101IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)6416.161132027.83
EfficientNet-B0lowest latency configIPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)416.1686860.44
EfficientNet-B0higher throughput configIPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)416.16109071.69
EfficientNet-B0IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)3216.16505103.05
EfficientNet-B0IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)6416.16718394.26
EfficientNet-B0IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)12816.16869866.77
EfficientNet-B0IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)14416.16698529.15
EfficientNet-B0IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)19616.166171413.38
EfficientNet-B0IPU-M2000SDK 2.4.0TensorFlow1Synthetic (host-generated)416.1682891.43
EfficientNet-B0IPU-M2000SDK 2.4.0TensorFlow1Synthetic (host-generated)816.16130561.82
EfficientNet-B0IPU-M2000SDK 2.4.0TensorFlow1Synthetic (host-generated)1616.16222172.15
EfficientNet-B0IPU-M2000SDK 2.4.0TensorFlow1Synthetic (host-generated)3216.16344482.77
EfficientNet-B0IPU-M2000SDK 2.4.0TensorFlow1Synthetic (host-generated)6416.16433514.41
EfficientNet-B0IPU-M2000SDK 2.4.0TensorFlow1Synthetic (host-generated)12816.16532567.19
EfficientNet-B0IPU-M2000SDK 2.4.0TensorFlow1Synthetic (host-generated)16016.16551698.68
EfficientNet-B4lowest latency configIPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)416.1635391.09
EfficientNet-B4higher throughput configIPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)416.1640811.85
EfficientNet-B4IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)1616.1682993.5
EfficientNet-B4IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)2416.1698744.37
EfficientNet-B4IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)3216.16107535.3
EfficientNet-B4IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)4016.16115786.22
EfficientNet-B4IPU-M2000SDK 2.4.0TensorFlow1Synthetic (host-generated)416.1637183.21
EfficientNet-B4IPU-M2000SDK 2.4.0TensorFlow1Synthetic (host-generated)816.1655144.34
EfficientNet-B4IPU-M2000SDK 2.4.0TensorFlow1Synthetic (host-generated)1616.1679596.01
EfficientNet-B4IPU-M2000SDK 2.4.0TensorFlow1Synthetic (host-generated)2016.1689586.68
EfficientNet-B7IPU-M2000SDK 2.4.0TensorFlow1Synthetic (host-generated)416.1614078.52
EfficientNet-B7IPU-M2000SDK 2.4.0TensorFlow1Synthetic (host-generated)816.16186912.82
Yolo v4image 896, bps 5, max det 200IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)416.166909.4
Yolo v4image 896, bps 10, max det 300IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)416.167229.74
Yolo v4image 640, bps 5, max det 200IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)816.16130610.03
Yolo v4image 640, bps 10, max det 300IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)816.16136410.39
Yolo v4image 512, bps 5, max det 200IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)816.1617727.25
Yolo v4image 512, bps 10, max det 300IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)816.1619157.31
Yolo v4image 416, bps 5, max det 200IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)416.1621955.88
Yolo v4image 416, bps 10, max det 100IPU-M2000SDK 2.4.0PyTorchSynthetic (host-generated)416.1629949.42
Unet (Medical)IPU-M2000SDK 2.4.0TensorFlow2Synthetic (host-generated)416.161144
Unet (Medical)IPU-M2000SDK 2.4.0TensorFlow2Synthetic (host-generated)816.161190

精度术语:X.Y定义如下:X是存储激活和梯度的精度,Y是存储权重的精度。在16.16权重中训练时,我们可能仍将FP32用于其他变量(例如规范或动量),并包括随机舍入。

基准测试是使用我们在 Graphcore GitHub 上的示例生成的。

本页最近更新日期为2022年3月3

获取最新的GRAPHCORE资讯

在下方注册以获取最新的资讯和更新: