-
Notifications
You must be signed in to change notification settings - Fork 65
Open
Copy link
Description
When exporting the profiler to the chrome trace with:
with torch.profiler.profile(record_shapes=True) as profile:
model()
profile.export_chrome_trace(trace1)This is the header of XPU's json:
{
"schemaVersion": 1,
"deviceProperties": [
],
"record_shapes": 1,
"trace_id": "DB6AD80A3A384D9CAB8E9D7D8D802327",
"displayTimeUnit": "ms",
"baseTimeNanoseconds": 1759300074000000000,
"traceEvents": [
{
"ph": "X", "cat": "cpu_op", "name": "TorchDynamo Cache Lookup", "pid": 2054252, "tid": 2054252,
"ts": 2493828094901.425, "dur": 2.415,
"args": {
"External id": 1,"Record function id": 0, "Ev Idx": 0
}
},This is the header of CUDA:
{
"schemaVersion": 1,
"deviceProperties": [
{
"id": 0, "name": "NVIDIA A100-PCIE-40GB", "totalGlobalMem": 42406903808,
"computeMajor": 8, "computeMinor": 0,
"maxThreadsPerBlock": 1024, "maxThreadsPerMultiprocessor": 2048,
"regsPerBlock": 65536, "warpSize": 32,
"sharedMemPerBlock": 49152, "numSms": 108
, "regsPerMultiprocessor": 65536, "sharedMemPerBlockOptin": 166912, "sharedMemPerMultiprocessor": 167936
}
],
"cupti_version": 26,
"cuda_runtime_version": 12080,
"cuda_driver_version": 12080,
"record_shapes": 1,
"trace_id": "9BC4A8D7D1994CA79C61AE6680BB7503",
"displayTimeUnit": "ms",
"baseTimeNanoseconds": 1759300074000000000,
"traceEvents": [
{
"ph": "X", "cat": "cpu_op", "name": "TorchDynamo Cache Lookup", "pid": 582593, "tid": 582593,
"ts": 2578376798799.799, "dur": 1.190,
"args": {
"External id": 1,"Record function id": 0, "Ev Idx": 0
}
},As you could see above, CUDA has more information than XPU.
TODO
The following tests need to be enabled when the issue is fixed.
- There are usage in
test_analysis.py. We skipped the tests in https://github.com/pytorch/pytorch/pull/166840/files#diff-4d6f32e73f44ef5821e4b222e2e08cb4f3e526c589ea88ba1206c3b16a240f91R650 for now