------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 0.18% 1.556ms 99.56% 873.817ms 873.817ms 0.000us 0.00% 18.070ms 18.070ms 1 aten::linear 0.01% 62.000us 97.61% 856.702ms 57.113ms 0.000us 0.00% 14.598ms 973.200us 15 aten::addmm 0.79% 6.974ms 97.59% 856.501ms 57.100ms 14.598ms 80.79% 14.598ms 973.200us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 10.120ms 56.00% 10.120ms 1.124ms 9 aten::relu 0.01% 110.000us 0.52% 4.544ms 454.400us 0.000us 0.00% 2.536ms 253.600us 10 aten::clamp_min 0.02% 199.000us 0.51% 4.434ms 443.400us 2.536ms 14.03% 2.536ms 253.600us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.536ms 14.03% 2.536ms 253.600us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 2.155ms 11.93% 2.155ms 143.667us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 2.041ms 11.29% 2.041ms 408.200us 5 aten::cat 0.01% 110.000us 0.04% 361.000us 72.200us 410.000us 2.27% 410.000us 82.000us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 877.640ms Self CUDA time total: 18.070ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 7.16% 898.000us 82.90% 10.391ms 10.391ms 0.000us 0.00% 10.183ms 10.183ms 1 aten::linear 0.35% 44.000us 7.52% 942.000us 62.800us 0.000us 0.00% 8.230ms 548.667us 15 aten::addmm 4.50% 564.000us 6.29% 789.000us 52.600us 8.230ms 80.82% 8.230ms 548.667us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 5.749ms 56.46% 5.749ms 574.900us 10 aten::relu 0.62% 78.000us 31.28% 3.921ms 392.100us 0.000us 0.00% 1.423ms 142.300us 10 aten::clamp_min 1.19% 149.000us 30.66% 3.843ms 384.300us 1.423ms 13.97% 1.423ms 142.300us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.423ms 13.97% 1.423ms 142.300us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.229ms 12.07% 1.229ms 81.933us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.096ms 10.76% 1.096ms 274.000us 4 aten::cat 0.61% 77.000us 0.88% 110.000us 22.000us 208.000us 2.04% 208.000us 41.600us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 12.534ms Self CUDA time total: 10.183ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 3.75% 821.000us 82.69% 18.120ms 18.120ms 0.000us 0.00% 19.541ms 19.541ms 1 aten::linear 0.21% 45.000us 7.57% 1.659ms 110.600us 0.000us 0.00% 15.765ms 1.051ms 15 aten::addmm 2.68% 587.000us 6.86% 1.503ms 100.200us 15.765ms 80.68% 15.765ms 1.051ms 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 10.927ms 55.92% 10.927ms 1.214ms 9 aten::relu 0.38% 84.000us 36.22% 7.937ms 793.700us 0.000us 0.00% 2.750ms 275.000us 10 aten::clamp_min 0.69% 152.000us 35.84% 7.853ms 785.300us 2.750ms 14.07% 2.750ms 275.000us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.750ms 14.07% 2.750ms 275.000us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 2.323ms 11.89% 2.323ms 154.867us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 2.201ms 11.26% 2.201ms 440.200us 5 aten::cat 0.35% 77.000us 0.50% 109.000us 21.800us 464.000us 2.37% 464.000us 92.800us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 21.914ms Self CUDA time total: 19.541ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 7.60% 823.000us 75.49% 8.177ms 8.177ms 0.000us 0.00% 9.740ms 9.740ms 1 aten::linear 0.41% 44.000us 8.17% 885.000us 59.000us 0.000us 0.00% 7.885ms 525.667us 15 aten::addmm 4.86% 526.000us 6.87% 744.000us 49.600us 7.885ms 80.95% 7.885ms 525.667us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 5.581ms 57.30% 5.581ms 507.364us 11 aten::relu 0.63% 68.000us 2.11% 229.000us 22.900us 0.000us 0.00% 1.360ms 136.000us 10 aten::clamp_min 0.93% 101.000us 1.49% 161.000us 16.100us 1.360ms 13.96% 1.360ms 136.000us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.360ms 13.96% 1.360ms 136.000us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.180ms 12.11% 1.180ms 78.667us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 971.000us 9.97% 971.000us 323.667us 3 aten::cat 0.68% 74.000us 1.00% 108.000us 21.600us 194.000us 1.99% 194.000us 38.800us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 10.832ms Self CUDA time total: 9.740ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 3.19% 915.000us 78.59% 22.549ms 22.549ms 0.000us 0.00% 25.206ms 25.206ms 1 aten::linear 0.18% 53.000us 12.84% 3.683ms 245.533us 0.000us 0.00% 20.290ms 1.353ms 15 aten::addmm 1.99% 570.000us 12.27% 3.520ms 234.667us 20.290ms 80.50% 20.290ms 1.353ms 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 14.081ms 55.86% 14.081ms 1.565ms 9 aten::relu 0.28% 79.000us 13.87% 3.980ms 398.000us 0.000us 0.00% 3.574ms 357.400us 10 aten::clamp_min 0.54% 154.000us 13.60% 3.901ms 390.100us 3.574ms 14.18% 3.574ms 357.400us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 3.574ms 14.18% 3.574ms 357.400us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 2.987ms 11.85% 2.987ms 199.133us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 2.841ms 11.27% 2.841ms 568.200us 5 aten::cat 0.27% 78.000us 0.39% 112.000us 22.400us 628.000us 2.49% 628.000us 125.600us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 28.691ms Self CUDA time total: 25.206ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 8.64% 848.000us 95.79% 9.399ms 9.399ms 0.000us 0.00% 8.671ms 8.671ms 1 aten::linear 0.43% 42.000us 9.19% 902.000us 60.133us 0.000us 0.00% 7.019ms 467.933us 15 aten::addmm 5.37% 527.000us 7.62% 748.000us 49.867us 7.019ms 80.95% 7.019ms 467.933us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 4.912ms 56.65% 4.912ms 491.200us 10 aten::relu 0.71% 70.000us 2.42% 237.000us 23.700us 0.000us 0.00% 1.207ms 120.700us 10 aten::clamp_min 1.09% 107.000us 1.70% 167.000us 16.700us 1.207ms 13.92% 1.207ms 120.700us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.207ms 13.92% 1.207ms 120.700us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.053ms 12.14% 1.053ms 70.200us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 921.000us 10.62% 921.000us 230.250us 4 aten::cat 0.77% 76.000us 1.11% 109.000us 21.800us 168.000us 1.94% 168.000us 33.600us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 9.812ms Self CUDA time total: 8.671ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 4.55% 853.000us 71.92% 13.474ms 13.474ms 0.000us 0.00% 17.657ms 17.657ms 1 aten::linear 0.26% 49.000us 4.79% 897.000us 59.800us 0.000us 0.00% 14.271ms 951.400us 15 aten::addmm 2.86% 535.000us 4.00% 750.000us 50.000us 14.271ms 80.82% 14.271ms 951.400us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 9.904ms 56.09% 9.904ms 1.100ms 9 aten::relu 0.37% 69.000us 1.26% 237.000us 23.700us 0.000us 0.00% 2.474ms 247.400us 10 aten::clamp_min 0.57% 107.000us 0.90% 168.000us 16.800us 2.474ms 14.01% 2.474ms 247.400us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.474ms 14.01% 2.474ms 247.400us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 2.108ms 11.94% 2.108ms 140.533us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.996ms 11.30% 1.996ms 399.200us 5 aten::cat 0.44% 82.000us 0.62% 117.000us 23.400us 403.000us 2.28% 403.000us 80.600us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 18.736ms Self CUDA time total: 17.657ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 6.02% 850.000us 73.31% 10.355ms 10.355ms 0.000us 0.00% 13.041ms 13.041ms 1 aten::linear 0.30% 42.000us 6.30% 890.000us 59.333us 0.000us 0.00% 10.553ms 703.533us 15 aten::addmm 3.75% 530.000us 5.30% 749.000us 49.933us 10.553ms 80.92% 10.553ms 703.533us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 7.393ms 56.69% 7.393ms 739.300us 10 aten::relu 0.47% 66.000us 1.64% 232.000us 23.200us 0.000us 0.00% 1.829ms 182.900us 10 aten::clamp_min 0.74% 104.000us 1.18% 166.000us 16.600us 1.829ms 14.02% 1.829ms 182.900us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.829ms 14.02% 1.829ms 182.900us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.567ms 12.02% 1.567ms 104.467us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.396ms 10.70% 1.396ms 349.000us 4 aten::cat 0.55% 78.000us 0.79% 112.000us 22.400us 278.000us 2.13% 278.000us 55.600us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 14.125ms Self CUDA time total: 13.041ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 8.84% 823.000us 76.87% 7.159ms 7.159ms 0.000us 0.00% 8.180ms 8.180ms 1 aten::linear 0.46% 43.000us 9.62% 896.000us 59.733us 0.000us 0.00% 6.630ms 442.000us 15 aten::addmm 5.77% 537.000us 8.07% 752.000us 50.133us 6.630ms 81.05% 6.630ms 442.000us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 4.642ms 56.75% 4.642ms 464.200us 10 aten::relu 0.78% 73.000us 2.57% 239.000us 23.900us 0.000us 0.00% 1.132ms 113.200us 10 aten::clamp_min 1.12% 104.000us 1.78% 166.000us 16.600us 1.132ms 13.84% 1.132ms 113.200us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.132ms 13.84% 1.132ms 113.200us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 998.000us 12.20% 998.000us 66.533us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 867.000us 10.60% 867.000us 216.750us 4 aten::cat 0.81% 75.000us 1.16% 108.000us 21.600us 158.000us 1.93% 158.000us 31.600us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 9.313ms Self CUDA time total: 8.180ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 3.88% 865.000us 71.01% 15.814ms 15.814ms 0.000us 0.00% 21.208ms 21.208ms 1 aten::linear 0.21% 46.000us 4.12% 918.000us 61.200us 0.000us 0.00% 17.106ms 1.140ms 15 aten::addmm 2.46% 547.000us 3.46% 771.000us 51.400us 17.106ms 80.66% 17.106ms 1.140ms 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 11.861ms 55.93% 11.861ms 1.318ms 9 aten::relu 0.30% 67.000us 1.06% 237.000us 23.700us 0.000us 0.00% 3.001ms 300.100us 10 aten::clamp_min 0.48% 107.000us 0.76% 170.000us 17.000us 3.001ms 14.15% 3.001ms 300.100us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 3.001ms 14.15% 3.001ms 300.100us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 2.520ms 11.88% 2.520ms 168.000us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 2.392ms 11.28% 2.392ms 478.400us 5 aten::cat 0.33% 73.000us 0.49% 109.000us 21.800us 499.000us 2.35% 499.000us 99.800us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 22.271ms Self CUDA time total: 21.208ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 2.55% 906.000us 75.16% 26.691ms 26.691ms 0.000us 0.00% 33.190ms 33.190ms 1 aten::linear 0.14% 51.000us 7.00% 2.485ms 165.667us 0.000us 0.00% 26.675ms 1.778ms 15 aten::addmm 1.65% 585.000us 6.52% 2.314ms 154.267us 26.675ms 80.37% 26.675ms 1.778ms 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 18.516ms 55.79% 18.516ms 2.057ms 9 aten::relu 0.24% 86.000us 12.95% 4.600ms 460.000us 0.000us 0.00% 4.714ms 471.400us 10 aten::clamp_min 0.46% 162.000us 12.71% 4.514ms 451.400us 4.714ms 14.20% 4.714ms 471.400us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 4.714ms 14.20% 4.714ms 471.400us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 3.918ms 11.80% 3.918ms 261.200us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 3.729ms 11.24% 3.729ms 745.800us 5 aten::cat 0.23% 80.000us 0.32% 115.000us 23.000us 885.000us 2.67% 885.000us 177.000us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 35.513ms Self CUDA time total: 33.190ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 5.59% 837.000us 73.00% 10.929ms 10.929ms 0.000us 0.00% 13.903ms 13.903ms 1 aten::linear 0.28% 42.000us 5.98% 895.000us 59.667us 0.000us 0.00% 11.248ms 749.867us 15 aten::addmm 3.61% 541.000us 5.02% 751.000us 50.067us 11.248ms 80.90% 11.248ms 749.867us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 7.866ms 56.58% 7.866ms 786.600us 10 aten::relu 0.45% 67.000us 1.59% 238.000us 23.800us 0.000us 0.00% 1.946ms 194.600us 10 aten::clamp_min 0.75% 112.000us 1.14% 171.000us 17.100us 1.946ms 14.00% 1.946ms 194.600us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.946ms 14.00% 1.946ms 194.600us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.665ms 11.98% 1.665ms 111.000us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.491ms 10.72% 1.491ms 372.750us 4 aten::cat 0.51% 77.000us 0.74% 111.000us 22.200us 300.000us 2.16% 300.000us 60.000us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 14.971ms Self CUDA time total: 13.903ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 5.61% 860.000us 73.17% 11.209ms 11.209ms 0.000us 0.00% 14.234ms 14.234ms 1 aten::linear 0.25% 39.000us 5.91% 905.000us 60.333us 0.000us 0.00% 11.513ms 767.533us 15 aten::addmm 3.54% 542.000us 4.96% 760.000us 50.667us 11.513ms 80.88% 11.513ms 767.533us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 8.113ms 57.00% 8.113ms 737.545us 11 aten::relu 0.46% 71.000us 1.54% 236.000us 23.600us 0.000us 0.00% 1.993ms 199.300us 10 aten::clamp_min 0.68% 104.000us 1.08% 165.000us 16.500us 1.993ms 14.00% 1.993ms 199.300us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.993ms 14.00% 1.993ms 199.300us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.706ms 11.99% 1.706ms 113.733us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.459ms 10.25% 1.459ms 486.333us 3 aten::cat 0.52% 79.000us 0.73% 112.000us 22.400us 312.000us 2.19% 312.000us 62.400us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 15.320ms Self CUDA time total: 14.234ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 4.42% 803.000us 75.75% 13.767ms 13.767ms 0.000us 0.00% 14.923ms 14.923ms 1 aten::linear 0.26% 47.000us 5.08% 923.000us 61.533us 0.000us 0.00% 12.070ms 804.667us 15 aten::addmm 3.00% 546.000us 4.26% 774.000us 51.600us 12.070ms 80.88% 12.070ms 804.667us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 8.356ms 55.99% 8.356ms 928.444us 9 aten::relu 0.39% 70.000us 1.31% 238.000us 23.800us 0.000us 0.00% 2.094ms 209.400us 10 aten::clamp_min 0.58% 105.000us 0.92% 168.000us 16.800us 2.094ms 14.03% 2.094ms 209.400us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.094ms 14.03% 2.094ms 209.400us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.787ms 11.97% 1.787ms 119.133us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.687ms 11.30% 1.687ms 337.400us 5 aten::cat 0.43% 78.000us 0.62% 112.000us 22.400us 329.000us 2.20% 329.000us 65.800us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 18.174ms Self CUDA time total: 14.923ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 2.63% 852.000us 71.74% 23.201ms 23.201ms 0.000us 0.00% 31.305ms 31.305ms 1 aten::linear 0.13% 41.000us 2.72% 879.000us 58.600us 0.000us 0.00% 24.994ms 1.666ms 15 aten::addmm 1.61% 520.000us 2.29% 740.000us 49.333us 24.994ms 79.84% 24.994ms 1.666ms 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 17.130ms 54.72% 17.130ms 1.903ms 9 aten::relu 0.21% 68.000us 0.73% 236.000us 23.600us 0.000us 0.00% 4.584ms 458.400us 10 aten::clamp_min 0.32% 105.000us 0.52% 168.000us 16.800us 4.584ms 14.64% 4.584ms 458.400us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 4.584ms 14.64% 4.584ms 458.400us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 3.790ms 12.11% 3.790ms 252.667us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 3.619ms 11.56% 3.619ms 723.800us 5 aten::cat 0.24% 78.000us 0.35% 112.000us 22.400us 845.000us 2.70% 845.000us 169.000us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 32.340ms Self CUDA time total: 31.305ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 9.58% 841.000us 82.70% 7.262ms 7.262ms 0.000us 0.00% 7.643ms 7.643ms 1 aten::linear 0.51% 45.000us 10.19% 895.000us 59.667us 0.000us 0.00% 6.015ms 401.000us 15 aten::addmm 6.09% 535.000us 8.48% 745.000us 49.667us 6.015ms 78.70% 6.015ms 401.000us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 4.173ms 54.60% 4.173ms 417.300us 10 aten::relu 0.81% 71.000us 2.66% 234.000us 23.400us 0.000us 0.00% 1.241ms 124.100us 10 aten::clamp_min 1.21% 106.000us 1.86% 163.000us 16.300us 1.241ms 16.24% 1.241ms 124.100us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.241ms 16.24% 1.241ms 124.100us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 874.000us 11.44% 874.000us 58.267us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 834.000us 10.91% 834.000us 208.500us 4 aten::cat 0.88% 77.000us 1.26% 111.000us 22.200us 151.000us 1.98% 151.000us 30.200us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 8.781ms Self CUDA time total: 7.643ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 7.02% 843.000us 74.44% 8.942ms 8.942ms 0.000us 0.00% 10.962ms 10.962ms 1 aten::linear 0.35% 42.000us 7.43% 893.000us 59.533us 0.000us 0.00% 8.596ms 573.067us 15 aten::addmm 4.44% 533.000us 6.25% 751.000us 50.067us 8.596ms 78.42% 8.596ms 573.067us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 5.974ms 54.50% 5.974ms 597.400us 10 aten::relu 0.60% 72.000us 2.03% 244.000us 24.400us 0.000us 0.00% 1.802ms 180.200us 10 aten::clamp_min 0.92% 110.000us 1.43% 172.000us 17.200us 1.802ms 16.44% 1.802ms 180.200us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.802ms 16.44% 1.802ms 180.200us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.245ms 11.36% 1.245ms 83.000us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.196ms 10.91% 1.196ms 299.000us 4 aten::cat 0.66% 79.000us 0.94% 113.000us 22.600us 243.000us 2.22% 243.000us 48.600us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 12.012ms Self CUDA time total: 10.962ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 10.14% 862.000us 85.09% 7.237ms 7.237ms 0.000us 0.00% 7.333ms 7.333ms 1 aten::linear 0.55% 47.000us 10.72% 912.000us 60.800us 0.000us 0.00% 5.765ms 384.333us 15 aten::addmm 6.43% 547.000us 9.02% 767.000us 51.133us 5.765ms 78.62% 5.765ms 384.333us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 3.940ms 53.73% 3.940ms 437.778us 9 aten::relu 0.86% 73.000us 2.79% 237.000us 23.700us 0.000us 0.00% 1.190ms 119.000us 10 aten::clamp_min 1.21% 103.000us 1.93% 164.000us 16.400us 1.190ms 16.23% 1.190ms 119.000us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.190ms 16.23% 1.190ms 119.000us 10 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 851.000us 11.61% 851.000us 170.200us 5 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 843.000us 11.50% 843.000us 56.200us 15 aten::cat 0.87% 74.000us 1.26% 107.000us 21.400us 147.000us 2.00% 147.000us 29.400us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 8.505ms Self CUDA time total: 7.333ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 6.51% 863.000us 73.94% 9.807ms 9.807ms 0.000us 0.00% 12.190ms 12.190ms 1 aten::linear 0.28% 37.000us 6.78% 899.000us 59.933us 0.000us 0.00% 9.543ms 636.200us 15 aten::addmm 4.03% 534.000us 5.69% 755.000us 50.333us 9.543ms 78.29% 9.543ms 636.200us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 6.617ms 54.28% 6.617ms 661.700us 10 aten::relu 0.56% 74.000us 1.82% 241.000us 24.100us 0.000us 0.00% 2.012ms 201.200us 10 aten::clamp_min 0.81% 107.000us 1.26% 167.000us 16.700us 2.012ms 16.51% 2.012ms 201.200us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.012ms 16.51% 2.012ms 201.200us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.381ms 11.33% 1.381ms 92.067us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.342ms 11.01% 1.342ms 335.500us 4 aten::cat 0.58% 77.000us 0.83% 110.000us 22.000us 282.000us 2.31% 282.000us 56.400us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 13.263ms Self CUDA time total: 12.190ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 7.72% 838.000us 75.78% 8.224ms 8.224ms 0.000us 0.00% 9.768ms 9.768ms 1 aten::linear 0.41% 44.000us 8.37% 908.000us 60.533us 0.000us 0.00% 7.657ms 510.467us 15 aten::addmm 4.96% 538.000us 7.02% 762.000us 50.800us 7.657ms 78.39% 7.657ms 510.467us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 5.240ms 53.64% 5.240ms 582.222us 9 aten::relu 0.66% 72.000us 2.17% 236.000us 23.600us 0.000us 0.00% 1.612ms 161.200us 10 aten::clamp_min 0.95% 103.000us 1.51% 164.000us 16.400us 1.612ms 16.50% 1.612ms 161.200us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.612ms 16.50% 1.612ms 161.200us 10 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.140ms 11.67% 1.140ms 228.000us 5 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.117ms 11.44% 1.117ms 74.467us 15 aten::cat 0.69% 75.000us 1.00% 108.000us 21.600us 213.000us 2.18% 213.000us 42.600us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 10.853ms Self CUDA time total: 9.768ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 6.05% 805.000us 75.28% 10.020ms 10.020ms 0.000us 0.00% 11.762ms 11.762ms 1 aten::linear 0.34% 45.000us 6.75% 899.000us 59.933us 0.000us 0.00% 9.207ms 613.800us 15 aten::addmm 4.02% 535.000us 5.65% 752.000us 50.133us 9.207ms 78.28% 9.207ms 613.800us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 6.437ms 54.73% 6.437ms 585.182us 11 aten::relu 0.52% 69.000us 1.82% 242.000us 24.200us 0.000us 0.00% 1.938ms 193.800us 10 aten::clamp_min 0.84% 112.000us 1.30% 173.000us 17.300us 1.938ms 16.48% 1.938ms 193.800us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.938ms 16.48% 1.938ms 193.800us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.334ms 11.34% 1.334ms 88.933us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.227ms 10.43% 1.227ms 409.000us 3 aten::cat 0.57% 76.000us 0.82% 109.000us 21.800us 276.000us 2.35% 276.000us 55.200us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 13.311ms Self CUDA time total: 11.762ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 4.94% 884.000us 72.12% 12.901ms 12.901ms 0.000us 0.00% 16.771ms 16.771ms 1 aten::linear 0.26% 46.000us 5.11% 914.000us 60.933us 0.000us 0.00% 13.562ms 904.133us 15 aten::addmm 3.01% 538.000us 4.24% 759.000us 50.600us 13.562ms 80.87% 13.562ms 904.133us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 9.484ms 56.55% 9.484ms 948.400us 10 aten::relu 0.37% 67.000us 1.36% 243.000us 24.300us 0.000us 0.00% 2.351ms 235.100us 10 aten::clamp_min 0.63% 113.000us 0.98% 176.000us 17.600us 2.351ms 14.02% 2.351ms 235.100us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.351ms 14.02% 2.351ms 235.100us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 2.002ms 11.94% 2.002ms 133.467us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.805ms 10.76% 1.805ms 451.250us 4 aten::cat 0.44% 78.000us 0.63% 112.000us 22.400us 377.000us 2.25% 377.000us 75.400us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 17.889ms Self CUDA time total: 16.771ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 3.30% 857.000us 70.46% 18.308ms 18.308ms 0.000us 0.00% 24.899ms 24.899ms 1 aten::linear 0.16% 41.000us 3.43% 891.000us 59.400us 0.000us 0.00% 20.072ms 1.338ms 15 aten::addmm 2.02% 524.000us 2.85% 740.000us 49.333us 20.072ms 80.61% 20.072ms 1.338ms 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 13.944ms 56.00% 13.944ms 1.549ms 9 aten::relu 0.28% 74.000us 0.91% 237.000us 23.700us 0.000us 0.00% 3.521ms 352.100us 10 aten::clamp_min 0.39% 101.000us 0.63% 163.000us 16.300us 3.521ms 14.14% 3.521ms 352.100us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 3.521ms 14.14% 3.521ms 352.100us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 2.949ms 11.84% 2.949ms 196.600us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 2.804ms 11.26% 2.804ms 560.800us 5 aten::cat 0.30% 79.000us 0.43% 112.000us 22.400us 607.000us 2.44% 607.000us 121.400us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 25.982ms Self CUDA time total: 24.899ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 5.17% 863.000us 73.55% 12.289ms 12.289ms 0.000us 0.00% 15.108ms 15.108ms 1 aten::linear 0.23% 39.000us 5.48% 915.000us 61.000us 0.000us 0.00% 12.225ms 815.000us 15 aten::addmm 3.33% 557.000us 4.62% 772.000us 51.467us 12.225ms 80.92% 12.225ms 815.000us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 8.462ms 56.01% 8.462ms 940.222us 9 aten::relu 0.43% 72.000us 1.42% 237.000us 23.700us 0.000us 0.00% 2.115ms 211.500us 10 aten::clamp_min 0.63% 105.000us 0.99% 165.000us 16.500us 2.115ms 14.00% 2.115ms 211.500us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.115ms 14.00% 2.115ms 211.500us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.808ms 11.97% 1.808ms 120.533us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.708ms 11.31% 1.708ms 341.600us 5 aten::cat 0.51% 85.000us 0.71% 119.000us 23.800us 331.000us 2.19% 331.000us 66.200us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 16.708ms Self CUDA time total: 15.108ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 5.73% 859.000us 73.25% 10.979ms 10.979ms 0.000us 0.00% 13.903ms 13.903ms 1 aten::linear 0.28% 42.000us 6.18% 926.000us 61.733us 0.000us 0.00% 11.246ms 749.733us 15 aten::addmm 3.74% 560.000us 5.18% 776.000us 51.733us 11.246ms 80.89% 11.246ms 749.733us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 7.875ms 56.64% 7.875ms 787.500us 10 aten::relu 0.49% 73.000us 1.58% 237.000us 23.700us 0.000us 0.00% 1.954ms 195.400us 10 aten::clamp_min 0.69% 104.000us 1.09% 164.000us 16.400us 1.954ms 14.05% 1.954ms 195.400us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.954ms 14.05% 1.954ms 195.400us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.666ms 11.98% 1.666ms 111.067us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.494ms 10.75% 1.494ms 373.500us 4 aten::cat 0.51% 76.000us 0.73% 110.000us 22.000us 300.000us 2.16% 300.000us 60.000us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 14.988ms Self CUDA time total: 13.903ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 6.70% 858.000us 74.36% 9.518ms 9.518ms 0.000us 0.00% 11.728ms 11.728ms 1 aten::linear 0.34% 43.000us 6.99% 895.000us 59.667us 0.000us 0.00% 9.484ms 632.267us 15 aten::addmm 4.15% 531.000us 5.85% 749.000us 49.933us 9.484ms 80.87% 9.484ms 632.267us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 6.626ms 56.50% 6.626ms 662.600us 10 aten::relu 0.55% 70.000us 1.89% 242.000us 24.200us 0.000us 0.00% 1.641ms 164.100us 10 aten::clamp_min 0.86% 110.000us 1.34% 172.000us 17.200us 1.641ms 13.99% 1.641ms 164.100us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.641ms 13.99% 1.641ms 164.100us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.410ms 12.02% 1.410ms 94.000us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.264ms 10.78% 1.264ms 316.000us 4 aten::cat 0.59% 76.000us 0.85% 109.000us 21.800us 245.000us 2.09% 245.000us 49.000us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 12.800ms Self CUDA time total: 11.728ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 4.33% 867.000us 71.69% 14.357ms 14.357ms 0.000us 0.00% 18.941ms 18.941ms 1 aten::linear 0.21% 42.000us 4.61% 924.000us 61.600us 0.000us 0.00% 15.003ms 1.000ms 15 aten::addmm 2.78% 556.000us 3.85% 772.000us 51.467us 15.003ms 79.21% 15.003ms 1.000ms 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 10.360ms 54.70% 10.360ms 1.151ms 9 aten::relu 0.36% 72.000us 1.19% 238.000us 23.800us 0.000us 0.00% 2.935ms 293.500us 10 aten::clamp_min 0.52% 105.000us 0.83% 166.000us 16.600us 2.935ms 15.50% 2.935ms 293.500us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.935ms 15.50% 2.935ms 293.500us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 2.179ms 11.50% 2.179ms 145.267us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 2.160ms 11.40% 2.160ms 432.000us 5 aten::cat 0.37% 75.000us 0.54% 108.000us 21.600us 469.000us 2.48% 469.000us 93.800us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 20.026ms Self CUDA time total: 18.941ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 6.68% 841.000us 75.82% 9.552ms 9.552ms 0.000us 0.00% 10.983ms 10.983ms 1 aten::linear 0.34% 43.000us 7.30% 920.000us 61.333us 0.000us 0.00% 8.729ms 581.933us 15 aten::addmm 4.41% 555.000us 6.17% 777.000us 51.800us 8.729ms 79.48% 8.729ms 581.933us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 6.019ms 54.80% 6.019ms 668.778us 9 aten::relu 0.56% 71.000us 1.93% 243.000us 24.300us 0.000us 0.00% 1.694ms 169.400us 10 aten::clamp_min 0.83% 105.000us 1.37% 172.000us 17.200us 1.694ms 15.42% 1.694ms 169.400us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.694ms 15.42% 1.694ms 169.400us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.283ms 11.68% 1.283ms 85.533us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.260ms 11.47% 1.260ms 252.000us 5 aten::cat 0.61% 77.000us 0.87% 110.000us 22.000us 238.000us 2.17% 238.000us 47.600us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 12.598ms Self CUDA time total: 10.983ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 6.63% 858.000us 74.27% 9.608ms 9.608ms 0.000us 0.00% 11.839ms 11.839ms 1 aten::linear 0.29% 38.000us 7.10% 918.000us 61.200us 0.000us 0.00% 9.405ms 627.000us 15 aten::addmm 4.30% 556.000us 6.00% 776.000us 51.733us 9.405ms 79.44% 9.405ms 627.000us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 6.476ms 54.70% 6.476ms 719.556us 9 aten::relu 0.59% 76.000us 1.88% 243.000us 24.300us 0.000us 0.00% 1.829ms 182.900us 10 aten::clamp_min 0.82% 106.000us 1.29% 167.000us 16.700us 1.829ms 15.45% 1.829ms 182.900us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.829ms 15.45% 1.829ms 182.900us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.379ms 11.65% 1.379ms 91.933us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.359ms 11.48% 1.359ms 271.800us 5 aten::cat 0.58% 75.000us 0.84% 109.000us 21.800us 257.000us 2.17% 257.000us 51.400us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 12.936ms Self CUDA time total: 11.839ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 5.15% 860.000us 73.47% 12.276ms 12.276ms 0.000us 0.00% 15.117ms 15.117ms 1 aten::linear 0.25% 41.000us 5.36% 896.000us 59.733us 0.000us 0.00% 12.002ms 800.133us 15 aten::addmm 3.23% 539.000us 4.51% 754.000us 50.267us 12.002ms 79.39% 12.002ms 800.133us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 8.357ms 55.28% 8.357ms 835.700us 10 aten::relu 0.41% 68.000us 1.41% 235.000us 23.500us 0.000us 0.00% 2.330ms 233.000us 10 aten::clamp_min 0.62% 104.000us 1.00% 167.000us 16.700us 2.330ms 15.41% 2.330ms 233.000us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.330ms 15.41% 2.330ms 233.000us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.744ms 11.54% 1.744ms 116.267us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.642ms 10.86% 1.642ms 410.500us 4 aten::cat 0.48% 81.000us 0.68% 114.000us 22.800us 359.000us 2.37% 359.000us 71.800us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 16.708ms Self CUDA time total: 15.117ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 4.09% 866.000us 72.09% 15.275ms 15.275ms 0.000us 0.00% 19.580ms 19.580ms 1 aten::linear 0.21% 45.000us 4.29% 910.000us 60.667us 0.000us 0.00% 15.501ms 1.033ms 15 aten::addmm 2.57% 545.000us 3.60% 762.000us 50.800us 15.501ms 79.17% 15.501ms 1.033ms 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 10.919ms 55.77% 10.919ms 992.636us 11 aten::relu 0.32% 68.000us 1.11% 236.000us 23.600us 0.000us 0.00% 3.039ms 303.900us 10 aten::clamp_min 0.50% 107.000us 0.79% 168.000us 16.800us 3.039ms 15.52% 3.039ms 303.900us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 3.039ms 15.52% 3.039ms 303.900us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 2.251ms 11.50% 2.251ms 150.067us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 2.020ms 10.32% 2.020ms 673.333us 3 aten::cat 0.37% 78.000us 0.53% 112.000us 22.400us 485.000us 2.48% 485.000us 97.000us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 21.189ms Self CUDA time total: 19.580ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 6.45% 860.000us 73.98% 9.860ms 9.860ms 0.000us 0.00% 12.228ms 12.228ms 1 aten::linear 0.33% 44.000us 6.87% 916.000us 61.067us 0.000us 0.00% 9.650ms 643.333us 15 aten::addmm 4.12% 549.000us 5.75% 767.000us 51.133us 9.650ms 78.92% 9.650ms 643.333us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 6.712ms 54.89% 6.712ms 671.200us 10 aten::relu 0.56% 74.000us 1.91% 255.000us 25.500us 0.000us 0.00% 1.946ms 194.600us 10 aten::clamp_min 0.91% 121.000us 1.36% 181.000us 18.100us 1.946ms 15.91% 1.946ms 194.600us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.946ms 15.91% 1.946ms 194.600us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.402ms 11.47% 1.402ms 93.467us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.327ms 10.85% 1.327ms 331.750us 4 aten::cat 0.60% 80.000us 0.85% 113.000us 22.600us 281.000us 2.30% 281.000us 56.200us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 13.328ms Self CUDA time total: 12.228ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 5.69% 852.000us 74.29% 11.126ms 11.126ms 0.000us 0.00% 13.366ms 13.366ms 1 aten::linear 0.29% 44.000us 6.23% 933.000us 62.200us 0.000us 0.00% 10.543ms 702.867us 15 aten::addmm 3.79% 568.000us 5.27% 789.000us 52.600us 10.543ms 78.88% 10.543ms 702.867us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 7.250ms 54.24% 7.250ms 805.556us 9 aten::relu 0.47% 70.000us 1.57% 235.000us 23.500us 0.000us 0.00% 2.128ms 212.800us 10 aten::clamp_min 0.70% 105.000us 1.10% 165.000us 16.500us 2.128ms 15.92% 2.128ms 212.800us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.128ms 15.92% 2.128ms 212.800us 10 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.536ms 11.49% 1.536ms 307.200us 5 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.528ms 11.43% 1.528ms 101.867us 15 aten::cat 0.52% 78.000us 0.74% 111.000us 22.200us 314.000us 2.35% 314.000us 62.800us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 14.976ms Self CUDA time total: 13.366ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 4.06% 866.000us 72.01% 15.377ms 15.377ms 0.000us 0.00% 19.714ms 19.714ms 1 aten::linear 0.21% 44.000us 4.28% 914.000us 60.933us 0.000us 0.00% 15.492ms 1.033ms 15 aten::addmm 2.56% 546.000us 3.58% 765.000us 51.000us 15.492ms 78.58% 15.492ms 1.033ms 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 10.680ms 54.17% 10.680ms 1.187ms 9 aten::relu 0.32% 69.000us 1.12% 240.000us 24.000us 0.000us 0.00% 3.170ms 317.000us 10 aten::clamp_min 0.52% 111.000us 0.80% 171.000us 17.100us 3.170ms 16.08% 3.170ms 317.000us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 3.170ms 16.08% 3.170ms 317.000us 10 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 2.263ms 11.48% 2.263ms 452.600us 5 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 2.232ms 11.32% 2.232ms 148.800us 15 aten::cat 0.37% 78.000us 0.52% 111.000us 22.200us 499.000us 2.53% 499.000us 99.800us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 21.354ms Self CUDA time total: 19.714ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 6.61% 860.000us 74.05% 9.636ms 9.636ms 0.000us 0.00% 11.922ms 11.922ms 1 aten::linear 0.36% 47.000us 6.96% 906.000us 60.400us 0.000us 0.00% 9.414ms 627.600us 15 aten::addmm 4.19% 545.000us 5.85% 761.000us 50.733us 9.414ms 78.96% 9.414ms 627.600us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 6.477ms 54.33% 6.477ms 719.667us 9 aten::relu 0.55% 71.000us 1.81% 236.000us 23.600us 0.000us 0.00% 1.893ms 189.300us 10 aten::clamp_min 0.80% 104.000us 1.27% 165.000us 16.500us 1.893ms 15.88% 1.893ms 189.300us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.893ms 15.88% 1.893ms 189.300us 10 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.372ms 11.51% 1.372ms 274.400us 5 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.363ms 11.43% 1.363ms 90.867us 15 aten::cat 0.58% 76.000us 0.84% 109.000us 21.800us 268.000us 2.25% 268.000us 53.600us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 13.012ms Self CUDA time total: 11.922ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 4.43% 865.000us 71.68% 14.006ms 14.006ms 0.000us 0.00% 18.463ms 18.463ms 1 aten::linear 0.21% 41.000us 4.69% 916.000us 61.067us 0.000us 0.00% 14.510ms 967.333us 15 aten::addmm 2.84% 555.000us 3.95% 772.000us 51.467us 14.510ms 78.59% 14.510ms 967.333us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 9.990ms 54.11% 9.990ms 1.110ms 9 aten::relu 0.37% 73.000us 1.27% 248.000us 24.800us 0.000us 0.00% 2.968ms 296.800us 10 aten::clamp_min 0.56% 109.000us 0.90% 175.000us 17.500us 2.968ms 16.08% 2.968ms 296.800us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.968ms 16.08% 2.968ms 296.800us 10 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 2.119ms 11.48% 2.119ms 423.800us 5 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 2.095ms 11.35% 2.095ms 139.667us 15 aten::cat 0.40% 79.000us 0.57% 112.000us 22.400us 470.000us 2.55% 470.000us 94.000us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 19.539ms Self CUDA time total: 18.463ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 9.02% 843.000us 79.37% 7.420ms 7.420ms 0.000us 0.00% 7.692ms 7.692ms 1 aten::linear 0.41% 38.000us 9.50% 888.000us 59.200us 0.000us 0.00% 6.035ms 402.333us 15 aten::addmm 5.67% 530.000us 8.03% 751.000us 50.067us 6.035ms 78.46% 6.035ms 402.333us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 4.225ms 54.93% 4.225ms 384.091us 11 aten::relu 0.73% 68.000us 2.58% 241.000us 24.100us 0.000us 0.00% 1.254ms 125.400us 10 aten::clamp_min 1.21% 113.000us 1.85% 173.000us 17.300us 1.254ms 16.30% 1.254ms 125.400us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.254ms 16.30% 1.254ms 125.400us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 879.000us 11.43% 879.000us 58.600us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 795.000us 10.34% 795.000us 265.000us 3 aten::cat 0.80% 75.000us 1.25% 117.000us 23.400us 158.000us 2.05% 158.000us 31.600us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 9.349ms Self CUDA time total: 7.692ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 4.32% 852.000us 71.77% 14.166ms 14.166ms 0.000us 0.00% 18.672ms 18.672ms 1 aten::linear 0.21% 42.000us 4.58% 904.000us 60.267us 0.000us 0.00% 14.560ms 970.667us 15 aten::addmm 2.77% 547.000us 3.87% 763.000us 50.867us 14.560ms 77.98% 14.560ms 970.667us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 10.007ms 53.59% 10.007ms 1.112ms 9 aten::relu 0.34% 67.000us 1.20% 236.000us 23.600us 0.000us 0.00% 3.106ms 310.600us 10 aten::clamp_min 0.54% 107.000us 0.86% 169.000us 16.900us 3.106ms 16.63% 3.106ms 310.600us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 3.106ms 16.63% 3.106ms 310.600us 10 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 2.159ms 11.56% 2.159ms 431.800us 5 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 2.084ms 11.16% 2.084ms 138.933us 15 aten::cat 0.43% 84.000us 0.59% 117.000us 23.400us 482.000us 2.58% 482.000us 96.400us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 19.739ms Self CUDA time total: 18.672ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 4.75% 854.000us 72.13% 12.960ms 12.960ms 0.000us 0.00% 16.878ms 16.878ms 1 aten::linear 0.26% 46.000us 5.13% 921.000us 61.400us 0.000us 0.00% 13.164ms 877.600us 15 aten::addmm 3.06% 550.000us 4.29% 770.000us 51.333us 13.164ms 78.00% 13.164ms 877.600us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 9.028ms 53.49% 9.028ms 1.003ms 9 aten::relu 0.38% 69.000us 1.35% 242.000us 24.200us 0.000us 0.00% 2.808ms 280.800us 10 aten::clamp_min 0.64% 115.000us 0.96% 173.000us 17.300us 2.808ms 16.64% 2.808ms 280.800us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.808ms 16.64% 2.808ms 280.800us 10 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.952ms 11.57% 1.952ms 390.400us 5 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.887ms 11.18% 1.887ms 125.800us 15 aten::cat 0.44% 79.000us 0.62% 112.000us 22.400us 433.000us 2.57% 433.000us 86.600us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 17.968ms Self CUDA time total: 16.878ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 6.17% 844.000us 73.78% 10.089ms 10.089ms 0.000us 0.00% 12.597ms 12.597ms 1 aten::linear 0.33% 45.000us 6.68% 914.000us 60.933us 0.000us 0.00% 9.859ms 657.267us 15 aten::addmm 4.01% 549.000us 5.60% 766.000us 51.067us 9.859ms 78.26% 9.859ms 657.267us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 6.832ms 54.24% 6.832ms 683.200us 10 aten::relu 0.53% 73.000us 1.75% 239.000us 23.900us 0.000us 0.00% 2.078ms 207.800us 10 aten::clamp_min 0.78% 106.000us 1.21% 166.000us 16.600us 2.078ms 16.50% 2.078ms 207.800us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.078ms 16.50% 2.078ms 207.800us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.424ms 11.30% 1.424ms 94.933us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.392ms 11.05% 1.392ms 348.000us 4 aten::cat 0.57% 78.000us 0.81% 111.000us 22.200us 298.000us 2.37% 298.000us 59.600us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 13.675ms Self CUDA time total: 12.597ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 7.10% 879.000us 76.08% 9.424ms 9.424ms 0.000us 0.00% 10.777ms 10.777ms 1 aten::linear 0.33% 41.000us 7.88% 976.000us 65.067us 0.000us 0.00% 8.444ms 562.933us 15 aten::addmm 4.59% 568.000us 6.37% 789.000us 52.600us 8.444ms 78.35% 8.444ms 562.933us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 5.777ms 53.60% 5.777ms 641.889us 9 aten::relu 0.57% 71.000us 1.93% 239.000us 23.900us 0.000us 0.00% 1.777ms 177.700us 10 aten::clamp_min 0.87% 108.000us 1.36% 168.000us 16.800us 1.777ms 16.49% 1.777ms 177.700us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.777ms 16.49% 1.777ms 177.700us 10 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.252ms 11.62% 1.252ms 250.400us 5 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.224ms 11.36% 1.224ms 81.600us 15 aten::cat 0.62% 77.000us 0.89% 110.000us 22.000us 244.000us 2.26% 244.000us 48.800us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 12.387ms Self CUDA time total: 10.777ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 9.29% 896.000us 99.72% 9.617ms 9.617ms 0.000us 0.00% 8.313ms 8.313ms 1 aten::linear 0.43% 41.000us 32.52% 3.136ms 209.067us 0.000us 0.00% 6.738ms 449.200us 15 aten::addmm 5.80% 559.000us 31.10% 2.999ms 199.933us 6.738ms 81.05% 6.738ms 449.200us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 4.782ms 57.52% 4.782ms 434.727us 11 aten::relu 0.74% 71.000us 2.48% 239.000us 23.900us 0.000us 0.00% 1.145ms 114.500us 10 aten::clamp_min 1.09% 105.000us 1.74% 168.000us 16.800us 1.145ms 13.77% 1.145ms 114.500us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.145ms 13.77% 1.145ms 114.500us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.008ms 12.13% 1.008ms 67.200us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 819.000us 9.85% 819.000us 273.000us 3 aten::cat 0.80% 77.000us 1.15% 111.000us 22.200us 161.000us 1.94% 161.000us 32.200us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 9.644ms Self CUDA time total: 8.313ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 3.66% 873.000us 75.25% 17.956ms 17.956ms 0.000us 0.00% 19.521ms 19.521ms 1 aten::linear 0.16% 38.000us 3.84% 917.000us 61.133us 0.000us 0.00% 15.763ms 1.051ms 15 aten::addmm 2.32% 553.000us 3.22% 768.000us 51.200us 15.763ms 80.75% 15.763ms 1.051ms 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 10.942ms 56.05% 10.942ms 1.216ms 9 aten::relu 0.31% 73.000us 1.05% 251.000us 25.100us 0.000us 0.00% 2.748ms 274.800us 10 aten::clamp_min 0.49% 116.000us 0.75% 178.000us 17.800us 2.748ms 14.08% 2.748ms 274.800us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.748ms 14.08% 2.748ms 274.800us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 2.323ms 11.90% 2.323ms 154.867us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 2.201ms 11.28% 2.201ms 440.200us 5 aten::cat 0.34% 81.000us 0.48% 115.000us 23.000us 456.000us 2.34% 456.000us 91.200us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 23.863ms Self CUDA time total: 19.521ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 7.91% 889.000us 81.71% 9.184ms 9.184ms 0.000us 0.00% 10.088ms 10.088ms 1 aten::linear 0.39% 44.000us 8.01% 900.000us 60.000us 0.000us 0.00% 8.169ms 544.600us 15 aten::addmm 4.78% 537.000us 6.70% 753.000us 50.200us 8.169ms 80.98% 8.169ms 544.600us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 5.722ms 56.72% 5.722ms 572.200us 10 aten::relu 0.61% 69.000us 2.22% 249.000us 24.900us 0.000us 0.00% 1.411ms 141.100us 10 aten::clamp_min 1.05% 118.000us 1.60% 180.000us 18.000us 1.411ms 13.99% 1.411ms 141.100us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.411ms 13.99% 1.411ms 141.100us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.221ms 12.10% 1.221ms 81.400us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.074ms 10.65% 1.074ms 268.500us 4 aten::cat 0.77% 87.000us 1.08% 121.000us 24.200us 204.000us 2.02% 204.000us 40.800us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 11.240ms Self CUDA time total: 10.088ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 8.73% 845.000us 97.72% 9.456ms 9.456ms 0.000us 0.00% 8.546ms 8.546ms 1 aten::linear 0.49% 47.000us 9.25% 895.000us 59.667us 0.000us 0.00% 6.917ms 461.133us 15 aten::addmm 5.52% 534.000us 7.73% 748.000us 49.867us 6.917ms 80.94% 6.917ms 461.133us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 4.768ms 55.79% 4.768ms 529.778us 9 aten::relu 0.69% 67.000us 2.42% 234.000us 23.400us 0.000us 0.00% 1.182ms 118.200us 10 aten::clamp_min 1.11% 107.000us 1.73% 167.000us 16.700us 1.182ms 13.83% 1.182ms 118.200us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.182ms 13.83% 1.182ms 118.200us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.042ms 12.19% 1.042ms 69.467us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 972.000us 11.37% 972.000us 194.400us 5 aten::cat 0.80% 77.000us 1.14% 110.000us 22.000us 171.000us 2.00% 171.000us 34.200us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 9.677ms Self CUDA time total: 8.546ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 6.81% 845.000us 79.10% 9.812ms 9.812ms 0.000us 0.00% 11.325ms 11.325ms 1 aten::linear 0.36% 45.000us 7.43% 921.000us 61.400us 0.000us 0.00% 9.167ms 611.133us 15 aten::addmm 4.40% 546.000us 6.22% 771.000us 51.400us 9.167ms 80.94% 9.167ms 611.133us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 6.402ms 56.53% 6.402ms 640.200us 10 aten::relu 0.57% 71.000us 1.93% 240.000us 24.000us 0.000us 0.00% 1.585ms 158.500us 10 aten::clamp_min 0.86% 107.000us 1.36% 169.000us 16.900us 1.585ms 14.00% 1.585ms 158.500us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.585ms 14.00% 1.585ms 158.500us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.364ms 12.04% 1.364ms 90.933us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.215ms 10.73% 1.215ms 303.750us 4 aten::cat 0.62% 77.000us 0.89% 111.000us 22.200us 230.000us 2.03% 230.000us 46.000us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 12.404ms Self CUDA time total: 11.325ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 4.16% 858.000us 71.40% 14.730ms 14.730ms 0.000us 0.00% 19.546ms 19.546ms 1 aten::linear 0.21% 44.000us 4.41% 909.000us 60.600us 0.000us 0.00% 15.785ms 1.052ms 15 aten::addmm 2.63% 542.000us 3.68% 760.000us 50.667us 15.785ms 80.76% 15.785ms 1.052ms 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 10.951ms 56.03% 10.951ms 1.217ms 9 aten::relu 0.34% 70.000us 1.14% 236.000us 23.600us 0.000us 0.00% 2.757ms 275.700us 10 aten::clamp_min 0.51% 105.000us 0.80% 166.000us 16.600us 2.757ms 14.11% 2.757ms 275.700us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.757ms 14.11% 2.757ms 275.700us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 2.330ms 11.92% 2.330ms 155.333us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 2.206ms 11.29% 2.206ms 441.200us 5 aten::cat 0.37% 76.000us 0.53% 109.000us 21.800us 456.000us 2.33% 456.000us 91.200us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 20.629ms Self CUDA time total: 19.546ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 5.88% 877.000us 73.35% 10.947ms 10.947ms 0.000us 0.00% 13.828ms 13.828ms 1 aten::linear 0.30% 45.000us 6.12% 914.000us 60.933us 0.000us 0.00% 11.182ms 745.467us 15 aten::addmm 3.62% 540.000us 5.09% 760.000us 50.667us 11.182ms 80.86% 11.182ms 745.467us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 7.736ms 55.94% 7.736ms 859.556us 9 aten::relu 0.47% 70.000us 1.59% 237.000us 23.700us 0.000us 0.00% 1.945ms 194.500us 10 aten::clamp_min 0.73% 109.000us 1.12% 167.000us 16.700us 1.945ms 14.07% 1.945ms 194.500us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.945ms 14.07% 1.945ms 194.500us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.658ms 11.99% 1.658ms 110.533us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.563ms 11.30% 1.563ms 312.600us 5 aten::cat 0.52% 77.000us 0.74% 110.000us 22.000us 299.000us 2.16% 299.000us 59.800us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 14.924ms Self CUDA time total: 13.828ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 4.04% 862.000us 71.45% 15.245ms 15.245ms 0.000us 0.00% 20.264ms 20.264ms 1 aten::linear 0.22% 46.000us 4.33% 924.000us 61.600us 0.000us 0.00% 16.348ms 1.090ms 15 aten::addmm 2.52% 537.000us 3.59% 766.000us 51.067us 16.348ms 80.68% 16.348ms 1.090ms 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 11.349ms 56.01% 11.349ms 1.261ms 9 aten::relu 0.34% 72.000us 1.12% 240.000us 24.000us 0.000us 0.00% 2.858ms 285.800us 10 aten::clamp_min 0.50% 106.000us 0.79% 168.000us 16.800us 2.858ms 14.10% 2.858ms 285.800us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.858ms 14.10% 2.858ms 285.800us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 2.406ms 11.87% 2.406ms 160.400us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 2.285ms 11.28% 2.285ms 457.000us 5 aten::cat 0.36% 77.000us 0.55% 117.000us 23.400us 477.000us 2.35% 477.000us 95.400us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 21.338ms Self CUDA time total: 20.264ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 4.98% 865.000us 72.10% 12.521ms 12.521ms 0.000us 0.00% 16.283ms 16.283ms 1 aten::linear 0.28% 49.000us 5.36% 931.000us 62.067us 0.000us 0.00% 13.173ms 878.200us 15 aten::addmm 3.21% 557.000us 4.46% 775.000us 51.667us 13.173ms 80.90% 13.173ms 878.200us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 9.129ms 56.06% 9.129ms 1.014ms 9 aten::relu 0.43% 75.000us 1.39% 241.000us 24.100us 0.000us 0.00% 2.284ms 228.400us 10 aten::clamp_min 0.62% 107.000us 0.96% 166.000us 16.600us 2.284ms 14.03% 2.284ms 228.400us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.284ms 14.03% 2.284ms 228.400us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.946ms 11.95% 1.946ms 129.733us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.840ms 11.30% 1.840ms 368.000us 5 aten::cat 0.43% 75.000us 0.63% 109.000us 21.800us 360.000us 2.21% 360.000us 72.000us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 17.366ms Self CUDA time total: 16.283ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 4.53% 860.000us 73.43% 13.942ms 13.942ms 0.000us 0.00% 17.006ms 17.006ms 1 aten::linear 0.23% 44.000us 4.84% 919.000us 61.267us 0.000us 0.00% 13.749ms 916.600us 15 aten::addmm 2.90% 551.000us 4.06% 771.000us 51.400us 13.749ms 80.85% 13.749ms 916.600us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 9.533ms 56.06% 9.533ms 1.059ms 9 aten::relu 0.37% 71.000us 1.27% 242.000us 24.200us 0.000us 0.00% 2.388ms 238.800us 10 aten::clamp_min 0.57% 108.000us 0.90% 171.000us 17.100us 2.388ms 14.04% 2.388ms 238.800us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.388ms 14.04% 2.388ms 238.800us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 2.032ms 11.95% 2.032ms 135.467us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.922ms 11.30% 1.922ms 384.400us 5 aten::cat 0.43% 82.000us 0.61% 115.000us 23.000us 382.000us 2.25% 382.000us 76.400us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 18.987ms Self CUDA time total: 17.006ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 4.84% 855.000us 72.14% 12.751ms 12.751ms 0.000us 0.00% 16.581ms 16.581ms 1 aten::linear 0.28% 49.000us 5.15% 911.000us 60.733us 0.000us 0.00% 13.418ms 894.533us 15 aten::addmm 3.03% 536.000us 4.28% 757.000us 50.467us 13.418ms 80.92% 13.418ms 894.533us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 9.302ms 56.10% 9.302ms 1.034ms 9 aten::relu 0.40% 70.000us 1.34% 237.000us 23.700us 0.000us 0.00% 2.322ms 232.200us 10 aten::clamp_min 0.60% 106.000us 0.94% 167.000us 16.700us 2.322ms 14.00% 2.322ms 232.200us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.322ms 14.00% 2.322ms 232.200us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.977ms 11.92% 1.977ms 131.800us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.872ms 11.29% 1.872ms 374.400us 5 aten::cat 0.45% 80.000us 0.64% 113.000us 22.600us 370.000us 2.23% 370.000us 74.000us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 17.675ms Self CUDA time total: 16.581ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 4.68% 880.000us 74.27% 13.977ms 13.977ms 0.000us 0.00% 16.400ms 16.400ms 1 aten::linear 0.23% 43.000us 4.94% 930.000us 62.000us 0.000us 0.00% 13.251ms 883.400us 15 aten::addmm 2.95% 555.000us 4.15% 781.000us 52.067us 13.251ms 80.80% 13.251ms 883.400us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 9.263ms 56.48% 9.263ms 926.300us 10 aten::relu 0.39% 74.000us 1.30% 245.000us 24.500us 0.000us 0.00% 2.297ms 229.700us 10 aten::clamp_min 0.58% 110.000us 0.91% 171.000us 17.100us 2.297ms 14.01% 2.297ms 229.700us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.297ms 14.01% 2.297ms 229.700us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.959ms 11.95% 1.959ms 130.600us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.764ms 10.76% 1.764ms 441.000us 4 aten::cat 0.46% 86.000us 0.63% 119.000us 23.800us 368.000us 2.24% 368.000us 73.600us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 18.819ms Self CUDA time total: 16.400ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 3.29% 861.000us 70.64% 18.493ms 18.493ms 0.000us 0.00% 25.109ms 25.109ms 1 aten::linear 0.17% 44.000us 3.49% 913.000us 60.867us 0.000us 0.00% 20.242ms 1.349ms 15 aten::addmm 2.06% 538.000us 2.92% 764.000us 50.933us 20.242ms 80.62% 20.242ms 1.349ms 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 14.057ms 55.98% 14.057ms 1.562ms 9 aten::relu 0.27% 70.000us 0.90% 236.000us 23.600us 0.000us 0.00% 3.551ms 355.100us 10 aten::clamp_min 0.39% 103.000us 0.63% 166.000us 16.600us 3.551ms 14.14% 3.551ms 355.100us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 3.551ms 14.14% 3.551ms 355.100us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 2.975ms 11.85% 2.975ms 198.333us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 2.826ms 11.25% 2.826ms 565.200us 5 aten::cat 0.33% 86.000us 0.46% 120.000us 24.000us 616.000us 2.45% 616.000us 123.200us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 26.179ms Self CUDA time total: 25.109ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 9.85% 860.000us 78.38% 6.841ms 6.841ms 0.000us 0.00% 7.576ms 7.576ms 1 aten::linear 0.47% 41.000us 10.36% 904.000us 60.267us 0.000us 0.00% 6.093ms 406.200us 15 aten::addmm 6.12% 534.000us 8.70% 759.000us 50.600us 6.093ms 80.43% 6.093ms 406.200us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 4.299ms 56.74% 4.299ms 390.818us 11 aten::relu 0.81% 71.000us 2.77% 242.000us 24.200us 0.000us 0.00% 1.082ms 108.200us 10 aten::clamp_min 1.26% 110.000us 1.96% 171.000us 17.100us 1.082ms 14.28% 1.082ms 108.200us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.082ms 14.28% 1.082ms 108.200us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 912.000us 12.04% 912.000us 60.800us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 760.000us 10.03% 760.000us 253.333us 3 aten::cat 0.88% 77.000us 1.26% 110.000us 22.000us 149.000us 1.97% 149.000us 29.800us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 8.728ms Self CUDA time total: 7.576ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 7.99% 836.000us 80.29% 8.396ms 8.396ms 0.000us 0.00% 9.355ms 9.355ms 1 aten::linear 0.42% 44.000us 12.83% 1.342ms 89.467us 0.000us 0.00% 7.514ms 500.933us 15 aten::addmm 5.26% 550.000us 11.41% 1.193ms 79.533us 7.514ms 80.32% 7.514ms 500.933us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 5.245ms 56.07% 5.245ms 524.500us 10 aten::relu 0.69% 72.000us 2.39% 250.000us 25.000us 0.000us 0.00% 1.362ms 136.200us 10 aten::clamp_min 1.13% 118.000us 1.70% 178.000us 17.800us 1.362ms 14.56% 1.362ms 136.200us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.362ms 14.56% 1.362ms 136.200us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.114ms 11.91% 1.114ms 74.267us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.003ms 10.72% 1.003ms 250.750us 4 aten::cat 0.76% 79.000us 1.07% 112.000us 22.400us 190.000us 2.03% 190.000us 38.000us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 10.457ms Self CUDA time total: 9.355ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 5.43% 878.000us 72.88% 11.782ms 11.782ms 0.000us 0.00% 15.070ms 15.070ms 1 aten::linear 0.26% 42.000us 5.68% 919.000us 61.267us 0.000us 0.00% 12.067ms 804.467us 15 aten::addmm 3.43% 555.000us 4.78% 772.000us 51.467us 12.067ms 80.07% 12.067ms 804.467us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 8.425ms 55.91% 8.425ms 842.500us 10 aten::relu 0.45% 72.000us 1.48% 240.000us 24.000us 0.000us 0.00% 2.221ms 222.100us 10 aten::clamp_min 0.66% 107.000us 1.04% 168.000us 16.800us 2.221ms 14.74% 2.221ms 222.100us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.221ms 14.74% 2.221ms 222.100us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.777ms 11.79% 1.777ms 118.467us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.632ms 10.83% 1.632ms 408.000us 4 aten::cat 0.49% 79.000us 0.69% 111.000us 22.200us 347.000us 2.30% 347.000us 69.400us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 16.166ms Self CUDA time total: 15.070ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 5.44% 868.000us 72.69% 11.589ms 11.589ms 0.000us 0.00% 14.854ms 14.854ms 1 aten::linear 0.29% 46.000us 5.68% 906.000us 60.400us 0.000us 0.00% 11.911ms 794.067us 15 aten::addmm 3.41% 543.000us 4.75% 758.000us 50.533us 11.911ms 80.19% 11.911ms 794.067us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 8.319ms 56.01% 8.319ms 831.900us 10 aten::relu 0.46% 74.000us 1.51% 241.000us 24.100us 0.000us 0.00% 2.185ms 218.500us 10 aten::clamp_min 0.66% 106.000us 1.05% 167.000us 16.700us 2.185ms 14.71% 2.185ms 218.500us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.185ms 14.71% 2.185ms 218.500us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.746ms 11.75% 1.746ms 116.400us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.599ms 10.76% 1.599ms 399.750us 4 aten::cat 0.48% 77.000us 0.69% 110.000us 22.000us 331.000us 2.23% 331.000us 66.200us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 15.943ms Self CUDA time total: 14.854ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 5.26% 860.000us 73.53% 12.031ms 12.031ms 0.000us 0.00% 14.725ms 14.725ms 1 aten::linear 0.25% 41.000us 5.52% 904.000us 60.267us 0.000us 0.00% 11.808ms 787.200us 15 aten::addmm 3.30% 540.000us 4.64% 759.000us 50.600us 11.808ms 80.19% 11.808ms 787.200us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 8.161ms 55.42% 8.161ms 906.778us 9 aten::relu 0.42% 69.000us 1.45% 238.000us 23.800us 0.000us 0.00% 2.168ms 216.800us 10 aten::clamp_min 0.66% 108.000us 1.03% 169.000us 16.900us 2.168ms 14.72% 2.168ms 216.800us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.168ms 14.72% 2.168ms 216.800us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.734ms 11.78% 1.734ms 115.600us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.673ms 11.36% 1.673ms 334.600us 5 aten::cat 0.48% 79.000us 0.72% 118.000us 23.600us 327.000us 2.22% 327.000us 65.400us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 16.362ms Self CUDA time total: 14.725ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 5.36% 863.000us 72.57% 11.678ms 11.678ms 0.000us 0.00% 15.006ms 15.006ms 1 aten::linear 0.28% 45.000us 5.72% 920.000us 61.333us 0.000us 0.00% 12.028ms 801.867us 15 aten::addmm 3.37% 543.000us 4.80% 772.000us 51.467us 12.028ms 80.15% 12.028ms 801.867us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 8.311ms 55.38% 8.311ms 923.444us 9 aten::relu 0.45% 73.000us 1.52% 244.000us 24.400us 0.000us 0.00% 2.210ms 221.000us 10 aten::clamp_min 0.68% 110.000us 1.06% 171.000us 17.100us 2.210ms 14.73% 2.210ms 221.000us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.210ms 14.73% 2.210ms 221.000us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.766ms 11.77% 1.766ms 117.733us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.706ms 11.37% 1.706ms 341.200us 5 aten::cat 0.48% 78.000us 0.69% 111.000us 22.200us 338.000us 2.25% 338.000us 67.600us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 16.091ms Self CUDA time total: 15.006ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 4.95% 863.000us 75.99% 13.243ms 13.243ms 0.000us 0.00% 14.336ms 14.336ms 1 aten::linear 0.28% 49.000us 5.26% 916.000us 61.067us 0.000us 0.00% 11.493ms 766.200us 15 aten::addmm 3.13% 546.000us 4.37% 761.000us 50.733us 11.493ms 80.17% 11.493ms 766.200us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 8.022ms 55.96% 8.022ms 802.200us 10 aten::relu 0.40% 70.000us 1.42% 248.000us 24.800us 0.000us 0.00% 2.109ms 210.900us 10 aten::clamp_min 0.62% 108.000us 1.02% 178.000us 17.800us 2.109ms 14.71% 2.109ms 210.900us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.109ms 14.71% 2.109ms 210.900us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.691ms 11.80% 1.691ms 112.733us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.545ms 10.78% 1.545ms 386.250us 4 aten::cat 0.44% 77.000us 0.63% 110.000us 22.000us 321.000us 2.24% 321.000us 64.200us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 17.428ms Self CUDA time total: 14.336ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 4.71% 883.000us 71.94% 13.498ms 13.498ms 0.000us 0.00% 17.663ms 17.663ms 1 aten::linear 0.25% 46.000us 4.96% 930.000us 62.000us 0.000us 0.00% 14.276ms 951.733us 15 aten::addmm 2.92% 548.000us 4.13% 775.000us 51.667us 14.276ms 80.82% 14.276ms 951.733us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 9.908ms 56.09% 9.908ms 1.101ms 9 aten::relu 0.39% 73.000us 1.32% 247.000us 24.700us 0.000us 0.00% 2.479ms 247.900us 10 aten::clamp_min 0.55% 104.000us 0.93% 174.000us 17.400us 2.479ms 14.03% 2.479ms 247.900us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.479ms 14.03% 2.479ms 247.900us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 2.106ms 11.92% 2.106ms 140.400us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.995ms 11.29% 1.995ms 399.000us 5 aten::cat 0.43% 80.000us 0.61% 114.000us 22.800us 403.000us 2.28% 403.000us 80.600us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 18.764ms Self CUDA time total: 17.663ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 6.23% 873.000us 73.63% 10.315ms 10.315ms 0.000us 0.00% 12.923ms 12.923ms 1 aten::linear 0.31% 43.000us 6.41% 898.000us 59.867us 0.000us 0.00% 10.446ms 696.400us 15 aten::addmm 3.83% 537.000us 5.37% 752.000us 50.133us 10.446ms 80.83% 10.446ms 696.400us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 7.379ms 57.10% 7.379ms 670.818us 11 aten::relu 0.49% 68.000us 1.70% 238.000us 23.800us 0.000us 0.00% 1.808ms 180.800us 10 aten::clamp_min 0.78% 109.000us 1.21% 170.000us 17.000us 1.808ms 13.99% 1.808ms 180.800us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.808ms 13.99% 1.808ms 180.800us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.552ms 12.01% 1.552ms 103.467us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.304ms 10.09% 1.304ms 434.667us 3 aten::cat 0.56% 78.000us 0.79% 111.000us 22.200us 280.000us 2.17% 280.000us 56.000us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 14.009ms Self CUDA time total: 12.923ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 6.07% 888.000us 73.34% 10.737ms 10.737ms 0.000us 0.00% 13.553ms 13.553ms 1 aten::linear 0.30% 44.000us 6.30% 922.000us 61.467us 0.000us 0.00% 10.961ms 730.733us 15 aten::addmm 3.72% 545.000us 5.26% 770.000us 51.333us 10.961ms 80.88% 10.961ms 730.733us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 7.675ms 56.63% 7.675ms 767.500us 10 aten::relu 0.50% 73.000us 1.64% 240.000us 24.000us 0.000us 0.00% 1.908ms 190.800us 10 aten::clamp_min 0.72% 106.000us 1.14% 167.000us 16.700us 1.908ms 14.08% 1.908ms 190.800us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.908ms 14.08% 1.908ms 190.800us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.627ms 12.00% 1.627ms 108.467us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.454ms 10.73% 1.454ms 363.500us 4 aten::cat 0.65% 95.000us 0.87% 128.000us 25.600us 287.000us 2.12% 287.000us 57.400us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 14.640ms Self CUDA time total: 13.553ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 6.86% 836.000us 90.11% 10.982ms 10.982ms 0.000us 0.00% 11.155ms 11.155ms 1 aten::linear 0.36% 44.000us 22.58% 2.752ms 183.467us 0.000us 0.00% 9.007ms 600.467us 15 aten::addmm 4.59% 559.000us 21.38% 2.606ms 173.733us 9.007ms 80.74% 9.007ms 600.467us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 6.216ms 55.72% 6.216ms 690.667us 9 aten::relu 0.65% 79.000us 2.03% 247.000us 24.700us 0.000us 0.00% 1.571ms 157.100us 10 aten::clamp_min 0.87% 106.000us 1.38% 168.000us 16.800us 1.571ms 14.08% 1.571ms 157.100us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.571ms 14.08% 1.571ms 157.100us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.346ms 12.07% 1.346ms 89.733us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.262ms 11.31% 1.262ms 252.400us 5 aten::cat 0.62% 75.000us 0.89% 108.000us 21.600us 234.000us 2.10% 234.000us 46.800us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 12.188ms Self CUDA time total: 11.155ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 7.50% 848.000us 87.50% 9.889ms 9.889ms 0.000us 0.00% 10.191ms 10.191ms 1 aten::linear 0.37% 42.000us 8.10% 915.000us 61.000us 0.000us 0.00% 8.255ms 550.333us 15 aten::addmm 4.80% 543.000us 6.79% 767.000us 51.133us 8.255ms 81.00% 8.255ms 550.333us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 5.859ms 57.49% 5.859ms 532.636us 11 aten::relu 0.64% 72.000us 2.14% 242.000us 24.200us 0.000us 0.00% 1.426ms 142.600us 10 aten::clamp_min 0.96% 109.000us 1.50% 170.000us 17.000us 1.426ms 13.99% 1.426ms 142.600us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.426ms 13.99% 1.426ms 142.600us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.232ms 12.09% 1.232ms 82.133us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 998.000us 9.79% 998.000us 332.667us 3 aten::cat 0.74% 84.000us 1.04% 117.000us 23.400us 202.000us 1.98% 202.000us 40.400us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 11.302ms Self CUDA time total: 10.191ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 3.30% 889.000us 70.32% 18.971ms 18.971ms 0.000us 0.00% 25.905ms 25.905ms 1 aten::linear 0.17% 45.000us 3.29% 888.000us 59.200us 0.000us 0.00% 20.881ms 1.392ms 15 aten::addmm 1.94% 523.000us 2.75% 741.000us 49.400us 20.881ms 80.61% 20.881ms 1.392ms 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 14.504ms 55.99% 14.504ms 1.612ms 9 aten::relu 0.25% 68.000us 0.87% 235.000us 23.500us 0.000us 0.00% 3.669ms 366.900us 10 aten::clamp_min 0.39% 106.000us 0.62% 167.000us 16.700us 3.669ms 14.16% 3.669ms 366.900us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 3.669ms 14.16% 3.669ms 366.900us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 3.070ms 11.85% 3.070ms 204.667us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 2.916ms 11.26% 2.916ms 583.200us 5 aten::cat 0.28% 76.000us 0.41% 110.000us 22.000us 644.000us 2.49% 644.000us 128.800us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 26.979ms Self CUDA time total: 25.905ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 5.41% 851.000us 72.68% 11.423ms 11.423ms 0.000us 0.00% 14.622ms 14.622ms 1 aten::linear 0.27% 43.000us 5.74% 902.000us 60.133us 0.000us 0.00% 11.834ms 788.933us 15 aten::addmm 3.44% 541.000us 4.85% 762.000us 50.800us 11.834ms 80.93% 11.834ms 788.933us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 8.296ms 56.74% 8.296ms 829.600us 10 aten::relu 0.46% 72.000us 1.53% 241.000us 24.100us 0.000us 0.00% 2.056ms 205.600us 10 aten::clamp_min 0.69% 109.000us 1.08% 169.000us 16.900us 2.056ms 14.06% 2.056ms 205.600us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.056ms 14.06% 2.056ms 205.600us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.753ms 11.99% 1.753ms 116.867us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.564ms 10.70% 1.564ms 391.000us 4 aten::cat 0.50% 79.000us 0.71% 112.000us 22.400us 311.000us 2.13% 311.000us 62.200us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 15.717ms Self CUDA time total: 14.622ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 5.25% 872.000us 73.53% 12.219ms 12.219ms 0.000us 0.00% 14.985ms 14.985ms 1 aten::linear 0.25% 41.000us 5.60% 931.000us 62.067us 0.000us 0.00% 12.057ms 803.800us 15 aten::addmm 3.35% 556.000us 4.74% 787.000us 52.467us 12.057ms 80.46% 12.057ms 803.800us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 8.350ms 55.72% 8.350ms 927.778us 9 aten::relu 0.44% 73.000us 1.43% 237.000us 23.700us 0.000us 0.00% 2.169ms 216.900us 10 aten::clamp_min 0.62% 103.000us 0.99% 164.000us 16.400us 2.169ms 14.47% 2.169ms 216.900us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.169ms 14.47% 2.169ms 216.900us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.778ms 11.87% 1.778ms 118.533us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.699ms 11.34% 1.699ms 339.800us 5 aten::cat 0.46% 77.000us 0.66% 110.000us 22.000us 328.000us 2.19% 328.000us 65.600us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 16.618ms Self CUDA time total: 14.985ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 2.70% 866.000us 69.82% 22.398ms 22.398ms 0.000us 0.00% 30.982ms 30.982ms 1 aten::linear 0.13% 41.000us 2.80% 898.000us 59.867us 0.000us 0.00% 24.804ms 1.654ms 15 aten::addmm 1.65% 528.000us 2.33% 749.000us 49.933us 24.804ms 80.06% 24.804ms 1.654ms 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 17.215ms 55.56% 17.215ms 1.913ms 9 aten::relu 0.22% 71.000us 0.75% 239.000us 23.900us 0.000us 0.00% 4.517ms 451.700us 10 aten::clamp_min 0.33% 105.000us 0.52% 168.000us 16.800us 4.517ms 14.58% 4.517ms 451.700us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 4.517ms 14.58% 4.517ms 451.700us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 3.622ms 11.69% 3.622ms 241.467us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 3.484ms 11.25% 3.484ms 696.800us 5 aten::cat 0.24% 77.000us 0.34% 110.000us 22.000us 818.000us 2.64% 818.000us 163.600us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 32.080ms Self CUDA time total: 30.982ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 7.34% 858.000us 78.73% 9.204ms 9.204ms 0.000us 0.00% 10.592ms 10.592ms 1 aten::linear 0.43% 50.000us 8.13% 951.000us 63.400us 0.000us 0.00% 8.533ms 568.867us 15 aten::addmm 4.89% 572.000us 6.81% 796.000us 53.067us 8.533ms 80.56% 8.533ms 568.867us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 5.885ms 55.56% 5.885ms 653.889us 9 aten::relu 0.62% 72.000us 2.03% 237.000us 23.700us 0.000us 0.00% 1.524ms 152.400us 10 aten::clamp_min 0.91% 106.000us 1.41% 165.000us 16.500us 1.524ms 14.39% 1.524ms 152.400us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.524ms 14.39% 1.524ms 152.400us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.269ms 11.98% 1.269ms 84.600us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.207ms 11.40% 1.207ms 241.400us 5 aten::cat 0.68% 79.000us 0.96% 112.000us 22.400us 220.000us 2.08% 220.000us 44.000us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 11.691ms Self CUDA time total: 10.592ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 4.24% 869.000us 72.34% 14.824ms 14.824ms 0.000us 0.00% 18.850ms 18.850ms 1 aten::linear 0.21% 43.000us 4.48% 918.000us 61.200us 0.000us 0.00% 15.141ms 1.009ms 15 aten::addmm 2.64% 542.000us 3.77% 773.000us 51.533us 15.141ms 80.32% 15.141ms 1.009ms 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 10.498ms 55.69% 10.498ms 1.166ms 9 aten::relu 0.33% 68.000us 1.17% 239.000us 23.900us 0.000us 0.00% 2.731ms 273.100us 10 aten::clamp_min 0.53% 109.000us 0.83% 171.000us 17.100us 2.731ms 14.49% 2.731ms 273.100us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.731ms 14.49% 2.731ms 273.100us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 2.223ms 11.79% 2.223ms 148.200us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 2.134ms 11.32% 2.134ms 426.800us 5 aten::cat 0.38% 78.000us 0.55% 112.000us 22.400us 444.000us 2.36% 444.000us 88.800us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 20.492ms Self CUDA time total: 18.850ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 4.52% 862.000us 71.79% 13.690ms 13.690ms 0.000us 0.00% 17.990ms 17.990ms 1 aten::linear 0.21% 40.000us 4.80% 916.000us 61.067us 0.000us 0.00% 14.378ms 958.533us 15 aten::addmm 2.85% 544.000us 4.02% 767.000us 51.133us 14.378ms 79.92% 14.378ms 958.533us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 9.947ms 55.29% 9.947ms 1.105ms 9 aten::relu 0.37% 70.000us 1.25% 238.000us 23.800us 0.000us 0.00% 2.677ms 267.700us 10 aten::clamp_min 0.56% 107.000us 0.88% 168.000us 16.800us 2.677ms 14.88% 2.677ms 267.700us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.677ms 14.88% 2.677ms 267.700us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 2.104ms 11.70% 2.104ms 140.267us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 2.045ms 11.37% 2.045ms 409.000us 5 aten::cat 0.42% 80.000us 0.60% 114.000us 22.800us 426.000us 2.37% 426.000us 85.200us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 19.069ms Self CUDA time total: 17.990ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 4.37% 874.000us 73.34% 14.685ms 14.685ms 0.000us 0.00% 17.826ms 17.826ms 1 aten::linear 0.28% 56.000us 4.57% 915.000us 61.000us 0.000us 0.00% 14.253ms 950.200us 15 aten::addmm 2.68% 537.000us 3.79% 759.000us 50.600us 14.253ms 79.96% 14.253ms 950.200us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 9.866ms 55.35% 9.866ms 1.096ms 9 aten::relu 0.33% 67.000us 1.17% 234.000us 23.400us 0.000us 0.00% 2.650ms 265.000us 10 aten::clamp_min 0.52% 105.000us 0.83% 167.000us 16.700us 2.650ms 14.87% 2.650ms 265.000us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.650ms 14.87% 2.650ms 265.000us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 2.084ms 11.69% 2.084ms 138.933us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 2.025ms 11.36% 2.025ms 405.000us 5 aten::cat 0.38% 76.000us 0.54% 109.000us 21.800us 418.000us 2.34% 418.000us 83.600us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 20.022ms Self CUDA time total: 17.826ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 4.81% 850.000us 72.02% 12.727ms 12.727ms 0.000us 0.00% 16.580ms 16.580ms 1 aten::linear 0.21% 37.000us 5.16% 911.000us 60.733us 0.000us 0.00% 13.261ms 884.067us 15 aten::addmm 3.04% 538.000us 4.36% 770.000us 51.333us 13.261ms 79.98% 13.261ms 884.067us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 9.171ms 55.31% 9.171ms 1.019ms 9 aten::relu 0.38% 67.000us 1.33% 235.000us 23.500us 0.000us 0.00% 2.463ms 246.300us 10 aten::clamp_min 0.61% 107.000us 0.95% 168.000us 16.800us 2.463ms 14.86% 2.463ms 246.300us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.463ms 14.86% 2.463ms 246.300us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.943ms 11.72% 1.943ms 129.533us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.883ms 11.36% 1.883ms 376.600us 5 aten::cat 0.42% 75.000us 0.62% 109.000us 21.800us 387.000us 2.33% 387.000us 77.400us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 17.671ms Self CUDA time total: 16.580ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 5.30% 857.000us 72.64% 11.738ms 11.738ms 0.000us 0.00% 15.088ms 15.088ms 1 aten::linear 0.24% 39.000us 5.66% 914.000us 60.933us 0.000us 0.00% 12.071ms 804.733us 15 aten::addmm 3.37% 545.000us 4.75% 768.000us 51.200us 12.071ms 80.00% 12.071ms 804.733us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 8.341ms 55.28% 8.341ms 926.778us 9 aten::relu 0.45% 73.000us 1.48% 239.000us 23.900us 0.000us 0.00% 2.236ms 223.600us 10 aten::clamp_min 0.66% 107.000us 1.03% 166.000us 16.600us 2.236ms 14.82% 2.236ms 223.600us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.236ms 14.82% 2.236ms 223.600us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.768ms 11.72% 1.768ms 117.867us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.712ms 11.35% 1.712ms 342.400us 5 aten::cat 0.47% 76.000us 0.72% 116.000us 23.200us 343.000us 2.27% 343.000us 68.600us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 16.160ms Self CUDA time total: 15.088ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 4.64% 869.000us 71.58% 13.396ms 13.396ms 0.000us 0.00% 17.636ms 17.636ms 1 aten::linear 0.22% 42.000us 4.78% 894.000us 59.600us 0.000us 0.00% 14.112ms 940.800us 15 aten::addmm 2.84% 532.000us 3.98% 745.000us 49.667us 14.112ms 80.02% 14.112ms 940.800us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 9.971ms 56.54% 9.971ms 906.455us 11 aten::relu 0.37% 70.000us 1.27% 238.000us 23.800us 0.000us 0.00% 2.618ms 261.800us 10 aten::clamp_min 0.57% 107.000us 0.90% 168.000us 16.800us 2.618ms 14.84% 2.618ms 261.800us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.618ms 14.84% 2.618ms 261.800us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 2.067ms 11.72% 2.067ms 137.800us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.786ms 10.13% 1.786ms 595.333us 3 aten::cat 0.42% 79.000us 0.60% 112.000us 22.400us 412.000us 2.34% 412.000us 82.400us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 18.715ms Self CUDA time total: 17.636ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 3.10% 879.000us 71.14% 20.147ms 20.147ms 0.000us 0.00% 26.692ms 26.692ms 1 aten::linear 0.14% 41.000us 3.27% 927.000us 61.800us 0.000us 0.00% 20.763ms 1.384ms 15 aten::addmm 1.96% 554.000us 2.74% 776.000us 51.733us 20.763ms 77.79% 20.763ms 1.384ms 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 14.286ms 53.52% 14.286ms 1.587ms 9 aten::relu 0.26% 73.000us 0.85% 241.000us 24.100us 0.000us 0.00% 4.436ms 443.600us 10 aten::clamp_min 0.37% 105.000us 0.59% 168.000us 16.800us 4.436ms 16.62% 4.436ms 443.600us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 4.436ms 16.62% 4.436ms 443.600us 10 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 3.069ms 11.50% 3.069ms 613.800us 5 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 2.963ms 11.10% 2.963ms 197.533us 15 aten::cat 0.28% 79.000us 0.40% 112.000us 22.400us 762.000us 2.85% 762.000us 152.400us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 28.322ms Self CUDA time total: 26.692ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 3.30% 946.000us 75.30% 21.605ms 21.605ms 0.000us 0.00% 27.616ms 27.616ms 1 aten::linear 0.16% 46.000us 3.41% 978.000us 65.200us 0.000us 0.00% 21.447ms 1.430ms 15 aten::addmm 2.02% 580.000us 2.82% 810.000us 54.000us 21.447ms 77.66% 21.447ms 1.430ms 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 14.747ms 53.40% 14.747ms 1.639ms 9 aten::relu 0.29% 83.000us 17.45% 5.005ms 500.500us 0.000us 0.00% 4.606ms 460.600us 10 aten::clamp_min 0.59% 168.000us 17.16% 4.922ms 492.200us 4.606ms 16.68% 4.606ms 460.600us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 4.606ms 16.68% 4.606ms 460.600us 10 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 3.177ms 11.50% 3.177ms 635.400us 5 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 3.060ms 11.08% 3.060ms 204.000us 15 aten::cat 0.28% 81.000us 0.40% 116.000us 23.200us 806.000us 2.92% 806.000us 161.200us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 28.690ms Self CUDA time total: 27.616ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 6.89% 876.000us 76.03% 9.666ms 9.666ms 0.000us 0.00% 11.062ms 11.062ms 1 aten::linear 0.35% 45.000us 7.40% 941.000us 62.733us 0.000us 0.00% 8.668ms 577.867us 15 aten::addmm 4.43% 563.000us 6.23% 792.000us 52.800us 8.668ms 78.36% 8.668ms 577.867us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 5.939ms 53.69% 5.939ms 659.889us 9 aten::relu 0.64% 81.000us 1.94% 247.000us 24.700us 0.000us 0.00% 1.819ms 181.900us 10 aten::clamp_min 0.84% 107.000us 1.31% 166.000us 16.600us 1.819ms 16.44% 1.819ms 181.900us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.819ms 16.44% 1.819ms 181.900us 10 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.287ms 11.63% 1.287ms 257.400us 5 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.259ms 11.38% 1.259ms 83.933us 15 aten::cat 0.62% 79.000us 0.88% 112.000us 22.400us 252.000us 2.28% 252.000us 50.400us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 12.713ms Self CUDA time total: 11.062ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 8.66% 864.000us 88.71% 8.851ms 8.851ms 0.000us 0.00% 8.882ms 8.882ms 1 aten::linear 0.42% 42.000us 9.09% 907.000us 60.467us 0.000us 0.00% 6.964ms 464.267us 15 aten::addmm 5.49% 548.000us 7.64% 762.000us 50.800us 6.964ms 78.41% 6.964ms 464.267us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 4.831ms 54.39% 4.831ms 483.100us 10 aten::relu 0.71% 71.000us 2.37% 236.000us 23.600us 0.000us 0.00% 1.454ms 145.400us 10 aten::clamp_min 1.05% 105.000us 1.65% 165.000us 16.500us 1.454ms 16.37% 1.454ms 145.400us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.454ms 16.37% 1.454ms 145.400us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.012ms 11.39% 1.012ms 67.467us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 972.000us 10.94% 972.000us 243.000us 4 aten::cat 0.78% 78.000us 1.11% 111.000us 22.200us 187.000us 2.11% 187.000us 37.400us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 9.977ms Self CUDA time total: 8.882ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 5.94% 896.000us 80.08% 12.081ms 12.081ms 0.000us 0.00% 10.855ms 10.855ms 1 aten::linear 0.31% 47.000us 6.19% 934.000us 62.267us 0.000us 0.00% 8.788ms 585.867us 15 aten::addmm 3.70% 558.000us 5.17% 780.000us 52.000us 8.788ms 80.96% 8.788ms 585.867us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 6.141ms 56.57% 6.141ms 614.100us 10 aten::relu 0.45% 68.000us 1.60% 241.000us 24.100us 0.000us 0.00% 1.515ms 151.500us 10 aten::clamp_min 0.74% 111.000us 1.15% 173.000us 17.300us 1.515ms 13.96% 1.515ms 151.500us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.515ms 13.96% 1.515ms 151.500us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.306ms 12.03% 1.306ms 87.067us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.161ms 10.70% 1.161ms 290.250us 4 aten::cat 0.54% 81.000us 0.77% 116.000us 23.200us 222.000us 2.05% 222.000us 44.400us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 15.087ms Self CUDA time total: 10.855ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 6.10% 875.000us 73.69% 10.579ms 10.579ms 0.000us 0.00% 13.251ms 13.251ms 1 aten::linear 0.29% 42.000us 6.58% 945.000us 63.000us 0.000us 0.00% 10.713ms 714.200us 15 aten::addmm 3.96% 568.000us 5.53% 794.000us 52.933us 10.713ms 80.85% 10.713ms 714.200us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 7.417ms 55.97% 7.417ms 824.111us 9 aten::relu 0.51% 73.000us 1.67% 240.000us 24.000us 0.000us 0.00% 1.866ms 186.600us 10 aten::clamp_min 0.72% 104.000us 1.16% 167.000us 16.700us 1.866ms 14.08% 1.866ms 186.600us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.866ms 14.08% 1.866ms 186.600us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.594ms 12.03% 1.594ms 106.267us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.501ms 11.33% 1.501ms 300.200us 5 aten::cat 0.54% 77.000us 0.77% 110.000us 22.000us 286.000us 2.16% 286.000us 57.200us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 14.356ms Self CUDA time total: 13.251ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 5.15% 872.000us 72.15% 12.224ms 12.224ms 0.000us 0.00% 15.848ms 15.848ms 1 aten::linear 0.26% 44.000us 5.44% 922.000us 61.467us 0.000us 0.00% 12.824ms 854.933us 15 aten::addmm 3.22% 546.000us 4.55% 771.000us 51.400us 12.824ms 80.92% 12.824ms 854.933us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 8.884ms 56.06% 8.884ms 987.111us 9 aten::relu 0.42% 72.000us 1.43% 243.000us 24.300us 0.000us 0.00% 2.227ms 222.700us 10 aten::clamp_min 0.66% 111.000us 1.01% 171.000us 17.100us 2.227ms 14.05% 2.227ms 222.700us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.227ms 14.05% 2.227ms 222.700us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.895ms 11.96% 1.895ms 126.333us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.792ms 11.31% 1.792ms 358.400us 5 aten::cat 0.45% 77.000us 0.66% 111.000us 22.200us 348.000us 2.20% 348.000us 69.600us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 16.942ms Self CUDA time total: 15.848ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 4.34% 888.000us 71.52% 14.651ms 14.651ms 0.000us 0.00% 19.398ms 19.398ms 1 aten::linear 0.21% 42.000us 4.49% 919.000us 61.267us 0.000us 0.00% 15.658ms 1.044ms 15 aten::addmm 2.68% 548.000us 3.76% 770.000us 51.333us 15.658ms 80.72% 15.658ms 1.044ms 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 10.853ms 55.95% 10.853ms 1.206ms 9 aten::relu 0.36% 74.000us 1.19% 244.000us 24.400us 0.000us 0.00% 2.737ms 273.700us 10 aten::clamp_min 0.53% 108.000us 0.83% 170.000us 17.000us 2.737ms 14.11% 2.737ms 273.700us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.737ms 14.11% 2.737ms 273.700us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 2.308ms 11.90% 2.308ms 153.867us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 2.187ms 11.27% 2.187ms 437.400us 5 aten::cat 0.39% 80.000us 0.55% 113.000us 22.600us 451.000us 2.32% 451.000us 90.200us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 20.484ms Self CUDA time total: 19.398ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 5.31% 876.000us 75.67% 12.477ms 12.477ms 0.000us 0.00% 13.898ms 13.898ms 1 aten::linear 0.29% 47.000us 5.75% 948.000us 63.200us 0.000us 0.00% 11.239ms 749.267us 15 aten::addmm 3.51% 578.000us 4.83% 797.000us 53.133us 11.239ms 80.87% 11.239ms 749.267us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 7.860ms 56.55% 7.860ms 786.000us 10 aten::relu 0.44% 72.000us 1.53% 252.000us 25.200us 0.000us 0.00% 1.953ms 195.300us 10 aten::clamp_min 0.72% 118.000us 1.09% 180.000us 18.000us 1.953ms 14.05% 1.953ms 195.300us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.953ms 14.05% 1.953ms 195.300us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.664ms 11.97% 1.664ms 110.933us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.491ms 10.73% 1.491ms 372.750us 4 aten::cat 0.48% 79.000us 0.69% 113.000us 22.600us 300.000us 2.16% 300.000us 60.000us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 16.489ms Self CUDA time total: 13.898ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 7.45% 871.000us 80.47% 9.407ms 9.407ms 0.000us 0.00% 10.605ms 10.605ms 1 aten::linear 0.38% 45.000us 7.86% 919.000us 61.267us 0.000us 0.00% 8.585ms 572.333us 15 aten::addmm 4.62% 540.000us 6.59% 770.000us 51.333us 8.585ms 80.95% 8.585ms 572.333us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 6.006ms 56.63% 6.006ms 600.600us 10 aten::relu 0.62% 73.000us 2.01% 235.000us 23.500us 0.000us 0.00% 1.484ms 148.400us 10 aten::clamp_min 0.87% 102.000us 1.39% 162.000us 16.200us 1.484ms 13.99% 1.484ms 148.400us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.484ms 13.99% 1.484ms 148.400us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.278ms 12.05% 1.278ms 85.200us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.133ms 10.68% 1.133ms 283.250us 4 aten::cat 0.68% 80.000us 0.98% 114.000us 22.800us 212.000us 2.00% 212.000us 42.400us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 11.690ms Self CUDA time total: 10.605ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 6.61% 859.000us 74.29% 9.654ms 9.654ms 0.000us 0.00% 11.912ms 11.912ms 1 aten::linear 0.32% 42.000us 7.03% 913.000us 60.867us 0.000us 0.00% 9.634ms 642.267us 15 aten::addmm 4.16% 541.000us 5.91% 768.000us 51.200us 9.634ms 80.88% 9.634ms 642.267us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 6.725ms 56.46% 6.725ms 672.500us 10 aten::relu 0.64% 83.000us 1.92% 250.000us 25.000us 0.000us 0.00% 1.667ms 166.700us 10 aten::clamp_min 0.82% 107.000us 1.29% 167.000us 16.700us 1.667ms 13.99% 1.667ms 166.700us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.667ms 13.99% 1.667ms 166.700us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.432ms 12.02% 1.432ms 95.467us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.282ms 10.76% 1.282ms 320.500us 4 aten::cat 0.59% 77.000us 0.85% 110.000us 22.000us 250.000us 2.10% 250.000us 50.000us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 12.995ms Self CUDA time total: 11.912ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 4.90% 851.000us 72.48% 12.597ms 12.597ms 0.000us 0.00% 16.292ms 16.292ms 1 aten::linear 0.25% 44.000us 5.22% 908.000us 60.533us 0.000us 0.00% 13.168ms 877.867us 15 aten::addmm 3.10% 538.000us 4.38% 761.000us 50.733us 13.168ms 80.82% 13.168ms 877.867us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 9.273ms 56.92% 9.273ms 843.000us 11 aten::relu 0.45% 78.000us 1.41% 245.000us 24.500us 0.000us 0.00% 2.283ms 228.300us 10 aten::clamp_min 0.62% 107.000us 0.96% 167.000us 16.700us 2.283ms 14.01% 2.283ms 228.300us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.283ms 14.01% 2.283ms 228.300us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.950ms 11.97% 1.950ms 130.000us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.679ms 10.31% 1.679ms 559.667us 3 aten::cat 0.45% 78.000us 0.64% 111.000us 22.200us 371.000us 2.28% 371.000us 74.200us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 17.379ms Self CUDA time total: 16.292ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 3.78% 867.000us 71.09% 16.308ms 16.308ms 0.000us 0.00% 21.869ms 21.869ms 1 aten::linear 0.16% 37.000us 3.99% 915.000us 61.000us 0.000us 0.00% 17.629ms 1.175ms 15 aten::addmm 2.41% 553.000us 3.37% 774.000us 51.600us 17.629ms 80.61% 17.629ms 1.175ms 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 12.237ms 55.96% 12.237ms 1.360ms 9 aten::relu 0.32% 74.000us 1.07% 245.000us 24.500us 0.000us 0.00% 3.094ms 309.400us 10 aten::clamp_min 0.48% 109.000us 0.75% 171.000us 17.100us 3.094ms 14.15% 3.094ms 309.400us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 3.094ms 14.15% 3.094ms 309.400us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 2.597ms 11.88% 2.597ms 173.133us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 2.462ms 11.26% 2.462ms 492.400us 5 aten::cat 0.38% 87.000us 0.52% 120.000us 24.000us 528.000us 2.41% 528.000us 105.600us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 22.941ms Self CUDA time total: 21.869ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 8.46% 845.000us 76.67% 7.662ms 7.662ms 0.000us 0.00% 8.855ms 8.855ms 1 aten::linear 0.48% 48.000us 9.11% 910.000us 60.667us 0.000us 0.00% 7.147ms 476.467us 15 aten::addmm 5.40% 540.000us 7.63% 762.000us 50.800us 7.147ms 80.71% 7.147ms 476.467us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 4.920ms 55.56% 4.920ms 546.667us 9 aten::relu 0.79% 79.000us 2.47% 247.000us 24.700us 0.000us 0.00% 1.250ms 125.000us 10 aten::clamp_min 1.07% 107.000us 1.68% 168.000us 16.800us 1.250ms 14.12% 1.250ms 125.000us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.250ms 14.12% 1.250ms 125.000us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.070ms 12.08% 1.070ms 71.333us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.007ms 11.37% 1.007ms 201.400us 5 aten::cat 0.76% 76.000us 1.10% 110.000us 22.000us 175.000us 1.98% 175.000us 35.000us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 9.993ms Self CUDA time total: 8.855ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 4.31% 882.000us 71.36% 14.590ms 14.590ms 0.000us 0.00% 19.364ms 19.364ms 1 aten::linear 0.21% 42.000us 4.44% 908.000us 60.533us 0.000us 0.00% 15.584ms 1.039ms 15 aten::addmm 2.63% 537.000us 3.72% 761.000us 50.733us 15.584ms 80.48% 15.584ms 1.039ms 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 10.790ms 55.72% 10.790ms 1.199ms 9 aten::relu 0.35% 72.000us 1.19% 243.000us 24.300us 0.000us 0.00% 2.781ms 278.100us 10 aten::clamp_min 0.54% 110.000us 0.84% 171.000us 17.100us 2.781ms 14.36% 2.781ms 278.100us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.781ms 14.36% 2.781ms 278.100us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 2.290ms 11.83% 2.290ms 152.667us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 2.188ms 11.30% 2.188ms 437.600us 5 aten::cat 0.39% 79.000us 0.55% 113.000us 22.600us 450.000us 2.32% 450.000us 90.000us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 20.446ms Self CUDA time total: 19.364ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 6.08% 876.000us 73.65% 10.605ms 10.605ms 0.000us 0.00% 13.308ms 13.308ms 1 aten::linear 0.35% 51.000us 6.39% 920.000us 61.333us 0.000us 0.00% 10.726ms 715.067us 15 aten::addmm 3.77% 543.000us 5.31% 764.000us 50.933us 10.726ms 80.60% 10.726ms 715.067us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 7.482ms 56.22% 7.482ms 748.200us 10 aten::relu 0.49% 70.000us 1.68% 242.000us 24.200us 0.000us 0.00% 1.902ms 190.200us 10 aten::clamp_min 0.76% 109.000us 1.19% 172.000us 17.200us 1.902ms 14.29% 1.902ms 190.200us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.902ms 14.29% 1.902ms 190.200us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.590ms 11.95% 1.590ms 106.000us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.438ms 10.81% 1.438ms 359.500us 4 aten::cat 0.54% 78.000us 0.77% 111.000us 22.200us 288.000us 2.16% 288.000us 57.600us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 14.399ms Self CUDA time total: 13.308ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 7.01% 860.000us 74.77% 9.167ms 9.167ms 0.000us 0.00% 11.171ms 11.171ms 1 aten::linear 0.34% 42.000us 7.46% 915.000us 61.000us 0.000us 0.00% 9.004ms 600.267us 15 aten::addmm 4.47% 548.000us 6.23% 764.000us 50.933us 9.004ms 80.60% 9.004ms 600.267us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 6.281ms 56.23% 6.281ms 628.100us 10 aten::relu 0.56% 69.000us 1.95% 239.000us 23.900us 0.000us 0.00% 1.598ms 159.800us 10 aten::clamp_min 0.91% 111.000us 1.39% 170.000us 17.000us 1.598ms 14.30% 1.598ms 159.800us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.598ms 14.30% 1.598ms 159.800us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.339ms 11.99% 1.339ms 89.267us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.200ms 10.74% 1.200ms 300.000us 4 aten::cat 0.62% 76.000us 0.89% 109.000us 21.800us 233.000us 2.09% 233.000us 46.600us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 12.260ms Self CUDA time total: 11.171ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 4.43% 861.000us 71.77% 13.943ms 13.943ms 0.000us 0.00% 18.339ms 18.339ms 1 aten::linear 0.23% 45.000us 4.65% 904.000us 60.267us 0.000us 0.00% 14.777ms 985.133us 15 aten::addmm 2.76% 536.000us 3.89% 756.000us 50.400us 14.777ms 80.58% 14.777ms 985.133us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 10.246ms 55.87% 10.246ms 1.138ms 9 aten::relu 0.36% 70.000us 1.23% 238.000us 23.800us 0.000us 0.00% 2.625ms 262.500us 10 aten::clamp_min 0.56% 108.000us 0.86% 168.000us 16.800us 2.625ms 14.31% 2.625ms 262.500us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.625ms 14.31% 2.625ms 262.500us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 2.174ms 11.85% 2.174ms 144.933us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 2.075ms 11.31% 2.075ms 415.000us 5 aten::cat 0.40% 78.000us 0.57% 111.000us 22.200us 421.000us 2.30% 421.000us 84.200us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 19.426ms Self CUDA time total: 18.339ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 4.56% 882.000us 72.66% 14.053ms 14.053ms 0.000us 0.00% 17.662ms 17.662ms 1 aten::linear 0.20% 39.000us 4.72% 913.000us 60.867us 0.000us 0.00% 14.221ms 948.067us 15 aten::addmm 2.84% 549.000us 3.97% 768.000us 51.200us 14.221ms 80.52% 14.221ms 948.067us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 9.859ms 55.82% 9.859ms 1.095ms 9 aten::relu 0.35% 68.000us 1.26% 243.000us 24.300us 0.000us 0.00% 2.531ms 253.100us 10 aten::clamp_min 0.59% 114.000us 0.90% 175.000us 17.500us 2.531ms 14.33% 2.531ms 253.100us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.531ms 14.33% 2.531ms 253.100us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 2.095ms 11.86% 2.095ms 139.667us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.996ms 11.30% 1.996ms 399.200us 5 aten::cat 0.40% 77.000us 0.57% 110.000us 22.000us 408.000us 2.31% 408.000us 81.600us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 19.342ms Self CUDA time total: 17.662ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 3.97% 870.000us 72.02% 15.775ms 15.775ms 0.000us 0.00% 20.246ms 20.246ms 1 aten::linear 0.20% 43.000us 4.15% 909.000us 60.600us 0.000us 0.00% 16.274ms 1.085ms 15 aten::addmm 2.48% 544.000us 3.49% 764.000us 50.933us 16.274ms 80.38% 16.274ms 1.085ms 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 11.283ms 55.73% 11.283ms 1.254ms 9 aten::relu 0.32% 70.000us 1.10% 241.000us 24.100us 0.000us 0.00% 2.915ms 291.500us 10 aten::clamp_min 0.51% 111.000us 0.78% 171.000us 17.100us 2.915ms 14.40% 2.915ms 291.500us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.915ms 14.40% 2.915ms 291.500us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 2.391ms 11.81% 2.391ms 159.400us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 2.290ms 11.31% 2.290ms 458.000us 5 aten::cat 0.35% 77.000us 0.50% 110.000us 22.000us 480.000us 2.37% 480.000us 96.000us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 21.905ms Self CUDA time total: 20.246ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 3.68% 870.000us 70.85% 16.768ms 16.768ms 0.000us 0.00% 22.593ms 22.593ms 1 aten::linear 0.18% 43.000us 3.79% 898.000us 59.867us 0.000us 0.00% 18.156ms 1.210ms 15 aten::addmm 2.27% 538.000us 3.19% 755.000us 50.333us 18.156ms 80.36% 18.156ms 1.210ms 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 12.595ms 55.75% 12.595ms 1.399ms 9 aten::relu 0.29% 68.000us 1.00% 237.000us 23.700us 0.000us 0.00% 3.262ms 326.200us 10 aten::clamp_min 0.46% 110.000us 0.71% 169.000us 16.900us 3.262ms 14.44% 3.262ms 326.200us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 3.262ms 14.44% 3.262ms 326.200us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 2.666ms 11.80% 2.666ms 177.733us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 2.551ms 11.29% 2.551ms 510.200us 5 aten::cat 0.33% 78.000us 0.48% 113.000us 22.600us 549.000us 2.43% 549.000us 109.800us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 23.668ms Self CUDA time total: 22.593ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 2.54% 929.000us 74.05% 27.045ms 27.045ms 0.000us 0.00% 35.149ms 35.149ms 1 aten::linear 0.12% 45.000us 5.14% 1.879ms 125.267us 0.000us 0.00% 28.142ms 1.876ms 15 aten::addmm 1.60% 585.000us 4.69% 1.712ms 114.133us 28.142ms 80.06% 28.142ms 1.876ms 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 19.540ms 55.59% 19.540ms 2.171ms 9 aten::relu 0.23% 85.000us 13.08% 4.777ms 477.700us 0.000us 0.00% 5.093ms 509.300us 10 aten::clamp_min 0.47% 172.000us 12.85% 4.692ms 469.200us 5.093ms 14.49% 5.093ms 509.300us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 5.093ms 14.49% 5.093ms 509.300us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 4.118ms 11.72% 4.118ms 274.533us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 3.949ms 11.24% 3.949ms 789.800us 5 aten::cat 0.22% 82.000us 0.32% 116.000us 23.200us 959.000us 2.73% 959.000us 191.800us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 36.522ms Self CUDA time total: 35.149ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 9.36% 861.000us 84.28% 7.753ms 7.753ms 0.000us 0.00% 8.080ms 8.080ms 1 aten::linear 0.47% 43.000us 16.40% 1.509ms 100.600us 0.000us 0.00% 6.348ms 423.200us 15 aten::addmm 5.91% 544.000us 14.84% 1.365ms 91.000us 6.348ms 78.56% 6.348ms 423.200us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 4.399ms 54.44% 4.399ms 439.900us 10 aten::relu 0.75% 69.000us 2.58% 237.000us 23.700us 0.000us 0.00% 1.318ms 131.800us 10 aten::clamp_min 1.18% 109.000us 1.83% 168.000us 16.800us 1.318ms 16.31% 1.318ms 131.800us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.318ms 16.31% 1.318ms 131.800us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 924.000us 11.44% 924.000us 61.600us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 884.000us 10.94% 884.000us 221.000us 4 aten::cat 0.87% 80.000us 1.29% 119.000us 23.800us 166.000us 2.05% 166.000us 33.200us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 9.199ms Self CUDA time total: 8.080ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 6.34% 860.000us 75.61% 10.254ms 10.254ms 0.000us 0.00% 11.853ms 11.853ms 1 aten::linear 0.32% 44.000us 6.82% 925.000us 61.667us 0.000us 0.00% 9.278ms 618.533us 15 aten::addmm 4.03% 547.000us 5.71% 775.000us 51.667us 9.278ms 78.28% 9.278ms 618.533us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 6.356ms 53.62% 6.356ms 706.222us 9 aten::relu 0.56% 76.000us 1.80% 244.000us 24.400us 0.000us 0.00% 1.952ms 195.200us 10 aten::clamp_min 0.81% 110.000us 1.24% 168.000us 16.800us 1.952ms 16.47% 1.952ms 195.200us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.952ms 16.47% 1.952ms 195.200us 10 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.376ms 11.61% 1.376ms 275.200us 5 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.339ms 11.30% 1.339ms 89.267us 15 aten::cat 0.81% 110.000us 1.05% 143.000us 28.600us 277.000us 2.34% 277.000us 55.400us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 13.562ms Self CUDA time total: 11.853ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 5.91% 871.000us 73.34% 10.818ms 10.818ms 0.000us 0.00% 13.637ms 13.637ms 1 aten::linear 0.31% 45.000us 6.16% 908.000us 60.533us 0.000us 0.00% 11.007ms 733.800us 15 aten::addmm 3.63% 536.000us 5.19% 766.000us 51.067us 11.007ms 80.71% 11.007ms 733.800us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 7.618ms 55.86% 7.618ms 846.444us 9 aten::relu 0.47% 69.000us 1.62% 239.000us 23.900us 0.000us 0.00% 1.920ms 192.000us 10 aten::clamp_min 0.73% 108.000us 1.15% 170.000us 17.000us 1.920ms 14.08% 1.920ms 192.000us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.920ms 14.08% 1.920ms 192.000us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.638ms 12.01% 1.638ms 109.200us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.542ms 11.31% 1.542ms 308.400us 5 aten::cat 0.54% 80.000us 0.77% 113.000us 22.600us 300.000us 2.20% 300.000us 60.000us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 14.750ms Self CUDA time total: 13.637ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 6.65% 869.000us 74.07% 9.684ms 9.684ms 0.000us 0.00% 11.973ms 11.973ms 1 aten::linear 0.29% 38.000us 7.03% 919.000us 61.267us 0.000us 0.00% 9.694ms 646.267us 15 aten::addmm 4.18% 546.000us 5.90% 771.000us 51.400us 9.694ms 80.97% 9.694ms 646.267us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 6.708ms 56.03% 6.708ms 745.333us 9 aten::relu 0.54% 71.000us 1.82% 238.000us 23.800us 0.000us 0.00% 1.681ms 168.100us 10 aten::clamp_min 0.82% 107.000us 1.28% 167.000us 16.700us 1.681ms 14.04% 1.681ms 168.100us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.681ms 14.04% 1.681ms 168.100us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.444ms 12.06% 1.444ms 96.267us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.362ms 11.38% 1.362ms 272.400us 5 aten::cat 0.59% 77.000us 0.85% 111.000us 22.200us 248.000us 2.07% 248.000us 49.600us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 13.074ms Self CUDA time total: 11.973ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 6.17% 857.000us 73.66% 10.231ms 10.231ms 0.000us 0.00% 12.801ms 12.801ms 1 aten::linear 0.31% 43.000us 6.54% 909.000us 60.600us 0.000us 0.00% 10.353ms 690.200us 15 aten::addmm 3.91% 543.000us 5.51% 765.000us 51.000us 10.353ms 80.88% 10.353ms 690.200us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 7.161ms 55.94% 7.161ms 795.667us 9 aten::relu 0.48% 67.000us 1.69% 235.000us 23.500us 0.000us 0.00% 1.797ms 179.700us 10 aten::clamp_min 0.78% 108.000us 1.21% 168.000us 16.800us 1.797ms 14.04% 1.797ms 179.700us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.797ms 14.04% 1.797ms 179.700us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.539ms 12.02% 1.539ms 102.600us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.448ms 11.31% 1.448ms 289.600us 5 aten::cat 0.60% 84.000us 0.84% 117.000us 23.400us 273.000us 2.13% 273.000us 54.600us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 13.890ms Self CUDA time total: 12.801ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 3.47% 858.000us 70.65% 17.484ms 17.484ms 0.000us 0.00% 23.661ms 23.661ms 1 aten::linear 0.16% 39.000us 3.66% 905.000us 60.333us 0.000us 0.00% 19.078ms 1.272ms 15 aten::addmm 2.17% 538.000us 3.10% 767.000us 51.133us 19.078ms 80.63% 19.078ms 1.272ms 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 13.249ms 56.00% 13.249ms 1.472ms 9 aten::relu 0.27% 68.000us 0.99% 244.000us 24.400us 0.000us 0.00% 3.354ms 335.400us 10 aten::clamp_min 0.43% 106.000us 0.71% 176.000us 17.600us 3.354ms 14.18% 3.354ms 335.400us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 3.354ms 14.18% 3.354ms 335.400us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 2.807ms 11.86% 2.807ms 187.133us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 2.667ms 11.27% 2.667ms 533.400us 5 aten::cat 0.31% 76.000us 0.44% 110.000us 22.000us 570.000us 2.41% 570.000us 114.000us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 24.746ms Self CUDA time total: 23.661ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 5.41% 881.000us 74.02% 12.051ms 12.051ms 0.000us 0.00% 14.574ms 14.574ms 1 aten::linear 0.29% 48.000us 5.71% 929.000us 61.933us 0.000us 0.00% 11.793ms 786.200us 15 aten::addmm 3.43% 559.000us 4.79% 780.000us 52.000us 11.793ms 80.92% 11.793ms 786.200us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 8.175ms 56.09% 8.175ms 908.333us 9 aten::relu 0.44% 71.000us 1.46% 238.000us 23.800us 0.000us 0.00% 2.042ms 204.200us 10 aten::clamp_min 0.66% 108.000us 1.03% 167.000us 16.700us 2.042ms 14.01% 2.042ms 204.200us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.042ms 14.01% 2.042ms 204.200us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.744ms 11.97% 1.744ms 116.267us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.650ms 11.32% 1.650ms 330.000us 5 aten::cat 0.50% 81.000us 0.69% 113.000us 22.600us 315.000us 2.16% 315.000us 63.000us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 16.281ms Self CUDA time total: 14.574ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 4.95% 870.000us 72.17% 12.680ms 12.680ms 0.000us 0.00% 16.477ms 16.477ms 1 aten::linear 0.27% 47.000us 5.22% 918.000us 61.200us 0.000us 0.00% 13.324ms 888.267us 15 aten::addmm 3.02% 530.000us 4.37% 767.000us 51.133us 13.324ms 80.86% 13.324ms 888.267us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 9.227ms 56.00% 9.227ms 1.025ms 9 aten::relu 0.39% 68.000us 1.35% 238.000us 23.800us 0.000us 0.00% 2.314ms 231.400us 10 aten::clamp_min 0.62% 109.000us 0.97% 170.000us 17.000us 2.314ms 14.04% 2.314ms 231.400us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.314ms 14.04% 2.314ms 231.400us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.966ms 11.93% 1.966ms 131.067us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.862ms 11.30% 1.862ms 372.400us 5 aten::cat 0.43% 76.000us 0.62% 109.000us 21.800us 367.000us 2.23% 367.000us 73.400us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 17.570ms Self CUDA time total: 16.477ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 9.06% 876.000us 83.89% 8.108ms 8.108ms 0.000us 0.00% 8.485ms 8.485ms 1 aten::linear 0.48% 46.000us 9.43% 911.000us 60.733us 0.000us 0.00% 6.875ms 458.333us 15 aten::addmm 5.56% 537.000us 7.84% 758.000us 50.533us 6.875ms 81.03% 6.875ms 458.333us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 4.856ms 57.23% 4.856ms 441.455us 11 aten::relu 0.78% 75.000us 2.56% 247.000us 24.700us 0.000us 0.00% 1.174ms 117.400us 10 aten::clamp_min 1.15% 111.000us 1.78% 172.000us 17.200us 1.174ms 13.84% 1.174ms 117.400us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.174ms 13.84% 1.174ms 117.400us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.034ms 12.19% 1.034ms 68.933us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 849.000us 10.01% 849.000us 283.000us 3 aten::cat 0.79% 76.000us 1.13% 109.000us 21.800us 165.000us 1.94% 165.000us 33.000us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 9.665ms Self CUDA time total: 8.485ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 7.21% 877.000us 77.22% 9.389ms 9.389ms 0.000us 0.00% 10.233ms 10.233ms 1 aten::linear 0.36% 44.000us 7.80% 948.000us 63.200us 0.000us 0.00% 8.281ms 552.067us 15 aten::addmm 4.66% 567.000us 6.53% 794.000us 52.933us 8.281ms 80.92% 8.281ms 552.067us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 5.845ms 57.12% 5.845ms 531.364us 11 aten::relu 0.62% 75.000us 2.01% 245.000us 24.500us 0.000us 0.00% 1.432ms 143.200us 10 aten::clamp_min 0.88% 107.000us 1.40% 170.000us 17.000us 1.432ms 13.99% 1.432ms 143.200us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.432ms 13.99% 1.432ms 143.200us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.236ms 12.08% 1.236ms 82.400us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.029ms 10.06% 1.029ms 343.000us 3 aten::cat 0.64% 78.000us 0.93% 113.000us 22.600us 209.000us 2.04% 209.000us 41.800us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 12.159ms Self CUDA time total: 10.233ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 4.68% 882.000us 72.86% 13.720ms 13.720ms 0.000us 0.00% 17.105ms 17.105ms 1 aten::linear 0.21% 40.000us 4.88% 919.000us 61.267us 0.000us 0.00% 13.838ms 922.533us 15 aten::addmm 2.94% 554.000us 4.09% 770.000us 51.333us 13.838ms 80.90% 13.838ms 922.533us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 9.587ms 56.05% 9.587ms 1.065ms 9 aten::relu 0.38% 71.000us 1.31% 247.000us 24.700us 0.000us 0.00% 2.393ms 239.300us 10 aten::clamp_min 0.62% 116.000us 0.93% 176.000us 17.600us 2.393ms 13.99% 2.393ms 239.300us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.393ms 13.99% 2.393ms 239.300us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 2.043ms 11.94% 2.043ms 136.200us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.934ms 11.31% 1.934ms 386.800us 5 aten::cat 0.42% 80.000us 0.60% 113.000us 22.600us 386.000us 2.26% 386.000us 77.200us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 18.830ms Self CUDA time total: 17.105ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 7.53% 872.000us 75.16% 8.706ms 8.706ms 0.000us 0.00% 10.493ms 10.493ms 1 aten::linear 0.36% 42.000us 7.88% 913.000us 60.867us 0.000us 0.00% 8.499ms 566.600us 15 aten::addmm 4.66% 540.000us 6.60% 765.000us 51.000us 8.499ms 81.00% 8.499ms 566.600us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 5.950ms 56.70% 5.950ms 595.000us 10 aten::relu 0.62% 72.000us 2.12% 246.000us 24.600us 0.000us 0.00% 1.466ms 146.600us 10 aten::clamp_min 0.98% 114.000us 1.50% 174.000us 17.400us 1.466ms 13.97% 1.466ms 146.600us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.466ms 13.97% 1.466ms 146.600us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.269ms 12.09% 1.269ms 84.600us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.122ms 10.69% 1.122ms 280.500us 4 aten::cat 0.67% 78.000us 0.96% 111.000us 22.200us 211.000us 2.01% 211.000us 42.200us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 11.584ms Self CUDA time total: 10.493ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 3.93% 867.000us 71.10% 15.679ms 15.679ms 0.000us 0.00% 20.983ms 20.983ms 1 aten::linear 0.19% 42.000us 4.11% 906.000us 60.400us 0.000us 0.00% 16.927ms 1.128ms 15 aten::addmm 2.45% 541.000us 3.46% 762.000us 50.800us 16.927ms 80.67% 16.927ms 1.128ms 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 11.747ms 55.98% 11.747ms 1.305ms 9 aten::relu 0.33% 72.000us 1.10% 242.000us 24.200us 0.000us 0.00% 2.963ms 296.300us 10 aten::clamp_min 0.49% 109.000us 0.77% 170.000us 17.000us 2.963ms 14.12% 2.963ms 296.300us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.963ms 14.12% 2.963ms 296.300us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 2.497ms 11.90% 2.497ms 166.467us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 2.371ms 11.30% 2.371ms 474.200us 5 aten::cat 0.35% 78.000us 0.51% 112.000us 22.400us 502.000us 2.39% 502.000us 100.400us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 22.051ms Self CUDA time total: 20.983ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 4.58% 867.000us 71.91% 13.613ms 13.613ms 0.000us 0.00% 17.829ms 17.829ms 1 aten::linear 0.25% 48.000us 4.87% 921.000us 61.400us 0.000us 0.00% 14.418ms 961.200us 15 aten::addmm 2.85% 539.000us 4.05% 766.000us 51.067us 14.418ms 80.87% 14.418ms 961.200us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 10.002ms 56.10% 10.002ms 1.111ms 9 aten::relu 0.39% 74.000us 1.29% 245.000us 24.500us 0.000us 0.00% 2.502ms 250.200us 10 aten::clamp_min 0.58% 109.000us 0.90% 171.000us 17.100us 2.502ms 14.03% 2.502ms 250.200us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.502ms 14.03% 2.502ms 250.200us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 2.129ms 11.94% 2.129ms 141.933us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 2.013ms 11.29% 2.013ms 402.600us 5 aten::cat 0.41% 77.000us 0.58% 110.000us 22.000us 402.000us 2.25% 402.000us 80.400us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 18.931ms Self CUDA time total: 17.829ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 7.15% 860.000us 78.71% 9.469ms 9.469ms 0.000us 0.00% 10.931ms 10.931ms 1 aten::linear 0.36% 43.000us 7.56% 910.000us 60.667us 0.000us 0.00% 8.857ms 590.467us 15 aten::addmm 4.51% 542.000us 6.32% 760.000us 50.667us 8.857ms 81.03% 8.857ms 590.467us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 6.280ms 57.45% 6.280ms 570.909us 11 aten::relu 0.62% 74.000us 1.98% 238.000us 23.800us 0.000us 0.00% 1.529ms 152.900us 10 aten::clamp_min 0.86% 103.000us 1.36% 164.000us 16.400us 1.529ms 13.99% 1.529ms 152.900us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.529ms 13.99% 1.529ms 152.900us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.320ms 12.08% 1.320ms 88.000us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.082ms 9.90% 1.082ms 360.667us 3 aten::cat 0.67% 81.000us 0.96% 116.000us 23.200us 221.000us 2.02% 221.000us 44.200us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 12.031ms Self CUDA time total: 10.931ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 3.02% 876.000us 70.10% 20.321ms 20.321ms 0.000us 0.00% 27.903ms 27.903ms 1 aten::linear 0.13% 38.000us 3.13% 906.000us 60.400us 0.000us 0.00% 22.483ms 1.499ms 15 aten::addmm 1.83% 530.000us 2.61% 756.000us 50.400us 22.483ms 80.58% 22.483ms 1.499ms 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 15.620ms 55.98% 15.620ms 1.736ms 9 aten::relu 0.25% 73.000us 0.83% 242.000us 24.200us 0.000us 0.00% 3.953ms 395.300us 10 aten::clamp_min 0.37% 107.000us 0.58% 169.000us 16.900us 3.953ms 14.17% 3.953ms 395.300us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 3.953ms 14.17% 3.953ms 395.300us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 3.302ms 11.83% 3.302ms 220.133us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 3.139ms 11.25% 3.139ms 627.800us 5 aten::cat 0.27% 79.000us 0.39% 112.000us 22.400us 702.000us 2.52% 702.000us 140.400us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 28.990ms Self CUDA time total: 27.903ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 4.60% 876.000us 73.03% 13.903ms 13.903ms 0.000us 0.00% 17.323ms 17.323ms 1 aten::linear 0.24% 46.000us 4.84% 921.000us 61.400us 0.000us 0.00% 13.996ms 933.067us 15 aten::addmm 2.90% 553.000us 4.04% 770.000us 51.333us 13.996ms 80.79% 13.996ms 933.067us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 9.709ms 56.05% 9.709ms 1.079ms 9 aten::relu 0.39% 75.000us 1.27% 241.000us 24.100us 0.000us 0.00% 2.429ms 242.900us 10 aten::clamp_min 0.55% 105.000us 0.87% 166.000us 16.600us 2.429ms 14.02% 2.429ms 242.900us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.429ms 14.02% 2.429ms 242.900us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 2.066ms 11.93% 2.066ms 137.733us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.956ms 11.29% 1.956ms 391.200us 5 aten::cat 0.49% 93.000us 0.66% 126.000us 25.200us 400.000us 2.31% 400.000us 80.000us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 19.037ms Self CUDA time total: 17.323ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 5.94% 861.000us 73.60% 10.675ms 10.675ms 0.000us 0.00% 13.403ms 13.403ms 1 aten::linear 0.28% 41.000us 6.32% 916.000us 61.067us 0.000us 0.00% 10.824ms 721.600us 15 aten::addmm 3.76% 546.000us 5.26% 763.000us 50.867us 10.824ms 80.76% 10.824ms 721.600us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 7.568ms 56.46% 7.568ms 756.800us 10 aten::relu 0.52% 76.000us 1.68% 244.000us 24.400us 0.000us 0.00% 1.891ms 189.100us 10 aten::clamp_min 0.74% 107.000us 1.16% 168.000us 16.800us 1.891ms 14.11% 1.891ms 189.100us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.891ms 14.11% 1.891ms 189.100us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.612ms 12.03% 1.612ms 107.467us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.443ms 10.77% 1.443ms 360.750us 4 aten::cat 0.53% 77.000us 0.76% 110.000us 22.000us 291.000us 2.17% 291.000us 58.200us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 14.505ms Self CUDA time total: 13.403ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 6.14% 852.000us 73.65% 10.215ms 10.215ms 0.000us 0.00% 12.779ms 12.779ms 1 aten::linear 0.32% 44.000us 6.76% 937.000us 62.467us 0.000us 0.00% 10.140ms 676.000us 15 aten::addmm 4.09% 567.000us 5.68% 788.000us 52.533us 10.140ms 79.35% 10.140ms 676.000us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 7.064ms 55.28% 7.064ms 706.400us 10 aten::relu 0.53% 74.000us 1.74% 242.000us 24.200us 0.000us 0.00% 1.986ms 198.600us 10 aten::clamp_min 0.76% 105.000us 1.21% 168.000us 16.800us 1.986ms 15.54% 1.986ms 198.600us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.986ms 15.54% 1.986ms 198.600us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.483ms 11.60% 1.483ms 98.867us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.387ms 10.85% 1.387ms 346.750us 4 aten::cat 0.56% 78.000us 0.80% 111.000us 22.200us 286.000us 2.24% 286.000us 57.200us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 13.870ms Self CUDA time total: 12.779ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 5.55% 906.000us 73.98% 12.088ms 12.088ms 0.000us 0.00% 14.635ms 14.635ms 1 aten::linear 0.26% 42.000us 5.62% 918.000us 61.200us 0.000us 0.00% 11.597ms 773.133us 15 aten::addmm 3.32% 542.000us 4.72% 771.000us 51.400us 11.597ms 79.24% 11.597ms 773.133us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 8.077ms 55.19% 8.077ms 807.700us 10 aten::relu 0.43% 71.000us 1.47% 241.000us 24.100us 0.000us 0.00% 2.277ms 227.700us 10 aten::clamp_min 0.67% 109.000us 1.04% 170.000us 17.000us 2.277ms 15.56% 2.277ms 227.700us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.277ms 15.56% 2.277ms 227.700us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.689ms 11.54% 1.689ms 112.600us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.596ms 10.91% 1.596ms 399.000us 4 aten::cat 0.48% 79.000us 0.69% 112.000us 22.400us 341.000us 2.33% 341.000us 68.200us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 16.339ms Self CUDA time total: 14.635ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 6.09% 878.000us 73.49% 10.592ms 10.592ms 0.000us 0.00% 13.316ms 13.316ms 1 aten::linear 0.30% 43.000us 6.38% 920.000us 61.333us 0.000us 0.00% 10.553ms 703.533us 15 aten::addmm 3.78% 545.000us 5.32% 766.000us 51.067us 10.553ms 79.25% 10.553ms 703.533us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 7.347ms 55.17% 7.347ms 734.700us 10 aten::relu 0.51% 73.000us 1.65% 238.000us 23.800us 0.000us 0.00% 2.071ms 207.100us 10 aten::clamp_min 0.72% 104.000us 1.14% 165.000us 16.500us 2.071ms 15.55% 2.071ms 207.100us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.071ms 15.55% 2.071ms 207.100us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.538ms 11.55% 1.538ms 102.533us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.443ms 10.84% 1.443ms 360.750us 4 aten::cat 0.56% 80.000us 0.79% 114.000us 22.800us 305.000us 2.29% 305.000us 61.000us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 14.412ms Self CUDA time total: 13.316ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 5.89% 874.000us 74.65% 11.078ms 11.078ms 0.000us 0.00% 13.107ms 13.107ms 1 aten::linear 0.29% 43.000us 6.19% 919.000us 61.267us 0.000us 0.00% 10.392ms 692.800us 15 aten::addmm 3.73% 554.000us 5.21% 773.000us 51.533us 10.392ms 79.29% 10.392ms 692.800us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 7.235ms 55.20% 7.235ms 723.500us 10 aten::relu 0.47% 70.000us 1.60% 238.000us 23.800us 0.000us 0.00% 2.042ms 204.200us 10 aten::clamp_min 0.71% 106.000us 1.13% 168.000us 16.800us 2.042ms 15.58% 2.042ms 204.200us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.042ms 15.58% 2.042ms 204.200us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.514ms 11.55% 1.514ms 100.933us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.421ms 10.84% 1.421ms 355.250us 4 aten::cat 0.54% 80.000us 0.76% 113.000us 22.600us 298.000us 2.27% 298.000us 59.600us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 14.839ms Self CUDA time total: 13.107ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 4.06% 812.000us 71.70% 14.346ms 14.346ms 0.000us 0.00% 18.985ms 18.985ms 1 aten::linear 0.20% 41.000us 4.61% 922.000us 61.467us 0.000us 0.00% 15.327ms 1.022ms 15 aten::addmm 2.73% 546.000us 3.86% 773.000us 51.533us 15.327ms 80.73% 15.327ms 1.022ms 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 10.633ms 56.01% 10.633ms 1.181ms 9 aten::relu 0.36% 73.000us 1.23% 246.000us 24.600us 0.000us 0.00% 2.678ms 267.800us 10 aten::clamp_min 0.55% 110.000us 0.86% 173.000us 17.300us 2.678ms 14.11% 2.678ms 267.800us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.678ms 14.11% 2.678ms 267.800us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 2.261ms 11.91% 2.261ms 150.733us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 2.148ms 11.31% 2.148ms 429.600us 5 aten::cat 0.38% 76.000us 0.60% 120.000us 24.000us 437.000us 2.30% 437.000us 87.400us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 20.009ms Self CUDA time total: 18.985ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 7.87% 866.000us 77.77% 8.556ms 8.556ms 0.000us 0.00% 9.250ms 9.250ms 1 aten::linear 0.39% 43.000us 8.32% 915.000us 61.000us 0.000us 0.00% 7.490ms 499.333us 15 aten::addmm 5.02% 552.000us 6.99% 769.000us 51.267us 7.490ms 80.97% 7.490ms 499.333us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 5.170ms 55.89% 5.170ms 574.444us 9 aten::relu 0.64% 70.000us 2.14% 235.000us 23.500us 0.000us 0.00% 1.288ms 128.800us 10 aten::clamp_min 0.94% 103.000us 1.50% 165.000us 16.500us 1.288ms 13.92% 1.288ms 128.800us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.288ms 13.92% 1.288ms 128.800us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.124ms 12.15% 1.124ms 74.933us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.049ms 11.34% 1.049ms 209.800us 5 aten::cat 0.71% 78.000us 1.01% 111.000us 22.200us 179.000us 1.94% 179.000us 35.800us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 11.001ms Self CUDA time total: 9.250ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 3.22% 875.000us 70.35% 19.141ms 19.141ms 0.000us 0.00% 26.096ms 26.096ms 1 aten::linear 0.17% 45.000us 3.36% 913.000us 60.867us 0.000us 0.00% 21.031ms 1.402ms 15 aten::addmm 1.97% 536.000us 2.80% 761.000us 50.733us 21.031ms 80.59% 21.031ms 1.402ms 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 14.610ms 55.99% 14.610ms 1.623ms 9 aten::relu 0.25% 68.000us 0.89% 241.000us 24.100us 0.000us 0.00% 3.695ms 369.500us 10 aten::clamp_min 0.42% 113.000us 0.64% 173.000us 17.300us 3.695ms 14.16% 3.695ms 369.500us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 3.695ms 14.16% 3.695ms 369.500us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 3.088ms 11.83% 3.088ms 205.867us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 2.936ms 11.25% 2.936ms 587.200us 5 aten::cat 0.30% 81.000us 0.42% 113.000us 22.600us 646.000us 2.48% 646.000us 129.200us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 27.209ms Self CUDA time total: 26.096ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 6.16% 866.000us 73.58% 10.343ms 10.343ms 0.000us 0.00% 12.969ms 12.969ms 1 aten::linear 0.33% 46.000us 6.45% 907.000us 60.467us 0.000us 0.00% 10.486ms 699.067us 15 aten::addmm 3.92% 551.000us 5.41% 760.000us 50.667us 10.486ms 80.85% 10.486ms 699.067us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 7.245ms 55.86% 7.245ms 805.000us 9 aten::relu 0.49% 69.000us 1.69% 237.000us 23.700us 0.000us 0.00% 1.823ms 182.300us 10 aten::clamp_min 0.77% 108.000us 1.20% 168.000us 16.800us 1.823ms 14.06% 1.823ms 182.300us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.823ms 14.06% 1.823ms 182.300us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.561ms 12.04% 1.561ms 104.067us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.470ms 11.33% 1.470ms 294.000us 5 aten::cat 0.55% 78.000us 0.79% 111.000us 22.200us 276.000us 2.13% 276.000us 55.200us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 14.057ms Self CUDA time total: 12.969ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 5.47% 860.000us 72.75% 11.442ms 11.442ms 0.000us 0.00% 14.639ms 14.639ms 1 aten::linear 0.25% 40.000us 5.84% 919.000us 61.267us 0.000us 0.00% 11.839ms 789.267us 15 aten::addmm 3.53% 555.000us 4.91% 773.000us 51.533us 11.839ms 80.87% 11.839ms 789.267us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 8.283ms 56.58% 8.283ms 828.300us 10 aten::relu 0.45% 70.000us 1.51% 237.000us 23.700us 0.000us 0.00% 2.058ms 205.800us 10 aten::clamp_min 0.67% 106.000us 1.06% 167.000us 16.700us 2.058ms 14.06% 2.058ms 205.800us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.058ms 14.06% 2.058ms 205.800us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.755ms 11.99% 1.755ms 117.000us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.566ms 10.70% 1.566ms 391.500us 4 aten::cat 0.48% 76.000us 0.69% 109.000us 21.800us 318.000us 2.17% 318.000us 63.600us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 15.728ms Self CUDA time total: 14.639ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 4.47% 862.000us 71.90% 13.862ms 13.862ms 0.000us 0.00% 18.188ms 18.188ms 1 aten::linear 0.21% 41.000us 4.76% 917.000us 61.133us 0.000us 0.00% 14.698ms 979.867us 15 aten::addmm 2.83% 545.000us 3.98% 768.000us 51.200us 14.698ms 80.81% 14.698ms 979.867us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 10.194ms 56.05% 10.194ms 1.133ms 9 aten::relu 0.38% 73.000us 1.30% 250.000us 25.000us 0.000us 0.00% 2.560ms 256.000us 10 aten::clamp_min 0.61% 118.000us 0.92% 177.000us 17.700us 2.560ms 14.08% 2.560ms 256.000us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.560ms 14.08% 2.560ms 256.000us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 2.168ms 11.92% 2.168ms 144.533us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 2.059ms 11.32% 2.059ms 411.800us 5 aten::cat 0.40% 78.000us 0.58% 112.000us 22.400us 415.000us 2.28% 415.000us 83.000us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 19.279ms Self CUDA time total: 18.188ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 4.07% 886.000us 72.32% 15.728ms 15.728ms 0.000us 0.00% 20.028ms 20.028ms 1 aten::linear 0.20% 44.000us 4.28% 930.000us 62.000us 0.000us 0.00% 16.160ms 1.077ms 15 aten::addmm 2.58% 562.000us 3.61% 784.000us 52.267us 16.160ms 80.69% 16.160ms 1.077ms 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 11.201ms 55.93% 11.201ms 1.245ms 9 aten::relu 0.33% 72.000us 1.11% 242.000us 24.200us 0.000us 0.00% 2.832ms 283.200us 10 aten::clamp_min 0.49% 107.000us 0.78% 170.000us 17.000us 2.832ms 14.14% 2.832ms 283.200us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.832ms 14.14% 2.832ms 283.200us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 2.386ms 11.91% 2.386ms 159.067us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 2.264ms 11.30% 2.264ms 452.800us 5 aten::cat 0.37% 80.000us 0.52% 114.000us 22.800us 469.000us 2.34% 469.000us 93.800us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 21.747ms Self CUDA time total: 20.028ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 5.46% 872.000us 72.66% 11.615ms 11.615ms 0.000us 0.00% 14.913ms 14.913ms 1 aten::linear 0.26% 41.000us 5.79% 925.000us 61.667us 0.000us 0.00% 11.784ms 785.600us 15 aten::addmm 3.46% 553.000us 4.85% 775.000us 51.667us 11.784ms 79.02% 11.784ms 785.600us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 8.107ms 54.36% 8.107ms 900.778us 9 aten::relu 0.48% 77.000us 1.56% 250.000us 25.000us 0.000us 0.00% 2.358ms 235.800us 10 aten::clamp_min 0.69% 111.000us 1.08% 173.000us 17.300us 2.358ms 15.81% 2.358ms 235.800us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.358ms 15.81% 2.358ms 235.800us 10 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.716ms 11.51% 1.716ms 343.200us 5 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.709ms 11.46% 1.709ms 113.933us 15 aten::cat 0.48% 77.000us 0.69% 110.000us 22.000us 347.000us 2.33% 347.000us 69.400us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 15.985ms Self CUDA time total: 14.913ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 5.39% 863.000us 74.04% 11.853ms 11.853ms 0.000us 0.00% 14.302ms 14.302ms 1 aten::linear 0.26% 42.000us 5.70% 912.000us 60.800us 0.000us 0.00% 11.294ms 752.933us 15 aten::addmm 3.44% 550.000us 4.82% 772.000us 51.467us 11.294ms 78.97% 11.294ms 752.933us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 7.864ms 54.99% 7.864ms 786.400us 10 aten::relu 0.42% 68.000us 1.46% 234.000us 23.400us 0.000us 0.00% 2.264ms 226.400us 10 aten::clamp_min 0.64% 103.000us 1.04% 166.000us 16.600us 2.264ms 15.83% 2.264ms 226.400us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.264ms 15.83% 2.264ms 226.400us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.638ms 11.45% 1.638ms 109.200us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.558ms 10.89% 1.558ms 389.500us 4 aten::cat 0.49% 79.000us 0.70% 112.000us 22.400us 336.000us 2.35% 336.000us 67.200us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 16.008ms Self CUDA time total: 14.302ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 6.43% 884.000us 73.74% 10.143ms 10.143ms 0.000us 0.00% 12.650ms 12.650ms 1 aten::linear 0.33% 46.000us 6.70% 921.000us 61.400us 0.000us 0.00% 10.002ms 666.800us 15 aten::addmm 3.96% 545.000us 5.58% 767.000us 51.133us 10.002ms 79.07% 10.002ms 666.800us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 6.960ms 55.02% 6.960ms 696.000us 10 aten::relu 0.55% 75.000us 1.83% 252.000us 25.200us 0.000us 0.00% 2.000ms 200.000us 10 aten::clamp_min 0.85% 117.000us 1.29% 177.000us 17.700us 2.000ms 15.81% 2.000ms 200.000us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.000ms 15.81% 2.000ms 200.000us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.453ms 11.49% 1.453ms 96.867us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.375ms 10.87% 1.375ms 343.750us 4 aten::cat 0.57% 78.000us 0.81% 112.000us 22.400us 287.000us 2.27% 287.000us 57.400us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 13.756ms Self CUDA time total: 12.650ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 4.38% 884.000us 72.63% 14.646ms 14.646ms 0.000us 0.00% 18.466ms 18.466ms 1 aten::linear 0.24% 48.000us 4.55% 917.000us 61.133us 0.000us 0.00% 14.537ms 969.133us 15 aten::addmm 2.77% 559.000us 3.83% 773.000us 51.533us 14.537ms 78.72% 14.537ms 969.133us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 10.018ms 54.25% 10.018ms 1.113ms 9 aten::relu 0.34% 68.000us 1.16% 233.000us 23.300us 0.000us 0.00% 2.944ms 294.400us 10 aten::clamp_min 0.53% 106.000us 0.82% 165.000us 16.500us 2.944ms 15.94% 2.944ms 294.400us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.944ms 15.94% 2.944ms 294.400us 10 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 2.114ms 11.45% 2.114ms 422.800us 5 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 2.099ms 11.37% 2.099ms 139.933us 15 aten::cat 0.39% 79.000us 0.56% 112.000us 22.400us 466.000us 2.52% 466.000us 93.200us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 20.164ms Self CUDA time total: 18.466ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 3.57% 868.000us 70.97% 17.258ms 17.258ms 0.000us 0.00% 23.250ms 23.250ms 1 aten::linear 0.19% 46.000us 3.82% 930.000us 62.000us 0.000us 0.00% 18.260ms 1.217ms 15 aten::addmm 2.26% 549.000us 3.17% 770.000us 51.333us 18.260ms 78.54% 18.260ms 1.217ms 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 12.596ms 54.18% 12.596ms 1.400ms 9 aten::relu 0.28% 68.000us 0.97% 236.000us 23.600us 0.000us 0.00% 3.711ms 371.100us 10 aten::clamp_min 0.44% 108.000us 0.69% 168.000us 16.800us 3.711ms 15.96% 3.711ms 371.100us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 3.711ms 15.96% 3.711ms 371.100us 10 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 2.654ms 11.42% 2.654ms 530.800us 5 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 2.632ms 11.32% 2.632ms 175.467us 15 aten::cat 0.32% 78.000us 0.46% 113.000us 22.600us 624.000us 2.68% 624.000us 124.800us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 24.319ms Self CUDA time total: 23.250ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 4.05% 869.000us 71.20% 15.280ms 15.280ms 0.000us 0.00% 20.375ms 20.375ms 1 aten::linear 0.21% 44.000us 4.19% 899.000us 59.933us 0.000us 0.00% 15.885ms 1.059ms 15 aten::addmm 2.47% 531.000us 3.49% 748.000us 49.867us 15.885ms 77.96% 15.885ms 1.059ms 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 10.920ms 53.60% 10.920ms 1.213ms 9 aten::relu 0.34% 74.000us 1.13% 242.000us 24.200us 0.000us 0.00% 3.390ms 339.000us 10 aten::clamp_min 0.50% 108.000us 0.78% 168.000us 16.800us 3.390ms 16.64% 3.390ms 339.000us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 3.390ms 16.64% 3.390ms 339.000us 10 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 2.354ms 11.55% 2.354ms 470.800us 5 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 2.274ms 11.16% 2.274ms 151.600us 15 aten::cat 0.35% 75.000us 0.51% 109.000us 21.800us 541.000us 2.66% 541.000us 108.200us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 21.462ms Self CUDA time total: 20.375ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 8.00% 837.000us 77.71% 8.131ms 8.131ms 0.000us 0.00% 8.741ms 8.741ms 1 aten::linear 0.40% 42.000us 8.61% 901.000us 60.067us 0.000us 0.00% 6.866ms 457.733us 15 aten::addmm 5.12% 536.000us 7.24% 758.000us 50.533us 6.866ms 78.55% 6.866ms 457.733us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 4.832ms 55.28% 4.832ms 439.273us 11 aten::relu 0.68% 71.000us 2.27% 237.000us 23.700us 0.000us 0.00% 1.432ms 143.200us 10 aten::clamp_min 1.02% 107.000us 1.59% 166.000us 16.600us 1.432ms 16.38% 1.432ms 143.200us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.432ms 16.38% 1.432ms 143.200us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 995.000us 11.38% 995.000us 66.333us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 883.000us 10.10% 883.000us 294.333us 3 aten::cat 0.75% 78.000us 1.05% 110.000us 22.000us 184.000us 2.11% 184.000us 36.800us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 10.463ms Self CUDA time total: 8.741ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 7.71% 904.000us 77.10% 9.040ms 9.040ms 0.000us 0.00% 9.969ms 9.969ms 1 aten::linear 0.38% 45.000us 7.85% 920.000us 61.333us 0.000us 0.00% 7.817ms 521.133us 15 aten::addmm 4.65% 545.000us 6.53% 766.000us 51.067us 7.817ms 78.41% 7.817ms 521.133us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 5.407ms 54.24% 5.407ms 540.700us 10 aten::relu 0.63% 74.000us 2.09% 245.000us 24.500us 0.000us 0.00% 1.636ms 163.600us 10 aten::clamp_min 0.93% 109.000us 1.46% 171.000us 17.100us 1.636ms 16.41% 1.636ms 163.600us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.636ms 16.41% 1.636ms 163.600us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.135ms 11.39% 1.135ms 75.667us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.097ms 11.00% 1.097ms 274.250us 4 aten::cat 0.70% 82.000us 0.99% 116.000us 23.200us 219.000us 2.20% 219.000us 43.800us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 11.725ms Self CUDA time total: 9.969ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 10.17% 849.000us 78.97% 6.592ms 6.592ms 0.000us 0.00% 7.201ms 7.201ms 1 aten::linear 0.49% 41.000us 11.01% 919.000us 61.267us 0.000us 0.00% 5.659ms 377.267us 15 aten::addmm 6.54% 546.000us 9.20% 768.000us 51.200us 5.659ms 78.59% 5.659ms 377.267us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 3.862ms 53.63% 3.862ms 429.111us 9 aten::relu 0.85% 71.000us 2.96% 247.000us 24.700us 0.000us 0.00% 1.169ms 116.900us 10 aten::clamp_min 1.32% 110.000us 2.11% 176.000us 17.600us 1.169ms 16.23% 1.169ms 116.900us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.169ms 16.23% 1.169ms 116.900us 10 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 841.000us 11.68% 841.000us 168.200us 5 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 825.000us 11.46% 825.000us 55.000us 15 aten::cat 0.95% 79.000us 1.34% 112.000us 22.400us 146.000us 2.03% 146.000us 29.200us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 8.347ms Self CUDA time total: 7.201ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 6.61% 884.000us 75.74% 10.126ms 10.126ms 0.000us 0.00% 11.654ms 11.654ms 1 aten::linear 0.33% 44.000us 6.88% 920.000us 61.333us 0.000us 0.00% 9.130ms 608.667us 15 aten::addmm 4.11% 549.000us 5.76% 770.000us 51.333us 9.130ms 78.34% 9.130ms 608.667us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 6.386ms 54.80% 6.386ms 580.545us 11 aten::relu 0.56% 75.000us 1.83% 245.000us 24.500us 0.000us 0.00% 1.915ms 191.500us 10 aten::clamp_min 0.80% 107.000us 1.27% 170.000us 17.000us 1.915ms 16.43% 1.915ms 191.500us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.915ms 16.43% 1.915ms 191.500us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.322ms 11.34% 1.322ms 88.133us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.215ms 10.43% 1.215ms 405.000us 3 aten::cat 0.66% 88.000us 0.91% 121.000us 24.200us 272.000us 2.33% 272.000us 54.400us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 13.370ms Self CUDA time total: 11.654ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 6.41% 871.000us 73.95% 10.052ms 10.052ms 0.000us 0.00% 12.492ms 12.492ms 1 aten::linear 0.29% 39.000us 6.85% 931.000us 62.067us 0.000us 0.00% 9.783ms 652.200us 15 aten::addmm 4.06% 552.000us 5.72% 777.000us 51.800us 9.783ms 78.31% 9.783ms 652.200us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 6.787ms 54.33% 6.787ms 678.700us 10 aten::relu 0.54% 74.000us 1.80% 245.000us 24.500us 0.000us 0.00% 2.057ms 205.700us 10 aten::clamp_min 0.80% 109.000us 1.26% 171.000us 17.100us 2.057ms 16.47% 2.057ms 205.700us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.057ms 16.47% 2.057ms 205.700us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.411ms 11.30% 1.411ms 94.067us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.376ms 11.02% 1.376ms 344.000us 4 aten::cat 0.58% 79.000us 0.83% 113.000us 22.600us 294.000us 2.35% 294.000us 58.800us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 13.593ms Self CUDA time total: 12.492ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 5.12% 861.000us 72.53% 12.192ms 12.192ms 0.000us 0.00% 15.718ms 15.718ms 1 aten::linear 0.26% 43.000us 5.48% 921.000us 61.400us 0.000us 0.00% 12.287ms 819.133us 15 aten::addmm 3.21% 540.000us 4.54% 763.000us 50.867us 12.287ms 78.17% 12.287ms 819.133us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 8.437ms 53.68% 8.437ms 937.444us 9 aten::relu 0.43% 73.000us 1.43% 241.000us 24.100us 0.000us 0.00% 2.598ms 259.800us 10 aten::clamp_min 0.65% 109.000us 1.00% 168.000us 16.800us 2.598ms 16.53% 2.598ms 259.800us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.598ms 16.53% 2.598ms 259.800us 10 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.822ms 11.59% 1.822ms 364.400us 5 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.762ms 11.21% 1.762ms 117.467us 15 aten::cat 0.48% 81.000us 0.68% 114.000us 22.800us 393.000us 2.50% 393.000us 78.600us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 16.810ms Self CUDA time total: 15.718ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 8.57% 852.000us 81.76% 8.128ms 8.128ms 0.000us 0.00% 8.803ms 8.803ms 1 aten::linear 0.45% 45.000us 9.25% 920.000us 61.333us 0.000us 0.00% 6.905ms 460.333us 15 aten::addmm 5.40% 537.000us 7.68% 763.000us 50.867us 6.905ms 78.44% 6.905ms 460.333us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 4.778ms 54.28% 4.778ms 477.800us 10 aten::relu 0.69% 69.000us 2.37% 236.000us 23.600us 0.000us 0.00% 1.438ms 143.800us 10 aten::clamp_min 1.08% 107.000us 1.68% 167.000us 16.700us 1.438ms 16.34% 1.438ms 143.800us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.438ms 16.34% 1.438ms 143.800us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.001ms 11.37% 1.001ms 66.733us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 965.000us 10.96% 965.000us 241.250us 4 aten::cat 0.74% 74.000us 1.10% 109.000us 21.800us 193.000us 2.19% 193.000us 38.600us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 9.941ms Self CUDA time total: 8.803ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 4.08% 902.000us 72.06% 15.938ms 15.938ms 0.000us 0.00% 20.388ms 20.388ms 1 aten::linear 0.19% 43.000us 4.25% 939.000us 62.600us 0.000us 0.00% 16.460ms 1.097ms 15 aten::addmm 2.54% 561.000us 3.55% 785.000us 52.333us 16.460ms 80.73% 16.460ms 1.097ms 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 11.639ms 57.09% 11.639ms 1.058ms 11 aten::relu 0.34% 75.000us 1.11% 246.000us 24.600us 0.000us 0.00% 2.877ms 287.700us 10 aten::clamp_min 0.49% 109.000us 0.77% 171.000us 17.100us 2.877ms 14.11% 2.877ms 287.700us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.877ms 14.11% 2.877ms 287.700us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 2.427ms 11.90% 2.427ms 161.800us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 2.072ms 10.16% 2.072ms 690.667us 3 aten::cat 0.34% 76.000us 0.50% 111.000us 22.200us 473.000us 2.32% 473.000us 94.600us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 22.118ms Self CUDA time total: 20.388ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 4.80% 873.000us 72.21% 13.129ms 13.129ms 0.000us 0.00% 17.077ms 17.077ms 1 aten::linear 0.23% 42.000us 5.13% 933.000us 62.200us 0.000us 0.00% 13.807ms 920.467us 15 aten::addmm 3.05% 555.000us 4.28% 779.000us 51.933us 13.807ms 80.85% 13.807ms 920.467us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 9.582ms 56.11% 9.582ms 1.065ms 9 aten::relu 0.40% 73.000us 1.34% 243.000us 24.300us 0.000us 0.00% 2.394ms 239.400us 10 aten::clamp_min 0.60% 109.000us 0.93% 170.000us 17.000us 2.394ms 14.02% 2.394ms 239.400us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.394ms 14.02% 2.394ms 239.400us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 2.040ms 11.95% 2.040ms 136.000us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.928ms 11.29% 1.928ms 385.600us 5 aten::cat 0.42% 77.000us 0.60% 109.000us 21.800us 386.000us 2.26% 386.000us 77.200us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 18.182ms Self CUDA time total: 17.077ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 7.26% 869.000us 80.17% 9.602ms 9.602ms 0.000us 0.00% 10.866ms 10.866ms 1 aten::linear 0.39% 47.000us 7.70% 922.000us 61.467us 0.000us 0.00% 8.794ms 586.267us 15 aten::addmm 4.58% 548.000us 6.42% 769.000us 51.267us 8.794ms 80.93% 8.794ms 586.267us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 6.144ms 56.54% 6.144ms 614.400us 10 aten::relu 0.60% 72.000us 2.08% 249.000us 24.900us 0.000us 0.00% 1.525ms 152.500us 10 aten::clamp_min 0.89% 106.000us 1.48% 177.000us 17.700us 1.525ms 14.03% 1.525ms 152.500us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.525ms 14.03% 1.525ms 152.500us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.310ms 12.06% 1.310ms 87.333us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.161ms 10.68% 1.161ms 290.250us 4 aten::cat 0.65% 78.000us 0.94% 112.000us 22.400us 221.000us 2.03% 221.000us 44.200us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 11.977ms Self CUDA time total: 10.866ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 3.25% 874.000us 71.33% 19.205ms 19.205ms 0.000us 0.00% 25.214ms 25.214ms 1 aten::linear 0.18% 48.000us 3.45% 929.000us 61.933us 0.000us 0.00% 20.318ms 1.355ms 15 aten::addmm 2.03% 546.000us 2.87% 773.000us 51.533us 20.318ms 80.58% 20.318ms 1.355ms 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 14.105ms 55.94% 14.105ms 1.567ms 9 aten::relu 0.26% 71.000us 0.90% 242.000us 24.200us 0.000us 0.00% 3.567ms 356.700us 10 aten::clamp_min 0.40% 109.000us 0.64% 171.000us 17.100us 3.567ms 14.15% 3.567ms 356.700us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 3.567ms 14.15% 3.567ms 356.700us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 2.988ms 11.85% 2.988ms 199.200us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 2.839ms 11.26% 2.839ms 567.800us 5 aten::cat 0.30% 80.000us 0.42% 113.000us 22.600us 622.000us 2.47% 622.000us 124.400us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 26.925ms Self CUDA time total: 25.214ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 9.12% 856.000us 77.49% 7.274ms 7.274ms 0.000us 0.00% 8.225ms 8.225ms 1 aten::linear 0.43% 40.000us 9.75% 915.000us 61.000us 0.000us 0.00% 6.668ms 444.533us 15 aten::addmm 5.85% 549.000us 8.20% 770.000us 51.333us 6.668ms 81.07% 6.668ms 444.533us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 4.600ms 55.93% 4.600ms 511.111us 9 aten::relu 0.78% 73.000us 2.55% 239.000us 23.900us 0.000us 0.00% 1.130ms 113.000us 10 aten::clamp_min 1.13% 106.000us 1.77% 166.000us 16.600us 1.130ms 13.74% 1.130ms 113.000us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.130ms 13.74% 1.130ms 113.000us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.003ms 12.19% 1.003ms 66.867us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 936.000us 11.38% 936.000us 187.200us 5 aten::cat 0.84% 79.000us 1.19% 112.000us 22.400us 160.000us 1.95% 160.000us 32.000us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 9.387ms Self CUDA time total: 8.225ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 3.44% 888.000us 70.58% 18.200ms 18.200ms 0.000us 0.00% 24.705ms 24.705ms 1 aten::linear 0.16% 40.000us 3.49% 901.000us 60.067us 0.000us 0.00% 19.910ms 1.327ms 15 aten::addmm 2.05% 529.000us 2.91% 751.000us 50.067us 19.910ms 80.59% 19.910ms 1.327ms 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 13.823ms 55.95% 13.823ms 1.536ms 9 aten::relu 0.28% 72.000us 0.93% 241.000us 24.100us 0.000us 0.00% 3.495ms 349.500us 10 aten::clamp_min 0.42% 108.000us 0.66% 169.000us 16.900us 3.495ms 14.15% 3.495ms 349.500us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 3.495ms 14.15% 3.495ms 349.500us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 2.928ms 11.85% 2.928ms 195.200us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 2.783ms 11.26% 2.783ms 556.600us 5 aten::cat 0.29% 76.000us 0.43% 110.000us 22.000us 608.000us 2.46% 608.000us 121.600us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 25.785ms Self CUDA time total: 24.705ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 8.69% 857.000us 83.67% 8.249ms 8.249ms 0.000us 0.00% 8.733ms 8.733ms 1 aten::linear 0.45% 44.000us 9.21% 908.000us 60.533us 0.000us 0.00% 7.076ms 471.733us 15 aten::addmm 5.47% 539.000us 7.68% 757.000us 50.467us 7.076ms 81.03% 7.076ms 471.733us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 4.887ms 55.96% 4.887ms 543.000us 9 aten::relu 0.69% 68.000us 2.41% 238.000us 23.800us 0.000us 0.00% 1.213ms 121.300us 10 aten::clamp_min 1.13% 111.000us 1.72% 170.000us 17.000us 1.213ms 13.89% 1.213ms 121.300us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.213ms 13.89% 1.213ms 121.300us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.063ms 12.17% 1.063ms 70.867us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 992.000us 11.36% 992.000us 198.400us 5 aten::cat 0.79% 78.000us 1.13% 111.000us 22.200us 172.000us 1.97% 172.000us 34.400us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 9.859ms Self CUDA time total: 8.733ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 5.26% 872.000us 72.61% 12.031ms 12.031ms 0.000us 0.00% 15.474ms 15.474ms 1 aten::linear 0.25% 42.000us 5.44% 902.000us 60.133us 0.000us 0.00% 12.514ms 834.267us 15 aten::addmm 3.23% 536.000us 4.57% 757.000us 50.467us 12.514ms 80.87% 12.514ms 834.267us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 8.762ms 56.62% 8.762ms 876.200us 10 aten::relu 0.42% 70.000us 1.44% 239.000us 23.900us 0.000us 0.00% 2.170ms 217.000us 10 aten::clamp_min 0.66% 109.000us 1.02% 169.000us 16.900us 2.170ms 14.02% 2.170ms 217.000us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.170ms 14.02% 2.170ms 217.000us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.851ms 11.96% 1.851ms 123.400us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.664ms 10.75% 1.664ms 416.000us 4 aten::cat 0.47% 78.000us 0.68% 112.000us 22.400us 343.000us 2.22% 343.000us 68.600us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 16.569ms Self CUDA time total: 15.474ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 3.78% 894.000us 71.01% 16.807ms 16.807ms 0.000us 0.00% 22.580ms 22.580ms 1 aten::linear 0.19% 45.000us 3.92% 927.000us 61.800us 0.000us 0.00% 18.213ms 1.214ms 15 aten::addmm 2.29% 541.000us 3.29% 778.000us 51.867us 18.213ms 80.66% 18.213ms 1.214ms 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 12.641ms 55.98% 12.641ms 1.405ms 9 aten::relu 0.31% 73.000us 1.02% 242.000us 24.200us 0.000us 0.00% 3.194ms 319.400us 10 aten::clamp_min 0.46% 108.000us 0.71% 169.000us 16.900us 3.194ms 14.15% 3.194ms 319.400us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 3.194ms 14.15% 3.194ms 319.400us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 2.683ms 11.88% 2.683ms 178.867us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 2.546ms 11.28% 2.546ms 509.200us 5 aten::cat 0.32% 76.000us 0.46% 110.000us 22.000us 542.000us 2.40% 542.000us 108.400us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 23.667ms Self CUDA time total: 22.580ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 11.93% 846.000us 90.79% 6.438ms 6.438ms 0.000us 0.00% 5.931ms 5.931ms 1 aten::linear 0.62% 44.000us 21.73% 1.541ms 102.733us 0.000us 0.00% 4.756ms 317.067us 15 aten::addmm 7.78% 552.000us 19.66% 1.394ms 92.933us 4.756ms 80.19% 4.756ms 317.067us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 3.344ms 56.38% 3.344ms 304.000us 11 aten::relu 0.99% 70.000us 3.34% 237.000us 23.700us 0.000us 0.00% 779.000us 77.900us 10 aten::clamp_min 1.52% 108.000us 2.36% 167.000us 16.700us 779.000us 13.13% 779.000us 77.900us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 779.000us 13.13% 779.000us 77.900us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 725.000us 12.22% 725.000us 48.333us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 591.000us 9.96% 591.000us 197.000us 3 aten::cat 1.06% 75.000us 1.51% 107.000us 21.400us 116.000us 1.96% 116.000us 23.200us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 7.091ms Self CUDA time total: 5.931ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 5.52% 873.000us 74.00% 11.711ms 11.711ms 0.000us 0.00% 14.095ms 14.095ms 1 aten::linear 0.31% 49.000us 5.94% 940.000us 62.667us 0.000us 0.00% 11.415ms 761.000us 15 aten::addmm 3.52% 557.000us 4.95% 783.000us 52.200us 11.415ms 80.99% 11.415ms 761.000us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 7.909ms 56.11% 7.909ms 878.778us 9 aten::relu 0.45% 71.000us 1.54% 244.000us 24.400us 0.000us 0.00% 1.975ms 197.500us 10 aten::clamp_min 0.71% 112.000us 1.09% 173.000us 17.300us 1.975ms 14.01% 1.975ms 197.500us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.975ms 14.01% 1.975ms 197.500us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.689ms 11.98% 1.689ms 112.600us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.597ms 11.33% 1.597ms 319.400us 5 aten::cat 0.50% 79.000us 0.71% 112.000us 22.400us 302.000us 2.14% 302.000us 60.400us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 15.825ms Self CUDA time total: 14.095ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 6.26% 871.000us 73.59% 10.244ms 10.244ms 0.000us 0.00% 12.820ms 12.820ms 1 aten::linear 0.32% 44.000us 6.53% 909.000us 60.600us 0.000us 0.00% 10.381ms 692.067us 15 aten::addmm 3.93% 547.000us 5.48% 763.000us 50.867us 10.381ms 80.98% 10.381ms 692.067us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 7.271ms 56.72% 7.271ms 727.100us 10 aten::relu 0.50% 70.000us 1.72% 239.000us 23.900us 0.000us 0.00% 1.799ms 179.900us 10 aten::clamp_min 0.78% 108.000us 1.21% 169.000us 16.900us 1.799ms 14.03% 1.799ms 179.900us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.799ms 14.03% 1.799ms 179.900us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.539ms 12.00% 1.539ms 102.600us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.366ms 10.66% 1.366ms 341.500us 4 aten::cat 0.57% 80.000us 0.82% 114.000us 22.800us 270.000us 2.11% 270.000us 54.000us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 13.920ms Self CUDA time total: 12.820ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 4.75% 876.000us 71.90% 13.253ms 13.253ms 0.000us 0.00% 17.337ms 17.337ms 1 aten::linear 0.24% 44.000us 4.96% 915.000us 61.000us 0.000us 0.00% 14.015ms 934.333us 15 aten::addmm 2.96% 545.000us 4.15% 765.000us 51.000us 14.015ms 80.84% 14.015ms 934.333us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 9.719ms 56.06% 9.719ms 1.080ms 9 aten::relu 0.41% 76.000us 1.33% 246.000us 24.600us 0.000us 0.00% 2.437ms 243.700us 10 aten::clamp_min 0.59% 109.000us 0.92% 170.000us 17.000us 2.437ms 14.06% 2.437ms 243.700us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.437ms 14.06% 2.437ms 243.700us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 2.075ms 11.97% 2.075ms 138.333us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.963ms 11.32% 1.963ms 392.600us 5 aten::cat 0.43% 80.000us 0.61% 113.000us 22.600us 391.000us 2.26% 391.000us 78.200us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 18.433ms Self CUDA time total: 17.337ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 5.67% 868.000us 73.06% 11.176ms 11.176ms 0.000us 0.00% 14.193ms 14.193ms 1 aten::linear 0.29% 45.000us 5.98% 915.000us 61.000us 0.000us 0.00% 11.482ms 765.467us 15 aten::addmm 3.58% 548.000us 5.00% 765.000us 51.000us 11.482ms 80.90% 11.482ms 765.467us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 8.045ms 56.68% 8.045ms 804.500us 10 aten::relu 0.46% 70.000us 1.56% 239.000us 23.900us 0.000us 0.00% 1.992ms 199.200us 10 aten::clamp_min 0.70% 107.000us 1.10% 169.000us 16.900us 1.992ms 14.04% 1.992ms 199.200us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.992ms 14.04% 1.992ms 199.200us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.700ms 11.98% 1.700ms 113.333us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.524ms 10.74% 1.524ms 381.000us 4 aten::cat 0.54% 82.000us 0.75% 115.000us 23.000us 308.000us 2.17% 308.000us 61.600us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 15.298ms Self CUDA time total: 14.193ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 5.12% 863.000us 73.67% 12.421ms 12.421ms 0.000us 0.00% 15.151ms 15.151ms 1 aten::linear 0.30% 50.000us 5.41% 913.000us 60.867us 0.000us 0.00% 12.185ms 812.333us 15 aten::addmm 3.19% 538.000us 4.50% 758.000us 50.533us 12.185ms 80.42% 12.185ms 812.333us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 8.511ms 56.17% 8.511ms 851.100us 10 aten::relu 0.43% 72.000us 1.43% 241.000us 24.100us 0.000us 0.00% 2.191ms 219.100us 10 aten::clamp_min 0.64% 108.000us 1.00% 169.000us 16.900us 2.191ms 14.46% 2.191ms 219.100us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.191ms 14.46% 2.191ms 219.100us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.794ms 11.84% 1.794ms 119.600us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.631ms 10.76% 1.631ms 407.750us 4 aten::cat 0.46% 78.000us 0.66% 111.000us 22.200us 338.000us 2.23% 338.000us 67.600us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 16.861ms Self CUDA time total: 15.151ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 6.15% 849.000us 73.77% 10.179ms 10.179ms 0.000us 0.00% 12.711ms 12.711ms 1 aten::linear 0.30% 42.000us 6.70% 924.000us 61.600us 0.000us 0.00% 10.227ms 681.800us 15 aten::addmm 4.00% 552.000us 5.63% 777.000us 51.800us 10.227ms 80.46% 10.227ms 681.800us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 7.154ms 56.28% 7.154ms 715.400us 10 aten::relu 0.51% 71.000us 1.71% 236.000us 23.600us 0.000us 0.00% 1.837ms 183.700us 10 aten::clamp_min 0.77% 106.000us 1.20% 165.000us 16.500us 1.837ms 14.45% 1.837ms 183.700us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.837ms 14.45% 1.837ms 183.700us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.514ms 11.91% 1.514ms 100.933us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.364ms 10.73% 1.364ms 341.000us 4 aten::cat 0.57% 78.000us 0.81% 112.000us 22.400us 275.000us 2.16% 275.000us 55.000us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 13.798ms Self CUDA time total: 12.711ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 6.08% 864.000us 73.54% 10.454ms 10.454ms 0.000us 0.00% 13.111ms 13.111ms 1 aten::linear 0.30% 43.000us 6.37% 906.000us 60.400us 0.000us 0.00% 10.546ms 703.067us 15 aten::addmm 3.85% 548.000us 5.35% 761.000us 50.733us 10.546ms 80.44% 10.546ms 703.067us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 7.280ms 55.53% 7.280ms 808.889us 9 aten::relu 0.50% 71.000us 1.69% 240.000us 24.000us 0.000us 0.00% 1.901ms 190.100us 10 aten::clamp_min 0.77% 110.000us 1.19% 169.000us 16.900us 1.901ms 14.50% 1.901ms 190.100us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.901ms 14.50% 1.901ms 190.100us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.562ms 11.91% 1.562ms 104.133us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.490ms 11.36% 1.490ms 298.000us 5 aten::cat 0.54% 77.000us 0.77% 110.000us 22.000us 284.000us 2.17% 284.000us 56.800us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 14.216ms Self CUDA time total: 13.111ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 4.05% 869.000us 74.11% 15.913ms 15.913ms 0.000us 0.00% 18.695ms 18.695ms 1 aten::linear 0.21% 46.000us 4.35% 933.000us 62.200us 0.000us 0.00% 15.013ms 1.001ms 15 aten::addmm 2.58% 553.000us 3.63% 779.000us 51.933us 15.013ms 80.30% 15.013ms 1.001ms 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 10.491ms 56.12% 10.491ms 1.049ms 10 aten::relu 0.40% 85.000us 1.19% 255.000us 25.500us 0.000us 0.00% 2.700ms 270.000us 10 aten::clamp_min 0.50% 107.000us 0.79% 170.000us 17.000us 2.700ms 14.44% 2.700ms 270.000us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.700ms 14.44% 2.700ms 270.000us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 2.205ms 11.79% 2.205ms 147.000us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 2.029ms 10.85% 2.029ms 507.250us 4 aten::cat 0.39% 83.000us 0.54% 117.000us 23.400us 451.000us 2.41% 451.000us 90.200us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 21.471ms Self CUDA time total: 18.695ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 3.39% 885.000us 70.79% 18.464ms 18.464ms 0.000us 0.00% 24.999ms 24.999ms 1 aten::linear 0.16% 43.000us 3.44% 898.000us 59.867us 0.000us 0.00% 20.018ms 1.335ms 15 aten::addmm 2.01% 524.000us 2.87% 748.000us 49.867us 20.018ms 80.08% 20.018ms 1.335ms 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 13.881ms 55.53% 13.881ms 1.542ms 9 aten::relu 0.26% 69.000us 1.08% 282.000us 28.200us 0.000us 0.00% 3.664ms 366.400us 10 aten::clamp_min 0.58% 151.000us 0.82% 213.000us 21.300us 3.664ms 14.66% 3.664ms 366.400us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 3.664ms 14.66% 3.664ms 366.400us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 2.932ms 11.73% 2.932ms 195.467us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 2.825ms 11.30% 2.825ms 565.000us 5 aten::cat 0.29% 76.000us 0.42% 110.000us 22.000us 625.000us 2.50% 625.000us 125.000us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 26.083ms Self CUDA time total: 24.999ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 5.70% 849.000us 73.27% 10.921ms 10.921ms 0.000us 0.00% 13.816ms 13.816ms 1 aten::linear 0.34% 50.000us 6.25% 932.000us 62.133us 0.000us 0.00% 11.097ms 739.800us 15 aten::addmm 3.71% 553.000us 5.21% 776.000us 51.733us 11.097ms 80.32% 11.097ms 739.800us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 7.661ms 55.45% 7.661ms 851.222us 9 aten::relu 0.47% 70.000us 1.58% 236.000us 23.600us 0.000us 0.00% 2.015ms 201.500us 10 aten::clamp_min 0.71% 106.000us 1.11% 166.000us 16.600us 2.015ms 14.58% 2.015ms 201.500us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.015ms 14.58% 2.015ms 201.500us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.636ms 11.84% 1.636ms 109.067us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.571ms 11.37% 1.571ms 314.200us 5 aten::cat 0.48% 72.000us 0.71% 106.000us 21.200us 303.000us 2.19% 303.000us 60.600us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 14.906ms Self CUDA time total: 13.816ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 5.03% 893.000us 74.92% 13.290ms 13.290ms 0.000us 0.00% 15.161ms 15.161ms 1 aten::linear 0.25% 44.000us 5.17% 917.000us 61.133us 0.000us 0.00% 12.263ms 817.533us 15 aten::addmm 3.09% 548.000us 4.35% 771.000us 51.400us 12.263ms 80.89% 12.263ms 817.533us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 8.578ms 56.58% 8.578ms 857.800us 10 aten::relu 0.39% 69.000us 1.35% 239.000us 23.900us 0.000us 0.00% 2.124ms 212.400us 10 aten::clamp_min 0.60% 107.000us 0.96% 170.000us 17.000us 2.124ms 14.01% 2.124ms 212.400us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.124ms 14.01% 2.124ms 212.400us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.812ms 11.95% 1.812ms 120.800us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.627ms 10.73% 1.627ms 406.750us 4 aten::cat 0.44% 78.000us 0.63% 111.000us 22.200us 334.000us 2.20% 334.000us 66.800us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 17.740ms Self CUDA time total: 15.161ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 3.87% 859.000us 71.15% 15.802ms 15.802ms 0.000us 0.00% 21.120ms 21.120ms 1 aten::linear 0.20% 45.000us 4.10% 910.000us 60.667us 0.000us 0.00% 17.034ms 1.136ms 15 aten::addmm 2.44% 543.000us 3.43% 761.000us 50.733us 17.034ms 80.65% 17.034ms 1.136ms 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 11.818ms 55.96% 11.818ms 1.313ms 9 aten::relu 0.32% 72.000us 1.09% 242.000us 24.200us 0.000us 0.00% 2.991ms 299.100us 10 aten::clamp_min 0.49% 108.000us 0.77% 170.000us 17.000us 2.991ms 14.16% 2.991ms 299.100us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.991ms 14.16% 2.991ms 299.100us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 2.512ms 11.89% 2.512ms 167.467us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 2.381ms 11.27% 2.381ms 476.200us 5 aten::cat 0.35% 78.000us 0.51% 113.000us 22.600us 498.000us 2.36% 498.000us 99.600us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 22.209ms Self CUDA time total: 21.120ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 4.73% 868.000us 73.17% 13.438ms 13.438ms 0.000us 0.00% 16.671ms 16.671ms 1 aten::linear 0.23% 43.000us 4.96% 911.000us 60.733us 0.000us 0.00% 13.477ms 898.467us 15 aten::addmm 2.94% 540.000us 4.14% 761.000us 50.733us 13.477ms 80.84% 13.477ms 898.467us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 9.435ms 56.60% 9.435ms 943.500us 10 aten::relu 0.40% 74.000us 1.34% 246.000us 24.600us 0.000us 0.00% 2.339ms 233.900us 10 aten::clamp_min 0.60% 111.000us 0.94% 172.000us 17.200us 2.339ms 14.03% 2.339ms 233.900us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.339ms 14.03% 2.339ms 233.900us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.990ms 11.94% 1.990ms 132.667us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.796ms 10.77% 1.796ms 449.000us 4 aten::cat 0.43% 79.000us 0.61% 112.000us 22.400us 375.000us 2.25% 375.000us 75.000us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 18.365ms Self CUDA time total: 16.671ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 4.80% 914.000us 71.84% 13.689ms 13.689ms 0.000us 0.00% 17.962ms 17.962ms 1 aten::linear 0.23% 44.000us 4.83% 921.000us 61.400us 0.000us 0.00% 14.511ms 967.400us 15 aten::addmm 2.84% 542.000us 3.99% 761.000us 50.733us 14.511ms 80.79% 14.511ms 967.400us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 10.067ms 56.05% 10.067ms 1.119ms 9 aten::relu 0.38% 72.000us 1.25% 239.000us 23.900us 0.000us 0.00% 2.523ms 252.300us 10 aten::clamp_min 0.56% 107.000us 0.88% 167.000us 16.700us 2.523ms 14.05% 2.523ms 252.300us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.523ms 14.05% 2.523ms 252.300us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 2.143ms 11.93% 2.143ms 142.867us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 2.032ms 11.31% 2.032ms 406.400us 5 aten::cat 0.40% 77.000us 0.58% 110.000us 22.000us 412.000us 2.29% 412.000us 82.400us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 19.055ms Self CUDA time total: 17.962ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 5.00% 899.000us 78.58% 14.142ms 14.142ms 0.000us 0.00% 13.453ms 13.453ms 1 aten::linear 0.26% 46.000us 5.25% 945.000us 63.000us 0.000us 0.00% 10.885ms 725.667us 15 aten::addmm 3.12% 562.000us 4.39% 790.000us 52.667us 10.885ms 80.91% 10.885ms 725.667us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 7.614ms 56.60% 7.614ms 761.400us 10 aten::relu 0.41% 74.000us 1.34% 242.000us 24.200us 0.000us 0.00% 1.889ms 188.900us 10 aten::clamp_min 0.59% 107.000us 0.93% 168.000us 16.800us 1.889ms 14.04% 1.889ms 188.900us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.889ms 14.04% 1.889ms 188.900us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.614ms 12.00% 1.614ms 107.600us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.440ms 10.70% 1.440ms 360.000us 4 aten::cat 0.43% 78.000us 0.62% 111.000us 22.200us 287.000us 2.13% 287.000us 57.400us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 17.996ms Self CUDA time total: 13.453ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 7.89% 869.000us 99.80% 10.997ms 10.997ms 0.000us 0.00% 9.488ms 9.488ms 1 aten::linear 0.37% 41.000us 8.28% 912.000us 60.800us 0.000us 0.00% 7.683ms 512.200us 15 aten::addmm 4.98% 549.000us 6.92% 763.000us 50.867us 7.683ms 80.98% 7.683ms 512.200us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 5.436ms 57.29% 5.436ms 494.182us 11 aten::relu 0.66% 73.000us 2.21% 243.000us 24.300us 0.000us 0.00% 1.317ms 131.700us 10 aten::clamp_min 1.01% 111.000us 1.54% 170.000us 17.000us 1.317ms 13.88% 1.317ms 131.700us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.317ms 13.88% 1.317ms 131.700us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.146ms 12.08% 1.146ms 76.400us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 947.000us 9.98% 947.000us 315.667us 3 aten::cat 0.73% 80.000us 1.03% 113.000us 22.600us 188.000us 1.98% 188.000us 37.600us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 11.019ms Self CUDA time total: 9.488ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 4.36% 882.000us 71.51% 14.459ms 14.459ms 0.000us 0.00% 19.138ms 19.138ms 1 aten::linear 0.23% 47.000us 4.54% 918.000us 61.200us 0.000us 0.00% 15.462ms 1.031ms 15 aten::addmm 2.68% 541.000us 3.77% 763.000us 50.867us 15.462ms 80.79% 15.462ms 1.031ms 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 10.713ms 55.98% 10.713ms 1.190ms 9 aten::relu 0.35% 71.000us 1.19% 240.000us 24.000us 0.000us 0.00% 2.690ms 269.000us 10 aten::clamp_min 0.54% 109.000us 0.84% 169.000us 16.900us 2.690ms 14.06% 2.690ms 269.000us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.690ms 14.06% 2.690ms 269.000us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 2.279ms 11.91% 2.279ms 151.933us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 2.162ms 11.30% 2.162ms 432.400us 5 aten::cat 0.40% 81.000us 0.56% 114.000us 22.800us 442.000us 2.31% 442.000us 88.400us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 20.220ms Self CUDA time total: 19.138ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 6.55% 888.000us 74.14% 10.053ms 10.053ms 0.000us 0.00% 12.455ms 12.455ms 1 aten::linear 0.32% 43.000us 6.86% 930.000us 62.000us 0.000us 0.00% 10.071ms 671.400us 15 aten::addmm 4.09% 554.000us 5.70% 773.000us 51.533us 10.071ms 80.86% 10.071ms 671.400us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 6.972ms 55.98% 6.972ms 774.667us 9 aten::relu 0.55% 74.000us 1.78% 242.000us 24.200us 0.000us 0.00% 1.751ms 175.100us 10 aten::clamp_min 0.77% 105.000us 1.24% 168.000us 16.800us 1.751ms 14.06% 1.751ms 175.100us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.751ms 14.06% 1.751ms 175.100us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.498ms 12.03% 1.498ms 99.867us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.409ms 11.31% 1.409ms 281.800us 5 aten::cat 0.59% 80.000us 0.83% 113.000us 22.600us 264.000us 2.12% 264.000us 52.800us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 13.559ms Self CUDA time total: 12.455ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 5.78% 889.000us 76.10% 11.705ms 11.705ms 0.000us 0.00% 13.002ms 13.002ms 1 aten::linear 0.30% 46.000us 6.09% 937.000us 62.467us 0.000us 0.00% 10.504ms 700.267us 15 aten::addmm 3.58% 550.000us 5.10% 784.000us 52.267us 10.504ms 80.79% 10.504ms 700.267us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 7.434ms 57.18% 7.434ms 675.818us 11 aten::relu 0.46% 71.000us 1.59% 244.000us 24.400us 0.000us 0.00% 1.839ms 183.900us 10 aten::clamp_min 0.73% 112.000us 1.12% 173.000us 17.300us 1.839ms 14.14% 1.839ms 183.900us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.839ms 14.14% 1.839ms 183.900us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.564ms 12.03% 1.564ms 104.267us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.315ms 10.11% 1.315ms 438.333us 3 aten::cat 0.52% 80.000us 0.73% 113.000us 22.600us 276.000us 2.12% 276.000us 55.200us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 15.381ms Self CUDA time total: 13.002ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 5.21% 872.000us 72.76% 12.187ms 12.187ms 0.000us 0.00% 15.660ms 15.660ms 1 aten::linear 0.26% 43.000us 5.47% 916.000us 61.067us 0.000us 0.00% 12.659ms 843.933us 15 aten::addmm 3.27% 547.000us 4.56% 763.000us 50.867us 12.659ms 80.84% 12.659ms 843.933us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 8.855ms 56.55% 8.855ms 885.500us 10 aten::relu 0.46% 77.000us 1.48% 248.000us 24.800us 0.000us 0.00% 2.197ms 219.700us 10 aten::clamp_min 0.67% 113.000us 1.02% 171.000us 17.100us 2.197ms 14.03% 2.197ms 219.700us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.197ms 14.03% 2.197ms 219.700us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.873ms 11.96% 1.873ms 124.867us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.690ms 10.79% 1.690ms 422.500us 4 aten::cat 0.46% 77.000us 0.69% 115.000us 23.000us 349.000us 2.23% 349.000us 69.800us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 16.749ms Self CUDA time total: 15.660ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 4.25% 893.000us 72.42% 15.219ms 15.219ms 0.000us 0.00% 19.318ms 19.318ms 1 aten::linear 0.20% 42.000us 4.46% 938.000us 62.533us 0.000us 0.00% 15.600ms 1.040ms 15 aten::addmm 2.68% 563.000us 3.74% 786.000us 52.400us 15.600ms 80.75% 15.600ms 1.040ms 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 10.824ms 56.03% 10.824ms 1.203ms 9 aten::relu 0.36% 75.000us 1.17% 246.000us 24.600us 0.000us 0.00% 2.721ms 272.100us 10 aten::clamp_min 0.52% 109.000us 0.81% 171.000us 17.100us 2.721ms 14.09% 2.721ms 272.100us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.721ms 14.09% 2.721ms 272.100us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 2.301ms 11.91% 2.301ms 153.400us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 2.185ms 11.31% 2.185ms 437.000us 5 aten::cat 0.38% 79.000us 0.54% 113.000us 22.600us 448.000us 2.32% 448.000us 89.600us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 21.016ms Self CUDA time total: 19.318ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 7.99% 869.000us 81.89% 8.907ms 8.907ms 0.000us 0.00% 9.767ms 9.767ms 1 aten::linear 0.43% 47.000us 14.45% 1.572ms 104.800us 0.000us 0.00% 7.815ms 521.000us 15 aten::addmm 5.33% 580.000us 13.04% 1.418ms 94.533us 7.815ms 80.01% 7.815ms 521.000us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 5.448ms 55.78% 5.448ms 544.800us 10 aten::relu 0.66% 72.000us 2.19% 238.000us 23.800us 0.000us 0.00% 1.454ms 145.400us 10 aten::clamp_min 0.97% 105.000us 1.53% 166.000us 16.600us 1.454ms 14.89% 1.454ms 145.400us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.454ms 14.89% 1.454ms 145.400us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.153ms 11.81% 1.153ms 76.867us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.051ms 10.76% 1.051ms 262.750us 4 aten::cat 0.75% 82.000us 1.07% 116.000us 23.200us 201.000us 2.06% 201.000us 40.200us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 10.877ms Self CUDA time total: 9.767ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 3.65% 879.000us 70.72% 17.052ms 17.052ms 0.000us 0.00% 23.029ms 23.029ms 1 aten::linear 0.18% 43.000us 3.75% 905.000us 60.333us 0.000us 0.00% 18.344ms 1.223ms 15 aten::addmm 2.20% 530.000us 3.13% 755.000us 50.333us 18.344ms 79.66% 18.344ms 1.223ms 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 12.718ms 55.23% 12.718ms 1.413ms 9 aten::relu 0.30% 73.000us 1.06% 255.000us 25.500us 0.000us 0.00% 3.472ms 347.200us 10 aten::clamp_min 0.50% 121.000us 0.75% 182.000us 18.200us 3.472ms 15.08% 3.472ms 347.200us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 3.472ms 15.08% 3.472ms 347.200us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 2.659ms 11.55% 2.659ms 177.267us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 2.604ms 11.31% 2.604ms 520.800us 5 aten::cat 0.32% 76.000us 0.46% 111.000us 22.200us 579.000us 2.51% 579.000us 115.800us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 24.112ms Self CUDA time total: 23.029ms ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ model_inference 4.34% 892.000us 75.18% 15.445ms 15.445ms 0.000us 0.00% 17.181ms 17.181ms 1 aten::linear 0.20% 41.000us 4.54% 932.000us 62.133us 0.000us 0.00% 13.734ms 915.600us 15 aten::addmm 2.67% 548.000us 3.76% 773.000us 51.533us 13.734ms 79.94% 13.734ms 915.600us 15 ampere_sgemm_32x128_tn 0.00% 0.000us 0.00% 0.000us 0.000us 9.500ms 55.29% 9.500ms 1.056ms 9 aten::relu 0.36% 73.000us 1.20% 246.000us 24.600us 0.000us 0.00% 2.559ms 255.900us 10 aten::clamp_min 0.55% 112.000us 0.84% 173.000us 17.300us 2.559ms 14.89% 2.559ms 255.900us 10 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.559ms 14.89% 2.559ms 255.900us 10 void at::native::elementwise_kernel<128, 2, at::nati... 0.00% 0.000us 0.00% 0.000us 0.000us 1.997ms 11.62% 1.997ms 133.133us 15 ampere_sgemm_128x64_tn 0.00% 0.000us 0.00% 0.000us 0.000us 1.951ms 11.36% 1.951ms 390.200us 5 aten::cat 0.38% 78.000us 0.54% 111.000us 22.200us 402.000us 2.34% 402.000us 80.400us 5 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 20.543ms Self CUDA time total: 17.181ms