Skip to content

How to fix ‘Cuda error: Device-side assert triggered?

  • by
  • 3 min read
What is Metadata? Types and benefits | Candid.Technology

As easy as modern-day programming has become, there are still several different hoops developers have to jump through to figure out what’s causing bugs in their code, especially if they’re working with bleeding-edge technologies like AI, ML or computer vision. 

In this article, we’re looking at the “CUDA error: Device-side assert triggered” when working with Python and PyTorch. 

What causes this error?

The error mainly occurs because of the following two reasons.

  • The number of labels/classes isn’t matching up to the number of output units.
  • The loss function input might be incorrect. 

Also read: What is Ring streaming error? 6 Fixes

How to fix this?

You can try out the following fixes. 

Match output units with the number of classes

You should first check to see if the number of classes you’ve assigned to your dataset matches the number of output units you have. For example, if your model’s greatest possible output value is 100, any label that produces an output value greater than 100 will trigger this error. This can be resolved by changing the corresponding value in your classifier. 

Fix the loss function input

Make sure that your output layer returns values that fall in the range of your selected loss function (also known as a criterion). You will have to use appropriate activation functions (Sigmoid, Softmax or LogSoftmax) in your final output layer. 

The quickest way to turn this around is to experiment with all three functions to see which one works best. Sometimes, a function might only work on the CPU but not on GPU or vice-versa, so you’ll have to play around with the code a little bit to figure out the correct answer. 

Check the ground label index

Make sure your ground index labels are set accordingly. If your ground truth label starts at 1, you should subtract 1 from every label. This should fix the problem for you. 

Keep this in mind as a general rule. As array indexes start from zero, your class index should also start from zero. 

Further troubleshooting

If the fixes mentioned above didn’t solve the problem for you, try running your script again, but this time with the CUDA_LAUNCH_BLOCKING=1 flag to get an accurate stack trace. Depending on the error you get, you might want to research further on what went wrong.

Also read: Coursera financial aid: Everything you need to know

Yadullah Abidi

Yadullah Abidi

Yadullah is a Computer Science graduate who writes/edits/shoots/codes all things cybersecurity, gaming, and tech hardware. When he's not, he streams himself racing virtual cars. He's been writing and reporting on tech and cybersecurity with websites like Candid.Technology and MakeUseOf since 2018. You can contact him here: