Techniques are described herein for a method of training a target activation sparsity in a neural network. The method includes obtaining a nonlinear portion of a plurality of neurons in a neural network. The neural network is trained to perform a target task. The method further includes substituting the nonlinear portion for a dynamic nonlinear portion in the plurality of neurons in the neural network. The dynamic nonlinear portion is trained to activate or deactivate one or more neurons of the plurality of neurons. The method further includes retraining the neural network using a first loss function that minimizes a loss of the target task and second loss function that minimizes a number of active neurons.