News
Pruning algorithms and sparse matrix-matrix multiplication (SpMM) kernels have been widely studied to accelerate DNN inference on GPUs. However, unstructured pruning spoils the regularity of data ...
To tackle these challenges, we propose a classification method based on candidate pseudo labels pruning and dual mixture of experts (CPLP-DMoEs). First, we employ the multihead mixture of bands (MMoBs ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results