Analyzing ReLUfication Limitations: Enhancing LLM Sparsity via Up Projection
Table of Links Abstract and 1. Introduction Related Work and Background Analysis 3.1 Limitations about Existing ReLUficatio 3.2 dReLU Are Neurons in Expert still Sparsely Activated? dReLU Sparsification Experiments Results 6.1 Downstream Tasks Performance 6.2 Sparsity of Sparsified Models Practical Inference Speedup Evaluation 7.1 Experiments Setting 7.2 Pure CPU Inference and 7.3 Hybrid GPU-CPU Inference 7.4 Deploy LLMs on mobile phones Conclusion and References A. Appendix / supplemental material B. Limitation C. Broader Impact 3 Analysis 3.1 Limitations […]