SC20 Is Everywhere We Are

Virtual Event FAQ
Term Quantization: Furthering Quantization at Run Time
Event Type
Paper
Tags
Data Analytics, Compression, and Management
Linear Algebra
Machine Learning, Deep Learning and Artificial Intelligence
Registration Categories
TP
TimeThursday, 19 November 20202pm - 2:30pm EDT
LocationTrack 4
DescriptionWe present a novel technique, called Term Quantization (TQ), for furthering quantization at run time for improved computational efficiency of deep neural networks (DNNs) already quantized with conventional quantization methods. TQ operates on power-of-two terms in expressions of values. In computing a dot-product, TQ dynamically selects a fixed number of largest terms to use from values of the two vectors. By exploiting weight and data distributions typically present in DNNs, TQ has a minimal impact on DNN model performance (e.g., accuracy or perplexity). We use TQ to facilitate tightly synchronized processor arrays, such as systolic arrays, for efficient parallel processing. We evaluate TQ on an MLP for MNIST, multiple CNNs for ImageNet and an LSTM for Wikitext-2. We demonstrate significant reductions in inference computation costs (between 3x and 10x) compared to conventional uniform quantization for the same level of model performance.
Back To Top Button