Latest the news

the

The Cascade Transformer: an Application for Efficient Answer Sentence Selection. (arXiv:2005.02534v2 [cs.CL] UPDATED)

By arxiv.org
Published On ::

Large transformer-based language models have been shown to be very effective in many classification tasks. However, their computational complexity prevents their use in applications requiring the classification of a large set of candidates. While previous works have investigated approaches to reduce model size, relatively little attention has been paid to techniques to improve batch throughput during inference. In this paper, we introduce the Cascade Transformer, a simple yet effective technique to adapt transformer-based models into a cascade of rankers. Each ranker is used to prune a subset of candidates in a batch, thus dramatically increasing throughput at inference time. Partial encodings from the transformer model are shared among rerankers, providing further speed-up. When compared to a state-of-the-art transformer model, our approach reduces computation by 37% with almost no impact on accuracy, as measured on two English Question Answering datasets.

The Cascade Transformer: an Application for Efficient Answer Sentence Selection. (arXiv:2005.02534v2 [cs.CL] UPDATED)

On the list recoverability of randomly punctured codes. (arXiv:2005.02478v2 [math.CO] UPDATED)

The Sensitivity of Language Models and Humans to Winograd Schema Perturbations. (arXiv:2005.01348v2 [cs.CL] UPDATED)

Jealousy-freeness and other common properties in Fair Division of Mixed Manna. (arXiv:2004.11469v2 [cs.GT] UPDATED)

On the regularity of De Bruijn multigrids. (arXiv:2004.10128v2 [cs.DM] UPDATED)

The growth rate over trees of any family of set defined by a monadic second order formula is semi-computable. (arXiv:2004.06508v3 [cs.DM] UPDATED)

Mathematical Formulae in Wikimedia Projects 2020. (arXiv:2003.09417v2 [cs.DL] UPDATED)

Watching the World Go By: Representation Learning from Unlabeled Videos. (arXiv:2003.07990v2 [cs.CV] UPDATED)

Toward Improving the Evaluation of Visual Attention Models: a Crowdsourcing Approach. (arXiv:2002.04407v2 [cs.CV] UPDATED)

Provenance for the Description Logic ELHr. (arXiv:2001.07541v2 [cs.LO] UPDATED)

Towards a Proof of the Fourier--Entropy Conjecture?. (arXiv:1911.10579v2 [cs.DM] UPDATED)

The Mapillary Traffic Sign Dataset for Detection and Classification on a Global Scale. (arXiv:1909.04422v2 [cs.CV] UPDATED)

Over-the-Air Computation Systems: Optimization, Analysis and Scaling Laws. (arXiv:1909.00329v2 [cs.IT] UPDATED)

Numerical study on the effect of geometric approximation error in the numerical solution of PDEs using a high-order curvilinear mesh. (arXiv:1908.09917v2 [math.NA] UPDATED)

Establishing the Quantum Supremacy Frontier with a 281 Pflop/s Simulation. (arXiv:1905.00444v2 [quant-ph] UPDATED)

On analog quantum algorithms for the mixing of Markov chains. (arXiv:1904.11895v2 [quant-ph] UPDATED)

A Fast and Accurate Algorithm for Spherical Harmonic Analysis on HEALPix Grids with Applications to the Cosmic Microwave Background Radiation. (arXiv:1904.10514v4 [math.NA] UPDATED)

Keeping out the Masses: Understanding the Popularity and Implications of Internet Paywalls. (arXiv:1903.01406v4 [cs.CY] UPDATED)

Asymptotic expansions of eigenvalues by both the Crouzeix-Raviart and enriched Crouzeix-Raviart elements. (arXiv:1902.09524v2 [math.NA] UPDATED)

Performance of the smallest-variance-first rule in appointment sequencing. (arXiv:1812.01467v4 [math.PR] UPDATED)

Using hierarchical matrices in the solution of the time-fractional heat equation by multigrid waveform relaxation. (arXiv:1706.07632v3 [math.NA] UPDATED)

The Zhou Ordinal of Labelled Markov Processes over Separable Spaces. (arXiv:2005.03630v1 [cs.LO])

Seismic Shot Gather Noise Localization Using a Multi-Scale Feature-Fusion-Based Neural Network. (arXiv:2005.03626v1 [cs.CV])

COVID-19 Contact-tracing Apps: A Survey on the Global Deployment and Challenges. (arXiv:2005.03599v1 [cs.CR])

A Local Spectral Exterior Calculus for the Sphere and Application to the Shallow Water Equations. (arXiv:2005.03598v1 [math.NA])

A Tale of Two Perplexities: Sensitivity of Neural Language Models to Lexical Retrieval Deficits in Dementia of the Alzheimer's Type. (arXiv:2005.03593v1 [cs.CL])

GeoLogic -- Graphical interactive theorem prover for Euclidean geometry. (arXiv:2005.03586v1 [cs.LO])

Credulous Users and Fake News: a Real Case Study on the Propagation in Twitter. (arXiv:2005.03550v1 [cs.SI])

The Danish Gigaword Project. (arXiv:2005.03521v1 [cs.CL])

Subtle Sensing: Detecting Differences in the Flexibility of Virtually Simulated Molecular Objects. (arXiv:2005.03503v1 [cs.HC])

Heidelberg Colorectal Data Set for Surgical Data Science in the Sensor Operating Room. (arXiv:2005.03501v1 [cs.CV])

Text Recognition in the Wild: A Survey. (arXiv:2005.03492v1 [cs.CV])

Successfully Applying the Stabilized Lottery Ticket Hypothesis to the Transformer Architecture. (arXiv:2005.03454v1 [cs.LG])

A combination of 'pooling' with a prediction model can reduce by 73% the number of COVID-19 (Corona-virus) tests. (arXiv:2005.03453v1 [cs.LG])

Dirichlet spectral-Galerkin approximation method for the simply supported vibrating plate eigenvalues. (arXiv:2005.03433v1 [math.NA])

The Perceptimatic English Benchmark for Speech Perception Models. (arXiv:2005.03418v1 [cs.CL])

Detection and Feeder Identification of the High Impedance Fault at Distribution Networks Based on Synchronous Waveform Distortions. (arXiv:2005.03411v1 [eess.SY])

Energy-efficient topology to enhance the wireless sensor network lifetime using connectivity control. (arXiv:2005.03370v1 [cs.NI])

Error estimates for the Cahn--Hilliard equation with dynamic boundary conditions. (arXiv:2005.03349v1 [math.NA])

Scene Text Image Super-Resolution in the Wild. (arXiv:2005.03341v1 [cs.CV])

Encoding in the Dark Grand Challenge: An Overview. (arXiv:2005.03315v1 [eess.IV])

On the unique solution of the generalized absolute value equation. (arXiv:2005.03287v1 [math.NA])

Structured inversion of the Bernstein-Vandermonde Matrix. (arXiv:2005.03251v1 [math.NA])

Conley's fundamental theorem for a class of hybrid systems. (arXiv:2005.03217v1 [math.DS])

An Optimal Control Theory for the Traveling Salesman Problem and Its Variants. (arXiv:2005.03186v1 [math.OC])

Avoiding 5/4-powers on the alphabet of nonnegative integers. (arXiv:2005.03158v1 [math.CO])

On the Learnability of Possibilistic Theories. (arXiv:2005.03157v1 [cs.LO])

Optimally Convergent Mixed Finite Element Methods for the Stochastic Stokes Equations. (arXiv:2005.03148v1 [math.NA])

A Separation Theorem for Joint Sensor and Actuator Scheduling with Guaranteed Performance Bounds. (arXiv:2005.03143v1 [eess.SY])

Diagnosing the Environment Bias in Vision-and-Language Navigation. (arXiv:2005.03086v1 [cs.CL])

The finish line: Attachment of Signs

The Finish Line: Katrina One Year After

The Finish Line: Cast Stone and EIFS

The Finish Line: Changing Stucco to EIFS

The Finish Line: A Case Study: What is Causing This?

The Finish Line: All About Rust

The Finish Line: Backwrapping vs. Edgewrapping

The Finish Line: Cleaning EIFS

The Finish Line: Floor Line Joints

The Finish Line: FAQ's About EIFS Part 1

The Finish Line: Drainage Efficiency

The Finish Line: Earthquakes and EIFS

The Finish Line: Types of EIFS

The Finish Line: Eco-Friendliness of EIFS

The Finish Line: Foam Shapes Revisited

Subscribe To Our Newsletter