Latest si news

The Sensitivity of Language Models and Humans to Winograd Schema Perturbations. (arXiv:2005.01348v2 [cs.CL] UPDATED)

By arxiv.org
Published On ::

Large-scale pretrained language models are the major driving force behind recent improvements in performance on the Winograd Schema Challenge, a widely employed test of common sense reasoning ability. We show, however, with a new diagnostic dataset, that these models are sensitive to linguistic perturbations of the Winograd examples that minimally affect human understanding. Our results highlight interesting differences between humans and language models: language models are more sensitive to number or gender alternations and synonym replacements than humans, and humans are more stable and consistent in their predictions, maintain a much higher absolute performance, and perform better on non-associative instances than associative ones. Overall, humans are correct more often than out-of-the-box models, and the models are sometimes right for the wrong reasons. Finally, we show that fine-tuning on a large, task-specific dataset can offer a solution to these issues.

The Sensitivity of Language Models and Humans to Winograd Schema Perturbations. (arXiv:2005.01348v2 [cs.CL] UPDATED)

Prediction of Event Related Potential Speller Performance Using Resting-State EEG. (arXiv:2005.01325v3 [cs.HC] UPDATED)

Quantum arithmetic operations based on quantum Fourier transform on signed integers. (arXiv:2005.00443v2 [cs.IT] UPDATED)

Teaching Cameras to Feel: Estimating Tactile Physical Properties of Surfaces From Images. (arXiv:2004.14487v2 [cs.CV] UPDATED)

Self-Attention with Cross-Lingual Position Representation. (arXiv:2004.13310v2 [cs.CL] UPDATED)

Jealousy-freeness and other common properties in Fair Division of Mixed Manna. (arXiv:2004.11469v2 [cs.GT] UPDATED)

Warwick Image Forensics Dataset for Device Fingerprinting In Multimedia Forensics. (arXiv:2004.10469v2 [cs.CV] UPDATED)

SPECTER: Document-level Representation Learning using Citation-informed Transformers. (arXiv:2004.07180v3 [cs.CL] UPDATED)

Transfer Learning for EEG-Based Brain-Computer Interfaces: A Review of Progress Made Since 2016. (arXiv:2004.06286v3 [cs.HC] UPDATED)

Decoding EEG Rhythms During Action Observation, Motor Imagery, and Execution for Standing and Sitting. (arXiv:2004.04107v2 [cs.HC] UPDATED)

PACT: Privacy Sensitive Protocols and Mechanisms for Mobile Contact Tracing. (arXiv:2004.03544v4 [cs.CR] UPDATED)

Improved RawNet with Feature Map Scaling for Text-independent Speaker Verification using Raw Waveforms. (arXiv:2004.00526v2 [eess.AS] UPDATED)

Subgraph densities in a surface. (arXiv:2003.13777v2 [math.CO] UPDATED)

Hierarchical Neural Architecture Search for Single Image Super-Resolution. (arXiv:2003.04619v2 [cs.CV] UPDATED)

Trees and Forests in Nuclear Physics. (arXiv:2002.10290v2 [nucl-th] UPDATED)

Recursed is not Recursive: A Jarring Result. (arXiv:2002.05131v2 [cs.AI] UPDATED)

Continuous speech separation: dataset and analysis. (arXiv:2001.11482v3 [cs.SD] UPDATED)

Evolutionary Dynamics of Higher-Order Interactions. (arXiv:2001.10313v2 [physics.soc-ph] UPDATED)

Intra-Variable Handwriting Inspection Reinforced with Idiosyncrasy Analysis. (arXiv:1912.12168v2 [cs.CV] UPDATED)

SCAttNet: Semantic Segmentation Network with Spatial and Channel Attention Mechanism for High-Resolution Remote Sensing Images. (arXiv:1912.09121v2 [cs.CV] UPDATED)

Robustly Clustering a Mixture of Gaussians. (arXiv:1911.11838v5 [cs.DS] UPDATED)

t-SS3: a text classifier with dynamic n-grams for early risk detection over text streams. (arXiv:1911.06147v2 [cs.CL] UPDATED)

Unsupervised Domain Adaptation on Reading Comprehension. (arXiv:1911.06137v4 [cs.CL] UPDATED)

Revisiting Semantics of Interactions for Trace Validity Analysis. (arXiv:1911.03094v2 [cs.SE] UPDATED)

Biologic and Prognostic Feature Scores from Whole-Slide Histology Images Using Deep Learning. (arXiv:1910.09100v4 [q-bio.QM] UPDATED)

Imitation Learning for Human-robot Cooperation Using Bilateral Control. (arXiv:1909.13018v2 [cs.RO] UPDATED)

Box Covers and Domain Orderings for Beyond Worst-Case Join Processing. (arXiv:1909.12102v2 [cs.DB] UPDATED)

The Mapillary Traffic Sign Dataset for Detection and Classification on a Global Scale. (arXiv:1909.04422v2 [cs.CV] UPDATED)

Over-the-Air Computation Systems: Optimization, Analysis and Scaling Laws. (arXiv:1909.00329v2 [cs.IT] UPDATED)

Numerical study on the effect of geometric approximation error in the numerical solution of PDEs using a high-order curvilinear mesh. (arXiv:1908.09917v2 [math.NA] UPDATED)

A Shift Selection Strategy for Parallel Shift-Invert Spectrum Slicing in Symmetric Self-Consistent Eigenvalue Computation. (arXiv:1908.06043v2 [math.NA] UPDATED)

Single use register automata for data words. (arXiv:1907.10504v2 [cs.FL] UPDATED)

Establishing the Quantum Supremacy Frontier with a 281 Pflop/s Simulation. (arXiv:1905.00444v2 [quant-ph] UPDATED)

A Fast and Accurate Algorithm for Spherical Harmonic Analysis on HEALPix Grids with Applications to the Cosmic Microwave Background Radiation. (arXiv:1904.10514v4 [math.NA] UPDATED)

Constrained Restless Bandits for Dynamic Scheduling in Cyber-Physical Systems. (arXiv:1904.08962v3 [cs.SY] UPDATED)

Asymptotic expansions of eigenvalues by both the Crouzeix-Raviart and enriched Crouzeix-Raviart elements. (arXiv:1902.09524v2 [math.NA] UPDATED)

SilhoNet: An RGB Method for 6D Object Pose Estimation. (arXiv:1809.06893v4 [cs.CV] UPDATED)

Identifying Compromised Accounts on Social Media Using Statistical Text Analysis. (arXiv:1804.07247v3 [cs.SI] UPDATED)

Using hierarchical matrices in the solution of the time-fractional heat equation by multigrid waveform relaxation. (arXiv:1706.07632v3 [math.NA] UPDATED)

Compression, inversion, and approximate PCA of dense kernel matrices at near-linear computational complexity. (arXiv:1706.02205v4 [math.NA] UPDATED)

Seismic Shot Gather Noise Localization Using a Multi-Scale Feature-Fusion-Based Neural Network. (arXiv:2005.03626v1 [cs.CV])

Real-Time Context-aware Detection of Unsafe Events in Robot-Assisted Surgery. (arXiv:2005.03611v1 [cs.RO])

Delayed approximate matrix assembly in multigrid with dynamic precisions. (arXiv:2005.03606v1 [cs.MS])

A Tale of Two Perplexities: Sensitivity of Neural Language Models to Lexical Retrieval Deficits in Dementia of the Alzheimer's Type. (arXiv:2005.03593v1 [cs.CL])

Simulating Population Protocols in Sub-Constant Time per Interaction. (arXiv:2005.03584v1 [cs.DS])

A Reduced Basis Method For Fractional Diffusion Operators II. (arXiv:2005.03574v1 [math.NA])

Credulous Users and Fake News: a Real Case Study on the Propagation in Twitter. (arXiv:2005.03550v1 [cs.SI])

MISA: Modality-Invariant and -Specific Representations for Multimodal Sentiment Analysis. (arXiv:2005.03545v1 [cs.CL])

Sunny Pointer: Designing a mouse pointer for people with peripheral vision loss. (arXiv:2005.03504v1 [cs.HC])

Subtle Sensing: Detecting Differences in the Flexibility of Virtually Simulated Molecular Objects. (arXiv:2005.03503v1 [cs.HC])

The finish line: Attachment of Signs

The Finish Line: A Case Study: What is Causing This?

The Finish Line: Foam Shapes Revisited

The Finish Line: Adhesives vs. Mechanical Fasteners

The Finish Line: Design Features

ANSI Green Globes 2015

Passive Houses Gain Momentum

Exoskeleton in the Job Site Closet

The Wait is Over: MarinoWARE’s Brand-New Website is Here

American Industrial Partners to Acquire PPG’s Architectural Coatings Business

Fundraising Regulator appoints four new committee members

Only 12 per cent of leading charities publicly recognise a trade union, analysis suggests

SIA Releases New Version of OSDP Standard

Graybar Targets Ways to Make Business Strong and Sustainable

Ultrabond ECO 885 Premium Grade Polyolefin Backed Carpet Adhesive

Subscribe To Our Newsletter