4 SPECTER: Document-level Representation Learning using Citation-informed Transformers. (arXiv:2004.07180v3 [cs.CL] UPDATED) By arxiv.org Published On :: Representation learning is a critical ingredient for natural language processing systems. Recent Transformer language models like BERT learn powerful textual representations, but these models are targeted towards token- and sentence-level training objectives and do not leverage information on inter-document relatedness, which limits their document-level representation power. For applications on scientific documents, such as classification and recommendation, the embeddings power strong performance on end tasks. We propose SPECTER, a new method to generate document-level embedding of scientific documents based on pretraining a Transformer language model on a powerful signal of document-level relatedness: the citation graph. Unlike existing pretrained language models, SPECTER can be easily applied to downstream applications without task-specific fine-tuning. Additionally, to encourage further research on document-level models, we introduce SciDocs, a new evaluation benchmark consisting of seven document-level tasks ranging from citation prediction, to document classification and recommendation. We show that SPECTER outperforms a variety of competitive baselines on the benchmark. Full Article
4 The growth rate over trees of any family of set defined by a monadic second order formula is semi-computable. (arXiv:2004.06508v3 [cs.DM] UPDATED) By arxiv.org Published On :: Monadic second order logic can be used to express many classical notions of sets of vertices of a graph as for instance: dominating sets, induced matchings, perfect codes, independent sets or irredundant sets. Bounds on the number of sets of any such family of sets are interesting from a combinatorial point of view and have algorithmic applications. Many such bounds on different families of sets over different classes of graphs are already provided in the literature. In particular, Rote recently showed that the number of minimal dominating sets in trees of order $n$ is at most $95^{frac{n}{13}}$ and that this bound is asymptotically sharp up to a multiplicative constant. We build on his work to show that what he did for minimal dominating sets can be done for any family of sets definable by a monadic second order formula. We first show that, for any monadic second order formula over graphs that characterizes a given kind of subset of its vertices, the maximal number of such sets in a tree can be expressed as the extit{growth rate of a bilinear system}. This mostly relies on well known links between monadic second order logic over trees and tree automata and basic tree automata manipulations. Then we show that this "growth rate" of a bilinear system can be approximated from above.We then use our implementation of this result to provide bounds on the number of independent dominating sets, total perfect dominating sets, induced matchings, maximal induced matchings, minimal perfect dominating sets, perfect codes and maximal irredundant sets on trees. We also solve a question from D. Y. Kang et al. regarding $r$-matchings and improve a bound from G'orska and Skupie'n on the number of maximal matchings on trees. Remark that this approach is easily generalizable to graphs of bounded tree width or clique width (or any similar class of graphs where tree automata are meaningful). Full Article
4 Cross-Lingual Semantic Role Labeling with High-Quality Translated Training Corpus. (arXiv:2004.06295v2 [cs.CL] UPDATED) By arxiv.org Published On :: Many efforts of research are devoted to semantic role labeling (SRL) which is crucial for natural language understanding. Supervised approaches have achieved impressing performances when large-scale corpora are available for resource-rich languages such as English. While for the low-resource languages with no annotated SRL dataset, it is still challenging to obtain competitive performances. Cross-lingual SRL is one promising way to address the problem, which has achieved great advances with the help of model transferring and annotation projection. In this paper, we propose a novel alternative based on corpus translation, constructing high-quality training datasets for the target languages from the source gold-standard SRL annotations. Experimental results on Universal Proposition Bank show that the translation-based method is highly effective, and the automatic pseudo datasets can improve the target-language SRL performances significantly. Full Article
4 Transfer Learning for EEG-Based Brain-Computer Interfaces: A Review of Progress Made Since 2016. (arXiv:2004.06286v3 [cs.HC] UPDATED) By arxiv.org Published On :: A brain-computer interface (BCI) enables a user to communicate with a computer directly using brain signals. Electroencephalograms (EEGs) used in BCIs are weak, easily contaminated by interference and noise, non-stationary for the same subject, and varying across different subjects and sessions. Therefore, it is difficult to build a generic pattern recognition model in an EEG-based BCI system that is optimal for different subjects, during different sessions, for different devices and tasks. Usually, a calibration session is needed to collect some training data for a new subject, which is time consuming and user unfriendly. Transfer learning (TL), which utilizes data or knowledge from similar or relevant subjects/sessions/devices/tasks to facilitate learning for a new subject/session/device/task, is frequently used to reduce the amount of calibration effort. This paper reviews journal publications on TL approaches in EEG-based BCIs in the last few years, i.e., since 2016. Six paradigms and applications -- motor imagery, event-related potentials, steady-state visual evoked potentials, affective BCIs, regression problems, and adversarial attacks -- are considered. For each paradigm/application, we group the TL approaches into cross-subject/session, cross-device, and cross-task settings and review them separately. Observations and conclusions are made at the end of the paper, which may point to future research directions. Full Article
4 Decoding EEG Rhythms During Action Observation, Motor Imagery, and Execution for Standing and Sitting. (arXiv:2004.04107v2 [cs.HC] UPDATED) By arxiv.org Published On :: Event-related desynchronization and synchronization (ERD/S) and movement-related cortical potential (MRCP) play an important role in brain-computer interfaces (BCI) for lower limb rehabilitation, particularly in standing and sitting. However, little is known about the differences in the cortical activation between standing and sitting, especially how the brain's intention modulates the pre-movement sensorimotor rhythm as they do for switching movements. In this study, we aim to investigate the decoding of continuous EEG rhythms during action observation (AO), motor imagery (MI), and motor execution (ME) for standing and sitting. We developed a behavioral task in which participants were instructed to perform both AO and MI/ME in regard to the actions of sit-to-stand and stand-to-sit. Our results demonstrated that the ERD was prominent during AO, whereas ERS was typical during MI at the alpha band across the sensorimotor area. A combination of the filter bank common spatial pattern (FBCSP) and support vector machine (SVM) for classification was used for both offline and pseudo-online analyses. The offline analysis indicated the classification of AO and MI providing the highest mean accuracy at 82.73$pm$2.38\% in stand-to-sit transition. By applying the pseudo-online analysis, we demonstrated the higher performance of decoding neural intentions from the MI paradigm in comparison to the ME paradigm. These observations led us to the promising aspect of using our developed tasks based on the integration of both AO and MI to build future exoskeleton-based rehabilitation systems. Full Article
4 PACT: Privacy Sensitive Protocols and Mechanisms for Mobile Contact Tracing. (arXiv:2004.03544v4 [cs.CR] UPDATED) By arxiv.org Published On :: The global health threat from COVID-19 has been controlled in a number of instances by large-scale testing and contact tracing efforts. We created this document to suggest three functionalities on how we might best harness computing technologies to supporting the goals of public health organizations in minimizing morbidity and mortality associated with the spread of COVID-19, while protecting the civil liberties of individuals. In particular, this work advocates for a third-party free approach to assisted mobile contact tracing, because such an approach mitigates the security and privacy risks of requiring a trusted third party. We also explicitly consider the inferential risks involved in any contract tracing system, where any alert to a user could itself give rise to de-anonymizing information. More generally, we hope to participate in bringing together colleagues in industry, academia, and civil society to discuss and converge on ideas around a critical issue rising with attempts to mitigate the COVID-19 pandemic. Full Article
4 Deblurring by Realistic Blurring. (arXiv:2004.01860v2 [cs.CV] UPDATED) By arxiv.org Published On :: Existing deep learning methods for image deblurring typically train models using pairs of sharp images and their blurred counterparts. However, synthetically blurring images do not necessarily model the genuine blurring process in real-world scenarios with sufficient accuracy. To address this problem, we propose a new method which combines two GAN models, i.e., a learning-to-Blur GAN (BGAN) and learning-to-DeBlur GAN (DBGAN), in order to learn a better model for image deblurring by primarily learning how to blur images. The first model, BGAN, learns how to blur sharp images with unpaired sharp and blurry image sets, and then guides the second model, DBGAN, to learn how to correctly deblur such images. In order to reduce the discrepancy between real blur and synthesized blur, a relativistic blur loss is leveraged. As an additional contribution, this paper also introduces a Real-World Blurred Image (RWBI) dataset including diverse blurry images. Our experiments show that the proposed method achieves consistently superior quantitative performance as well as higher perceptual quality on both the newly proposed dataset and the public GOPRO dataset. Full Article
4 Improved RawNet with Feature Map Scaling for Text-independent Speaker Verification using Raw Waveforms. (arXiv:2004.00526v2 [eess.AS] UPDATED) By arxiv.org Published On :: Recent advances in deep learning have facilitated the design of speaker verification systems that directly input raw waveforms. For example, RawNet extracts speaker embeddings from raw waveforms, which simplifies the process pipeline and demonstrates competitive performance. In this study, we improve RawNet by scaling feature maps using various methods. The proposed mechanism utilizes a scale vector that adopts a sigmoid non-linear function. It refers to a vector with dimensionality equal to the number of filters in a given feature map. Using a scale vector, we propose to scale the feature map multiplicatively, additively, or both. In addition, we investigate replacing the first convolution layer with the sinc-convolution layer of SincNet. Experiments performed on the VoxCeleb1 evaluation dataset demonstrate the effectiveness of the proposed methods, and the best performing system reduces the equal error rate by half compared to the original RawNet. Expanded evaluation results obtained using the VoxCeleb1-E and VoxCeleb-H protocols marginally outperform existing state-of-the-art systems. Full Article
4 Personal Health Knowledge Graphs for Patients. (arXiv:2004.00071v2 [cs.AI] UPDATED) By arxiv.org Published On :: Existing patient data analytics platforms fail to incorporate information that has context, is personal, and topical to patients. For a recommendation system to give a suitable response to a query or to derive meaningful insights from patient data, it should consider personal information about the patient's health history, including but not limited to their preferences, locations, and life choices that are currently applicable to them. In this review paper, we critique existing literature in this space and also discuss the various research challenges that come with designing, building, and operationalizing a personal health knowledge graph (PHKG) for patients. Full Article
4 Mathematical Formulae in Wikimedia Projects 2020. (arXiv:2003.09417v2 [cs.DL] UPDATED) By arxiv.org Published On :: This poster summarizes our contributions to Wikimedia's processing pipeline for mathematical formulae. We describe how we have supported the transition from rendering formulae as course-grained PNG images in 2001 to providing modern semantically enriched language-independent MathML formulae in 2020. Additionally, we describe our plans to improve the accessibility and discoverability of mathematical knowledge in Wikimedia projects further. Full Article
4 Hierarchical Neural Architecture Search for Single Image Super-Resolution. (arXiv:2003.04619v2 [cs.CV] UPDATED) By arxiv.org Published On :: Deep neural networks have exhibited promising performance in image super-resolution (SR). Most SR models follow a hierarchical architecture that contains both the cell-level design of computational blocks and the network-level design of the positions of upsampling blocks. However, designing SR models heavily relies on human expertise and is very labor-intensive. More critically, these SR models often contain a huge number of parameters and may not meet the requirements of computation resources in real-world applications. To address the above issues, we propose a Hierarchical Neural Architecture Search (HNAS) method to automatically design promising architectures with different requirements of computation cost. To this end, we design a hierarchical SR search space and propose a hierarchical controller for architecture search. Such a hierarchical controller is able to simultaneously find promising cell-level blocks and network-level positions of upsampling layers. Moreover, to design compact architectures with promising performance, we build a joint reward by considering both the performance and computation cost to guide the search process. Extensive experiments on five benchmark datasets demonstrate the superiority of our method over existing methods. Full Article
4 Eccentricity terrain of $delta$-hyperbolic graphs. (arXiv:2002.08495v2 [cs.DM] UPDATED) By arxiv.org Published On :: A graph $G=(V,E)$ is $delta$-hyperbolic if for any four vertices $u,v,w,x$, the two larger of the three distance sums $d(u,v)+d(w,x)$, $d(u,w)+d(v,x)$, and $d(u,x)+d(v,w)$ differ by at most $2delta geq 0$. Recent work shows that many real-world graphs have small hyperbolicity $delta$. This paper describes the eccentricity terrain of a $delta$-hyperbolic graph. The eccentricity function $e_G(v)=max{d(v,u) : u in V}$ partitions the vertex set of $G$ into eccentricity layers $C_{k}(G) = {v in V : e(v)=rad(G)+k}$, $k in mathbb{N}$, where $rad(G)=min{e_G(v): vin V}$ is the radius of $G$. The paper studies the eccentricity layers of vertices along shortest paths, identifying such terrain features as hills, plains, valleys, terraces, and plateaus. It introduces the notion of $eta$-pseudoconvexity, which implies Gromov's $epsilon$-quasiconvexity, and illustrates the abundance of pseudoconvex sets in $delta$-hyperbolic graphs. In particular, it shows that all sets $C_{leq k}(G)={vin V : e_G(v) leq rad(G) + k}$, $kin mathbb{N}$, are $(2delta-1)$-pseudoconvex. Additionally, several bounds on the eccentricity of a vertex are obtained which yield a few approaches to efficiently approximating all eccentricities. An $O(delta |E|)$ time eccentricity approximation $hat{e}(v)$, for all $vin V$, is presented that uses distances to two mutually distant vertices and satisfies $e_G(v)-2delta leq hat{e}(v) leq {e_G}(v)$. It also shows existence of two eccentricity approximating spanning trees $T$, one constructible in $O(delta |E|)$ time and the other in $O(|E|)$ time, which satisfy ${e}_G(v) leq e_T(v) leq {e}_G(v)+4delta+1$ and ${e}_G(v) leq e_T(v) leq {e}_G(v)+6delta$, respectively. Thus, the eccentricity terrain of a tree gives a good approximation (up-to an additive error $O(delta))$ of the eccentricity terrain of a $delta$-hyperbolic graph. Full Article
4 Lake Ice Detection from Sentinel-1 SAR with Deep Learning. (arXiv:2002.07040v2 [eess.IV] UPDATED) By arxiv.org Published On :: Lake ice, as part of the Essential Climate Variable (ECV) lakes, is an important indicator to monitor climate change and global warming. The spatio-temporal extent of lake ice cover, along with the timings of key phenological events such as freeze-up and break-up, provide important cues about the local and global climate. We present a lake ice monitoring system based on the automatic analysis of Sentinel-1 Synthetic Aperture Radar (SAR) data with a deep neural network. In previous studies that used optical satellite imagery for lake ice monitoring, frequent cloud cover was a main limiting factor, which we overcome thanks to the ability of microwave sensors to penetrate clouds and observe the lakes regardless of the weather and illumination conditions. We cast ice detection as a two class (frozen, non-frozen) semantic segmentation problem and solve it using a state-of-the-art deep convolutional network (CNN). We report results on two winters ( 2016 - 17 and 2017 - 18 ) and three alpine lakes in Switzerland. The proposed model reaches mean Intersection-over-Union (mIoU) scores >90% on average, and >84% even for the most difficult lake. Additionally, we perform cross-validation tests and show that our algorithm generalises well across unseen lakes and winters. Full Article
4 On Rearrangement of Items Stored in Stacks. (arXiv:2002.04979v2 [cs.RO] UPDATED) By arxiv.org Published On :: There are $n ge 2$ stacks, each filled with $d$ items, and one empty stack. Every stack has capacity $d > 0$. A robot arm, in one stack operation (step), may pop one item from the top of a non-empty stack and subsequently push it onto a stack not at capacity. In a {em labeled} problem, all $nd$ items are distinguishable and are initially randomly scattered in the $n$ stacks. The items must be rearranged using pop-and-pushs so that in the end, the $k^{ m th}$ stack holds items $(k-1)d +1, ldots, kd$, in that order, from the top to the bottom for all $1 le k le n$. In an {em unlabeled} problem, the $nd$ items are of $n$ types of $d$ each. The goal is to rearrange items so that items of type $k$ are located in the $k^{ m th}$ stack for all $1 le k le n$. In carrying out the rearrangement, a natural question is to find the least number of required pop-and-pushes. Our main contributions are: (1) an algorithm for restoring the order of $n^2$ items stored in an $n imes n$ table using only $2n$ column and row permutations, and its generalization, and (2) an algorithm with a guaranteed upper bound of $O(nd)$ steps for solving both versions of the stack rearrangement problem when $d le lceil cn ceil$ for arbitrary fixed positive number $c$. In terms of the required number of steps, the labeled and unlabeled version have lower bounds $Omega(nd + nd{frac{log d}{log n}})$ and $Omega(nd)$, respectively. Full Article
4 Toward Improving the Evaluation of Visual Attention Models: a Crowdsourcing Approach. (arXiv:2002.04407v2 [cs.CV] UPDATED) By arxiv.org Published On :: Human visual attention is a complex phenomenon. A computational modeling of this phenomenon must take into account where people look in order to evaluate which are the salient locations (spatial distribution of the fixations), when they look in those locations to understand the temporal development of the exploration (temporal order of the fixations), and how they move from one location to another with respect to the dynamics of the scene and the mechanics of the eyes (dynamics). State-of-the-art models focus on learning saliency maps from human data, a process that only takes into account the spatial component of the phenomenon and ignore its temporal and dynamical counterparts. In this work we focus on the evaluation methodology of models of human visual attention. We underline the limits of the current metrics for saliency prediction and scanpath similarity, and we introduce a statistical measure for the evaluation of the dynamics of the simulated eye movements. While deep learning models achieve astonishing performance in saliency prediction, our analysis shows their limitations in capturing the dynamics of the process. We find that unsupervised gravitational models, despite of their simplicity, outperform all competitors. Finally, exploiting a crowd-sourcing platform, we present a study aimed at evaluating how strongly the scanpaths generated with the unsupervised gravitational models appear plausible to naive and expert human observers. Full Article
4 Continuous speech separation: dataset and analysis. (arXiv:2001.11482v3 [cs.SD] UPDATED) By arxiv.org Published On :: This paper describes a dataset and protocols for evaluating continuous speech separation algorithms. Most prior studies on speech separation use pre-segmented signals of artificially mixed speech utterances which are mostly emph{fully} overlapped, and the algorithms are evaluated based on signal-to-distortion ratio or similar performance metrics. However, in natural conversations, a speech signal is continuous, containing both overlapped and overlap-free components. In addition, the signal-based metrics have very weak correlations with automatic speech recognition (ASR) accuracy. We think that not only does this make it hard to assess the practical relevance of the tested algorithms, it also hinders researchers from developing systems that can be readily applied to real scenarios. In this paper, we define continuous speech separation (CSS) as a task of generating a set of non-overlapped speech signals from a extit{continuous} audio stream that contains multiple utterances that are emph{partially} overlapped by a varying degree. A new real recorded dataset, called LibriCSS, is derived from LibriSpeech by concatenating the corpus utterances to simulate a conversation and capturing the audio replays with far-field microphones. A Kaldi-based ASR evaluation protocol is also established by using a well-trained multi-conditional acoustic model. By using this dataset, several aspects of a recently proposed speaker-independent CSS algorithm are investigated. The dataset and evaluation scripts are available to facilitate the research in this direction. Full Article
4 Provenance for the Description Logic ELHr. (arXiv:2001.07541v2 [cs.LO] UPDATED) By arxiv.org Published On :: We address the problem of handling provenance information in ELHr ontologies. We consider a setting recently introduced for ontology-based data access, based on semirings and extending classical data provenance, in which ontology axioms are annotated with provenance tokens. A consequence inherits the provenance of the axioms involved in deriving it, yielding a provenance polynomial as an annotation. We analyse the semantics for the ELHr case and show that the presence of conjunctions poses various difficulties for handling provenance, some of which are mitigated by assuming multiplicative idempotency of the semiring. Under this assumption, we study three problems: ontology completion with provenance, computing the set of relevant axioms for a consequence, and query answering. Full Article
4 Hardware Implementation of Neural Self-Interference Cancellation. (arXiv:2001.04543v2 [eess.SP] UPDATED) By arxiv.org Published On :: In-band full-duplex systems can transmit and receive information simultaneously on the same frequency band. However, due to the strong self-interference caused by the transmitter to its own receiver, the use of non-linear digital self-interference cancellation is essential. In this work, we describe a hardware architecture for a neural network-based non-linear self-interference (SI) canceller and we compare it with our own hardware implementation of a conventional polynomial based SI canceller. In particular, we present implementation results for a shallow and a deep neural network SI canceller as well as for a polynomial SI canceller. Our results show that the deep neural network canceller achieves a hardware efficiency of up to $312.8$ Msamples/s/mm$^2$ and an energy efficiency of up to $0.9$ nJ/sample, which is $2.1 imes$ and $2 imes$ better than the polynomial SI canceller, respectively. These results show that NN-based methods applied to communications are not only useful from a performance perspective, but can also be a very effective means to reduce the implementation complexity. Full Article
4 Maximal Closed Set and Half-Space Separations in Finite Closure Systems. (arXiv:2001.04417v2 [cs.AI] UPDATED) By arxiv.org Published On :: Several problems of artificial intelligence, such as predictive learning, formal concept analysis or inductive logic programming, can be viewed as a special case of half-space separation in abstract closure systems over finite ground sets. For the typical scenario that the closure system is given via a closure operator, we show that the half-space separation problem is NP-complete. As a first approach to overcome this negative result, we relax the problem to maximal closed set separation, give a greedy algorithm solving this problem with a linear number of closure operator calls, and show that this bound is sharp. For a second direction, we consider Kakutani closure systems and prove that they are algorithmically characterized by the greedy algorithm. As a first special case of the general problem setting, we consider Kakutani closure systems over graphs, generalize a fundamental characterization result based on the Pasch axiom to graph structured partitioning of finite sets, and give a sufficient condition for this kind of closures systems in terms of graph minors. For a second case, we then focus on closure systems over finite lattices, give an improved adaptation of the greedy algorithm for this special case, and present two applications concerning formal concept and subsumption lattices. We also report some experimental results to demonstrate the practical usefulness of our algorithm. Full Article
4 Games Where You Can Play Optimally with Arena-Independent Finite Memory. (arXiv:2001.03894v2 [cs.GT] UPDATED) By arxiv.org Published On :: For decades, two-player (antagonistic) games on graphs have been a framework of choice for many important problems in theoretical computer science. A notorious one is controller synthesis, which can be rephrased through the game-theoretic metaphor as the quest for a winning strategy of the system in a game against its antagonistic environment. Depending on the specification, optimal strategies might be simple or quite complex, for example having to use (possibly infinite) memory. Hence, research strives to understand which settings allow for simple strategies. In 2005, Gimbert and Zielonka provided a complete characterization of preference relations (a formal framework to model specifications and game objectives) that admit memoryless optimal strategies for both players. In the last fifteen years however, practical applications have driven the community toward games with complex or multiple objectives, where memory -- finite or infinite -- is almost always required. Despite much effort, the exact frontiers of the class of preference relations that admit finite-memory optimal strategies still elude us. In this work, we establish a complete characterization of preference relations that admit optimal strategies using arena-independent finite memory, generalizing the work of Gimbert and Zielonka to the finite-memory case. We also prove an equivalent to their celebrated corollary of great practical interest: if both players have optimal (arena-independent-)finite-memory strategies in all one-player games, then it is also the case in all two-player games. Finally, we pinpoint the boundaries of our results with regard to the literature: our work completely covers the case of arena-independent memory (e.g., multiple parity objectives, lower- and upper-bounded energy objectives), and paves the way to the arena-dependent case (e.g., multiple lower-bounded energy objectives). Full Article
4 Safe non-smooth black-box optimization with application to policy search. (arXiv:1912.09466v3 [math.OC] UPDATED) By arxiv.org Published On :: For safety-critical black-box optimization tasks, observations of the constraints and the objective are often noisy and available only for the feasible points. We propose an approach based on log barriers to find a local solution of a non-convex non-smooth black-box optimization problem $min f^0(x)$ subject to $f^i(x)leq 0,~ i = 1,ldots, m$, at the same time, guaranteeing constraint satisfaction while learning an optimal solution with high probability. Our proposed algorithm exploits noisy observations to iteratively improve on an initial safe point until convergence. We derive the convergence rate and prove safety of our algorithm. We demonstrate its performance in an application to an iterative control design problem. Full Article
4 t-SS3: a text classifier with dynamic n-grams for early risk detection over text streams. (arXiv:1911.06147v2 [cs.CL] UPDATED) By arxiv.org Published On :: A recently introduced classifier, called SS3, has shown to be well suited to deal with early risk detection (ERD) problems on text streams. It obtained state-of-the-art performance on early depression and anorexia detection on Reddit in the CLEF's eRisk open tasks. SS3 was created to deal with ERD problems naturally since: it supports incremental training and classification over text streams, and it can visually explain its rationale. However, SS3 processes the input using a bag-of-word model lacking the ability to recognize important word sequences. This aspect could negatively affect the classification performance and also reduces the descriptiveness of visual explanations. In the standard document classification field, it is very common to use word n-grams to try to overcome some of these limitations. Unfortunately, when working with text streams, using n-grams is not trivial since the system must learn and recognize which n-grams are important "on the fly". This paper introduces t-SS3, an extension of SS3 that allows it to recognize useful patterns over text streams dynamically. We evaluated our model in the eRisk 2017 and 2018 tasks on early depression and anorexia detection. Experimental results suggest that t-SS3 is able to improve both current results and the richness of visual explanations. Full Article
4 Unsupervised Domain Adaptation on Reading Comprehension. (arXiv:1911.06137v4 [cs.CL] UPDATED) By arxiv.org Published On :: Reading comprehension (RC) has been studied in a variety of datasets with the boosted performance brought by deep neural networks. However, the generalization capability of these models across different domains remains unclear. To alleviate this issue, we are going to investigate unsupervised domain adaptation on RC, wherein a model is trained on labeled source domain and to be applied to the target domain with only unlabeled samples. We first show that even with the powerful BERT contextual representation, the performance is still unsatisfactory when the model trained on one dataset is directly applied to another target dataset. To solve this, we provide a novel conditional adversarial self-training method (CASe). Specifically, our approach leverages a BERT model fine-tuned on the source dataset along with the confidence filtering to generate reliable pseudo-labeled samples in the target domain for self-training. On the other hand, it further reduces domain distribution discrepancy through conditional adversarial learning across domains. Extensive experiments show our approach achieves comparable accuracy to supervised models on multiple large-scale benchmark datasets. Full Article
4 Revisiting Semantics of Interactions for Trace Validity Analysis. (arXiv:1911.03094v2 [cs.SE] UPDATED) By arxiv.org Published On :: Interaction languages such as MSC are often associated with formal semantics by means of translations into distinct behavioral formalisms such as automatas or Petri nets. In contrast to translational approaches we propose an operational approach. Its principle is to identify which elementary communication actions can be immediately executed, and then to compute, for every such action, a new interaction representing the possible continuations to its execution. We also define an algorithm for checking the validity of execution traces (i.e. whether or not they belong to an interaction's semantics). Algorithms for semantic computation and trace validity are analyzed by means of experiments. Full Article
4 Biologic and Prognostic Feature Scores from Whole-Slide Histology Images Using Deep Learning. (arXiv:1910.09100v4 [q-bio.QM] UPDATED) By arxiv.org Published On :: Histopathology is a reflection of the molecular changes and provides prognostic phenotypes representing the disease progression. In this study, we introduced feature scores generated from hematoxylin and eosin histology images based on deep learning (DL) models developed for prostate pathology. We demonstrated that these feature scores were significantly prognostic for time to event endpoints (biochemical recurrence and cancer-specific survival) and had simultaneously molecular biologic associations to relevant genomic alterations and molecular subtypes using already trained DL models that were not previously exposed to the datasets of the current study. Further, we discussed the potential of such feature scores to improve the current tumor grading system and the challenges that are associated with tumor heterogeneity and the development of prognostic models from histology images. Our findings uncover the potential of feature scores from histology images as digital biomarkers in precision medicine and as an expanding utility for digital pathology. Full Article
4 Global Locality in Biomedical Relation and Event Extraction. (arXiv:1909.04822v2 [cs.CL] UPDATED) By arxiv.org Published On :: Due to the exponential growth of biomedical literature, event and relation extraction are important tasks in biomedical text mining. Most work only focus on relation extraction, and detect a single entity pair mention on a short span of text, which is not ideal due to long sentences that appear in biomedical contexts. We propose an approach to both relation and event extraction, for simultaneously predicting relationships between all mention pairs in a text. We also perform an empirical study to discuss different network setups for this purpose. The best performing model includes a set of multi-head attentions and convolutions, an adaptation of the transformer architecture, which offers self-attention the ability to strengthen dependencies among related elements, and models the interaction between features extracted by multiple attention heads. Experiment results demonstrate that our approach outperforms the state of the art on a set of benchmark biomedical corpora including BioNLP 2009, 2011, 2013 and BioCreative 2017 shared tasks. Full Article
4 The Mapillary Traffic Sign Dataset for Detection and Classification on a Global Scale. (arXiv:1909.04422v2 [cs.CV] UPDATED) By arxiv.org Published On :: Traffic signs are essential map features globally in the era of autonomous driving and smart cities. To develop accurate and robust algorithms for traffic sign detection and classification, a large-scale and diverse benchmark dataset is required. In this paper, we introduce a traffic sign benchmark dataset of 100K street-level images around the world that encapsulates diverse scenes, wide coverage of geographical locations, and varying weather and lighting conditions and covers more than 300 manually annotated traffic sign classes. The dataset includes 52K images that are fully annotated and 48K images that are partially annotated. This is the largest and the most diverse traffic sign dataset consisting of images from all over world with fine-grained annotations of traffic sign classes. We have run extensive experiments to establish strong baselines for both the detection and the classification tasks. In addition, we have verified that the diversity of this dataset enables effective transfer learning for existing large-scale benchmark datasets on traffic sign detection and classification. The dataset is freely available for academic research: https://www.mapillary.com/dataset/trafficsign. Full Article
4 A Shift Selection Strategy for Parallel Shift-Invert Spectrum Slicing in Symmetric Self-Consistent Eigenvalue Computation. (arXiv:1908.06043v2 [math.NA] UPDATED) By arxiv.org Published On :: The central importance of large scale eigenvalue problems in scientific computation necessitates the development of massively parallel algorithms for their solution. Recent advances in dense numerical linear algebra have enabled the routine treatment of eigenvalue problems with dimensions on the order of hundreds of thousands on the world's largest supercomputers. In cases where dense treatments are not feasible, Krylov subspace methods offer an attractive alternative due to the fact that they do not require storage of the problem matrices. However, demonstration of scalability of either of these classes of eigenvalue algorithms on computing architectures capable of expressing massive parallelism is non-trivial due to communication requirements and serial bottlenecks, respectively. In this work, we introduce the SISLICE method: a parallel shift-invert algorithm for the solution of the symmetric self-consistent field (SCF) eigenvalue problem. The SISLICE method drastically reduces the communication requirement of current parallel shift-invert eigenvalue algorithms through various shift selection and migration techniques based on density of states estimation and k-means clustering, respectively. This work demonstrates the robustness and parallel performance of the SISLICE method on a representative set of SCF eigenvalue problems and outlines research directions which will be explored in future work. Full Article
4 Single use register automata for data words. (arXiv:1907.10504v2 [cs.FL] UPDATED) By arxiv.org Published On :: Our starting point are register automata for data words, in the style of Kaminski and Francez. We study the effects of the single-use restriction, which says that a register is emptied immediately after being used. We show that under the single-use restriction, the theory of automata for data words becomes much more robust. The main results are: (a) five different machine models are equivalent as language acceptors, including one-way and two-way single-use register automata; (b) one can recover some of the algebraic theory of languages over finite alphabets, including a version of the Krohn-Rhodes Theorem; (c) there is also a robust theory of transducers, with four equivalent models, including two-way single use transducers and a variant of streaming string transducers for data words. These results are in contrast with automata for data words without the single-use restriction, where essentially all models are pairwise non-equivalent. Full Article
4 Establishing the Quantum Supremacy Frontier with a 281 Pflop/s Simulation. (arXiv:1905.00444v2 [quant-ph] UPDATED) By arxiv.org Published On :: Noisy Intermediate-Scale Quantum (NISQ) computers are entering an era in which they can perform computational tasks beyond the capabilities of the most powerful classical computers, thereby achieving "Quantum Supremacy", a major milestone in quantum computing. NISQ Supremacy requires comparison with a state-of-the-art classical simulator. We report HPC simulations of hard random quantum circuits (RQC), which have been recently used as a benchmark for the first experimental demonstration of Quantum Supremacy, sustaining an average performance of 281 Pflop/s (true single precision) on Summit, currently the fastest supercomputer in the World. These simulations were carried out using qFlex, a tensor-network-based classical high-performance simulator of RQCs. Our results show an advantage of many orders of magnitude in energy consumption of NISQ devices over classical supercomputers. In addition, we propose a standard benchmark for NISQ computers based on qFlex. Full Article
4 Parameterised Counting in Logspace. (arXiv:1904.12156v3 [cs.LO] UPDATED) By arxiv.org Published On :: Stockhusen and Tantau (IPEC 2013) defined the operators paraW and paraBeta for parameterised space complexity classes by allowing bounded nondeterminism with multiple read and read-once access, respectively. Using these operators, they obtained characterisations for the complexity of many parameterisations of natural problems on graphs. In this article, we study the counting versions of such operators and introduce variants based on tail-nondeterminism, paraW[1] and paraBetaTail, in the setting of parameterised logarithmic space. We examine closure properties of the new classes under the central reductions and arithmetic operations. We also identify a wide range of natural complete problems for our classes in the areas of walk counting in digraphs, first-order model-checking and graph-homomorphisms. In doing so, we also see that the closure of #paraBetaTail-L under parameterised logspace parsimonious reductions coincides with #paraBeta-L. We show that the complexity of a parameterised variant of the determinant function is #paraBetaTail-L-hard and can be written as the difference of two functions in #paraBetaTail-L for (0,1)-matrices. Finally, we characterise the new complexity classes in terms of branching programs. Full Article
4 On analog quantum algorithms for the mixing of Markov chains. (arXiv:1904.11895v2 [quant-ph] UPDATED) By arxiv.org Published On :: The problem of sampling from the stationary distribution of a Markov chain finds widespread applications in a variety of fields. The time required for a Markov chain to converge to its stationary distribution is known as the classical mixing time. In this article, we deal with analog quantum algorithms for mixing. First, we provide an analog quantum algorithm that given a Markov chain, allows us to sample from its stationary distribution in a time that scales as the sum of the square root of the classical mixing time and the square root of the classical hitting time. Our algorithm makes use of the framework of interpolated quantum walks and relies on Hamiltonian evolution in conjunction with von Neumann measurements. There also exists a different notion for quantum mixing: the problem of sampling from the limiting distribution of quantum walks, defined in a time-averaged sense. In this scenario, the quantum mixing time is defined as the time required to sample from a distribution that is close to this limiting distribution. Recently we provided an upper bound on the quantum mixing time for Erd"os-Renyi random graphs [Phys. Rev. Lett. 124, 050501 (2020)]. Here, we also extend and expand upon our findings therein. Namely, we provide an intuitive understanding of the state-of-the-art random matrix theory tools used to derive our results. In particular, for our analysis we require information about macroscopic, mesoscopic and microscopic statistics of eigenvalues of random matrices which we highlight here. Furthermore, we provide numerical simulations that corroborate our analytical findings and extend this notion of mixing from simple graphs to any ergodic, reversible, Markov chain. Full Article
4 A Fast and Accurate Algorithm for Spherical Harmonic Analysis on HEALPix Grids with Applications to the Cosmic Microwave Background Radiation. (arXiv:1904.10514v4 [math.NA] UPDATED) By arxiv.org Published On :: The Hierarchical Equal Area isoLatitude Pixelation (HEALPix) scheme is used extensively in astrophysics for data collection and analysis on the sphere. The scheme was originally designed for studying the Cosmic Microwave Background (CMB) radiation, which represents the first light to travel during the early stages of the universe's development and gives the strongest evidence for the Big Bang theory to date. Refined analysis of the CMB angular power spectrum can lead to revolutionary developments in understanding the nature of dark matter and dark energy. In this paper, we present a new method for performing spherical harmonic analysis for HEALPix data, which is a central component to computing and analyzing the angular power spectrum of the massive CMB data sets. The method uses a novel combination of a non-uniform fast Fourier transform, the double Fourier sphere method, and Slevinsky's fast spherical harmonic transform (Slevinsky, 2019). For a HEALPix grid with $N$ pixels (points), the computational complexity of the method is $mathcal{O}(Nlog^2 N)$, with an initial set-up cost of $mathcal{O}(N^{3/2}log N)$. This compares favorably with $mathcal{O}(N^{3/2})$ runtime complexity of the current methods available in the HEALPix software when multiple maps need to be analyzed at the same time. Using numerical experiments, we demonstrate that the new method also appears to provide better accuracy over the entire angular power spectrum of synthetic data when compared to the current methods, with a convergence rate at least two times higher. Full Article
4 Constrained Restless Bandits for Dynamic Scheduling in Cyber-Physical Systems. (arXiv:1904.08962v3 [cs.SY] UPDATED) By arxiv.org Published On :: Restless multi-armed bandits are a class of discrete-time stochastic control problems which involve sequential decision making with a finite set of actions (set of arms). This paper studies a class of constrained restless multi-armed bandits (CRMAB). The constraints are in the form of time varying set of actions (set of available arms). This variation can be either stochastic or semi-deterministic. Given a set of arms, a fixed number of them can be chosen to be played in each decision interval. The play of each arm yields a state dependent reward. The current states of arms are partially observable through binary feedback signals from arms that are played. The current availability of arms is fully observable. The objective is to maximize long term cumulative reward. The uncertainty about future availability of arms along with partial state information makes this objective challenging. Applications for CRMAB abound in the domain of cyber-physical systems. This optimization problem is analyzed using Whittle's index policy. To this end, a constrained restless single-armed bandit is studied. It is shown to admit a threshold-type optimal policy, and is also indexable. An algorithm to compute Whittle's index is presented. Further, upper bounds on the value function are derived in order to estimate the degree of sub-optimality of various solutions. The simulation study compares the performance of Whittle's index, modified Whittle's index and myopic policies. Full Article
4 Keeping out the Masses: Understanding the Popularity and Implications of Internet Paywalls. (arXiv:1903.01406v4 [cs.CY] UPDATED) By arxiv.org Published On :: Funding the production of quality online content is a pressing problem for content producers. The most common funding method, online advertising, is rife with well-known performance and privacy harms, and an intractable subject-agent conflict: many users do not want to see advertisements, depriving the site of needed funding. Because of these negative aspects of advertisement-based funding, paywalls are an increasingly popular alternative for websites. This shift to a "pay-for-access" web is one that has potentially huge implications for the web and society. Instead of a system where information (nominally) flows freely, paywalls create a web where high quality information is available to fewer and fewer people, leaving the rest of the web users with less information, that might be also less accurate and of lower quality. Despite the potential significance of a move from an "advertising-but-open" web to a "paywalled" web, we find this issue understudied. This work addresses this gap in our understanding by measuring how widely paywalls have been adopted, what kinds of sites use paywalls, and the distribution of policies enforced by paywalls. A partial list of our findings include that (i) paywall use is accelerating (2x more paywalls every 6 months), (ii) paywall adoption differs by country (e.g. 18.75% in US, 12.69% in Australia), (iii) paywalls change how users interact with sites (e.g. higher bounce rates, less incoming links), (iv) the median cost of an annual paywall access is $108 per site, and (v) paywalls are in general trivial to circumvent. Finally, we present the design of a novel, automated system for detecting whether a site uses a paywall, through the combination of runtime browser instrumentation and repeated programmatic interactions with the site. We intend this classifier to augment future, longitudinal measurements of paywall use and behavior. Full Article
4 Asymptotic expansions of eigenvalues by both the Crouzeix-Raviart and enriched Crouzeix-Raviart elements. (arXiv:1902.09524v2 [math.NA] UPDATED) By arxiv.org Published On :: Asymptotic expansions are derived for eigenvalues produced by both the Crouzeix-Raviart element and the enriched Crouzeix--Raviart element. The expansions are optimal in the sense that extrapolation eigenvalues based on them admit a fourth order convergence provided that exact eigenfunctions are smooth enough. The major challenge in establishing the expansions comes from the fact that the canonical interpolation of both nonconforming elements lacks a crucial superclose property, and the nonconformity of both elements. The main idea is to employ the relation between the lowest-order mixed Raviart--Thomas element and the two nonconforming elements, and consequently make use of the superclose property of the canonical interpolation of the lowest-order mixed Raviart--Thomas element. To overcome the difficulty caused by the nonconformity, the commuting property of the canonical interpolation operators of both nonconforming elements is further used, which turns the consistency error problem into an interpolation error problem. Then, a series of new results are obtained to show the final expansions. Full Article
4 Machine learning topological phases in real space. (arXiv:1901.01963v4 [cond-mat.mes-hall] UPDATED) By arxiv.org Published On :: We develop a supervised machine learning algorithm that is able to learn topological phases for finite condensed matter systems from bulk data in real lattice space. The algorithm employs diagonalization in real space together with any supervised learning algorithm to learn topological phases through an eigenvector ensembling procedure. We combine our algorithm with decision trees and random forests to successfully recover topological phase diagrams of Su-Schrieffer-Heeger (SSH) models from bulk lattice data in real space and show how the Shannon information entropy of ensembles of lattice eigenvectors can be used to retrieve a signal detailing how topological information is distributed in the bulk. The discovery of Shannon information entropy signals associated with topological phase transitions from the analysis of data from several thousand SSH systems illustrates how model explainability in machine learning can advance the research of exotic quantum materials with properties that may power future technological applications such as qubit engineering for quantum computing. Full Article
4 Learning Direct Optimization for Scene Understanding. (arXiv:1812.07524v2 [cs.CV] UPDATED) By arxiv.org Published On :: We develop a Learning Direct Optimization (LiDO) method for the refinement of a latent variable model that describes input image x. Our goal is to explain a single image x with an interpretable 3D computer graphics model having scene graph latent variables z (such as object appearance, camera position). Given a current estimate of z we can render a prediction of the image g(z), which can be compared to the image x. The standard way to proceed is then to measure the error E(x, g(z)) between the two, and use an optimizer to minimize the error. However, it is unknown which error measure E would be most effective for simultaneously addressing issues such as misaligned objects, occlusions, textures, etc. In contrast, the LiDO approach trains a Prediction Network to predict an update directly to correct z, rather than minimizing the error with respect to z. Experiments show that our LiDO method converges rapidly as it does not need to perform a search on the error landscape, produces better solutions than error-based competitors, and is able to handle the mismatch between the data and the fitted scene model. We apply LiDO to a realistic synthetic dataset, and show that the method also transfers to work well with real images. Full Article
4 Performance of the smallest-variance-first rule in appointment sequencing. (arXiv:1812.01467v4 [math.PR] UPDATED) By arxiv.org Published On :: A classical problem in appointment scheduling, with applications in health care, concerns the determination of the patients' arrival times that minimize a cost function that is a weighted sum of mean waiting times and mean idle times. One aspect of this problem is the sequencing problem, which focuses on ordering the patients. We assess the performance of the smallest-variance-first (SVF) rule, which sequences patients in order of increasing variance of their service durations. While it was known that SVF is not always optimal, it has been widely observed that it performs well in practice and simulation. We provide a theoretical justification for this observation by proving, in various settings, quantitative worst-case bounds on the ratio between the cost incurred by the SVF rule and the minimum attainable cost. We also show that, in great generality, SVF is asymptotically optimal, i.e., the ratio approaches 1 as the number of patients grows large. While evaluating policies by considering an approximation ratio is a standard approach in many algorithmic settings, our results appear to be the first of this type in the appointment scheduling literature. Full Article
4 An improved exact algorithm and an NP-completeness proof for sparse matrix bipartitioning. (arXiv:1811.02043v2 [cs.DS] UPDATED) By arxiv.org Published On :: We investigate sparse matrix bipartitioning -- a problem where we minimize the communication volume in parallel sparse matrix-vector multiplication. We prove, by reduction from graph bisection, that this problem is $mathcal{NP}$-complete in the case where each side of the bipartitioning must contain a linear fraction of the nonzeros. We present an improved exact branch-and-bound algorithm which finds the minimum communication volume for a given matrix and maximum allowed imbalance. The algorithm is based on a maximum-flow bound and a packing bound, which extend previous matching and packing bounds. We implemented the algorithm in a new program called MP (Matrix Partitioner), which solved 839 matrices from the SuiteSparse collection to optimality, each within 24 hours of CPU-time. Furthermore, MP solved the difficult problem of the matrix cage6 in about 3 days. The new program is on average more than ten times faster than the previous program MondriaanOpt. Benchmark results using the set of 839 optimally solved matrices show that combining the medium-grain/iterative refinement methods of the Mondriaan package with the hypergraph bipartitioner of the PaToH package produces sparse matrix bipartitionings on average within 10% of the optimal solution. Full Article
4 SilhoNet: An RGB Method for 6D Object Pose Estimation. (arXiv:1809.06893v4 [cs.CV] UPDATED) By arxiv.org Published On :: Autonomous robot manipulation involves estimating the translation and orientation of the object to be manipulated as a 6-degree-of-freedom (6D) pose. Methods using RGB-D data have shown great success in solving this problem. However, there are situations where cost constraints or the working environment may limit the use of RGB-D sensors. When limited to monocular camera data only, the problem of object pose estimation is very challenging. In this work, we introduce a novel method called SilhoNet that predicts 6D object pose from monocular images. We use a Convolutional Neural Network (CNN) pipeline that takes in Region of Interest (ROI) proposals to simultaneously predict an intermediate silhouette representation for objects with an associated occlusion mask and a 3D translation vector. The 3D orientation is then regressed from the predicted silhouettes. We show that our method achieves better overall performance on the YCB-Video dataset than two state-of-the art networks for 6D pose estimation from monocular image input. Full Article
4 Identifying Compromised Accounts on Social Media Using Statistical Text Analysis. (arXiv:1804.07247v3 [cs.SI] UPDATED) By arxiv.org Published On :: Compromised accounts on social networks are regular user accounts that have been taken over by an entity with malicious intent. Since the adversary exploits the already established trust of a compromised account, it is crucial to detect these accounts to limit the damage they can cause. We propose a novel general framework for discovering compromised accounts by semantic analysis of text messages coming out from an account. Our framework is built on the observation that normal users will use language that is measurably different from the language that an adversary would use when the account is compromised. We use our framework to develop specific algorithms that use the difference of language models of users and adversaries as features in a supervised learning setup. Evaluation results show that the proposed framework is effective for discovering compromised accounts on social networks and a KL-divergence-based language model feature works best. Full Article
4 Compression, inversion, and approximate PCA of dense kernel matrices at near-linear computational complexity. (arXiv:1706.02205v4 [math.NA] UPDATED) By arxiv.org Published On :: Dense kernel matrices $Theta in mathbb{R}^{N imes N}$ obtained from point evaluations of a covariance function $G$ at locations ${ x_{i} }_{1 leq i leq N} subset mathbb{R}^{d}$ arise in statistics, machine learning, and numerical analysis. For covariance functions that are Green's functions of elliptic boundary value problems and homogeneously-distributed sampling points, we show how to identify a subset $S subset { 1 , dots , N }^2$, with $# S = O ( N log (N) log^{d} ( N /epsilon ) )$, such that the zero fill-in incomplete Cholesky factorisation of the sparse matrix $Theta_{ij} 1_{( i, j ) in S}$ is an $epsilon$-approximation of $Theta$. This factorisation can provably be obtained in complexity $O ( N log( N ) log^{d}( N /epsilon) )$ in space and $O ( N log^{2}( N ) log^{2d}( N /epsilon) )$ in time, improving upon the state of the art for general elliptic operators; we further present numerical evidence that $d$ can be taken to be the intrinsic dimension of the data set rather than that of the ambient space. The algorithm only needs to know the spatial configuration of the $x_{i}$ and does not require an analytic representation of $G$. Furthermore, this factorization straightforwardly provides an approximate sparse PCA with optimal rate of convergence in the operator norm. Hence, by using only subsampling and the incomplete Cholesky factorization, we obtain, at nearly linear complexity, the compression, inversion and approximate PCA of a large class of covariance matrices. By inverting the order of the Cholesky factorization we also obtain a solver for elliptic PDE with complexity $O ( N log^{d}( N /epsilon) )$ in space and $O ( N log^{2d}( N /epsilon) )$ in time, improving upon the state of the art for general elliptic operators. Full Article
4 Defending Hardware-based Malware Detectors against Adversarial Attacks. (arXiv:2005.03644v1 [cs.CR]) By arxiv.org Published On :: In the era of Internet of Things (IoT), Malware has been proliferating exponentially over the past decade. Traditional anti-virus software are ineffective against modern complex Malware. In order to address this challenge, researchers have proposed Hardware-assisted Malware Detection (HMD) using Hardware Performance Counters (HPCs). The HPCs are used to train a set of Machine learning (ML) classifiers, which in turn, are used to distinguish benign programs from Malware. Recently, adversarial attacks have been designed by introducing perturbations in the HPC traces using an adversarial sample predictor to misclassify a program for specific HPCs. These attacks are designed with the basic assumption that the attacker is aware of the HPCs being used to detect Malware. Since modern processors consist of hundreds of HPCs, restricting to only a few of them for Malware detection aids the attacker. In this paper, we propose a Moving target defense (MTD) for this adversarial attack by designing multiple ML classifiers trained on different sets of HPCs. The MTD randomly selects a classifier; thus, confusing the attacker about the HPCs or the number of classifiers applied. We have developed an analytical model which proves that the probability of an attacker to guess the perfect HPC-classifier combination for MTD is extremely low (in the range of $10^{-1864}$ for a system with 20 HPCs). Our experimental results prove that the proposed defense is able to improve the classification accuracy of HPC traces that have been modified through an adversarial sample generator by up to 31.5%, for a near perfect (99.4%) restoration of the original accuracy. Full Article
4 On Exposure Bias, Hallucination and Domain Shift in Neural Machine Translation. (arXiv:2005.03642v1 [cs.CL]) By arxiv.org Published On :: The standard training algorithm in neural machine translation (NMT) suffers from exposure bias, and alternative algorithms have been proposed to mitigate this. However, the practical impact of exposure bias is under debate. In this paper, we link exposure bias to another well-known problem in NMT, namely the tendency to generate hallucinations under domain shift. In experiments on three datasets with multiple test domains, we show that exposure bias is partially to blame for hallucinations, and that training with Minimum Risk Training, which avoids exposure bias, can mitigate this. Our analysis explains why exposure bias is more problematic under domain shift, and also links exposure bias to the beam search problem, i.e. performance deterioration with increasing beam size. Our results provide a new justification for methods that reduce exposure bias: even if they do not increase performance on in-domain test sets, they can increase model robustness to domain shift. Full Article
4 Where is Linked Data in Question Answering over Linked Data?. (arXiv:2005.03640v1 [cs.CL]) By arxiv.org Published On :: We argue that "Question Answering with Knowledge Base" and "Question Answering over Linked Data" are currently two instances of the same problem, despite one explicitly declares to deal with Linked Data. We point out the lack of existing methods to evaluate question answering on datasets which exploit external links to the rest of the cloud or share common schema. To this end, we propose the creation of new evaluation settings to leverage the advantages of the Semantic Web to achieve AI-complete question answering. Full Article
4 Learning Robust Models for e-Commerce Product Search. (arXiv:2005.03624v1 [cs.CL]) By arxiv.org Published On :: Showing items that do not match search query intent degrades customer experience in e-commerce. These mismatches result from counterfactual biases of the ranking algorithms toward noisy behavioral signals such as clicks and purchases in the search logs. Mitigating the problem requires a large labeled dataset, which is expensive and time-consuming to obtain. In this paper, we develop a deep, end-to-end model that learns to effectively classify mismatches and to generate hard mismatched examples to improve the classifier. We train the model end-to-end by introducing a latent variable into the cross-entropy loss that alternates between using the real and generated samples. This not only makes the classifier more robust but also boosts the overall ranking performance. Our model achieves a relative gain compared to baselines by over 26% in F-score, and over 17% in Area Under PR curve. On live search traffic, our model gains significant improvement in multiple countries. Full Article
4 Simulating Population Protocols in Sub-Constant Time per Interaction. (arXiv:2005.03584v1 [cs.DS]) By arxiv.org Published On :: We consider the problem of efficiently simulating population protocols. In the population model, we are given a distributed system of $n$ agents modeled as identical finite-state machines. In each time step, a pair of agents is selected uniformly at random to interact. In an interaction, agents update their states according to a common transition function. We empirically and analytically analyze two classes of simulators for this model. First, we consider sequential simulators executing one interaction after the other. Key to the performance of these simulators is the data structure storing the agents' states. For our analysis, we consider plain arrays, binary search trees, and a novel Dynamic Alias Table data structure. Secondly, we consider batch processing to efficiently update the states of multiple independent agents in one step. For many protocols considered in literature, our simulator requires amortized sub-constant time per interaction and is fast in practice: given a fixed time budget, the implementation of our batched simulator is able to simulate population protocols several orders of magnitude larger compared to the sequential competitors, and can carry out $2^{50}$ interactions among the same number of agents in less than 400s. Full Article
4 A Reduced Basis Method For Fractional Diffusion Operators II. (arXiv:2005.03574v1 [math.NA]) By arxiv.org Published On :: We present a novel numerical scheme to approximate the solution map $smapsto u(s) := mathcal{L}^{-s}f$ to partial differential equations involving fractional elliptic operators. Reinterpreting $mathcal{L}^{-s}$ as interpolation operator allows us to derive an integral representation of $u(s)$ which includes solutions to parametrized reaction-diffusion problems. We propose a reduced basis strategy on top of a finite element method to approximate its integrand. Unlike prior works, we deduce the choice of snapshots for the reduced basis procedure analytically. Avoiding further discretization, the integral is interpreted in a spectral setting to evaluate the surrogate directly. Its computation boils down to a matrix approximation $L$ of the operator whose inverse is projected to a low-dimensional space, where explicit diagonalization is feasible. The universal character of the underlying $s$-independent reduced space allows the approximation of $(u(s))_{sin(0,1)}$ in its entirety. We prove exponential convergence rates and confirm the analysis with a variety of numerical examples. Further improvements are proposed in the second part of this investigation to avoid inversion of $L$. Instead, we directly project the matrix to the reduced space, where its negative fractional power is evaluated. A numerical comparison with the predecessor highlights its competitive performance. Full Article
4 QuickSync: A Quickly Synchronizing PoS-Based Blockchain Protocol. (arXiv:2005.03564v1 [cs.CR]) By arxiv.org Published On :: To implement a blockchain, we need a blockchain protocol for all the nodes to follow. To design a blockchain protocol, we need a block publisher selection mechanism and a chain selection rule. In Proof-of-Stake (PoS) based blockchain protocols, block publisher selection mechanism selects the node to publish the next block based on the relative stake held by the node. However, PoS protocols may face vulnerability to fully adaptive corruptions. In literature, researchers address this issue at the cost of performance. In this paper, we propose a novel PoS-based blockchain protocol, QuickSync, to achieve security against fully adaptive corruptions without compromising on performance. We propose a metric called block power, a value defined for each block, derived from the output of the verifiable random function based on the digital signature of the block publisher. With this metric, we compute chain power, the sum of block powers of all the blocks comprising the chain, for all the valid chains. These metrics are a function of the block publisher's stake to enable the PoS aspect of the protocol. The chain selection rule selects the chain with the highest chain power as the one to extend. This chain selection rule hence determines the selected block publisher of the previous block. When we use metrics to define the chain selection rule, it may lead to vulnerabilities against Sybil attacks. QuickSync uses a Sybil attack resistant function implemented using histogram matching. We prove that QuickSync satisfies common prefix, chain growth, and chain quality properties and hence it is secure. We also show that it is resilient to different types of adversarial attack strategies. Our analysis demonstrates that QuickSync performs better than Bitcoin by an order of magnitude on both transactions per second and time to finality, and better than Ouroboros v1 by a factor of three on time to finality. Full Article