3

SPECTER: Document-level Representation Learning using Citation-informed Transformers. (arXiv:2004.07180v3 [cs.CL] UPDATED)

Representation learning is a critical ingredient for natural language processing systems. Recent Transformer language models like BERT learn powerful textual representations, but these models are targeted towards token- and sentence-level training objectives and do not leverage information on inter-document relatedness, which limits their document-level representation power. For applications on scientific documents, such as classification and recommendation, the embeddings power strong performance on end tasks. We propose SPECTER, a new method to generate document-level embedding of scientific documents based on pretraining a Transformer language model on a powerful signal of document-level relatedness: the citation graph. Unlike existing pretrained language models, SPECTER can be easily applied to downstream applications without task-specific fine-tuning. Additionally, to encourage further research on document-level models, we introduce SciDocs, a new evaluation benchmark consisting of seven document-level tasks ranging from citation prediction, to document classification and recommendation. We show that SPECTER outperforms a variety of competitive baselines on the benchmark.




3

The growth rate over trees of any family of set defined by a monadic second order formula is semi-computable. (arXiv:2004.06508v3 [cs.DM] UPDATED)

Monadic second order logic can be used to express many classical notions of sets of vertices of a graph as for instance: dominating sets, induced matchings, perfect codes, independent sets or irredundant sets. Bounds on the number of sets of any such family of sets are interesting from a combinatorial point of view and have algorithmic applications. Many such bounds on different families of sets over different classes of graphs are already provided in the literature. In particular, Rote recently showed that the number of minimal dominating sets in trees of order $n$ is at most $95^{frac{n}{13}}$ and that this bound is asymptotically sharp up to a multiplicative constant. We build on his work to show that what he did for minimal dominating sets can be done for any family of sets definable by a monadic second order formula.

We first show that, for any monadic second order formula over graphs that characterizes a given kind of subset of its vertices, the maximal number of such sets in a tree can be expressed as the extit{growth rate of a bilinear system}. This mostly relies on well known links between monadic second order logic over trees and tree automata and basic tree automata manipulations. Then we show that this "growth rate" of a bilinear system can be approximated from above.We then use our implementation of this result to provide bounds on the number of independent dominating sets, total perfect dominating sets, induced matchings, maximal induced matchings, minimal perfect dominating sets, perfect codes and maximal irredundant sets on trees. We also solve a question from D. Y. Kang et al. regarding $r$-matchings and improve a bound from G'orska and Skupie'n on the number of maximal matchings on trees. Remark that this approach is easily generalizable to graphs of bounded tree width or clique width (or any similar class of graphs where tree automata are meaningful).




3

Transfer Learning for EEG-Based Brain-Computer Interfaces: A Review of Progress Made Since 2016. (arXiv:2004.06286v3 [cs.HC] UPDATED)

A brain-computer interface (BCI) enables a user to communicate with a computer directly using brain signals. Electroencephalograms (EEGs) used in BCIs are weak, easily contaminated by interference and noise, non-stationary for the same subject, and varying across different subjects and sessions. Therefore, it is difficult to build a generic pattern recognition model in an EEG-based BCI system that is optimal for different subjects, during different sessions, for different devices and tasks. Usually, a calibration session is needed to collect some training data for a new subject, which is time consuming and user unfriendly. Transfer learning (TL), which utilizes data or knowledge from similar or relevant subjects/sessions/devices/tasks to facilitate learning for a new subject/session/device/task, is frequently used to reduce the amount of calibration effort. This paper reviews journal publications on TL approaches in EEG-based BCIs in the last few years, i.e., since 2016. Six paradigms and applications -- motor imagery, event-related potentials, steady-state visual evoked potentials, affective BCIs, regression problems, and adversarial attacks -- are considered. For each paradigm/application, we group the TL approaches into cross-subject/session, cross-device, and cross-task settings and review them separately. Observations and conclusions are made at the end of the paper, which may point to future research directions.




3

PACT: Privacy Sensitive Protocols and Mechanisms for Mobile Contact Tracing. (arXiv:2004.03544v4 [cs.CR] UPDATED)

The global health threat from COVID-19 has been controlled in a number of instances by large-scale testing and contact tracing efforts. We created this document to suggest three functionalities on how we might best harness computing technologies to supporting the goals of public health organizations in minimizing morbidity and mortality associated with the spread of COVID-19, while protecting the civil liberties of individuals. In particular, this work advocates for a third-party free approach to assisted mobile contact tracing, because such an approach mitigates the security and privacy risks of requiring a trusted third party. We also explicitly consider the inferential risks involved in any contract tracing system, where any alert to a user could itself give rise to de-anonymizing information.

More generally, we hope to participate in bringing together colleagues in industry, academia, and civil society to discuss and converge on ideas around a critical issue rising with attempts to mitigate the COVID-19 pandemic.




3

Subgraph densities in a surface. (arXiv:2003.13777v2 [math.CO] UPDATED)

Given a fixed graph $H$ that embeds in a surface $Sigma$, what is the maximum number of copies of $H$ in an $n$-vertex graph $G$ that embeds in $Sigma$? We show that the answer is $Theta(n^{f(H)})$, where $f(H)$ is a graph invariant called the `flap-number' of $H$, which is independent of $Sigma$. This simultaneously answers two open problems posed by Eppstein (1993). When $H$ is a complete graph we give more precise answers.




3

Human Motion Transfer with 3D Constraints and Detail Enhancement. (arXiv:2003.13510v2 [cs.GR] UPDATED)

We propose a new method for realistic human motion transfer using a generative adversarial network (GAN), which generates a motion video of a target character imitating actions of a source character, while maintaining high authenticity of the generated results. We tackle the problem by decoupling and recombining the posture information and appearance information of both the source and target characters. The innovation of our approach lies in the use of the projection of a reconstructed 3D human model as the condition of GAN to better maintain the structural integrity of transfer results in different poses. We further introduce a detail enhancement net to enhance the details of transfer results by exploiting the details in real source frames. Extensive experiments show that our approach yields better results both qualitatively and quantitatively than the state-of-the-art methods.




3

Mathematical Formulae in Wikimedia Projects 2020. (arXiv:2003.09417v2 [cs.DL] UPDATED)

This poster summarizes our contributions to Wikimedia's processing pipeline for mathematical formulae. We describe how we have supported the transition from rendering formulae as course-grained PNG images in 2001 to providing modern semantically enriched language-independent MathML formulae in 2020. Additionally, we describe our plans to improve the accessibility and discoverability of mathematical knowledge in Wikimedia projects further.




3

Watching the World Go By: Representation Learning from Unlabeled Videos. (arXiv:2003.07990v2 [cs.CV] UPDATED)

Recent single image unsupervised representation learning techniques show remarkable success on a variety of tasks. The basic principle in these works is instance discrimination: learning to differentiate between two augmented versions of the same image and a large batch of unrelated images. Networks learn to ignore the augmentation noise and extract semantically meaningful representations. Prior work uses artificial data augmentation techniques such as cropping, and color jitter which can only affect the image in superficial ways and are not aligned with how objects actually change e.g. occlusion, deformation, viewpoint change. In this paper, we argue that videos offer this natural augmentation for free. Videos can provide entirely new views of objects, show deformation, and even connect semantically similar but visually distinct concepts. We propose Video Noise Contrastive Estimation, a method for using unlabeled video to learn strong, transferable single image representations. We demonstrate improvements over recent unsupervised single image techniques, as well as over fully supervised ImageNet pretraining, across a variety of temporal and non-temporal tasks. Code and the Random Related Video Views dataset are available at https://www.github.com/danielgordon10/vince




3

Hierarchical Neural Architecture Search for Single Image Super-Resolution. (arXiv:2003.04619v2 [cs.CV] UPDATED)

Deep neural networks have exhibited promising performance in image super-resolution (SR). Most SR models follow a hierarchical architecture that contains both the cell-level design of computational blocks and the network-level design of the positions of upsampling blocks. However, designing SR models heavily relies on human expertise and is very labor-intensive. More critically, these SR models often contain a huge number of parameters and may not meet the requirements of computation resources in real-world applications. To address the above issues, we propose a Hierarchical Neural Architecture Search (HNAS) method to automatically design promising architectures with different requirements of computation cost. To this end, we design a hierarchical SR search space and propose a hierarchical controller for architecture search. Such a hierarchical controller is able to simultaneously find promising cell-level blocks and network-level positions of upsampling layers. Moreover, to design compact architectures with promising performance, we build a joint reward by considering both the performance and computation cost to guide the search process. Extensive experiments on five benchmark datasets demonstrate the superiority of our method over existing methods.




3

Testing Scenario Library Generation for Connected and Automated Vehicles: An Adaptive Framework. (arXiv:2003.03712v2 [eess.SY] UPDATED)

How to generate testing scenario libraries for connected and automated vehicles (CAVs) is a major challenge faced by the industry. In previous studies, to evaluate maneuver challenge of a scenario, surrogate models (SMs) are often used without explicit knowledge of the CAV under test. However, performance dissimilarities between the SM and the CAV under test usually exist, and it can lead to the generation of suboptimal scenario libraries. In this paper, an adaptive testing scenario library generation (ATSLG) method is proposed to solve this problem. A customized testing scenario library for a specific CAV model is generated through an adaptive process. To compensate the performance dissimilarities and leverage each test of the CAV, Bayesian optimization techniques are applied with classification-based Gaussian Process Regression and a new-designed acquisition function. Comparing with a pre-determined library, a CAV can be tested and evaluated in a more efficient manner with the customized library. To validate the proposed method, a cut-in case study was performed and the results demonstrate that the proposed method can further accelerate the evaluation process by a few orders of magnitude.




3

Recursed is not Recursive: A Jarring Result. (arXiv:2002.05131v2 [cs.AI] UPDATED)

Recursed is a 2D puzzle platform video game featuring treasure chests that, when jumped into, instantiate a room that can later be exited (similar to function calls), optionally generating a jar that returns back to that room (similar to continuations). We prove that Recursed is RE-complete and thus undecidable (not recursive) by a reduction from the Post Correspondence Problem. Our reduction is "practical": the reduction from PCP results in fully playable levels that abide by all constraints governing levels (including the 15x20 room size) designed for the main game. Our reduction is also "efficient": a Turing machine can be simulated by a Recursed level whose size is linear in the encoding size of the Turing machine and whose solution length is polynomial in the running time of the Turing machine.




3

A memory of motion for visual predictive control tasks. (arXiv:2001.11759v3 [cs.RO] UPDATED)

This paper addresses the problem of efficiently achieving visual predictive control tasks. To this end, a memory of motion, containing a set of trajectories built off-line, is used for leveraging precomputation and dealing with difficult visual tasks. Standard regression techniques, such as k-nearest neighbors and Gaussian process regression, are used to query the memory and provide on-line a warm-start and a way point to the control optimization process. The proposed technique allows the control scheme to achieve high performance and, at the same time, keep the computational time limited. Simulation and experimental results, carried out with a 7-axis manipulator, show the effectiveness of the approach.




3

Continuous speech separation: dataset and analysis. (arXiv:2001.11482v3 [cs.SD] UPDATED)

This paper describes a dataset and protocols for evaluating continuous speech separation algorithms. Most prior studies on speech separation use pre-segmented signals of artificially mixed speech utterances which are mostly emph{fully} overlapped, and the algorithms are evaluated based on signal-to-distortion ratio or similar performance metrics. However, in natural conversations, a speech signal is continuous, containing both overlapped and overlap-free components. In addition, the signal-based metrics have very weak correlations with automatic speech recognition (ASR) accuracy. We think that not only does this make it hard to assess the practical relevance of the tested algorithms, it also hinders researchers from developing systems that can be readily applied to real scenarios. In this paper, we define continuous speech separation (CSS) as a task of generating a set of non-overlapped speech signals from a extit{continuous} audio stream that contains multiple utterances that are emph{partially} overlapped by a varying degree. A new real recorded dataset, called LibriCSS, is derived from LibriSpeech by concatenating the corpus utterances to simulate a conversation and capturing the audio replays with far-field microphones. A Kaldi-based ASR evaluation protocol is also established by using a well-trained multi-conditional acoustic model. By using this dataset, several aspects of a recently proposed speaker-independent CSS algorithm are investigated. The dataset and evaluation scripts are available to facilitate the research in this direction.




3

Evolutionary Dynamics of Higher-Order Interactions. (arXiv:2001.10313v2 [physics.soc-ph] UPDATED)

We live and cooperate in networks. However, links in networks only allow for pairwise interactions, thus making the framework suitable for dyadic games, but not for games that are played in groups of more than two players. To remedy this, we introduce higher-order interactions, where a link can connect more than two individuals, and study their evolutionary dynamics. We first consider a public goods game on a uniform hypergraph, showing that it corresponds to the replicator dynamics in the well-mixed limit, and providing an exact theoretical foundation to study cooperation in networked groups. We also extend the analysis to heterogeneous hypergraphs that describe interactions of groups of different sizes and characterize the evolution of cooperation in such cases. Finally, we apply our new formulation to study the nature of group dynamics in real systems, showing how to extract the actual dependence of the synergy factor on the size of a group from real-world collaboration data in science and technology. Our work is a first step towards the implementation of new actions to boost cooperation in social groups.




3

Hardware Implementation of Neural Self-Interference Cancellation. (arXiv:2001.04543v2 [eess.SP] UPDATED)

In-band full-duplex systems can transmit and receive information simultaneously on the same frequency band. However, due to the strong self-interference caused by the transmitter to its own receiver, the use of non-linear digital self-interference cancellation is essential. In this work, we describe a hardware architecture for a neural network-based non-linear self-interference (SI) canceller and we compare it with our own hardware implementation of a conventional polynomial based SI canceller. In particular, we present implementation results for a shallow and a deep neural network SI canceller as well as for a polynomial SI canceller. Our results show that the deep neural network canceller achieves a hardware efficiency of up to $312.8$ Msamples/s/mm$^2$ and an energy efficiency of up to $0.9$ nJ/sample, which is $2.1 imes$ and $2 imes$ better than the polynomial SI canceller, respectively. These results show that NN-based methods applied to communications are not only useful from a performance perspective, but can also be a very effective means to reduce the implementation complexity.




3

Games Where You Can Play Optimally with Arena-Independent Finite Memory. (arXiv:2001.03894v2 [cs.GT] UPDATED)

For decades, two-player (antagonistic) games on graphs have been a framework of choice for many important problems in theoretical computer science. A notorious one is controller synthesis, which can be rephrased through the game-theoretic metaphor as the quest for a winning strategy of the system in a game against its antagonistic environment. Depending on the specification, optimal strategies might be simple or quite complex, for example having to use (possibly infinite) memory. Hence, research strives to understand which settings allow for simple strategies.

In 2005, Gimbert and Zielonka provided a complete characterization of preference relations (a formal framework to model specifications and game objectives) that admit memoryless optimal strategies for both players. In the last fifteen years however, practical applications have driven the community toward games with complex or multiple objectives, where memory -- finite or infinite -- is almost always required. Despite much effort, the exact frontiers of the class of preference relations that admit finite-memory optimal strategies still elude us.

In this work, we establish a complete characterization of preference relations that admit optimal strategies using arena-independent finite memory, generalizing the work of Gimbert and Zielonka to the finite-memory case. We also prove an equivalent to their celebrated corollary of great practical interest: if both players have optimal (arena-independent-)finite-memory strategies in all one-player games, then it is also the case in all two-player games. Finally, we pinpoint the boundaries of our results with regard to the literature: our work completely covers the case of arena-independent memory (e.g., multiple parity objectives, lower- and upper-bounded energy objectives), and paves the way to the arena-dependent case (e.g., multiple lower-bounded energy objectives).




3

Safe non-smooth black-box optimization with application to policy search. (arXiv:1912.09466v3 [math.OC] UPDATED)

For safety-critical black-box optimization tasks, observations of the constraints and the objective are often noisy and available only for the feasible points. We propose an approach based on log barriers to find a local solution of a non-convex non-smooth black-box optimization problem $min f^0(x)$ subject to $f^i(x)leq 0,~ i = 1,ldots, m$, at the same time, guaranteeing constraint satisfaction while learning an optimal solution with high probability. Our proposed algorithm exploits noisy observations to iteratively improve on an initial safe point until convergence. We derive the convergence rate and prove safety of our algorithm. We demonstrate its performance in an application to an iterative control design problem.




3

IPG-Net: Image Pyramid Guidance Network for Small Object Detection. (arXiv:1912.00632v3 [cs.CV] UPDATED)

For Convolutional Neural Network-based object detection, there is a typical dilemma: the spatial information is well kept in the shallow layers which unfortunately do not have enough semantic information, while the deep layers have a high semantic concept but lost a lot of spatial information, resulting in serious information imbalance. To acquire enough semantic information for shallow layers, Feature Pyramid Networks (FPN) is used to build a top-down propagated path. In this paper, except for top-down combining of information for shallow layers, we propose a novel network called Image Pyramid Guidance Network (IPG-Net) to make sure both the spatial information and semantic information are abundant for each layer. Our IPG-Net has two main parts: the image pyramid guidance transformation module and the image pyramid guidance fusion module. Our main idea is to introduce the image pyramid guidance into the backbone stream to solve the information imbalance problem, which alleviates the vanishment of the small object features. This IPG transformation module promises even in the deepest stage of the backbone, there is enough spatial information for bounding box regression and classification. Furthermore, we designed an effective fusion module to fuse the features from the image pyramid and features from the backbone stream. We have tried to apply this novel network to both one-stage and two-stage detection models, state of the art results are obtained on the most popular benchmark data sets, i.e. MS COCO and Pascal VOC.




3

Robustly Clustering a Mixture of Gaussians. (arXiv:1911.11838v5 [cs.DS] UPDATED)

We give an efficient algorithm for robustly clustering of a mixture of two arbitrary Gaussians, a central open problem in the theory of computationally efficient robust estimation, assuming only that the the means of the component Gaussians are well-separated or their covariances are well-separated. Our algorithm and analysis extend naturally to robustly clustering mixtures of well-separated strongly logconcave distributions. The mean separation required is close to the smallest possible to guarantee that most of the measure of each component can be separated by some hyperplane (for covariances, it is the same condition in the second degree polynomial kernel). We also show that for Gaussian mixtures, separation in total variation distance suffices to achieve robust clustering. Our main tools are a new identifiability criterion based on isotropic position and the Fisher discriminant, and a corresponding Sum-of-Squares convex programming relaxation, of fixed degree.




3

t-SS3: a text classifier with dynamic n-grams for early risk detection over text streams. (arXiv:1911.06147v2 [cs.CL] UPDATED)

A recently introduced classifier, called SS3, has shown to be well suited to deal with early risk detection (ERD) problems on text streams. It obtained state-of-the-art performance on early depression and anorexia detection on Reddit in the CLEF's eRisk open tasks. SS3 was created to deal with ERD problems naturally since: it supports incremental training and classification over text streams, and it can visually explain its rationale. However, SS3 processes the input using a bag-of-word model lacking the ability to recognize important word sequences. This aspect could negatively affect the classification performance and also reduces the descriptiveness of visual explanations. In the standard document classification field, it is very common to use word n-grams to try to overcome some of these limitations. Unfortunately, when working with text streams, using n-grams is not trivial since the system must learn and recognize which n-grams are important "on the fly". This paper introduces t-SS3, an extension of SS3 that allows it to recognize useful patterns over text streams dynamically. We evaluated our model in the eRisk 2017 and 2018 tasks on early depression and anorexia detection. Experimental results suggest that t-SS3 is able to improve both current results and the richness of visual explanations.




3

Unsupervised Domain Adaptation on Reading Comprehension. (arXiv:1911.06137v4 [cs.CL] UPDATED)

Reading comprehension (RC) has been studied in a variety of datasets with the boosted performance brought by deep neural networks. However, the generalization capability of these models across different domains remains unclear. To alleviate this issue, we are going to investigate unsupervised domain adaptation on RC, wherein a model is trained on labeled source domain and to be applied to the target domain with only unlabeled samples. We first show that even with the powerful BERT contextual representation, the performance is still unsatisfactory when the model trained on one dataset is directly applied to another target dataset. To solve this, we provide a novel conditional adversarial self-training method (CASe). Specifically, our approach leverages a BERT model fine-tuned on the source dataset along with the confidence filtering to generate reliable pseudo-labeled samples in the target domain for self-training. On the other hand, it further reduces domain distribution discrepancy through conditional adversarial learning across domains. Extensive experiments show our approach achieves comparable accuracy to supervised models on multiple large-scale benchmark datasets.




3

Revisiting Semantics of Interactions for Trace Validity Analysis. (arXiv:1911.03094v2 [cs.SE] UPDATED)

Interaction languages such as MSC are often associated with formal semantics by means of translations into distinct behavioral formalisms such as automatas or Petri nets. In contrast to translational approaches we propose an operational approach. Its principle is to identify which elementary communication actions can be immediately executed, and then to compute, for every such action, a new interaction representing the possible continuations to its execution. We also define an algorithm for checking the validity of execution traces (i.e. whether or not they belong to an interaction's semantics). Algorithms for semantic computation and trace validity are analyzed by means of experiments.




3

Digital Twin: Enabling Technologies, Challenges and Open Research. (arXiv:1911.01276v3 [cs.CY] UPDATED)

Digital Twin technology is an emerging concept that has become the centre of attention for industry and, in more recent years, academia. The advancements in industry 4.0 concepts have facilitated its growth, particularly in the manufacturing industry. The Digital Twin is defined extensively but is best described as the effortless integration of data between a physical and virtual machine in either direction. The challenges, applications, and enabling technologies for Artificial Intelligence, Internet of Things (IoT) and Digital Twins are presented. A review of publications relating to Digital Twins is performed, producing a categorical review of recent papers. The review has categorised them by research areas: manufacturing, healthcare and smart cities, discussing a range of papers that reflect these areas and the current state of research. The paper provides an assessment of the enabling technologies, challenges and open research for Digital Twins.




3

Imitation Learning for Human-robot Cooperation Using Bilateral Control. (arXiv:1909.13018v2 [cs.RO] UPDATED)

Robots are required to operate autonomously in response to changing situations. Previously, imitation learning using 4ch-bilateral control was demonstrated to be suitable for imitation of object manipulation. However, cooperative work between humans and robots has not yet been verified in these studies. In this study, the task was expanded by cooperative work between a human and a robot. 4ch-bilateral control was used to collect training data for training robot motion. We focused on serving salad as a task in the home. The task was executed with a spoon and a fork fixed to robots. Adjustment of force was indispensable in manipulating indefinitely shaped objects such as salad. Results confirmed the effectiveness of the proposed method as demonstrated by the success of the task.




3

Over-the-Air Computation Systems: Optimization, Analysis and Scaling Laws. (arXiv:1909.00329v2 [cs.IT] UPDATED)

For future Internet of Things (IoT)-based Big Data applications (e.g., smart cities/transportation), wireless data collection from ubiquitous massive smart sensors with limited spectrum bandwidth is very challenging. On the other hand, to interpret the meaning behind the collected data, it is also challenging for edge fusion centers running computing tasks over large data sets with limited computation capacity. To tackle these challenges, by exploiting the superposition property of a multiple-access channel and the functional decomposition properties, the recently proposed technique, over-the-air computation (AirComp), enables an effective joint data collection and computation from concurrent sensor transmissions. In this paper, we focus on a single-antenna AirComp system consisting of $K$ sensors and one receiver (i.e., the fusion center). We consider an optimization problem to minimize the computation mean-squared error (MSE) of the $K$ sensors' signals at the receiver by optimizing the transmitting-receiving (Tx-Rx) policy, under the peak power constraint of each sensor. Although the problem is not convex, we derive the computation-optimal policy in closed form. Also, we comprehensively investigate the ergodic performance of AirComp systems in terms of the average computation MSE and the average power consumption under Rayleigh fading channels with different Tx-Rx policies. For the computation-optimal policy, we prove that its average computation MSE has a decay rate of $O(1/sqrt{K})$, and our numerical results illustrate that the policy also has a vanishing average power consumption with the increasing $K$, which jointly show the computation effectiveness and the energy efficiency of the policy with a large number of sensors.




3

A Shift Selection Strategy for Parallel Shift-Invert Spectrum Slicing in Symmetric Self-Consistent Eigenvalue Computation. (arXiv:1908.06043v2 [math.NA] UPDATED)

The central importance of large scale eigenvalue problems in scientific computation necessitates the development of massively parallel algorithms for their solution. Recent advances in dense numerical linear algebra have enabled the routine treatment of eigenvalue problems with dimensions on the order of hundreds of thousands on the world's largest supercomputers. In cases where dense treatments are not feasible, Krylov subspace methods offer an attractive alternative due to the fact that they do not require storage of the problem matrices. However, demonstration of scalability of either of these classes of eigenvalue algorithms on computing architectures capable of expressing massive parallelism is non-trivial due to communication requirements and serial bottlenecks, respectively. In this work, we introduce the SISLICE method: a parallel shift-invert algorithm for the solution of the symmetric self-consistent field (SCF) eigenvalue problem. The SISLICE method drastically reduces the communication requirement of current parallel shift-invert eigenvalue algorithms through various shift selection and migration techniques based on density of states estimation and k-means clustering, respectively. This work demonstrates the robustness and parallel performance of the SISLICE method on a representative set of SCF eigenvalue problems and outlines research directions which will be explored in future work.




3

Dynamic Face Video Segmentation via Reinforcement Learning. (arXiv:1907.01296v3 [cs.CV] UPDATED)

For real-time semantic video segmentation, most recent works utilised a dynamic framework with a key scheduler to make online key/non-key decisions. Some works used a fixed key scheduling policy, while others proposed adaptive key scheduling methods based on heuristic strategies, both of which may lead to suboptimal global performance. To overcome this limitation, we model the online key decision process in dynamic video segmentation as a deep reinforcement learning problem and learn an efficient and effective scheduling policy from expert information about decision history and from the process of maximising global return. Moreover, we study the application of dynamic video segmentation on face videos, a field that has not been investigated before. By evaluating on the 300VW dataset, we show that the performance of our reinforcement key scheduler outperforms that of various baselines in terms of both effective key selections and running speed. Further results on the Cityscapes dataset demonstrate that our proposed method can also generalise to other scenarios. To the best of our knowledge, this is the first work to use reinforcement learning for online key-frame decision in dynamic video segmentation, and also the first work on its application on face videos.




3

Space-Efficient Vertex Separators for Treewidth. (arXiv:1907.00676v3 [cs.DS] UPDATED)

For $n$-vertex graphs with treewidth $k = O(n^{1/2-epsilon})$ and an arbitrary $epsilon>0$, we present a word-RAM algorithm to compute vertex separators using only $O(n)$ bits of working memory. As an application of our algorithm, we give an $O(1)$-approximation algorithm for tree decomposition. Our algorithm computes a tree decomposition in $c^k n (log log n) log^* n$ time using $O(n)$ bits for some constant $c > 0$.

We finally use the tree decomposition obtained by our algorithm to solve Vertex Cover, Independent Set, Dominating Set, MaxCut and $3$-Coloring by using $O(n)$ bits as long as the treewidth of the graph is smaller than $c' log n$ for some problem dependent constant $0 < c' < 1$.




3

Parameterised Counting in Logspace. (arXiv:1904.12156v3 [cs.LO] UPDATED)

Stockhusen and Tantau (IPEC 2013) defined the operators paraW and paraBeta for parameterised space complexity classes by allowing bounded nondeterminism with multiple read and read-once access, respectively. Using these operators, they obtained characterisations for the complexity of many parameterisations of natural problems on graphs.

In this article, we study the counting versions of such operators and introduce variants based on tail-nondeterminism, paraW[1] and paraBetaTail, in the setting of parameterised logarithmic space. We examine closure properties of the new classes under the central reductions and arithmetic operations. We also identify a wide range of natural complete problems for our classes in the areas of walk counting in digraphs, first-order model-checking and graph-homomorphisms. In doing so, we also see that the closure of #paraBetaTail-L under parameterised logspace parsimonious reductions coincides with #paraBeta-L. We show that the complexity of a parameterised variant of the determinant function is #paraBetaTail-L-hard and can be written as the difference of two functions in #paraBetaTail-L for (0,1)-matrices. Finally, we characterise the new complexity classes in terms of branching programs.




3

Constrained Restless Bandits for Dynamic Scheduling in Cyber-Physical Systems. (arXiv:1904.08962v3 [cs.SY] UPDATED)

Restless multi-armed bandits are a class of discrete-time stochastic control problems which involve sequential decision making with a finite set of actions (set of arms). This paper studies a class of constrained restless multi-armed bandits (CRMAB). The constraints are in the form of time varying set of actions (set of available arms). This variation can be either stochastic or semi-deterministic. Given a set of arms, a fixed number of them can be chosen to be played in each decision interval. The play of each arm yields a state dependent reward. The current states of arms are partially observable through binary feedback signals from arms that are played. The current availability of arms is fully observable. The objective is to maximize long term cumulative reward. The uncertainty about future availability of arms along with partial state information makes this objective challenging. Applications for CRMAB abound in the domain of cyber-physical systems. This optimization problem is analyzed using Whittle's index policy. To this end, a constrained restless single-armed bandit is studied. It is shown to admit a threshold-type optimal policy, and is also indexable. An algorithm to compute Whittle's index is presented. Further, upper bounds on the value function are derived in order to estimate the degree of sub-optimality of various solutions. The simulation study compares the performance of Whittle's index, modified Whittle's index and myopic policies.




3

Fast Cross-validation in Harmonic Approximation. (arXiv:1903.10206v3 [math.NA] UPDATED)

Finding a good regularization parameter for Tikhonov regularization problems is a though yet often asked question. One approach is to use leave-one-out cross-validation scores to indicate the goodness of fit. This utilizes only the noisy function values but, on the downside, comes with a high computational cost. In this paper we present a general approach to shift the main computations from the function in question to the node distribution and, making use of FFT and FFT-like algorithms, even reduce this cost tremendously to the cost of the Tikhonov regularization problem itself. We apply this technique in different settings on the torus, the unit interval, and the two-dimensional sphere. Given that the sampling points satisfy a quadrature rule our algorithm computes the cross-validations scores in floating-point precision. In the cases of arbitrarily scattered nodes we propose an approximating algorithm with the same complexity. Numerical experiments indicate the applicability of our algorithms.




3

Ranked List Loss for Deep Metric Learning. (arXiv:1903.03238v6 [cs.CV] UPDATED)

The objective of deep metric learning (DML) is to learn embeddings that can capture semantic similarity and dissimilarity information among data points. Existing pairwise or tripletwise loss functions used in DML are known to suffer from slow convergence due to a large proportion of trivial pairs or triplets as the model improves. To improve this, ranking-motivated structured losses are proposed recently to incorporate multiple examples and exploit the structured information among them. They converge faster and achieve state-of-the-art performance. In this work, we unveil two limitations of existing ranking-motivated structured losses and propose a novel ranked list loss to solve both of them. First, given a query, only a fraction of data points is incorporated to build the similarity structure. To address this, we propose to build a set-based similarity structure by exploiting all instances in the gallery. The learning setting can be interpreted as few-shot retrieval: given a mini-batch, every example is iteratively used as a query, and the rest ones compose the galley to search, i.e., the support set in few-shot setting. The rest examples are split into a positive set and a negative set. For every mini-batch, the learning objective of ranked list loss is to make the query closer to the positive set than to the negative set by a margin. Second, previous methods aim to pull positive pairs as close as possible in the embedding space. As a result, the intraclass data distribution tends to be extremely compressed. In contrast, we propose to learn a hypersphere for each class in order to preserve useful similarity structure inside it, which functions as regularisation. Extensive experiments demonstrate the superiority of our proposal by comparing with the state-of-the-art methods on the fine-grained image retrieval task.




3

Keeping out the Masses: Understanding the Popularity and Implications of Internet Paywalls. (arXiv:1903.01406v4 [cs.CY] UPDATED)

Funding the production of quality online content is a pressing problem for content producers. The most common funding method, online advertising, is rife with well-known performance and privacy harms, and an intractable subject-agent conflict: many users do not want to see advertisements, depriving the site of needed funding.

Because of these negative aspects of advertisement-based funding, paywalls are an increasingly popular alternative for websites. This shift to a "pay-for-access" web is one that has potentially huge implications for the web and society. Instead of a system where information (nominally) flows freely, paywalls create a web where high quality information is available to fewer and fewer people, leaving the rest of the web users with less information, that might be also less accurate and of lower quality. Despite the potential significance of a move from an "advertising-but-open" web to a "paywalled" web, we find this issue understudied.

This work addresses this gap in our understanding by measuring how widely paywalls have been adopted, what kinds of sites use paywalls, and the distribution of policies enforced by paywalls. A partial list of our findings include that (i) paywall use is accelerating (2x more paywalls every 6 months), (ii) paywall adoption differs by country (e.g. 18.75% in US, 12.69% in Australia), (iii) paywalls change how users interact with sites (e.g. higher bounce rates, less incoming links), (iv) the median cost of an annual paywall access is $108 per site, and (v) paywalls are in general trivial to circumvent.

Finally, we present the design of a novel, automated system for detecting whether a site uses a paywall, through the combination of runtime browser instrumentation and repeated programmatic interactions with the site. We intend this classifier to augment future, longitudinal measurements of paywall use and behavior.




3

Deterministic Sparse Fourier Transform with an ell_infty Guarantee. (arXiv:1903.00995v3 [cs.DS] UPDATED)

In this paper we revisit the deterministic version of the Sparse Fourier Transform problem, which asks to read only a few entries of $x in mathbb{C}^n$ and design a recovery algorithm such that the output of the algorithm approximates $hat x$, the Discrete Fourier Transform (DFT) of $x$. The randomized case has been well-understood, while the main work in the deterministic case is that of Merhi et al.@ (J Fourier Anal Appl 2018), which obtains $O(k^2 log^{-1}k cdot log^{5.5}n)$ samples and a similar runtime with the $ell_2/ell_1$ guarantee. We focus on the stronger $ell_{infty}/ell_1$ guarantee and the closely related problem of incoherent matrices. We list our contributions as follows.

1. We find a deterministic collection of $O(k^2 log n)$ samples for the $ell_infty/ell_1$ recovery in time $O(nk log^2 n)$, and a deterministic collection of $O(k^2 log^2 n)$ samples for the $ell_infty/ell_1$ sparse recovery in time $O(k^2 log^3n)$.

2. We give new deterministic constructions of incoherent matrices that are row-sampled submatrices of the DFT matrix, via a derandomization of Bernstein's inequality and bounds on exponential sums considered in analytic number theory. Our first construction matches a previous randomized construction of Nelson, Nguyen and Woodruff (RANDOM'12), where there was no constraint on the form of the incoherent matrix.

Our algorithms are nearly sample-optimal, since a lower bound of $Omega(k^2 + k log n)$ is known, even for the case where the sensing matrix can be arbitrarily designed. A similar lower bound of $Omega(k^2 log n/ log k)$ is known for incoherent matrices.




3

Machine learning topological phases in real space. (arXiv:1901.01963v4 [cond-mat.mes-hall] UPDATED)

We develop a supervised machine learning algorithm that is able to learn topological phases for finite condensed matter systems from bulk data in real lattice space. The algorithm employs diagonalization in real space together with any supervised learning algorithm to learn topological phases through an eigenvector ensembling procedure. We combine our algorithm with decision trees and random forests to successfully recover topological phase diagrams of Su-Schrieffer-Heeger (SSH) models from bulk lattice data in real space and show how the Shannon information entropy of ensembles of lattice eigenvectors can be used to retrieve a signal detailing how topological information is distributed in the bulk. The discovery of Shannon information entropy signals associated with topological phase transitions from the analysis of data from several thousand SSH systems illustrates how model explainability in machine learning can advance the research of exotic quantum materials with properties that may power future technological applications such as qubit engineering for quantum computing.




3

Weighted Moore-Penrose inverses of arbitrary-order tensors. (arXiv:1812.03052v3 [math.NA] UPDATED)

Within the field of multilinear algebra, inverses and generalized inverses of tensors based on the Einstein product have been investigated over the past few years. In this paper, we explore the singular value decomposition and full-rank decomposition of arbitrary-order tensors using {it reshape} operation. Applying range and null space of tensors along with the reshape operation; we further study the Moore-Penrose inverse of tensors and their cancellation properties via the Einstein product. Then we discuss weighted Moore-Penrose inverses of arbitrary-order tensors using such product. Following a specific algebraic approach, a few characterizations and representations of these inverses are explored. In addition to this, we obtain a few necessary and sufficient conditions for the reverse-order law to hold for weighted Moore-Penrose inverses of arbitrary-order tensors.




3

An improved exact algorithm and an NP-completeness proof for sparse matrix bipartitioning. (arXiv:1811.02043v2 [cs.DS] UPDATED)

We investigate sparse matrix bipartitioning -- a problem where we minimize the communication volume in parallel sparse matrix-vector multiplication. We prove, by reduction from graph bisection, that this problem is $mathcal{NP}$-complete in the case where each side of the bipartitioning must contain a linear fraction of the nonzeros.

We present an improved exact branch-and-bound algorithm which finds the minimum communication volume for a given matrix and maximum allowed imbalance. The algorithm is based on a maximum-flow bound and a packing bound, which extend previous matching and packing bounds.

We implemented the algorithm in a new program called MP (Matrix Partitioner), which solved 839 matrices from the SuiteSparse collection to optimality, each within 24 hours of CPU-time. Furthermore, MP solved the difficult problem of the matrix cage6 in about 3 days. The new program is on average more than ten times faster than the previous program MondriaanOpt.

Benchmark results using the set of 839 optimally solved matrices show that combining the medium-grain/iterative refinement methods of the Mondriaan package with the hypergraph bipartitioner of the PaToH package produces sparse matrix bipartitionings on average within 10% of the optimal solution.




3

SilhoNet: An RGB Method for 6D Object Pose Estimation. (arXiv:1809.06893v4 [cs.CV] UPDATED)

Autonomous robot manipulation involves estimating the translation and orientation of the object to be manipulated as a 6-degree-of-freedom (6D) pose. Methods using RGB-D data have shown great success in solving this problem. However, there are situations where cost constraints or the working environment may limit the use of RGB-D sensors. When limited to monocular camera data only, the problem of object pose estimation is very challenging. In this work, we introduce a novel method called SilhoNet that predicts 6D object pose from monocular images. We use a Convolutional Neural Network (CNN) pipeline that takes in Region of Interest (ROI) proposals to simultaneously predict an intermediate silhouette representation for objects with an associated occlusion mask and a 3D translation vector. The 3D orientation is then regressed from the predicted silhouettes. We show that our method achieves better overall performance on the YCB-Video dataset than two state-of-the art networks for 6D pose estimation from monocular image input.




3

Identifying Compromised Accounts on Social Media Using Statistical Text Analysis. (arXiv:1804.07247v3 [cs.SI] UPDATED)

Compromised accounts on social networks are regular user accounts that have been taken over by an entity with malicious intent. Since the adversary exploits the already established trust of a compromised account, it is crucial to detect these accounts to limit the damage they can cause. We propose a novel general framework for discovering compromised accounts by semantic analysis of text messages coming out from an account. Our framework is built on the observation that normal users will use language that is measurably different from the language that an adversary would use when the account is compromised. We use our framework to develop specific algorithms that use the difference of language models of users and adversaries as features in a supervised learning setup. Evaluation results show that the proposed framework is effective for discovering compromised accounts on social networks and a KL-divergence-based language model feature works best.




3

ZebraLancer: Decentralized Crowdsourcing of Human Knowledge atop Open Blockchain. (arXiv:1803.01256v5 [cs.HC] UPDATED)

We design and implement the first private and anonymous decentralized crowdsourcing system ZebraLancer, and overcome two fundamental challenges of decentralizing crowdsourcing, i.e., data leakage and identity breach.

First, our outsource-then-prove methodology resolves the tension between the blockchain transparency and the data confidentiality to guarantee the basic utilities/fairness requirements of data crowdsourcing, thus ensuring: (i) a requester will not pay more than what data deserve, according to a policy announced when her task is published via the blockchain; (ii) each worker indeed gets a payment based on the policy, if he submits data to the blockchain; (iii) the above properties are realized not only without a central arbiter, but also without leaking the data to the open blockchain. Second, the transparency of blockchain allows one to infer private information about workers and requesters through their participation history. Simply enabling anonymity is seemingly attempting but will allow malicious workers to submit multiple times to reap rewards. ZebraLancer also overcomes this problem by allowing anonymous requests/submissions without sacrificing accountability. The idea behind is a subtle linkability: if a worker submits twice to a task, anyone can link the submissions, or else he stays anonymous and unlinkable across tasks. To realize this delicate linkability, we put forward a novel cryptographic concept, i.e., the common-prefix-linkable anonymous authentication. We remark the new anonymous authentication scheme might be of independent interest. Finally, we implement our protocol for a common image annotation task and deploy it in a test net of Ethereum. The experiment results show the applicability of our protocol atop the existing real-world blockchain.




3

ErdH{o}s-P'osa property of chordless cycles and its applications. (arXiv:1711.00667v3 [math.CO] UPDATED)

A chordless cycle, or equivalently a hole, in a graph $G$ is an induced subgraph of $G$ which is a cycle of length at least $4$. We prove that the ErdH{o}s-P'osa property holds for chordless cycles, which resolves the major open question concerning the ErdH{o}s-P'osa property. Our proof for chordless cycles is constructive: in polynomial time, one can find either $k+1$ vertex-disjoint chordless cycles, or $c_1k^2 log k+c_2$ vertices hitting every chordless cycle for some constants $c_1$ and $c_2$. It immediately implies an approximation algorithm of factor $mathcal{O}(sf{opt}log {sf opt})$ for Chordal Vertex Deletion. We complement our main result by showing that chordless cycles of length at least $ell$ for any fixed $ellge 5$ do not have the ErdH{o}s-P'osa property.




3

Using hierarchical matrices in the solution of the time-fractional heat equation by multigrid waveform relaxation. (arXiv:1706.07632v3 [math.NA] UPDATED)

This work deals with the efficient numerical solution of the time-fractional heat equation discretized on non-uniform temporal meshes. Non-uniform grids are essential to capture the singularities of "typical" solutions of time-fractional problems. We propose an efficient space-time multigrid method based on the waveform relaxation technique, which accounts for the nonlocal character of the fractional differential operator. To maintain an optimal complexity, which can be obtained for the case of uniform grids, we approximate the coefficient matrix corresponding to the temporal discretization by its hierarchical matrix (${cal H}$-matrix) representation. In particular, the proposed method has a computational cost of ${cal O}(k N M log(M))$, where $M$ is the number of time steps, $N$ is the number of spatial grid points, and $k$ is a parameter which controls the accuracy of the ${cal H}$-matrix approximation. The efficiency and the good convergence of the algorithm, which can be theoretically justified by a semi-algebraic mode analysis, are demonstrated through numerical experiments in both one- and two-dimensional spaces.




3

Active Intent Disambiguation for Shared Control Robots. (arXiv:2005.03652v1 [cs.RO])

Assistive shared-control robots have the potential to transform the lives of millions of people afflicted with severe motor impairments. The usefulness of shared-control robots typically relies on the underlying autonomy's ability to infer the user's needs and intentions, and the ability to do so unambiguously is often a limiting factor for providing appropriate assistance confidently and accurately. The contributions of this paper are four-fold. First, we introduce the idea of intent disambiguation via control mode selection, and present a mathematical formalism for the same. Second, we develop a control mode selection algorithm which selects the control mode in which the user-initiated motion helps the autonomy to maximally disambiguate user intent. Third, we present a pilot study with eight subjects to evaluate the efficacy of the disambiguation algorithm. Our results suggest that the disambiguation system (a) helps to significantly reduce task effort, as measured by number of button presses, and (b) is of greater utility for more limited control interfaces and more complex tasks. We also observe that (c) subjects demonstrated a wide range of disambiguation request behaviors, with the common thread of concentrating requests early in the execution. As our last contribution, we introduce a novel field-theoretic approach to intent inference inspired by dynamic field theory that works in tandem with the disambiguation scheme.




3

Defending Hardware-based Malware Detectors against Adversarial Attacks. (arXiv:2005.03644v1 [cs.CR])

In the era of Internet of Things (IoT), Malware has been proliferating exponentially over the past decade. Traditional anti-virus software are ineffective against modern complex Malware. In order to address this challenge, researchers have proposed Hardware-assisted Malware Detection (HMD) using Hardware Performance Counters (HPCs). The HPCs are used to train a set of Machine learning (ML) classifiers, which in turn, are used to distinguish benign programs from Malware. Recently, adversarial attacks have been designed by introducing perturbations in the HPC traces using an adversarial sample predictor to misclassify a program for specific HPCs. These attacks are designed with the basic assumption that the attacker is aware of the HPCs being used to detect Malware. Since modern processors consist of hundreds of HPCs, restricting to only a few of them for Malware detection aids the attacker. In this paper, we propose a Moving target defense (MTD) for this adversarial attack by designing multiple ML classifiers trained on different sets of HPCs. The MTD randomly selects a classifier; thus, confusing the attacker about the HPCs or the number of classifiers applied. We have developed an analytical model which proves that the probability of an attacker to guess the perfect HPC-classifier combination for MTD is extremely low (in the range of $10^{-1864}$ for a system with 20 HPCs). Our experimental results prove that the proposed defense is able to improve the classification accuracy of HPC traces that have been modified through an adversarial sample generator by up to 31.5%, for a near perfect (99.4%) restoration of the original accuracy.




3

On Exposure Bias, Hallucination and Domain Shift in Neural Machine Translation. (arXiv:2005.03642v1 [cs.CL])

The standard training algorithm in neural machine translation (NMT) suffers from exposure bias, and alternative algorithms have been proposed to mitigate this. However, the practical impact of exposure bias is under debate. In this paper, we link exposure bias to another well-known problem in NMT, namely the tendency to generate hallucinations under domain shift. In experiments on three datasets with multiple test domains, we show that exposure bias is partially to blame for hallucinations, and that training with Minimum Risk Training, which avoids exposure bias, can mitigate this. Our analysis explains why exposure bias is more problematic under domain shift, and also links exposure bias to the beam search problem, i.e. performance deterioration with increasing beam size. Our results provide a new justification for methods that reduce exposure bias: even if they do not increase performance on in-domain test sets, they can increase model robustness to domain shift.




3

Where is Linked Data in Question Answering over Linked Data?. (arXiv:2005.03640v1 [cs.CL])

We argue that "Question Answering with Knowledge Base" and "Question Answering over Linked Data" are currently two instances of the same problem, despite one explicitly declares to deal with Linked Data. We point out the lack of existing methods to evaluate question answering on datasets which exploit external links to the rest of the cloud or share common schema. To this end, we propose the creation of new evaluation settings to leverage the advantages of the Semantic Web to achieve AI-complete question answering.




3

Mutli-task Learning with Alignment Loss for Far-field Small-Footprint Keyword Spotting. (arXiv:2005.03633v1 [eess.AS])

In this paper, we focus on the task of small-footprint keyword spotting under the far-field scenario. Far-field environments are commonly encountered in real-life speech applications, and it causes serve degradation of performance due to room reverberation and various kinds of noises. Our baseline system is built on the convolutional neural network trained with pooled data of both far-field and close-talking speech. To cope with the distortions, we adopt the multi-task learning scheme with alignment loss to reduce the mismatch between the embedding features learned from different domains of data. Experimental results show that our proposed method maintains the performance on close-talking speech and achieves significant improvement on the far-field test set.




3

The Zhou Ordinal of Labelled Markov Processes over Separable Spaces. (arXiv:2005.03630v1 [cs.LO])

There exist two notions of equivalence of behavior between states of a Labelled Markov Process (LMP): state bisimilarity and event bisimilarity. The first one can be considered as an appropriate generalization to continuous spaces of Larsen and Skou's probabilistic bisimilarity, while the second one is characterized by a natural logic. C. Zhou expressed state bisimilarity as the greatest fixed point of an operator $mathcal{O}$, and thus introduced an ordinal measure of the discrepancy between it and event bisimilarity. We call this ordinal the "Zhou ordinal" of $mathbb{S}$, $mathfrak{Z}(mathbb{S})$. When $mathfrak{Z}(mathbb{S})=0$, $mathbb{S}$ satisfies the Hennessy-Milner property. The second author proved the existence of an LMP $mathbb{S}$ with $mathfrak{Z}(mathbb{S}) geq 1$ and Zhou showed that there are LMPs having an infinite Zhou ordinal. In this paper we show that there are LMPs $mathbb{S}$ over separable metrizable spaces having arbitrary large countable $mathfrak{Z}(mathbb{S})$ and that it is consistent with the axioms of $mathit{ZFC}$ that there is such a process with an uncountable Zhou ordinal.




3

Universal Coding and Prediction on Martin-L"of Random Points. (arXiv:2005.03627v1 [math.PR])

We perform an effectivization of classical results concerning universal coding and prediction for stationary ergodic processes over an arbitrary finite alphabet. That is, we lift the well-known almost sure statements to statements about Martin-L"of random sequences. Most of this work is quite mechanical but, by the way, we complete a result of Ryabko from 2008 by showing that each universal probability measure in the sense of universal coding induces a universal predictor in the prequential sense. Surprisingly, the effectivization of this implication holds true provided the universal measure does not ascribe too low conditional probabilities to individual symbols. As an example, we show that the Prediction by Partial Matching (PPM) measure satisfies this requirement. In the almost sure setting, the requirement is superfluous.




3

Seismic Shot Gather Noise Localization Using a Multi-Scale Feature-Fusion-Based Neural Network. (arXiv:2005.03626v1 [cs.CV])

Deep learning-based models, such as convolutional neural networks, have advanced various segments of computer vision. However, this technology is rarely applied to seismic shot gather noise localization problem. This letter presents an investigation on the effectiveness of a multi-scale feature-fusion-based network for seismic shot-gather noise localization. Herein, we describe the following: (1) the construction of a real-world dataset of seismic noise localization based on 6,500 seismograms; (2) a multi-scale feature-fusion-based detector that uses the MobileNet combined with the Feature Pyramid Net as the backbone; and (3) the Single Shot multi-box detector for box classification/regression. Additionally, we propose the use of the Focal Loss function that improves the detector's prediction accuracy. The proposed detector achieves an AP@0.5 of 78.67\% in our empirical evaluation.