xi Subtle Sensing: Detecting Differences in the Flexibility of Virtually Simulated Molecular Objects. (arXiv:2005.03503v1 [cs.HC]) By arxiv.org Published On :: During VR demos we have performed over last few years, many participants (in the absence of any haptic feedback) have commented on their perceived ability to 'feel' differences between simulated molecular objects. The mechanisms for such 'feeling' are not entirely clear: observing from outside VR, one can see that there is nothing physical for participants to 'feel'. Here we outline exploratory user studies designed to evaluate the extent to which participants can distinguish quantitative differences in the flexibility of VR-simulated molecular objects. The results suggest that an individual's capacity to detect differences in molecular flexibility is enhanced when they can interact with and manipulate the molecules, as opposed to merely observing the same interaction. Building on these results, we intend to carry out further studies investigating humans' ability to sense quantitative properties of VR simulations without haptic technology. Full Article
xi Heidelberg Colorectal Data Set for Surgical Data Science in the Sensor Operating Room. (arXiv:2005.03501v1 [cs.CV]) By arxiv.org Published On :: Image-based tracking of medical instruments is an integral part of many surgical data science applications. Previous research has addressed the tasks of detecting, segmenting and tracking medical instruments based on laparoscopic video data. However, the methods proposed still tend to fail when applied to challenging images and do not generalize well to data they have not been trained on. This paper introduces the Heidelberg Colorectal (HeiCo) data set - the first publicly available data set enabling comprehensive benchmarking of medical instrument detection and segmentation algorithms with a specific emphasis on robustness and generalization capabilities of the methods. Our data set comprises 30 laparoscopic videos and corresponding sensor data from medical devices in the operating room for three different types of laparoscopic surgery. Annotations include surgical phase labels for all frames in the videos as well as instance-wise segmentation masks for surgical instruments in more than 10,000 individual frames. The data has successfully been used to organize international competitions in the scope of the Endoscopic Vision Challenges (EndoVis) 2017 and 2019. Full Article
xi Computing with bricks and mortar: Classification of waveforms with a doped concrete blocks. (arXiv:2005.03498v1 [cs.ET]) By arxiv.org Published On :: We present results showing the capability of concrete-based information processing substrate in the signal classification task in accordance with in materio computing paradigm. As the Reservoir Computing is a suitable model for describing embedded in materio computation, we propose that this type of presented basic construction unit can be used as a source for "reservoir of states" necessary for simple tuning of the readout layer. In that perspective, buildings constructed from computing concrete could function as a highly parallel information processor for smart architecture. We present an electrical characterization of the set of samples with different additive concentrations followed by a dynamical analysis of selected specimens showing fingerprints of memfractive properties. Moreover, on the basis of obtained parameters, classification of the signal waveform shapes can be performed in scenarios explicitly tuned for a given device terminal. Full Article
xi Subquadratic-Time Algorithms for Normal Bases. (arXiv:2005.03497v1 [cs.SC]) By arxiv.org Published On :: For any finite Galois field extension $mathsf{K}/mathsf{F}$, with Galois group $G = mathrm{Gal}(mathsf{K}/mathsf{F})$, there exists an element $alpha in mathsf{K}$ whose orbit $Gcdotalpha$ forms an $mathsf{F}$-basis of $mathsf{K}$. Such an $alpha$ is called a normal element and $Gcdotalpha$ is a normal basis. We introduce a probabilistic algorithm for testing whether a given $alpha in mathsf{K}$ is normal, when $G$ is either a finite abelian or a metacyclic group. The algorithm is based on the fact that deciding whether $alpha$ is normal can be reduced to deciding whether $sum_{g in G} g(alpha)g in mathsf{K}[G]$ is invertible; it requires a slightly subquadratic number of operations. Once we know that $alpha$ is normal, we show how to perform conversions between the working basis of $mathsf{K}/mathsf{F}$ and the normal basis with the same asymptotic cost. Full Article
xi Text Recognition in the Wild: A Survey. (arXiv:2005.03492v1 [cs.CV]) By arxiv.org Published On :: The history of text can be traced back over thousands of years. Rich and precise semantic information carried by text is important in a wide range of vision-based application scenarios. Therefore, text recognition in natural scenes has been an active research field in computer vision and pattern recognition. In recent years, with the rise and development of deep learning, numerous methods have shown promising in terms of innovation, practicality, and efficiency. This paper aims to (1) summarize the fundamental problems and the state-of-the-art associated with scene text recognition; (2) introduce new insights and ideas; (3) provide a comprehensive review of publicly available resources; (4) point out directions for future work. In summary, this literature review attempts to present the entire picture of the field of scene text recognition. It provides a comprehensive reference for people entering this field, and could be helpful to inspire future research. Related resources are available at our Github repository: https://github.com/HCIILAB/Scene-Text-Recognition. Full Article
xi Algorithmic Averaging for Studying Periodic Orbits of Planar Differential Systems. (arXiv:2005.03487v1 [cs.SC]) By arxiv.org Published On :: One of the main open problems in the qualitative theory of real planar differential systems is the study of limit cycles. In this article, we present an algorithmic approach for detecting how many limit cycles can bifurcate from the periodic orbits of a given polynomial differential center when it is perturbed inside a class of polynomial differential systems via the averaging method. We propose four symbolic algorithms to implement the averaging method. The first algorithm is based on the change of polar coordinates that allows one to transform a considered differential system to the normal form of averaging. The second algorithm is used to derive the solutions of certain differential systems associated to the unperturbed term of the normal of averaging. The third algorithm exploits the partial Bell polynomials and allows one to compute the integral formula of the averaged functions at any order. The last algorithm is based on the aforementioned algorithms and determines the exact expressions of the averaged functions for the considered differential systems. The implementation of our algorithms is discussed and evaluated using several examples. The experimental results have extended the existing relevant results for certain classes of differential systems. Full Article
xi Anonymized GCN: A Novel Robust Graph Embedding Method via Hiding Node Position in Noise. (arXiv:2005.03482v1 [cs.LG]) By arxiv.org Published On :: Graph convolution network (GCN) have achieved state-of-the-art performance in the task of node prediction in the graph structure. However, with the gradual various of graph attack methods, there are lack of research on the robustness of GCN. At this paper, we will design a robust GCN method for node prediction tasks. Considering the graph structure contains two types of information: node information and connection information, and attackers usually modify the connection information to complete the interference with the prediction results of the node, we first proposed a method to hide the connection information in the generator, named Anonymized GCN (AN-GCN). By hiding the connection information in the graph structure in the generator through adversarial training, the accurate node prediction can be completed only by the node number rather than its specific position in the graph. Specifically, we first demonstrated the key to determine the embedding of a specific node: the row corresponding to the node of the eigenmatrix of the Laplace matrix, by target it as the output of the generator, we designed a method to hide the node number in the noise. Take the corresponding noise as input, we will obtain the connection structure of the node instead of directly obtaining. Then the encoder and decoder are spliced both in discriminator, so that after adversarial training, the generator and discriminator can cooperate to complete the encoding and decoding of the graph, then complete the node prediction. Finally, All node positions can generated by noise at the same time, that is to say, the generator will hides all the connection information of the graph structure. The evaluation shows that we only need to obtain the initial features and node numbers of the nodes to complete the node prediction, and the accuracy did not decrease, but increased by 0.0293. Full Article
xi Brain-like approaches to unsupervised learning of hidden representations -- a comparative study. (arXiv:2005.03476v1 [cs.NE]) By arxiv.org Published On :: Unsupervised learning of hidden representations has been one of the most vibrant research directions in machine learning in recent years. In this work we study the brain-like Bayesian Confidence Propagating Neural Network (BCPNN) model, recently extended to extract sparse distributed high-dimensional representations. The saliency and separability of the hidden representations when trained on MNIST dataset is studied using an external classifier, and compared with other unsupervised learning methods that include restricted Boltzmann machines and autoencoders. Full Article
xi Bundle Recommendation with Graph Convolutional Networks. (arXiv:2005.03475v1 [cs.IR]) By arxiv.org Published On :: Bundle recommendation aims to recommend a bundle of items for a user to consume as a whole. Existing solutions integrate user-item interaction modeling into bundle recommendation by sharing model parameters or learning in a multi-task manner, which cannot explicitly model the affiliation between items and bundles, and fail to explore the decision-making when a user chooses bundles. In this work, we propose a graph neural network model named BGCN (short for extit{ extBF{B}undle extBF{G}raph extBF{C}onvolutional extBF{N}etwork}) for bundle recommendation. BGCN unifies user-item interaction, user-bundle interaction and bundle-item affiliation into a heterogeneous graph. With item nodes as the bridge, graph convolutional propagation between user and bundle nodes makes the learned representations capture the item level semantics. Through training based on hard-negative sampler, the user's fine-grained preferences for similar bundles are further distinguished. Empirical results on two real-world datasets demonstrate the strong performance gains of BGCN, which outperforms the state-of-the-art baselines by 10.77\% to 23.18\%. Full Article
xi Ensuring Fairness under Prior Probability Shifts. (arXiv:2005.03474v1 [cs.LG]) By arxiv.org Published On :: In this paper, we study the problem of fair classification in the presence of prior probability shifts, where the training set distribution differs from the test set. This phenomenon can be observed in the yearly records of several real-world datasets, such as recidivism records and medical expenditure surveys. If unaccounted for, such shifts can cause the predictions of a classifier to become unfair towards specific population subgroups. While the fairness notion called Proportional Equality (PE) accounts for such shifts, a procedure to ensure PE-fairness was unknown. In this work, we propose a method, called CAPE, which provides a comprehensive solution to the aforementioned problem. CAPE makes novel use of prevalence estimation techniques, sampling and an ensemble of classifiers to ensure fair predictions under prior probability shifts. We introduce a metric, called prevalence difference (PD), which CAPE attempts to minimize in order to ensure PE-fairness. We theoretically establish that this metric exhibits several desirable properties. We evaluate the efficacy of CAPE via a thorough empirical evaluation on synthetic datasets. We also compare the performance of CAPE with several popular fair classifiers on real-world datasets like COMPAS (criminal risk assessment) and MEPS (medical expenditure panel survey). The results indicate that CAPE ensures PE-fair predictions, while performing well on other performance metrics. Full Article
xi Indexing Metric Spaces for Exact Similarity Search. (arXiv:2005.03468v1 [cs.DB]) By arxiv.org Published On :: With the continued digitalization of societal processes, we are seeing an explosion in available data. This is referred to as big data. In a research setting, three aspects of the data are often viewed as the main sources of challenges when attempting to enable value creation from big data: volume, velocity and variety. Many studies address volume or velocity, while much fewer studies concern the variety. Metric space is ideal for addressing variety because it can accommodate any type of data as long as its associated distance notion satisfies the triangle inequality. To accelerate search in metric space, a collection of indexing techniques for metric data have been proposed. However, existing surveys each offers only a narrow coverage, and no comprehensive empirical study of those techniques exists. We offer a survey of all the existing metric indexes that can support exact similarity search, by i) summarizing all the existing partitioning, pruning and validation techniques used for metric indexes, ii) providing the time and storage complexity analysis on the index construction, and iii) report on a comprehensive empirical comparison of their similarity query processing performance. Here, empirical comparisons are used to evaluate the index performance during search as it is hard to see the complexity analysis differences on the similarity query processing and the query performance depends on the pruning and validation abilities related to the data distribution. This article aims at revealing different strengths and weaknesses of different indexing techniques in order to offer guidance on selecting an appropriate indexing technique for a given setting, and directing the future research for metric indexes. Full Article
xi Predictions and algorithmic statistics for infinite sequence. (arXiv:2005.03467v1 [cs.IT]) By arxiv.org Published On :: Consider the following prediction problem. Assume that there is a block box that produces bits according to some unknown computable distribution on the binary tree. We know first $n$ bits $x_1 x_2 ldots x_n$. We want to know the probability of the event that that the next bit is equal to $1$. Solomonoff suggested to use universal semimeasure $m$ for solving this task. He proved that for every computable distribution $P$ and for every $b in {0,1}$ the following holds: $$sum_{n=1}^{infty}sum_{x: l(x)=n} P(x) (P(b | x) - m(b | x))^2 < infty .$$ However, Solomonoff's method has a negative aspect: Hutter and Muchnik proved that there are an universal semimeasure $m$, computable distribution $P$ and a random (in Martin-L{"o}f sense) sequence $x_1 x_2ldots$ such that $lim_{n o infty} P(x_{n+1} | x_1ldots x_n) - m(x_{n+1} | x_1ldots x_n) rightarrow 0$. We suggest a new way for prediction. For every finite string $x$ we predict the new bit according to the best (in some sence) distribution for $x$. We prove the similar result as Solomonoff theorem for our way of prediction. Also we show that our method of prediction has no that negative aspect as Solomonoff's method. Full Article
xi High Performance Interference Suppression in Multi-User Massive MIMO Detector. (arXiv:2005.03466v1 [cs.OH]) By arxiv.org Published On :: In this paper, we propose a new nonlinear detector with improved interference suppression in Multi-User Multiple Input, Multiple Output (MU-MIMO) system. The proposed detector is a combination of the following parts: QR decomposition (QRD), low complexity users sorting before QRD, sorting-reduced (SR) K-best method and minimum mean square error (MMSE) pre-processing. Our method outperforms a linear interference rejection combining (IRC, i.e. MMSE naturally) method significantly in both strong interference and additive white noise scenarios with both ideal and real channel estimations. This result has wide application importance for scenarios with strong interference, i.e. when co-located users utilize the internet in stadium, highway, shopping center, etc. Simulation results are presented for the non-line of sight 3D-UMa model of 5G QuaDRiGa 2.0 channel for 16 highly correlated single-antenna users with QAM16 modulation in 64 antennas of Massive MIMO system. The performance was compared with MMSE and other detection approaches. Full Article
xi How Can CNNs Use Image Position for Segmentation?. (arXiv:2005.03463v1 [eess.IV]) By arxiv.org Published On :: Convolution is an equivariant operation, and image position does not affect its result. A recent study shows that the zero-padding employed in convolutional layers of CNNs provides position information to the CNNs. The study further claims that the position information enables accurate inference for several tasks, such as object recognition, segmentation, etc. However, there is a technical issue with the design of the experiments of the study, and thus the correctness of the claim is yet to be verified. Moreover, the absolute image position may not be essential for the segmentation of natural images, in which target objects will appear at any image position. In this study, we investigate how positional information is and can be utilized for segmentation tasks. Toward this end, we consider {em positional encoding} (PE) that adds channels embedding image position to the input images and compare PE with several padding methods. Considering the above nature of natural images, we choose medical image segmentation tasks, in which the absolute position appears to be relatively important, as the same organs (of different patients) are captured in similar sizes and positions. We draw a mixed conclusion from the experimental results; the positional encoding certainly works in some cases, but the absolute image position may not be so important for segmentation tasks as we think. Full Article
xi ExpDNN: Explainable Deep Neural Network. (arXiv:2005.03461v1 [cs.LG]) By arxiv.org Published On :: In recent years, deep neural networks have been applied to obtain high performance of prediction, classification, and pattern recognition. However, the weights in these deep neural networks are difficult to be explained. Although a linear regression method can provide explainable results, the method is not suitable in the case of input interaction. Therefore, an explainable deep neural network (ExpDNN) with explainable layers is proposed to obtain explainable results in the case of input interaction. Three cases were given to evaluate the proposed ExpDNN, and the results showed that the absolute value of weight in an explainable layer can be used to explain the weight of corresponding input for feature extraction. Full Article
xi AIBench: Scenario-distilling AI Benchmarking. (arXiv:2005.03459v1 [cs.PF]) By arxiv.org Published On :: Real-world application scenarios like modern Internet services consist of diversity of AI and non-AI modules with very long and complex execution paths. Using component or micro AI benchmarks alone can lead to error-prone conclusions. This paper proposes a scenario-distilling AI benchmarking methodology. Instead of using real-world applications, we propose the permutations of essential AI and non-AI tasks as a scenario-distilling benchmark. We consider scenario-distilling benchmarks, component and micro benchmarks as three indispensable parts of a benchmark suite. Together with seventeen industry partners, we identify nine important real-world application scenarios. We design and implement a highly extensible, configurable, and flexible benchmark framework. On the basis of the framework, we propose the guideline for building scenario-distilling benchmarks, and present two Internet service AI ones. The preliminary evaluation shows the advantage of scenario-distilling AI benchmarking against using component or micro AI benchmarks alone. The specifications, source code, testbed, and results are publicly available from the web site url{this http URL}. Full Article
xi NTIRE 2020 Challenge on NonHomogeneous Dehazing. (arXiv:2005.03457v1 [cs.CV]) By arxiv.org Published On :: This paper reviews the NTIRE 2020 Challenge on NonHomogeneous Dehazing of images (restoration of rich details in hazy image). We focus on the proposed solutions and their results evaluated on NH-Haze, a novel dataset consisting of 55 pairs of real haze free and nonhomogeneous hazy images recorded outdoor. NH-Haze is the first realistic nonhomogeneous haze dataset that provides ground truth images. The nonhomogeneous haze has been produced using a professional haze generator that imitates the real conditions of haze scenes. 168 participants registered in the challenge and 27 teams competed in the final testing phase. The proposed solutions gauge the state-of-the-art in image dehazing. Full Article
xi Successfully Applying the Stabilized Lottery Ticket Hypothesis to the Transformer Architecture. (arXiv:2005.03454v1 [cs.LG]) By arxiv.org Published On :: Sparse models require less memory for storage and enable a faster inference by reducing the necessary number of FLOPs. This is relevant both for time-critical and on-device computations using neural networks. The stabilized lottery ticket hypothesis states that networks can be pruned after none or few training iterations, using a mask computed based on the unpruned converged model. On the transformer architecture and the WMT 2014 English-to-German and English-to-French tasks, we show that stabilized lottery ticket pruning performs similar to magnitude pruning for sparsity levels of up to 85%, and propose a new combination of pruning techniques that outperforms all other techniques for even higher levels of sparsity. Furthermore, we confirm that the parameter's initial sign and not its specific value is the primary factor for successful training, and show that magnitude pruning cannot be used to find winning lottery tickets. Full Article
xi A combination of 'pooling' with a prediction model can reduce by 73% the number of COVID-19 (Corona-virus) tests. (arXiv:2005.03453v1 [cs.LG]) By arxiv.org Published On :: We show that combining a prediction model (based on neural networks), with a new method of test pooling (better than the original Dorfman method, and better than double-pooling) called 'Grid', we can reduce the number of Covid-19 tests by 73%. Full Article
xi Lifted Regression/Reconstruction Networks. (arXiv:2005.03452v1 [cs.LG]) By arxiv.org Published On :: In this work we propose lifted regression/reconstruction networks (LRRNs), which combine lifted neural networks with a guaranteed Lipschitz continuity property for the output layer. Lifted neural networks explicitly optimize an energy model to infer the unit activations and therefore---in contrast to standard feed-forward neural networks---allow bidirectional feedback between layers. So far lifted neural networks have been modelled around standard feed-forward architectures. We propose to take further advantage of the feedback property by letting the layers simultaneously perform regression and reconstruction. The resulting lifted network architecture allows to control the desired amount of Lipschitz continuity, which is an important feature to obtain adversarially robust regression and classification methods. We analyse and numerically demonstrate applications for unsupervised and supervised learning. Full Article
xi An Experimental Study of Reduced-Voltage Operation in Modern FPGAs for Neural Network Acceleration. (arXiv:2005.03451v1 [cs.LG]) By arxiv.org Published On :: We empirically evaluate an undervolting technique, i.e., underscaling the circuit supply voltage below the nominal level, to improve the power-efficiency of Convolutional Neural Network (CNN) accelerators mapped to Field Programmable Gate Arrays (FPGAs). Undervolting below a safe voltage level can lead to timing faults due to excessive circuit latency increase. We evaluate the reliability-power trade-off for such accelerators. Specifically, we experimentally study the reduced-voltage operation of multiple components of real FPGAs, characterize the corresponding reliability behavior of CNN accelerators, propose techniques to minimize the drawbacks of reduced-voltage operation, and combine undervolting with architectural CNN optimization techniques, i.e., quantization and pruning. We investigate the effect of environmental temperature on the reliability-power trade-off of such accelerators. We perform experiments on three identical samples of modern Xilinx ZCU102 FPGA platforms with five state-of-the-art image classification CNN benchmarks. This approach allows us to study the effects of our undervolting technique for both software and hardware variability. We achieve more than 3X power-efficiency (GOPs/W) gain via undervolting. 2.6X of this gain is the result of eliminating the voltage guardband region, i.e., the safe voltage region below the nominal level that is set by FPGA vendor to ensure correct functionality in worst-case environmental and circuit conditions. 43% of the power-efficiency gain is due to further undervolting below the guardband, which comes at the cost of accuracy loss in the CNN accelerator. We evaluate an effective frequency underscaling technique that prevents this accuracy loss, and find that it reduces the power-efficiency gain from 43% to 25%. Full Article
xi Fine-Grained Analysis of Cross-Linguistic Syntactic Divergences. (arXiv:2005.03436v1 [cs.CL]) By arxiv.org Published On :: The patterns in which the syntax of different languages converges and diverges are often used to inform work on cross-lingual transfer. Nevertheless, little empirical work has been done on quantifying the prevalence of different syntactic divergences across language pairs. We propose a framework for extracting divergence patterns for any language pair from a parallel corpus, building on Universal Dependencies. We show that our framework provides a detailed picture of cross-language divergences, generalizes previous approaches, and lends itself to full automation. We further present a novel dataset, a manually word-aligned subset of the Parallel UD corpus in five languages, and use it to perform a detailed corpus study. We demonstrate the usefulness of the resulting analysis by showing that it can help account for performance patterns of a cross-lingual parser. Full Article
xi Parametrized Universality Problems for One-Counter Nets. (arXiv:2005.03435v1 [cs.FL]) By arxiv.org Published On :: We study the language universality problem for One-Counter Nets, also known as 1-dimensional Vector Addition Systems with States (1-VASS), parameterized either with an initial counter value, or with an upper bound on the allowed counter value during runs. The language accepted by an OCN (defined by reaching a final control state) is monotone in both parameters. This yields two natural questions: 1) Does there exist an initial counter value that makes the language universal? 2) Does there exist a sufficiently high ceiling so that the bounded language is universal? Despite the fact that unparameterized universality is Ackermann-complete and that these problems seem to reduce to checking basic structural properties of the underlying automaton, we show that in fact both problems are undecidable. We also look into the complexities of the problems for several decidable subclasses, namely for unambiguous, and deterministic systems, and for those over a single-letter alphabet. Full Article
xi Dirichlet spectral-Galerkin approximation method for the simply supported vibrating plate eigenvalues. (arXiv:2005.03433v1 [math.NA]) By arxiv.org Published On :: In this paper, we analyze and implement the Dirichlet spectral-Galerkin method for approximating simply supported vibrating plate eigenvalues with variable coefficients. This is a Galerkin approximation that uses the approximation space that is the span of finitely many Dirichlet eigenfunctions for the Laplacian. Convergence and error analysis for this method is presented for two and three dimensions. Here we will assume that the domain has either a smooth or Lipschitz boundary with no reentrant corners. An important component of the error analysis is Weyl's law for the Dirichlet eigenvalues. Numerical examples for computing the simply supported vibrating plate eigenvalues for the unit disk and square are presented. In order to test the accuracy of the approximation, we compare the spectral-Galerkin method to the separation of variables for the unit disk. Whereas for the unit square we will numerically test the convergence rate for a variable coefficient problem. Full Article
xi The Perceptimatic English Benchmark for Speech Perception Models. (arXiv:2005.03418v1 [cs.CL]) By arxiv.org Published On :: We present the Perceptimatic English Benchmark, an open experimental benchmark for evaluating quantitative models of speech perception in English. The benchmark consists of ABX stimuli along with the responses of 91 American English-speaking listeners. The stimuli test discrimination of a large number of English and French phonemic contrasts. They are extracted directly from corpora of read speech, making them appropriate for evaluating statistical acoustic models (such as those used in automatic speech recognition) trained on typical speech data sets. We show that phone discrimination is correlated with several types of models, and give recommendations for researchers seeking easily calculated norms of acoustic distance on experimental stimuli. We show that DeepSpeech, a standard English speech recognizer, is more specialized on English phoneme discrimination than English listeners, and is poorly correlated with their behaviour, even though it yields a low error on the decision task given to humans. Full Article
xi Kunster -- AR Art Video Maker -- Real time video neural style transfer on mobile devices. (arXiv:2005.03415v1 [cs.CV]) By arxiv.org Published On :: Neural style transfer is a well-known branch of deep learning research, with many interesting works and two major drawbacks. Most of the works in the field are hard to use by non-expert users and substantial hardware resources are required. In this work, we present a solution to both of these problems. We have applied neural style transfer to real-time video (over 25 frames per second), which is capable of running on mobile devices. We also investigate the works on achieving temporal coherence and present the idea of fine-tuning, already trained models, to achieve stable video. What is more, we also analyze the impact of the common deep neural network architecture on the performance of mobile devices with regard to number of layers and filters present. In the experiment section we present the results of our work with respect to the iOS devices and discuss the problems present in current Android devices as well as future possibilities. At the end we present the qualitative results of stylization and quantitative results of performance tested on the iPhone 11 Pro and iPhone 6s. The presented work is incorporated in Kunster - AR Art Video Maker application available in the Apple's App Store. Full Article
xi NTIRE 2020 Challenge on Spectral Reconstruction from an RGB Image. (arXiv:2005.03412v1 [eess.IV]) By arxiv.org Published On :: This paper reviews the second challenge on spectral reconstruction from RGB images, i.e., the recovery of whole-scene hyperspectral (HS) information from a 3-channel RGB image. As in the previous challenge, two tracks were provided: (i) a "Clean" track where HS images are estimated from noise-free RGBs, the RGB images are themselves calculated numerically using the ground-truth HS images and supplied spectral sensitivity functions (ii) a "Real World" track, simulating capture by an uncalibrated and unknown camera, where the HS images are recovered from noisy JPEG-compressed RGB images. A new, larger-than-ever, natural hyperspectral image data set is presented, containing a total of 510 HS images. The Clean and Real World tracks had 103 and 78 registered participants respectively, with 14 teams competing in the final testing phase. A description of the proposed methods, alongside their challenge scores and an extensive evaluation of top performing methods is also provided. They gauge the state-of-the-art in spectral reconstruction from an RGB image. Full Article
xi Detection and Feeder Identification of the High Impedance Fault at Distribution Networks Based on Synchronous Waveform Distortions. (arXiv:2005.03411v1 [eess.SY]) By arxiv.org Published On :: Diagnosis of high impedance fault (HIF) is a challenge for nowadays distribution network protections. The fault current of a HIF is much lower than that of a normal load, and fault feature is significantly affected by fault scenarios. A detection and feeder identification algorithm for HIFs is proposed in this paper, based on the high-resolution and synchronous waveform data. In the algorithm, an interval slope is defined to describe the waveform distortions, which guarantees a uniform feature description under various HIF nonlinearities and noise interferences. For three typical types of network neutrals, i.e.,isolated neutral, resonant neutral, and low-resistor-earthed neutral, differences of the distorted components between the zero-sequence currents of healthy and faulty feeders are mathematically deduced, respectively. As a result, the proposed criterion, which is based on the distortion relationships between zero-sequence currents of feeders and the zero-sequence voltage at the substation, is theoretically supported. 28 HIFs grounded to various materials are tested in a 10kV distribution networkwith three neutral types, and are utilized to verify the effectiveness of the proposed algorithm. Full Article
xi AutoSOS: Towards Multi-UAV Systems Supporting Maritime Search and Rescue with Lightweight AI and Edge Computing. (arXiv:2005.03409v1 [cs.RO]) By arxiv.org Published On :: Rescue vessels are the main actors in maritime safety and rescue operations. At the same time, aerial drones bring a significant advantage into this scenario. This paper presents the research directions of the AutoSOS project, where we work in the development of an autonomous multi-robot search and rescue assistance platform capable of sensor fusion and object detection in embedded devices using novel lightweight AI models. The platform is meant to perform reconnaissance missions for initial assessment of the environment using novel adaptive deep learning algorithms that efficiently use the available sensors and computational resources on drones and rescue vessel. When drones find potential objects, they will send their sensor data to the vessel to verity the findings with increased accuracy. The actual rescue and treatment operation are left as the responsibility of the rescue personnel. The drones will autonomously reconfigure their spatial distribution to enable multi-hop communication, when a direct connection between a drone transmitting information and the vessel is unavailable. Full Article
xi Joint Prediction and Time Estimation of COVID-19 Developing Severe Symptoms using Chest CT Scan. (arXiv:2005.03405v1 [eess.IV]) By arxiv.org Published On :: With the rapidly worldwide spread of Coronavirus disease (COVID-19), it is of great importance to conduct early diagnosis of COVID-19 and predict the time that patients might convert to the severe stage, for designing effective treatment plan and reducing the clinicians' workloads. In this study, we propose a joint classification and regression method to determine whether the patient would develop severe symptoms in the later time, and if yes, predict the possible conversion time that the patient would spend to convert to the severe stage. To do this, the proposed method takes into account 1) the weight for each sample to reduce the outliers' influence and explore the problem of imbalance classification, and 2) the weight for each feature via a sparsity regularization term to remove the redundant features of high-dimensional data and learn the shared information across the classification task and the regression task. To our knowledge, this study is the first work to predict the disease progression and the conversion time, which could help clinicians to deal with the potential severe cases in time or even save the patients' lives. Experimental analysis was conducted on a real data set from two hospitals with 422 chest computed tomography (CT) scans, where 52 cases were converted to severe on average 5.64 days and 34 cases were severe at admission. Results show that our method achieves the best classification (e.g., 85.91% of accuracy) and regression (e.g., 0.462 of the correlation coefficient) performance, compared to all comparison methods. Moreover, our proposed method yields 76.97% of accuracy for predicting the severe cases, 0.524 of the correlation coefficient, and 0.55 days difference for the converted time. Full Article
xi A LiDAR-based real-time capable 3D Perception System for Automated Driving in Urban Domains. (arXiv:2005.03404v1 [cs.RO]) By arxiv.org Published On :: We present a LiDAR-based and real-time capable 3D perception system for automated driving in urban domains. The hierarchical system design is able to model stationary and movable parts of the environment simultaneously and under real-time conditions. Our approach extends the state of the art by innovative in-detail enhancements for perceiving road users and drivable corridors even in case of non-flat ground surfaces and overhanging or protruding elements. We describe a runtime-efficient pointcloud processing pipeline, consisting of adaptive ground surface estimation, 3D clustering and motion classification stages. Based on the pipeline's output, the stationary environment is represented in a multi-feature mapping and fusion approach. Movable elements are represented in an object tracking system capable of using multiple reference points to account for viewpoint changes. We further enhance the tracking system by explicit consideration of occlusion and ambiguity cases. Our system is evaluated using a subset of the TUBS Road User Dataset. We enhance common performance metrics by considering application-driven aspects of real-world traffic scenarios. The perception system shows impressive results and is able to cope with the addressed scenarios while still preserving real-time capability. Full Article
xi Datom: A Deformable modular robot for building self-reconfigurable programmable matter. (arXiv:2005.03402v1 [cs.RO]) By arxiv.org Published On :: Moving a module in a modular robot is a very complex and error-prone process. Unlike in swarm, in the modular robots we are targeting, the moving module must keep the connection to, at least, one other module. In order to miniaturize each module to few millimeters, we have proposed a design which is using electrostatic actuator. However, this movement is composed of several attachment, detachment creating the movement and each small step can fail causing a module to break the connection. The idea developed in this paper consists in creating a new kind of deformable module allowing a movement which keeps the connection between the moving and the fixed modules. We detail the geometry and the practical constraints during the conception of this new module. We then validate the possibility of movement for a module in an existing configuration. This implies the cooperation of some of the modules placed along the path and we show in simulation that it exists a motion process to reach every free positions of the surface for a given configuration. Full Article
xi Simultaneous topology and fastener layout optimization of assemblies considering joint failure. (arXiv:2005.03398v1 [cs.CE]) By arxiv.org Published On :: This paper provides a method for the simultaneous topology optimization of parts and their corresponding joint locations in an assembly. Therein, the joint locations are not discrete and predefined, but continuously movable. The underlying coupling equations allow for connecting dissimilar meshes and avoid the need for remeshing when joint locations change. The presented method models the force transfer at a joint location not only by using single spring elements but accounts for the size and type of the joints. When considering riveted or bolted joints, the local part geometry at the joint location consists of holes that are surrounded by material. For spot welds, the joint locations are filled with material and may be smaller than for bolts. The presented method incorporates these material and clearance zones into the simultaneously running topology optimization of the parts. Furthermore, failure of joints may be taken into account at the optimization stage, yielding assemblies connected in a fail-safe manner. Full Article
xi Scheduling with a processing time oracle. (arXiv:2005.03394v1 [cs.DS]) By arxiv.org Published On :: In this paper we study a single machine scheduling problem on a set of independent jobs whose execution time is not known, but guaranteed to be either short or long, for two given processing times. At every time step, the scheduler has the possibility either to test a job, by querying a processing time oracle, which reveals its processing time, and occupies one time unit on the schedule. Or the scheduler can execute a job, might it be previously tested or not. The objective value is the total completion time over all jobs, and is compared with the objective value of an optimal schedule, which does not need to test. The resulting competitive ratio measures the price of hidden processing time. Two models are studied in this paper. In the non-adaptive model, the algorithm needs to decide before hand which jobs to test, and which jobs to execute untested. However in the adaptive model, the algorithm can make these decisions adaptively to the outcomes of the job tests. In both models we provide optimal polynomial time two-phase algorithms, which consist of a first phase where jobs are tested, and a second phase where jobs are executed untested. Experiments give strong evidence that optimal algorithms have this structure. Proving this property is left as an open problem. Full Article
xi Does Multi-Encoder Help? A Case Study on Context-Aware Neural Machine Translation. (arXiv:2005.03393v1 [cs.CL]) By arxiv.org Published On :: In encoder-decoder neural models, multiple encoders are in general used to represent the contextual information in addition to the individual sentence. In this paper, we investigate multi-encoder approaches in documentlevel neural machine translation (NMT). Surprisingly, we find that the context encoder does not only encode the surrounding sentences but also behaves as a noise generator. This makes us rethink the real benefits of multi-encoder in context-aware translation - some of the improvements come from robust training. We compare several methods that introduce noise and/or well-tuned dropout setup into the training of these encoders. Experimental results show that noisy training plays an important role in multi-encoder-based NMT, especially when the training data is small. Also, we establish a new state-of-the-art on IWSLT Fr-En task by careful use of noise generation and dropout methods. Full Article
xi Semantic Signatures for Large-scale Visual Localization. (arXiv:2005.03388v1 [cs.CV]) By arxiv.org Published On :: Visual localization is a useful alternative to standard localization techniques. It works by utilizing cameras. In a typical scenario, features are extracted from captured images and compared with geo-referenced databases. Location information is then inferred from the matching results. Conventional schemes mainly use low-level visual features. These approaches offer good accuracy but suffer from scalability issues. In order to assist localization in large urban areas, this work explores a different path by utilizing high-level semantic information. It is found that object information in a street view can facilitate localization. A novel descriptor scheme called "semantic signature" is proposed to summarize this information. A semantic signature consists of type and angle information of visible objects at a spatial location. Several metrics and protocols are proposed for signature comparison and retrieval. They illustrate different trade-offs between accuracy and complexity. Extensive simulation results confirm the potential of the proposed scheme in large-scale applications. This paper is an extended version of a conference paper in CBMI'18. A more efficient retrieval protocol is presented with additional experiment results. Full Article
xi WSMN: An optimized multipurpose blind watermarking in Shearlet domain using MLP and NSGA-II. (arXiv:2005.03382v1 [cs.CR]) By arxiv.org Published On :: Digital watermarking is a remarkable issue in the field of information security to avoid the misuse of images in multimedia networks. Although access to unauthorized persons can be prevented through cryptography, it cannot be simultaneously used for copyright protection or content authentication with the preservation of image integrity. Hence, this paper presents an optimized multipurpose blind watermarking in Shearlet domain with the help of smart algorithms including MLP and NSGA-II. In this method, four copies of the robust copyright logo are embedded in the approximate coefficients of Shearlet by using an effective quantization technique. Furthermore, an embedded random sequence as a semi-fragile authentication mark is effectively extracted from details by the neural network. Due to performing an effective optimization algorithm for selecting optimum embedding thresholds, and also distinguishing the texture of blocks, the imperceptibility and robustness have been preserved. The experimental results reveal the superiority of the scheme with regard to the quality of watermarked images and robustness against hybrid attacks over other state-of-the-art schemes. The average PSNR and SSIM of the dual watermarked images are 38 dB and 0.95, respectively; Besides, it can effectively extract the copyright logo and locates forgery regions under severe attacks with satisfactory accuracy. Full Article
xi 2kenize: Tying Subword Sequences for Chinese Script Conversion. (arXiv:2005.03375v1 [cs.CL]) By arxiv.org Published On :: Simplified Chinese to Traditional Chinese character conversion is a common preprocessing step in Chinese NLP. Despite this, current approaches have poor performance because they do not take into account that a simplified Chinese character can correspond to multiple traditional characters. Here, we propose a model that can disambiguate between mappings and convert between the two scripts. The model is based on subword segmentation, two language models, as well as a method for mapping between subword sequences. We further construct benchmark datasets for topic classification and script conversion. Our proposed method outperforms previous Chinese Character conversion approaches by 6 points in accuracy. These results are further confirmed in a downstream application, where 2kenize is used to convert pretraining dataset for topic classification. An error analysis reveals that our method's particular strengths are in dealing with code-mixing and named entities. Full Article
xi Playing Minecraft with Behavioural Cloning. (arXiv:2005.03374v1 [cs.AI]) By arxiv.org Published On :: MineRL 2019 competition challenged participants to train sample-efficient agents to play Minecraft, by using a dataset of human gameplay and a limit number of steps the environment. We approached this task with behavioural cloning by predicting what actions human players would take, and reached fifth place in the final ranking. Despite being a simple algorithm, we observed the performance of such an approach can vary significantly, based on when the training is stopped. In this paper, we detail our submission to the competition, run further experiments to study how performance varied over training and study how different engineering decisions affected these results. Full Article
xi Accessibility in 360-degree video players. (arXiv:2005.03373v1 [cs.MM]) By arxiv.org Published On :: Any media experience must be fully inclusive and accessible to all users regardless of their ability. With the current trend towards immersive experiences, such as Virtual Reality (VR) and 360-degree video, it becomes key that these environments are adapted to be fully accessible. However, until recently the focus has been mostly on adapting the existing techniques to fit immersive displays, rather than considering new approaches for accessibility designed specifically for these increasingly relevant media experiences. This paper surveys a wide range of 360-degree video players and examines the features they include for dealing with accessibility, such as Subtitles, Audio Description, Sign Language, User Interfaces, and other interaction features, like voice control and support for multi-screen scenarios. These features have been chosen based on guidelines from standardization contributions, like in the World Wide Web Consortium (W3C) and the International Communication Union (ITU), and from research contributions for making 360-degree video consumption experiences accessible. The in-depth analysis has been part of a research effort towards the development of a fully inclusive and accessible 360-degree video player. The paper concludes by discussing how the newly developed player has gone above and beyond the existing solutions and guidelines, by providing accessibility features that meet the expectations for a widely used immersive medium, like 360-degree video. Full Article
xi Vid2Curve: Simultaneously Camera Motion Estimation and Thin Structure Reconstruction from an RGB Video. (arXiv:2005.03372v1 [cs.GR]) By arxiv.org Published On :: Thin structures, such as wire-frame sculptures, fences, cables, power lines, and tree branches, are common in the real world. It is extremely challenging to acquire their 3D digital models using traditional image-based or depth-based reconstruction methods because thin structures often lack distinct point features and have severe self-occlusion. We propose the first approach that simultaneously estimates camera motion and reconstructs the geometry of complex 3D thin structures in high quality from a color video captured by a handheld camera. Specifically, we present a new curve-based approach to estimate accurate camera poses by establishing correspondences between featureless thin objects in the foreground in consecutive video frames, without requiring visual texture in the background scene to lock on. Enabled by this effective curve-based camera pose estimation strategy, we develop an iterative optimization method with tailored measures on geometry, topology as well as self-occlusion handling for reconstructing 3D thin structures. Extensive validations on a variety of thin structures show that our method achieves accurate camera pose estimation and faithful reconstruction of 3D thin structures with complex shape and topology at a level that has not been attained by other existing reconstruction methods. Full Article
xi Energy-efficient topology to enhance the wireless sensor network lifetime using connectivity control. (arXiv:2005.03370v1 [cs.NI]) By arxiv.org Published On :: Wireless sensor networks have attracted much attention because of many applications in the fields of industry, military, medicine, agriculture, and education. In addition, the vast majority of researches has been done to expand its applications and improve its efficiency. However, there are still many challenges for increasing the efficiency in different parts of this network. One of the most important parts is to improve the network lifetime in the wireless sensor network. Since the sensor nodes are generally powered by batteries, the most important issue to consider in these types of networks is to reduce the power consumption of the nodes in such a way as to increase the network lifetime to an acceptable level. The contribution of this paper is using topology control, the threshold for the remaining energy in nodes, and two of the meta-algorithms include SA (Simulated annealing) and VNS (Variable Neighbourhood Search) to increase the energy remaining in the sensors. Moreover, using a low-cost spanning tree, an appropriate connectivity control among nodes is created in the network in order to increase the network lifetime. The results of simulations show that the proposed method improves the sensor lifetime and reduces the energy consumed. Full Article
xi Scoring Root Necrosis in Cassava Using Semantic Segmentation. (arXiv:2005.03367v1 [eess.IV]) By arxiv.org Published On :: Cassava a major food crop in many parts of Africa, has majorly been affected by Cassava Brown Streak Disease (CBSD). The disease affects tuberous roots and presents symptoms that include a yellow/brown, dry, corky necrosis within the starch-bearing tissues. Cassava breeders currently depend on visual inspection to score necrosis in roots based on a qualitative score which is quite subjective. In this paper we present an approach to automate root necrosis scoring using deep convolutional neural networks with semantic segmentation. Our experiments show that the UNet model performs this task with high accuracy achieving a mean Intersection over Union (IoU) of 0.90 on the test set. This method provides a means to use a quantitative measure for necrosis scoring on root cross-sections. This is done by segmentation and classifying the necrotized and non-necrotized pixels of cassava root cross-sections without any additional feature engineering. Full Article
xi Soft Interference Cancellation for Random Coding in Massive Gaussian Multiple-Access. (arXiv:2005.03364v1 [cs.IT]) By arxiv.org Published On :: We utilize recent results on the exact block error probability of Gaussian random codes in additive white Gaussian noise to analyze Gaussian random coding for massive multiple-access at finite message length. Soft iterative interference cancellation is found to closely approach the performance bounds recently found in [1]. The existence of two fundamentally different regimes in the trade-off between power and bandwidth efficiency reported in [2] is related to much older results in [3] on power optimization by linear programming. Furthermore, we tighten the achievability bounds of [1] in the low power regime and show that orthogonal constellations are very close to the theoretical limits for message lengths around 100 and above. Full Article
xi Probabilistic Hyperproperties of Markov Decision Processes. (arXiv:2005.03362v1 [cs.LO]) By arxiv.org Published On :: We study the specification and verification of hyperproperties for probabilistic systems represented as Markov decision processes (MDPs). Hyperproperties are system properties that describe the correctness of a system as a relation between multiple executions. Hyperproperties generalize trace properties and include information-flow security requirements, like noninterference, as well as requirements like symmetry, partial observation, robustness, and fault tolerance. We introduce the temporal logic PHL, which extends classic probabilistic logics with quantification over schedulers and traces. PHL can express a wide range of hyperproperties for probabilistic systems, including both classical applications, such as differential privacy, and novel applications in areas such as robotics and planning. While the model checking problem for PHL is in general undecidable, we provide methods both for proving and for refuting a class of probabilistic hyperproperties for MDPs. Full Article
xi JASS: Japanese-specific Sequence to Sequence Pre-training for Neural Machine Translation. (arXiv:2005.03361v1 [cs.CL]) By arxiv.org Published On :: Neural machine translation (NMT) needs large parallel corpora for state-of-the-art translation quality. Low-resource NMT is typically addressed by transfer learning which leverages large monolingual or parallel corpora for pre-training. Monolingual pre-training approaches such as MASS (MAsked Sequence to Sequence) are extremely effective in boosting NMT quality for languages with small parallel corpora. However, they do not account for linguistic information obtained using syntactic analyzers which is known to be invaluable for several Natural Language Processing (NLP) tasks. To this end, we propose JASS, Japanese-specific Sequence to Sequence, as a novel pre-training alternative to MASS for NMT involving Japanese as the source or target language. JASS is joint BMASS (Bunsetsu MASS) and BRSS (Bunsetsu Reordering Sequence to Sequence) pre-training which focuses on Japanese linguistic units called bunsetsus. In our experiments on ASPEC Japanese--English and News Commentary Japanese--Russian translation we show that JASS can give results that are competitive with if not better than those given by MASS. Furthermore, we show for the first time that joint MASS and JASS pre-training gives results that significantly surpass the individual methods indicating their complementary nature. We will release our code, pre-trained models and bunsetsu annotated data as resources for researchers to use in their own NLP tasks. Full Article
xi Self-Supervised Human Depth Estimation from Monocular Videos. (arXiv:2005.03358v1 [cs.CV]) By arxiv.org Published On :: Previous methods on estimating detailed human depth often require supervised training with `ground truth' depth data. This paper presents a self-supervised method that can be trained on YouTube videos without known depth, which makes training data collection simple and improves the generalization of the learned network. The self-supervised learning is achieved by minimizing a photo-consistency loss, which is evaluated between a video frame and its neighboring frames warped according to the estimated depth and the 3D non-rigid motion of the human body. To solve this non-rigid motion, we first estimate a rough SMPL model at each video frame and compute the non-rigid body motion accordingly, which enables self-supervised learning on estimating the shape details. Experiments demonstrate that our method enjoys better generalization and performs much better on data in the wild. Full Article
xi Estimating Blood Pressure from Photoplethysmogram Signal and Demographic Features using Machine Learning Techniques. (arXiv:2005.03357v1 [eess.SP]) By arxiv.org Published On :: Hypertension is a potentially unsafe health ailment, which can be indicated directly from the Blood pressure (BP). Hypertension always leads to other health complications. Continuous monitoring of BP is very important; however, cuff-based BP measurements are discrete and uncomfortable to the user. To address this need, a cuff-less, continuous and a non-invasive BP measurement system is proposed using Photoplethysmogram (PPG) signal and demographic features using machine learning (ML) algorithms. PPG signals were acquired from 219 subjects, which undergo pre-processing and feature extraction steps. Time, frequency and time-frequency domain features were extracted from the PPG and their derivative signals. Feature selection techniques were used to reduce the computational complexity and to decrease the chance of over-fitting the ML algorithms. The features were then used to train and evaluate ML algorithms. The best regression models were selected for Systolic BP (SBP) and Diastolic BP (DBP) estimation individually. Gaussian Process Regression (GPR) along with ReliefF feature selection algorithm outperforms other algorithms in estimating SBP and DBP with a root-mean-square error (RMSE) of 6.74 and 3.59 respectively. This ML model can be implemented in hardware systems to continuously monitor BP and avoid any critical health conditions due to sudden changes. Full Article
xi DramaQA: Character-Centered Video Story Understanding with Hierarchical QA. (arXiv:2005.03356v1 [cs.CL]) By arxiv.org Published On :: Despite recent progress on computer vision and natural language processing, developing video understanding intelligence is still hard to achieve due to the intrinsic difficulty of story in video. Moreover, there is not a theoretical metric for evaluating the degree of video understanding. In this paper, we propose a novel video question answering (Video QA) task, DramaQA, for a comprehensive understanding of the video story. The DramaQA focused on two perspectives: 1) hierarchical QAs as an evaluation metric based on the cognitive developmental stages of human intelligence. 2) character-centered video annotations to model local coherence of the story. Our dataset is built upon the TV drama "Another Miss Oh" and it contains 16,191 QA pairs from 23,928 various length video clips, with each QA pair belonging to one of four difficulty levels. We provide 217,308 annotated images with rich character-centered annotations, including visual bounding boxes, behaviors, and emotions of main characters, and coreference resolved scripts. Additionally, we provide analyses of the dataset as well as Dual Matching Multistream model which effectively learns character-centered representations of video to answer questions about the video. We are planning to release our dataset and model publicly for research purposes and expect that our work will provide a new perspective on video story understanding research. Full Article
xi Quantum correlation alignment for unsupervised domain adaptation. (arXiv:2005.03355v1 [quant-ph]) By arxiv.org Published On :: Correlation alignment (CORAL), a representative domain adaptation (DA) algorithm, decorrelates and aligns a labelled source domain dataset to an unlabelled target domain dataset to minimize the domain shift such that a classifier can be applied to predict the target domain labels. In this paper, we implement the CORAL on quantum devices by two different methods. One method utilizes quantum basic linear algebra subroutines (QBLAS) to implement the CORAL with exponential speedup in the number and dimension of the given data samples. The other method is achieved through a variational hybrid quantum-classical procedure. In addition, the numerical experiments of the CORAL with three different types of data sets, namely the synthetic data, the synthetic-Iris data, the handwritten digit data, are presented to evaluate the performance of our work. The simulation results prove that the variational quantum correlation alignment algorithm (VQCORAL) can achieve competitive performance compared with the classical CORAL. Full Article