io

Sunny Pointer: Designing a mouse pointer for people with peripheral vision loss. (arXiv:2005.03504v1 [cs.HC])

We present a new mouse cursor designed to facilitate the use of the mouse by people with peripheral vision loss. The pointer consists of a collection of converging straight lines covering the whole screen and following the position of the mouse cursor. We measured its positive effects with a group of participants with peripheral vision loss of different kinds and we found that it can reduce by a factor of 7 the time required to complete a targeting task using the mouse. Using eye tracking, we show that this system makes it possible to initiate the movement towards the target without having to precisely locate the mouse pointer. Using Fitts' Law, we compare these performances with those of full visual field users in order to understand the relation between the accuracy of the estimated mouse cursor position and the index of performance obtained with our tool.




io

Computing with bricks and mortar: Classification of waveforms with a doped concrete blocks. (arXiv:2005.03498v1 [cs.ET])

We present results showing the capability of concrete-based information processing substrate in the signal classification task in accordance with in materio computing paradigm. As the Reservoir Computing is a suitable model for describing embedded in materio computation, we propose that this type of presented basic construction unit can be used as a source for "reservoir of states" necessary for simple tuning of the readout layer. In that perspective, buildings constructed from computing concrete could function as a highly parallel information processor for smart architecture. We present an electrical characterization of the set of samples with different additive concentrations followed by a dynamical analysis of selected specimens showing fingerprints of memfractive properties. Moreover, on the basis of obtained parameters, classification of the signal waveform shapes can be performed in scenarios explicitly tuned for a given device terminal.




io

Text Recognition in the Wild: A Survey. (arXiv:2005.03492v1 [cs.CV])

The history of text can be traced back over thousands of years. Rich and precise semantic information carried by text is important in a wide range of vision-based application scenarios. Therefore, text recognition in natural scenes has been an active research field in computer vision and pattern recognition. In recent years, with the rise and development of deep learning, numerous methods have shown promising in terms of innovation, practicality, and efficiency. This paper aims to (1) summarize the fundamental problems and the state-of-the-art associated with scene text recognition; (2) introduce new insights and ideas; (3) provide a comprehensive review of publicly available resources; (4) point out directions for future work. In summary, this literature review attempts to present the entire picture of the field of scene text recognition. It provides a comprehensive reference for people entering this field, and could be helpful to inspire future research. Related resources are available at our Github repository: https://github.com/HCIILAB/Scene-Text-Recognition.




io

Algorithmic Averaging for Studying Periodic Orbits of Planar Differential Systems. (arXiv:2005.03487v1 [cs.SC])

One of the main open problems in the qualitative theory of real planar differential systems is the study of limit cycles. In this article, we present an algorithmic approach for detecting how many limit cycles can bifurcate from the periodic orbits of a given polynomial differential center when it is perturbed inside a class of polynomial differential systems via the averaging method. We propose four symbolic algorithms to implement the averaging method. The first algorithm is based on the change of polar coordinates that allows one to transform a considered differential system to the normal form of averaging. The second algorithm is used to derive the solutions of certain differential systems associated to the unperturbed term of the normal of averaging. The third algorithm exploits the partial Bell polynomials and allows one to compute the integral formula of the averaged functions at any order. The last algorithm is based on the aforementioned algorithms and determines the exact expressions of the averaged functions for the considered differential systems. The implementation of our algorithms is discussed and evaluated using several examples. The experimental results have extended the existing relevant results for certain classes of differential systems.




io

Anonymized GCN: A Novel Robust Graph Embedding Method via Hiding Node Position in Noise. (arXiv:2005.03482v1 [cs.LG])

Graph convolution network (GCN) have achieved state-of-the-art performance in the task of node prediction in the graph structure. However, with the gradual various of graph attack methods, there are lack of research on the robustness of GCN. At this paper, we will design a robust GCN method for node prediction tasks. Considering the graph structure contains two types of information: node information and connection information, and attackers usually modify the connection information to complete the interference with the prediction results of the node, we first proposed a method to hide the connection information in the generator, named Anonymized GCN (AN-GCN). By hiding the connection information in the graph structure in the generator through adversarial training, the accurate node prediction can be completed only by the node number rather than its specific position in the graph. Specifically, we first demonstrated the key to determine the embedding of a specific node: the row corresponding to the node of the eigenmatrix of the Laplace matrix, by target it as the output of the generator, we designed a method to hide the node number in the noise. Take the corresponding noise as input, we will obtain the connection structure of the node instead of directly obtaining. Then the encoder and decoder are spliced both in discriminator, so that after adversarial training, the generator and discriminator can cooperate to complete the encoding and decoding of the graph, then complete the node prediction. Finally, All node positions can generated by noise at the same time, that is to say, the generator will hides all the connection information of the graph structure. The evaluation shows that we only need to obtain the initial features and node numbers of the nodes to complete the node prediction, and the accuracy did not decrease, but increased by 0.0293.




io

Brain-like approaches to unsupervised learning of hidden representations -- a comparative study. (arXiv:2005.03476v1 [cs.NE])

Unsupervised learning of hidden representations has been one of the most vibrant research directions in machine learning in recent years. In this work we study the brain-like Bayesian Confidence Propagating Neural Network (BCPNN) model, recently extended to extract sparse distributed high-dimensional representations. The saliency and separability of the hidden representations when trained on MNIST dataset is studied using an external classifier, and compared with other unsupervised learning methods that include restricted Boltzmann machines and autoencoders.




io

Bundle Recommendation with Graph Convolutional Networks. (arXiv:2005.03475v1 [cs.IR])

Bundle recommendation aims to recommend a bundle of items for a user to consume as a whole. Existing solutions integrate user-item interaction modeling into bundle recommendation by sharing model parameters or learning in a multi-task manner, which cannot explicitly model the affiliation between items and bundles, and fail to explore the decision-making when a user chooses bundles. In this work, we propose a graph neural network model named BGCN (short for extit{ extBF{B}undle extBF{G}raph extBF{C}onvolutional extBF{N}etwork}) for bundle recommendation. BGCN unifies user-item interaction, user-bundle interaction and bundle-item affiliation into a heterogeneous graph. With item nodes as the bridge, graph convolutional propagation between user and bundle nodes makes the learned representations capture the item level semantics. Through training based on hard-negative sampler, the user's fine-grained preferences for similar bundles are further distinguished. Empirical results on two real-world datasets demonstrate the strong performance gains of BGCN, which outperforms the state-of-the-art baselines by 10.77\% to 23.18\%.




io

Ensuring Fairness under Prior Probability Shifts. (arXiv:2005.03474v1 [cs.LG])

In this paper, we study the problem of fair classification in the presence of prior probability shifts, where the training set distribution differs from the test set. This phenomenon can be observed in the yearly records of several real-world datasets, such as recidivism records and medical expenditure surveys. If unaccounted for, such shifts can cause the predictions of a classifier to become unfair towards specific population subgroups. While the fairness notion called Proportional Equality (PE) accounts for such shifts, a procedure to ensure PE-fairness was unknown.

In this work, we propose a method, called CAPE, which provides a comprehensive solution to the aforementioned problem. CAPE makes novel use of prevalence estimation techniques, sampling and an ensemble of classifiers to ensure fair predictions under prior probability shifts. We introduce a metric, called prevalence difference (PD), which CAPE attempts to minimize in order to ensure PE-fairness. We theoretically establish that this metric exhibits several desirable properties.

We evaluate the efficacy of CAPE via a thorough empirical evaluation on synthetic datasets. We also compare the performance of CAPE with several popular fair classifiers on real-world datasets like COMPAS (criminal risk assessment) and MEPS (medical expenditure panel survey). The results indicate that CAPE ensures PE-fair predictions, while performing well on other performance metrics.




io

Predictions and algorithmic statistics for infinite sequence. (arXiv:2005.03467v1 [cs.IT])

Consider the following prediction problem. Assume that there is a block box that produces bits according to some unknown computable distribution on the binary tree. We know first $n$ bits $x_1 x_2 ldots x_n$. We want to know the probability of the event that that the next bit is equal to $1$. Solomonoff suggested to use universal semimeasure $m$ for solving this task. He proved that for every computable distribution $P$ and for every $b in {0,1}$ the following holds: $$sum_{n=1}^{infty}sum_{x: l(x)=n} P(x) (P(b | x) - m(b | x))^2 < infty .$$ However, Solomonoff's method has a negative aspect: Hutter and Muchnik proved that there are an universal semimeasure $m$, computable distribution $P$ and a random (in Martin-L{"o}f sense) sequence $x_1 x_2ldots$ such that $lim_{n o infty} P(x_{n+1} | x_1ldots x_n) - m(x_{n+1} | x_1ldots x_n) rightarrow 0$. We suggest a new way for prediction. For every finite string $x$ we predict the new bit according to the best (in some sence) distribution for $x$. We prove the similar result as Solomonoff theorem for our way of prediction. Also we show that our method of prediction has no that negative aspect as Solomonoff's method.




io

High Performance Interference Suppression in Multi-User Massive MIMO Detector. (arXiv:2005.03466v1 [cs.OH])

In this paper, we propose a new nonlinear detector with improved interference suppression in Multi-User Multiple Input, Multiple Output (MU-MIMO) system. The proposed detector is a combination of the following parts: QR decomposition (QRD), low complexity users sorting before QRD, sorting-reduced (SR) K-best method and minimum mean square error (MMSE) pre-processing. Our method outperforms a linear interference rejection combining (IRC, i.e. MMSE naturally) method significantly in both strong interference and additive white noise scenarios with both ideal and real channel estimations. This result has wide application importance for scenarios with strong interference, i.e. when co-located users utilize the internet in stadium, highway, shopping center, etc. Simulation results are presented for the non-line of sight 3D-UMa model of 5G QuaDRiGa 2.0 channel for 16 highly correlated single-antenna users with QAM16 modulation in 64 antennas of Massive MIMO system. The performance was compared with MMSE and other detection approaches.




io

How Can CNNs Use Image Position for Segmentation?. (arXiv:2005.03463v1 [eess.IV])

Convolution is an equivariant operation, and image position does not affect its result. A recent study shows that the zero-padding employed in convolutional layers of CNNs provides position information to the CNNs. The study further claims that the position information enables accurate inference for several tasks, such as object recognition, segmentation, etc. However, there is a technical issue with the design of the experiments of the study, and thus the correctness of the claim is yet to be verified. Moreover, the absolute image position may not be essential for the segmentation of natural images, in which target objects will appear at any image position. In this study, we investigate how positional information is and can be utilized for segmentation tasks. Toward this end, we consider {em positional encoding} (PE) that adds channels embedding image position to the input images and compare PE with several padding methods. Considering the above nature of natural images, we choose medical image segmentation tasks, in which the absolute position appears to be relatively important, as the same organs (of different patients) are captured in similar sizes and positions. We draw a mixed conclusion from the experimental results; the positional encoding certainly works in some cases, but the absolute image position may not be so important for segmentation tasks as we think.




io

AIBench: Scenario-distilling AI Benchmarking. (arXiv:2005.03459v1 [cs.PF])

Real-world application scenarios like modern Internet services consist of diversity of AI and non-AI modules with very long and complex execution paths. Using component or micro AI benchmarks alone can lead to error-prone conclusions. This paper proposes a scenario-distilling AI benchmarking methodology. Instead of using real-world applications, we propose the permutations of essential AI and non-AI tasks as a scenario-distilling benchmark. We consider scenario-distilling benchmarks, component and micro benchmarks as three indispensable parts of a benchmark suite. Together with seventeen industry partners, we identify nine important real-world application scenarios. We design and implement a highly extensible, configurable, and flexible benchmark framework. On the basis of the framework, we propose the guideline for building scenario-distilling benchmarks, and present two Internet service AI ones. The preliminary evaluation shows the advantage of scenario-distilling AI benchmarking against using component or micro AI benchmarks alone. The specifications, source code, testbed, and results are publicly available from the web site url{this http URL}.




io

A combination of 'pooling' with a prediction model can reduce by 73% the number of COVID-19 (Corona-virus) tests. (arXiv:2005.03453v1 [cs.LG])

We show that combining a prediction model (based on neural networks), with a new method of test pooling (better than the original Dorfman method, and better than double-pooling) called 'Grid', we can reduce the number of Covid-19 tests by 73%.




io

Lifted Regression/Reconstruction Networks. (arXiv:2005.03452v1 [cs.LG])

In this work we propose lifted regression/reconstruction networks (LRRNs), which combine lifted neural networks with a guaranteed Lipschitz continuity property for the output layer. Lifted neural networks explicitly optimize an energy model to infer the unit activations and therefore---in contrast to standard feed-forward neural networks---allow bidirectional feedback between layers. So far lifted neural networks have been modelled around standard feed-forward architectures. We propose to take further advantage of the feedback property by letting the layers simultaneously perform regression and reconstruction. The resulting lifted network architecture allows to control the desired amount of Lipschitz continuity, which is an important feature to obtain adversarially robust regression and classification methods. We analyse and numerically demonstrate applications for unsupervised and supervised learning.




io

An Experimental Study of Reduced-Voltage Operation in Modern FPGAs for Neural Network Acceleration. (arXiv:2005.03451v1 [cs.LG])

We empirically evaluate an undervolting technique, i.e., underscaling the circuit supply voltage below the nominal level, to improve the power-efficiency of Convolutional Neural Network (CNN) accelerators mapped to Field Programmable Gate Arrays (FPGAs). Undervolting below a safe voltage level can lead to timing faults due to excessive circuit latency increase. We evaluate the reliability-power trade-off for such accelerators. Specifically, we experimentally study the reduced-voltage operation of multiple components of real FPGAs, characterize the corresponding reliability behavior of CNN accelerators, propose techniques to minimize the drawbacks of reduced-voltage operation, and combine undervolting with architectural CNN optimization techniques, i.e., quantization and pruning. We investigate the effect of environmental temperature on the reliability-power trade-off of such accelerators. We perform experiments on three identical samples of modern Xilinx ZCU102 FPGA platforms with five state-of-the-art image classification CNN benchmarks. This approach allows us to study the effects of our undervolting technique for both software and hardware variability. We achieve more than 3X power-efficiency (GOPs/W) gain via undervolting. 2.6X of this gain is the result of eliminating the voltage guardband region, i.e., the safe voltage region below the nominal level that is set by FPGA vendor to ensure correct functionality in worst-case environmental and circuit conditions. 43% of the power-efficiency gain is due to further undervolting below the guardband, which comes at the cost of accuracy loss in the CNN accelerator. We evaluate an effective frequency underscaling technique that prevents this accuracy loss, and find that it reduces the power-efficiency gain from 43% to 25%.




io

Dirichlet spectral-Galerkin approximation method for the simply supported vibrating plate eigenvalues. (arXiv:2005.03433v1 [math.NA])

In this paper, we analyze and implement the Dirichlet spectral-Galerkin method for approximating simply supported vibrating plate eigenvalues with variable coefficients. This is a Galerkin approximation that uses the approximation space that is the span of finitely many Dirichlet eigenfunctions for the Laplacian. Convergence and error analysis for this method is presented for two and three dimensions. Here we will assume that the domain has either a smooth or Lipschitz boundary with no reentrant corners. An important component of the error analysis is Weyl's law for the Dirichlet eigenvalues. Numerical examples for computing the simply supported vibrating plate eigenvalues for the unit disk and square are presented. In order to test the accuracy of the approximation, we compare the spectral-Galerkin method to the separation of variables for the unit disk. Whereas for the unit square we will numerically test the convergence rate for a variable coefficient problem.




io

The Perceptimatic English Benchmark for Speech Perception Models. (arXiv:2005.03418v1 [cs.CL])

We present the Perceptimatic English Benchmark, an open experimental benchmark for evaluating quantitative models of speech perception in English. The benchmark consists of ABX stimuli along with the responses of 91 American English-speaking listeners. The stimuli test discrimination of a large number of English and French phonemic contrasts. They are extracted directly from corpora of read speech, making them appropriate for evaluating statistical acoustic models (such as those used in automatic speech recognition) trained on typical speech data sets. We show that phone discrimination is correlated with several types of models, and give recommendations for researchers seeking easily calculated norms of acoustic distance on experimental stimuli. We show that DeepSpeech, a standard English speech recognizer, is more specialized on English phoneme discrimination than English listeners, and is poorly correlated with their behaviour, even though it yields a low error on the decision task given to humans.




io

NTIRE 2020 Challenge on Spectral Reconstruction from an RGB Image. (arXiv:2005.03412v1 [eess.IV])

This paper reviews the second challenge on spectral reconstruction from RGB images, i.e., the recovery of whole-scene hyperspectral (HS) information from a 3-channel RGB image. As in the previous challenge, two tracks were provided: (i) a "Clean" track where HS images are estimated from noise-free RGBs, the RGB images are themselves calculated numerically using the ground-truth HS images and supplied spectral sensitivity functions (ii) a "Real World" track, simulating capture by an uncalibrated and unknown camera, where the HS images are recovered from noisy JPEG-compressed RGB images. A new, larger-than-ever, natural hyperspectral image data set is presented, containing a total of 510 HS images. The Clean and Real World tracks had 103 and 78 registered participants respectively, with 14 teams competing in the final testing phase. A description of the proposed methods, alongside their challenge scores and an extensive evaluation of top performing methods is also provided. They gauge the state-of-the-art in spectral reconstruction from an RGB image.




io

Detection and Feeder Identification of the High Impedance Fault at Distribution Networks Based on Synchronous Waveform Distortions. (arXiv:2005.03411v1 [eess.SY])

Diagnosis of high impedance fault (HIF) is a challenge for nowadays distribution network protections. The fault current of a HIF is much lower than that of a normal load, and fault feature is significantly affected by fault scenarios. A detection and feeder identification algorithm for HIFs is proposed in this paper, based on the high-resolution and synchronous waveform data. In the algorithm, an interval slope is defined to describe the waveform distortions, which guarantees a uniform feature description under various HIF nonlinearities and noise interferences. For three typical types of network neutrals, i.e.,isolated neutral, resonant neutral, and low-resistor-earthed neutral, differences of the distorted components between the zero-sequence currents of healthy and faulty feeders are mathematically deduced, respectively. As a result, the proposed criterion, which is based on the distortion relationships between zero-sequence currents of feeders and the zero-sequence voltage at the substation, is theoretically supported. 28 HIFs grounded to various materials are tested in a 10kV distribution networkwith three neutral types, and are utilized to verify the effectiveness of the proposed algorithm.




io

Joint Prediction and Time Estimation of COVID-19 Developing Severe Symptoms using Chest CT Scan. (arXiv:2005.03405v1 [eess.IV])

With the rapidly worldwide spread of Coronavirus disease (COVID-19), it is of great importance to conduct early diagnosis of COVID-19 and predict the time that patients might convert to the severe stage, for designing effective treatment plan and reducing the clinicians' workloads. In this study, we propose a joint classification and regression method to determine whether the patient would develop severe symptoms in the later time, and if yes, predict the possible conversion time that the patient would spend to convert to the severe stage. To do this, the proposed method takes into account 1) the weight for each sample to reduce the outliers' influence and explore the problem of imbalance classification, and 2) the weight for each feature via a sparsity regularization term to remove the redundant features of high-dimensional data and learn the shared information across the classification task and the regression task. To our knowledge, this study is the first work to predict the disease progression and the conversion time, which could help clinicians to deal with the potential severe cases in time or even save the patients' lives. Experimental analysis was conducted on a real data set from two hospitals with 422 chest computed tomography (CT) scans, where 52 cases were converted to severe on average 5.64 days and 34 cases were severe at admission. Results show that our method achieves the best classification (e.g., 85.91% of accuracy) and regression (e.g., 0.462 of the correlation coefficient) performance, compared to all comparison methods. Moreover, our proposed method yields 76.97% of accuracy for predicting the severe cases, 0.524 of the correlation coefficient, and 0.55 days difference for the converted time.




io

A LiDAR-based real-time capable 3D Perception System for Automated Driving in Urban Domains. (arXiv:2005.03404v1 [cs.RO])

We present a LiDAR-based and real-time capable 3D perception system for automated driving in urban domains. The hierarchical system design is able to model stationary and movable parts of the environment simultaneously and under real-time conditions. Our approach extends the state of the art by innovative in-detail enhancements for perceiving road users and drivable corridors even in case of non-flat ground surfaces and overhanging or protruding elements. We describe a runtime-efficient pointcloud processing pipeline, consisting of adaptive ground surface estimation, 3D clustering and motion classification stages. Based on the pipeline's output, the stationary environment is represented in a multi-feature mapping and fusion approach. Movable elements are represented in an object tracking system capable of using multiple reference points to account for viewpoint changes. We further enhance the tracking system by explicit consideration of occlusion and ambiguity cases. Our system is evaluated using a subset of the TUBS Road User Dataset. We enhance common performance metrics by considering application-driven aspects of real-world traffic scenarios. The perception system shows impressive results and is able to cope with the addressed scenarios while still preserving real-time capability.




io

Simultaneous topology and fastener layout optimization of assemblies considering joint failure. (arXiv:2005.03398v1 [cs.CE])

This paper provides a method for the simultaneous topology optimization of parts and their corresponding joint locations in an assembly. Therein, the joint locations are not discrete and predefined, but continuously movable. The underlying coupling equations allow for connecting dissimilar meshes and avoid the need for remeshing when joint locations change. The presented method models the force transfer at a joint location not only by using single spring elements but accounts for the size and type of the joints. When considering riveted or bolted joints, the local part geometry at the joint location consists of holes that are surrounded by material. For spot welds, the joint locations are filled with material and may be smaller than for bolts. The presented method incorporates these material and clearance zones into the simultaneously running topology optimization of the parts. Furthermore, failure of joints may be taken into account at the optimization stage, yielding assemblies connected in a fail-safe manner.




io

Does Multi-Encoder Help? A Case Study on Context-Aware Neural Machine Translation. (arXiv:2005.03393v1 [cs.CL])

In encoder-decoder neural models, multiple encoders are in general used to represent the contextual information in addition to the individual sentence. In this paper, we investigate multi-encoder approaches in documentlevel neural machine translation (NMT). Surprisingly, we find that the context encoder does not only encode the surrounding sentences but also behaves as a noise generator. This makes us rethink the real benefits of multi-encoder in context-aware translation - some of the improvements come from robust training. We compare several methods that introduce noise and/or well-tuned dropout setup into the training of these encoders. Experimental results show that noisy training plays an important role in multi-encoder-based NMT, especially when the training data is small. Also, we establish a new state-of-the-art on IWSLT Fr-En task by careful use of noise generation and dropout methods.




io

Semantic Signatures for Large-scale Visual Localization. (arXiv:2005.03388v1 [cs.CV])

Visual localization is a useful alternative to standard localization techniques. It works by utilizing cameras. In a typical scenario, features are extracted from captured images and compared with geo-referenced databases. Location information is then inferred from the matching results. Conventional schemes mainly use low-level visual features. These approaches offer good accuracy but suffer from scalability issues. In order to assist localization in large urban areas, this work explores a different path by utilizing high-level semantic information. It is found that object information in a street view can facilitate localization. A novel descriptor scheme called "semantic signature" is proposed to summarize this information. A semantic signature consists of type and angle information of visible objects at a spatial location. Several metrics and protocols are proposed for signature comparison and retrieval. They illustrate different trade-offs between accuracy and complexity. Extensive simulation results confirm the potential of the proposed scheme in large-scale applications. This paper is an extended version of a conference paper in CBMI'18. A more efficient retrieval protocol is presented with additional experiment results.




io

2kenize: Tying Subword Sequences for Chinese Script Conversion. (arXiv:2005.03375v1 [cs.CL])

Simplified Chinese to Traditional Chinese character conversion is a common preprocessing step in Chinese NLP. Despite this, current approaches have poor performance because they do not take into account that a simplified Chinese character can correspond to multiple traditional characters. Here, we propose a model that can disambiguate between mappings and convert between the two scripts. The model is based on subword segmentation, two language models, as well as a method for mapping between subword sequences. We further construct benchmark datasets for topic classification and script conversion. Our proposed method outperforms previous Chinese Character conversion approaches by 6 points in accuracy. These results are further confirmed in a downstream application, where 2kenize is used to convert pretraining dataset for topic classification. An error analysis reveals that our method's particular strengths are in dealing with code-mixing and named entities.




io

Playing Minecraft with Behavioural Cloning. (arXiv:2005.03374v1 [cs.AI])

MineRL 2019 competition challenged participants to train sample-efficient agents to play Minecraft, by using a dataset of human gameplay and a limit number of steps the environment. We approached this task with behavioural cloning by predicting what actions human players would take, and reached fifth place in the final ranking. Despite being a simple algorithm, we observed the performance of such an approach can vary significantly, based on when the training is stopped. In this paper, we detail our submission to the competition, run further experiments to study how performance varied over training and study how different engineering decisions affected these results.




io

Vid2Curve: Simultaneously Camera Motion Estimation and Thin Structure Reconstruction from an RGB Video. (arXiv:2005.03372v1 [cs.GR])

Thin structures, such as wire-frame sculptures, fences, cables, power lines, and tree branches, are common in the real world.

It is extremely challenging to acquire their 3D digital models using traditional image-based or depth-based reconstruction methods because thin structures often lack distinct point features and have severe self-occlusion.

We propose the first approach that simultaneously estimates camera motion and reconstructs the geometry of complex 3D thin structures in high quality from a color video captured by a handheld camera.

Specifically, we present a new curve-based approach to estimate accurate camera poses by establishing correspondences between featureless thin objects in the foreground in consecutive video frames, without requiring visual texture in the background scene to lock on.

Enabled by this effective curve-based camera pose estimation strategy, we develop an iterative optimization method with tailored measures on geometry, topology as well as self-occlusion handling for reconstructing 3D thin structures.

Extensive validations on a variety of thin structures show that our method achieves accurate camera pose estimation and faithful reconstruction of 3D thin structures with complex shape and topology at a level that has not been attained by other existing reconstruction methods.




io

Scoring Root Necrosis in Cassava Using Semantic Segmentation. (arXiv:2005.03367v1 [eess.IV])

Cassava a major food crop in many parts of Africa, has majorly been affected by Cassava Brown Streak Disease (CBSD). The disease affects tuberous roots and presents symptoms that include a yellow/brown, dry, corky necrosis within the starch-bearing tissues. Cassava breeders currently depend on visual inspection to score necrosis in roots based on a qualitative score which is quite subjective. In this paper we present an approach to automate root necrosis scoring using deep convolutional neural networks with semantic segmentation. Our experiments show that the UNet model performs this task with high accuracy achieving a mean Intersection over Union (IoU) of 0.90 on the test set. This method provides a means to use a quantitative measure for necrosis scoring on root cross-sections. This is done by segmentation and classifying the necrotized and non-necrotized pixels of cassava root cross-sections without any additional feature engineering.




io

Soft Interference Cancellation for Random Coding in Massive Gaussian Multiple-Access. (arXiv:2005.03364v1 [cs.IT])

We utilize recent results on the exact block error probability of Gaussian random codes in additive white Gaussian noise to analyze Gaussian random coding for massive multiple-access at finite message length. Soft iterative interference cancellation is found to closely approach the performance bounds recently found in [1]. The existence of two fundamentally different regimes in the trade-off between power and bandwidth efficiency reported in [2] is related to much older results in [3] on power optimization by linear programming. Furthermore, we tighten the achievability bounds of [1] in the low power regime and show that orthogonal constellations are very close to the theoretical limits for message lengths around 100 and above.




io

Probabilistic Hyperproperties of Markov Decision Processes. (arXiv:2005.03362v1 [cs.LO])

We study the specification and verification of hyperproperties for probabilistic systems represented as Markov decision processes (MDPs). Hyperproperties are system properties that describe the correctness of a system as a relation between multiple executions. Hyperproperties generalize trace properties and include information-flow security requirements, like noninterference, as well as requirements like symmetry, partial observation, robustness, and fault tolerance. We introduce the temporal logic PHL, which extends classic probabilistic logics with quantification over schedulers and traces. PHL can express a wide range of hyperproperties for probabilistic systems, including both classical applications, such as differential privacy, and novel applications in areas such as robotics and planning. While the model checking problem for PHL is in general undecidable, we provide methods both for proving and for refuting a class of probabilistic hyperproperties for MDPs.




io

JASS: Japanese-specific Sequence to Sequence Pre-training for Neural Machine Translation. (arXiv:2005.03361v1 [cs.CL])

Neural machine translation (NMT) needs large parallel corpora for state-of-the-art translation quality. Low-resource NMT is typically addressed by transfer learning which leverages large monolingual or parallel corpora for pre-training. Monolingual pre-training approaches such as MASS (MAsked Sequence to Sequence) are extremely effective in boosting NMT quality for languages with small parallel corpora. However, they do not account for linguistic information obtained using syntactic analyzers which is known to be invaluable for several Natural Language Processing (NLP) tasks. To this end, we propose JASS, Japanese-specific Sequence to Sequence, as a novel pre-training alternative to MASS for NMT involving Japanese as the source or target language. JASS is joint BMASS (Bunsetsu MASS) and BRSS (Bunsetsu Reordering Sequence to Sequence) pre-training which focuses on Japanese linguistic units called bunsetsus. In our experiments on ASPEC Japanese--English and News Commentary Japanese--Russian translation we show that JASS can give results that are competitive with if not better than those given by MASS. Furthermore, we show for the first time that joint MASS and JASS pre-training gives results that significantly surpass the individual methods indicating their complementary nature. We will release our code, pre-trained models and bunsetsu annotated data as resources for researchers to use in their own NLP tasks.




io

Self-Supervised Human Depth Estimation from Monocular Videos. (arXiv:2005.03358v1 [cs.CV])

Previous methods on estimating detailed human depth often require supervised training with `ground truth' depth data. This paper presents a self-supervised method that can be trained on YouTube videos without known depth, which makes training data collection simple and improves the generalization of the learned network. The self-supervised learning is achieved by minimizing a photo-consistency loss, which is evaluated between a video frame and its neighboring frames warped according to the estimated depth and the 3D non-rigid motion of the human body. To solve this non-rigid motion, we first estimate a rough SMPL model at each video frame and compute the non-rigid body motion accordingly, which enables self-supervised learning on estimating the shape details. Experiments demonstrate that our method enjoys better generalization and performs much better on data in the wild.




io

Quantum correlation alignment for unsupervised domain adaptation. (arXiv:2005.03355v1 [quant-ph])

Correlation alignment (CORAL), a representative domain adaptation (DA) algorithm, decorrelates and aligns a labelled source domain dataset to an unlabelled target domain dataset to minimize the domain shift such that a classifier can be applied to predict the target domain labels. In this paper, we implement the CORAL on quantum devices by two different methods. One method utilizes quantum basic linear algebra subroutines (QBLAS) to implement the CORAL with exponential speedup in the number and dimension of the given data samples. The other method is achieved through a variational hybrid quantum-classical procedure. In addition, the numerical experiments of the CORAL with three different types of data sets, namely the synthetic data, the synthetic-Iris data, the handwritten digit data, are presented to evaluate the performance of our work. The simulation results prove that the variational quantum correlation alignment algorithm (VQCORAL) can achieve competitive performance compared with the classical CORAL.




io

Error estimates for the Cahn--Hilliard equation with dynamic boundary conditions. (arXiv:2005.03349v1 [math.NA])

A proof of convergence is given for bulk--surface finite element semi-discretisation of the Cahn--Hilliard equation with Cahn--Hilliard-type dynamic boundary conditions in a smooth domain. The semi-discretisation is studied in the weak formulation as a second order system. Optimal-order uniform-in-time error estimates are shown in the $L^2$ and $H^1$ norms. The error estimates are based on a consistency and stability analysis. The proof of stability is performed in an abstract framework, based on energy estimates exploiting the anti-symmetric structure of the second order system. Numerical experiments illustrate the theoretical results.




io

Regression Forest-Based Atlas Localization and Direction Specific Atlas Generation for Pancreas Segmentation. (arXiv:2005.03345v1 [cs.CV])

This paper proposes a fully automated atlas-based pancreas segmentation method from CT volumes utilizing atlas localization by regression forest and atlas generation using blood vessel information. Previous probabilistic atlas-based pancreas segmentation methods cannot deal with spatial variations that are commonly found in the pancreas well. Also, shape variations are not represented by an averaged atlas. We propose a fully automated pancreas segmentation method that deals with two types of variations mentioned above. The position and size of the pancreas is estimated using a regression forest technique. After localization, a patient-specific probabilistic atlas is generated based on a new image similarity that reflects the blood vessel position and direction information around the pancreas. We segment it using the EM algorithm with the atlas as prior followed by the graph-cut. In evaluation results using 147 CT volumes, the Jaccard index and the Dice overlap of the proposed method were 62.1% and 75.1%, respectively. Although we automated all of the segmentation processes, segmentation results were superior to the other state-of-the-art methods in the Dice overlap.




io

Arranging Test Tubes in Racks Using Combined Task and Motion Planning. (arXiv:2005.03342v1 [cs.RO])

The paper develops a robotic manipulation system to treat the pressing needs for handling a large number of test tubes in clinical examination and replace or reduce human labor. It presents the technical details of the system, which separates and arranges test tubes in racks with the help of 3D vision and artificial intelligence (AI) reasoning/planning. The developed system only requires a person to put a rack with mixed and non-arranged tubes in front of a robot. The robot autonomously performs recognition, reasoning, planning, manipulation, etc., and returns a rack with separated and arranged tubes. The system is simple-to-use, and there are no requests for expert knowledge in robotics. We expect such a system to play an important role in helping managing public health and hope similar systems could be extended to other clinical manipulation like handling mixers and pipettes in the future.




io

Scene Text Image Super-Resolution in the Wild. (arXiv:2005.03341v1 [cs.CV])

Low-resolution text images are often seen in natural scenes such as documents captured by mobile phones. Recognizing low-resolution text images is challenging because they lose detailed content information, leading to poor recognition accuracy. An intuitive solution is to introduce super-resolution (SR) techniques as pre-processing. However, previous single image super-resolution (SISR) methods are trained on synthetic low-resolution images (e.g.Bicubic down-sampling), which is simple and not suitable for real low-resolution text recognition. To this end, we pro-pose a real scene text SR dataset, termed TextZoom. It contains paired real low-resolution and high-resolution images which are captured by cameras with different focal length in the wild. It is more authentic and challenging than synthetic data, as shown in Fig. 1. We argue improv-ing the recognition accuracy is the ultimate goal for Scene Text SR. In this purpose, a new Text Super-Resolution Network termed TSRN, with three novel modules is developed. (1) A sequential residual block is proposed to extract the sequential information of the text images. (2) A boundary-aware loss is designed to sharpen the character boundaries. (3) A central alignment module is proposed to relieve the misalignment problem in TextZoom. Extensive experiments on TextZoom demonstrate that our TSRN largely improves the recognition accuracy by over 13%of CRNN, and by nearly 9.0% of ASTER and MORAN compared to synthetic SR data. Furthermore, our TSRN clearly outperforms 7 state-of-the-art SR methods in boosting the recognition accuracy of LR images in TextZoom. For example, it outperforms LapSRN by over 5% and 8%on the recognition accuracy of ASTER and CRNN. Our results suggest that low-resolution text recognition in the wild is far from being solved, thus more research effort is needed.




io

Wavelet Integrated CNNs for Noise-Robust Image Classification. (arXiv:2005.03337v1 [cs.CV])

Convolutional Neural Networks (CNNs) are generally prone to noise interruptions, i.e., small image noise can cause drastic changes in the output. To suppress the noise effect to the final predication, we enhance CNNs by replacing max-pooling, strided-convolution, and average-pooling with Discrete Wavelet Transform (DWT). We present general DWT and Inverse DWT (IDWT) layers applicable to various wavelets like Haar, Daubechies, and Cohen, etc., and design wavelet integrated CNNs (WaveCNets) using these layers for image classification. In WaveCNets, feature maps are decomposed into the low-frequency and high-frequency components during the down-sampling. The low-frequency component stores main information including the basic object structures, which is transmitted into the subsequent layers to extract robust high-level features. The high-frequency components, containing most of the data noise, are dropped during inference to improve the noise-robustness of the WaveCNets. Our experimental results on ImageNet and ImageNet-C (the noisy version of ImageNet) show that WaveCNets, the wavelet integrated versions of VGG, ResNets, and DenseNet, achieve higher accuracy and better noise-robustness than their vanilla versions.




io

Causal Paths in Temporal Networks of Face-to-Face Human Interactions. (arXiv:2005.03333v1 [cs.SI])

In a temporal network causal paths are characterized by the fact that links from a source to a target must respect the chronological order. In this article we study the causal paths structure in temporal networks of human face to face interactions in different social contexts. In a static network paths are transitive i.e. the existence of a link from $a$ to $b$ and from $b$ to $c$ implies the existence of a path from $a$ to $c$ via $b$. In a temporal network the chronological constraint introduces time correlations that affects transitivity. A probabilistic model based on higher order Markov chains shows that correlations that can invalidate transitivity are present only when the time gap between consecutive events is larger than the average value and are negligible below such a value. The comparison between the densities of the temporal and static accessibility matrices shows that the static representation can be used with good approximation. Moreover, we quantify the extent of the causally connected region of the networks over time.




io

Crop Aggregating for short utterances speaker verification using raw waveforms. (arXiv:2005.03329v1 [eess.AS])

Most studies on speaker verification systems focus on long-duration utterances, which are composed of sufficient phonetic information. However, the performances of these systems are known to degrade when short-duration utterances are inputted due to the lack of phonetic information as compared to the long utterances. In this paper, we propose a method that compensates for the performance degradation of speaker verification for short utterances, referred to as "crop aggregating". The proposed method adopts an ensemble-based design to improve the stability and accuracy of speaker verification systems. The proposed method segments an input utterance into several short utterances and then aggregates the segment embeddings extracted from the segmented inputs to compose a speaker embedding. Then, this method simultaneously trains the segment embeddings and the aggregated speaker embedding. In addition, we also modified the teacher-student learning method for the proposed method. Experimental results on different input duration using the VoxCeleb1 test set demonstrate that the proposed technique improves speaker verification performance by about 45.37% relatively compared to the baseline system with 1-second test utterance condition.




io

Bitvector-aware Query Optimization for Decision Support Queries (extended version). (arXiv:2005.03328v1 [cs.DB])

Bitvector filtering is an important query processing technique that can significantly reduce the cost of execution, especially for complex decision support queries with multiple joins. Despite its wide application, however, its implication to query optimization is not well understood.

In this work, we study how bitvector filters impact query optimization. We show that incorporating bitvector filters into query optimization straightforwardly can increase the plan space complexity by an exponential factor in the number of relations in the query. We analyze the plans with bitvector filters for star and snowflake queries in the plan space of right deep trees without cross products. Surprisingly, with some simplifying assumptions, we prove that, the plan of the minimal cost with bitvector filters can be found from a linear number of plans in the number of relations in the query. This greatly reduces the plan space complexity for such queries from exponential to linear.

Motivated by our analysis, we propose an algorithm that accounts for the impact of bitvector filters in query optimization. Our algorithm optimizes the join order for an arbitrary decision support query by choosing from a linear number of candidate plans in the number of relations in the query. We implement our algorithm in Microsoft SQL Server as a transformation rule. Our evaluation on both industry standard benchmarks and customer workload shows that, compared with the original Microsoft SQL Server, our technique reduces the total CPU execution time by 22%-64% for the workloads, with up to two orders of magnitude reduction in CPU execution time for individual queries.




io

Global Distribution of Google Scholar Citations: A Size-independent Institution-based Analysis. (arXiv:2005.03324v1 [cs.DL])

Most currently available schemes for performance based ranking of Universities or Research organizations, such as, Quacarelli Symonds (QS), Times Higher Education (THE), Shanghai University based All Research of World Universities (ARWU) use a variety of criteria that include productivity, citations, awards, reputation, etc., while Leiden and Scimago use only bibliometric indicators. The research performance evaluation in the aforesaid cases is based on bibliometric data from Web of Science or Scopus, which are commercially available priced databases. The coverage includes peer reviewed journals and conference proceedings. Google Scholar (GS) on the other hand, provides a free and open alternative to obtaining citations of papers available on the net, (though it is not clear exactly which journals are covered.) Citations are collected automatically from the net and also added to self created individual author profiles under Google Scholar Citations (GSC). This data was used by Webometrics Lab, Spain to create a ranked list of 4000+ institutions in 2016, based on citations from only the top 10 individual GSC profiles in each organization. (GSC excludes the top paper for reasons explained in the text; the simple selection procedure makes the ranked list size-independent as claimed by the Cybermetrics Lab). Using this data (Transparent Ranking TR, 2016), we find the regional and country wise distribution of GS-TR Citations. The size independent ranked list is subdivided into deciles of 400 institutions each and the number of institutions and citations of each country obtained for each decile. We test for correlation between institutional ranks between GS TR and the other ranking schemes for the top 20 institutions.




io

Database Traffic Interception for Graybox Detection of Stored and Context-Sensitive XSS. (arXiv:2005.03322v1 [cs.CR])

XSS is a security vulnerability that permits injecting malicious code into the client side of a web application. In the simplest situations, XSS vulnerabilities arise when a web application includes the user input in the web output without due sanitization. Such simple XSS vulnerabilities can be detected fairly reliably with blackbox scanners, which inject malicious payload into sensitive parts of HTTP requests and look for the reflected values in the web output.

Contemporary blackbox scanners are not effective against stored XSS vulnerabilities, where the malicious payload in an HTTP response originates from the database storage of the web application, rather than from the associated HTTP request. Similarly, many blackbox scanners do not systematically handle context-sensitive XSS vulnerabilities, where the user input is included in the web output after a transformation that prevents the scanner from recognizing the original value, but does not sanitize the value sufficiently. Among the combination of two basic data sources (stored vs reflected) and two basic vulnerability patterns (context sensitive vs not so), only one is therefore tested systematically by state-of-the-art blackbox scanners.

Our work focuses on systematic coverage of the three remaining combinations. We present a graybox mechanism that extends a general purpose database to cooperate with our XSS scanner, reporting and injecting the test inputs at the boundary between the database and the web application. Furthermore, we design a mechanism for identifying the injected inputs in the web output even after encoding by the web application, and check whether the encoding sanitizes the injected inputs correctly in the respective browser context. We evaluate our approach on eight mature and technologically diverse web applications, discovering previously unknown and exploitable XSS flaws in each of those applications.




io

Specification and Automated Analysis of Inter-Parameter Dependencies in Web APIs. (arXiv:2005.03320v1 [cs.SE])

Web services often impose inter-parameter dependencies that restrict the way in which two or more input parameters can be combined to form valid calls to the service. Unfortunately, current specification languages for web services like the OpenAPI Specification (OAS) provide no support for the formal description of such dependencies, which makes it hardly possible to automatically discover and interact with services without human intervention. In this article, we present an approach for the specification and automated analysis of inter-parameter dependencies in web APIs. We first present a domain-specific language, called Inter-parameter Dependency Language (IDL), for the specification of dependencies among input parameters in web services. Then, we propose a mapping to translate an IDL document into a constraint satisfaction problem (CSP), enabling the automated analysis of IDL specifications using standard CSP-based reasoning operations. Specifically, we present a catalogue of nine analysis operations on IDL documents allowing to compute, for example, whether a given request satisfies all the dependencies of the service. Finally, we present a tool suite including an editor, a parser, an OAS extension, a constraint programming-aided library, and a test suite supporting IDL specifications and their analyses. Together, these contributions pave the way for a new range of specification-driven applications in areas such as code generation and testing.




io

A Review of Computer Vision Methods in Network Security. (arXiv:2005.03318v1 [cs.NI])

Network security has become an area of significant importance more than ever as highlighted by the eye-opening numbers of data breaches, attacks on critical infrastructure, and malware/ransomware/cryptojacker attacks that are reported almost every day. Increasingly, we are relying on networked infrastructure and with the advent of IoT, billions of devices will be connected to the internet, providing attackers with more opportunities to exploit. Traditional machine learning methods have been frequently used in the context of network security. However, such methods are more based on statistical features extracted from sources such as binaries, emails, and packet flows.

On the other hand, recent years witnessed a phenomenal growth in computer vision mainly driven by the advances in the area of convolutional neural networks. At a glance, it is not trivial to see how computer vision methods are related to network security. Nonetheless, there is a significant amount of work that highlighted how methods from computer vision can be applied in network security for detecting attacks or building security solutions. In this paper, we provide a comprehensive survey of such work under three topics; i) phishing attempt detection, ii) malware detection, and iii) traffic anomaly detection. Next, we review a set of such commercial products for which public information is available and explore how computer vision methods are effectively used in those products. Finally, we discuss existing research gaps and future research directions, especially focusing on how network security research community and the industry can leverage the exponential growth of computer vision methods to build much secure networked systems.




io

Boosting Cloud Data Analytics using Multi-Objective Optimization. (arXiv:2005.03314v1 [cs.DB])

Data analytics in the cloud has become an integral part of enterprise businesses. Big data analytics systems, however, still lack the ability to take user performance goals and budgetary constraints for a task, collectively referred to as task objectives, and automatically configure an analytic job to achieve these objectives. This paper presents a data analytics optimizer that can automatically determine a cluster configuration with a suitable number of cores as well as other system parameters that best meet the task objectives. At a core of our work is a principled multi-objective optimization (MOO) approach that computes a Pareto optimal set of job configurations to reveal tradeoffs between different user objectives, recommends a new job configuration that best explores such tradeoffs, and employs novel optimizations to enable such recommendations within a few seconds. We present efficient incremental algorithms based on the notion of a Progressive Frontier for realizing our MOO approach and implement them into a Spark-based prototype. Detailed experiments using benchmark workloads show that our MOO techniques provide a 2-50x speedup over existing MOO methods, while offering good coverage of the Pareto frontier. When compared to Ottertune, a state-of-the-art performance tuning system, our approach recommends configurations that yield 26\%-49\% reduction of running time of the TPCx-BB benchmark while adapting to different application preferences on multiple objectives.




io

Nakdan: Professional Hebrew Diacritizer. (arXiv:2005.03312v1 [cs.CL])

We present a system for automatic diacritization of Hebrew text. The system combines modern neural models with carefully curated declarative linguistic knowledge and comprehensive manually constructed tables and dictionaries. Besides providing state of the art diacritization accuracy, the system also supports an interface for manual editing and correction of the automatic output, and has several features which make it particularly useful for preparation of scientific editions of Hebrew texts. The system supports Modern Hebrew, Rabbinic Hebrew and Poetic Hebrew. The system is freely accessible for all use at this http URL




io

Interval type-2 fuzzy logic system based similarity evaluation for image steganography. (arXiv:2005.03310v1 [cs.MM])

Similarity measure, also called information measure, is a concept used to distinguish different objects. It has been studied from different contexts by employing mathematical, psychological, and fuzzy approaches. Image steganography is the art of hiding secret data into an image in such a way that it cannot be detected by an intruder. In image steganography, hiding secret data in the plain or non-edge regions of the image is significant due to the high similarity and redundancy of the pixels in their neighborhood. However, the similarity measure of the neighboring pixels, i.e., their proximity in color space, is perceptual rather than mathematical. This paper proposes an interval type 2 fuzzy logic system (IT2 FLS) to determine the similarity between the neighboring pixels by involving an instinctive human perception through a rule-based approach. The pixels of the image having high similarity values, calculated using the proposed IT2 FLS similarity measure, are selected for embedding via the least significant bit (LSB) method. We term the proposed procedure of steganography as IT2 FLS LSB method. Moreover, we have developed two more methods, namely, type 1 fuzzy logic system based least significant bits (T1FLS LSB) and Euclidean distance based similarity measures for least significant bit (SM LSB) steganographic methods. Experimental simulations were conducted for a collection of images and quality index metrics, such as PSNR, UQI, and SSIM are used. All the three steganographic methods are applied on datasets and the quality metrics are calculated. The obtained stego images and results are shown and thoroughly compared to determine the efficacy of the IT2 FLS LSB method. Finally, we have done a comparative analysis of the proposed approach with the existing well-known steganographic methods to show the effectiveness of our proposed steganographic method.




io

Safe Data-Driven Distributed Coordination of Intersection Traffic. (arXiv:2005.03304v1 [math.OC])

This work addresses the problem of traffic management at and near an isolated un-signalized intersection for autonomous and networked vehicles through coordinated optimization of their trajectories. We decompose the trajectory of each vehicle into two phases: the provisional phase and the coordinated phase. A vehicle, upon entering the region of interest, initially operates in the provisional phase, in which the vehicle is allowed to optimize its trajectory but is constrained to guarantee in-lane safety and to not enter the intersection. Periodically, all the vehicles in their provisional phase switch to their coordinated phase, which is obtained by coordinated optimization of the schedule of the vehicles' intersection usage as well as their trajectories. For the coordinated phase, we propose a data-driven solution, in which the intersection usage order is obtained through a data-driven online "classification" and the trajectories are computed sequentially. This approach is computationally very efficient and does not compromise much on optimality. Moreover, it also allows for incorporation of "macro" information such as traffic arrival rates into the solution. We also discuss a distributed implementation of this proposed data-driven sequential algorithm. Finally, we compare the proposed algorithm and its two variants against traditional methods of intersection management and against some existing results in the literature by micro-simulations.




io

Knowledge Enhanced Neural Fashion Trend Forecasting. (arXiv:2005.03297v1 [cs.IR])

Fashion trend forecasting is a crucial task for both academia and industry. Although some efforts have been devoted to tackling this challenging task, they only studied limited fashion elements with highly seasonal or simple patterns, which could hardly reveal the real fashion trends. Towards insightful fashion trend forecasting, this work focuses on investigating fine-grained fashion element trends for specific user groups. We first contribute a large-scale fashion trend dataset (FIT) collected from Instagram with extracted time series fashion element records and user information. Further-more, to effectively model the time series data of fashion elements with rather complex patterns, we propose a Knowledge EnhancedRecurrent Network model (KERN) which takes advantage of the capability of deep recurrent neural networks in modeling time-series data. Moreover, it leverages internal and external knowledge in fashion domain that affects the time-series patterns of fashion element trends. Such incorporation of domain knowledge further enhances the deep learning model in capturing the patterns of specific fashion elements and predicting the future trends. Extensive experiments demonstrate that the proposed KERN model can effectively capture the complicated patterns of objective fashion elements, therefore making preferable fashion trend forecast.