Latest io news

Cotatron: Transcription-Guided Speech Encoder for Any-to-Many Voice Conversion without Parallel Data. (arXiv:2005.03295v1 [eess.AS])

By arxiv.org
Published On ::

We propose Cotatron, a transcription-guided speech encoder for speaker-independent linguistic representation. Cotatron is based on the multispeaker TTS architecture and can be trained with conventional TTS datasets. We train a voice conversion system to reconstruct speech with Cotatron features, which is similar to the previous methods based on Phonetic Posteriorgram (PPG). By training and evaluating our system with 108 speakers from the VCTK dataset, we outperform the previous method in terms of both naturalness and speaker similarity. Our system can also convert speech from speakers that are unseen during training, and utilize ASR to automate the transcription with minimal reduction of the performance. Audio samples are available at https://mindslab-ai.github.io/cotatron, and the code with a pre-trained model will be made available soon.

Cotatron: Transcription-Guided Speech Encoder for Any-to-Many Voice Conversion without Parallel Data. (arXiv:2005.03295v1 [eess.AS])

Deep Learning based Person Re-identification. (arXiv:2005.03293v1 [cs.CV])

YANG2UML: Bijective Transformation and Simplification of YANG to UML. (arXiv:2005.03292v1 [cs.SE])

On the unique solution of the generalized absolute value equation. (arXiv:2005.03287v1 [math.NA])

Continuous maximal covering location problems with interconnected facilities. (arXiv:2005.03274v1 [math.OC])

RNN-T Models Fail to Generalize to Out-of-Domain Audio: Causes and Solutions. (arXiv:2005.03271v1 [eess.AS])

Data selection for multi-task learning under dynamic constraints. (arXiv:2005.03270v1 [eess.SY])

Online Proximal-ADMM For Time-varying Constrained Convex Optimization. (arXiv:2005.03267v1 [eess.SY])

Adaptive Feature Selection Guided Deep Forest for COVID-19 Classification with Chest CT. (arXiv:2005.03264v1 [eess.IV])

Structured inversion of the Bernstein-Vandermonde Matrix. (arXiv:2005.03251v1 [math.NA])

DFSeer: A Visual Analytics Approach to Facilitate Model Selection for Demand Forecasting. (arXiv:2005.03244v1 [cs.HC])

Enhancing Software Development Process Using Automated Adaptation of Object Ensembles. (arXiv:2005.03241v1 [cs.SE])

Multi-Target Deep Learning for Algal Detection and Classification. (arXiv:2005.03232v1 [cs.CV])

Diagnosis of Coronavirus Disease 2019 (COVID-19) with Structured Latent Multi-View Representation Learning. (arXiv:2005.03227v1 [eess.IV])

Deeply Supervised Active Learning for Finger Bones Segmentation. (arXiv:2005.03225v1 [cs.CV])

End-to-End Domain Adaptive Attention Network for Cross-Domain Person Re-Identification. (arXiv:2005.03222v1 [cs.CV])

Multi-dimensional Avikainen's estimates. (arXiv:2005.03219v1 [math.PR])

Shared Autonomy with Learned Latent Actions. (arXiv:2005.03210v1 [cs.RO])

Hierarchical Attention Network for Action Segmentation. (arXiv:2005.03209v1 [cs.CV])

A Stochastic Geometry Approach to Doppler Characterization in a LEO Satellite Network. (arXiv:2005.03205v1 [cs.IT])

What comprises a good talking-head video generation?: A Survey and Benchmark. (arXiv:2005.03201v1 [cs.CV])

Enabling Cross-chain Transactions: A Decentralized Cryptocurrency Exchange Protocol. (arXiv:2005.03199v1 [cs.CR])

Recognizing Exercises and Counting Repetitions in Real Time. (arXiv:2005.03194v1 [cs.CV])

Distributed Stabilization by Probability Control for Deterministic-Stochastic Large Scale Systems : Dissipativity Approach. (arXiv:2005.03193v1 [eess.SY])

Trains, Games, and Complexity: 0/1/2-Player Motion Planning through Input/Output Gadgets. (arXiv:2005.03192v1 [cs.CC])

ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context. (arXiv:2005.03191v1 [eess.AS])

A Dynamical Perspective on Point Cloud Registration. (arXiv:2005.03190v1 [cs.CV])

Evolutionary Multi Objective Optimization Algorithm for Community Detection in Complex Social Networks. (arXiv:2005.03181v1 [cs.NE])

Lattice-based public key encryption with equality test in standard model, revisited. (arXiv:2005.03178v1 [cs.CR])

A Parameterized Perspective on Attacking and Defending Elections. (arXiv:2005.03176v1 [cs.GT])

Fact-based Dialogue Generation with Convergent and Divergent Decoding. (arXiv:2005.03174v1 [cs.CL])

Nonlinear model reduction: a comparison between POD-Galerkin and POD-DEIM methods. (arXiv:2005.03173v1 [physics.comp-ph])

On Optimal Control of Discounted Cost Infinite-Horizon Markov Decision Processes Under Local State Information Structures. (arXiv:2005.03169v1 [eess.SY])

Decentralized Adaptive Control for Collaborative Manipulation of Rigid Bodies. (arXiv:2005.03153v1 [cs.RO])

An augmented Lagrangian preconditioner for implicitly-constituted non-Newtonian incompressible flow. (arXiv:2005.03150v1 [math.NA])

Optimally Convergent Mixed Finite Element Methods for the Stochastic Stokes Equations. (arXiv:2005.03148v1 [math.NA])

A Separation Theorem for Joint Sensor and Actuator Scheduling with Guaranteed Performance Bounds. (arXiv:2005.03143v1 [eess.SY])

A Gentle Introduction to Quantum Computing Algorithms with Applications to Universal Prediction. (arXiv:2005.03137v1 [quant-ph])

Evaluation, Tuning and Interpretation of Neural Networks for Meteorological Applications. (arXiv:2005.03126v1 [physics.ao-ph])

Unsupervised Multimodal Neural Machine Translation with Pseudo Visual Pivoting. (arXiv:2005.03119v1 [cs.CL])

Strong replica symmetry in high-dimensional optimal Bayesian inference. (arXiv:2005.03115v1 [math.PR])

Constrained de Bruijn Codes: Properties, Enumeration, Constructions, and Applications. (arXiv:2005.03102v1 [cs.IT])

Scale-Equalizing Pyramid Convolution for Object Detection. (arXiv:2005.03101v1 [cs.CV])

Optimal Location of Cellular Base Station via Convex Optimization. (arXiv:2005.03099v1 [cs.IT])

Inference with Choice Functions Made Practical. (arXiv:2005.03098v1 [cs.AI])

Heterogeneous Facility Location Games. (arXiv:2005.03095v1 [cs.GT])

AIOps for a Cloud Object Storage Service. (arXiv:2005.03094v1 [cs.DC])

Eliminating NB-IoT Interference to LTE System: a Sparse Machine Learning Based Approach. (arXiv:2005.03092v1 [cs.IT])

Robust Trajectory and Transmit Power Optimization for Secure UAV-Enabled Cognitive Radio Networks. (arXiv:2005.03091v1 [cs.IT])

A Multifactorial Optimization Paradigm for Linkage Tree Genetic Algorithm. (arXiv:2005.03090v1 [cs.NE])

The Finish Line: EPS Vs. Polyisocyanurate Insulation

The Finish Line: EIFS Inspection

The Finish Line: Right Solutions for the Right Problems

Will Synthetic Biology Save the World?

Anti-LEED Legislation

The Greenest Low Slope Roofing Solution

Only 12 per cent of leading charities publicly recognise a trade union, analysis suggests

Next chair of the National Lottery Community Fund revealed

SIA Releases New Version of OSDP Standard

Panasonic's Security Solutions Start With Energy-Efficient Products

Incomplete information can fuel misjudgment: study

FHWA rule updates protections for workers and drivers in work zones

The First Sealer to Give a Beautiful, Luxurious Appearance

Bellingham: A New Wool Collection from Karastan

Metallika blends beauty and function

Subscribe To Our Newsletter