etho

Derivative-Free Methods for Policy Optimization: Guarantees for Linear Quadratic Systems

We study derivative-free methods for policy optimization over the class of linear policies. We focus on characterizing the convergence rate of these methods when applied to linear-quadratic systems, and study various settings of driving noise and reward feedback. Our main theoretical result provides an explicit bound on the sample or evaluation complexity: we show that these methods are guaranteed to converge to within any pre-specified tolerance of the optimal policy with a number of zero-order evaluations that is an explicit polynomial of the error tolerance, dimension, and curvature properties of the problem. Our analysis reveals some interesting differences between the settings of additive driving noise and random initialization, as well as the settings of one-point and two-point reward feedback. Our theory is corroborated by simulations of derivative-free methods in application to these systems. Along the way, we derive convergence rates for stochastic zero-order optimization algorithms when applied to a certain class of non-convex problems.




etho

Generalized Optimal Matching Methods for Causal Inference

We develop an encompassing framework for matching, covariate balancing, and doubly-robust methods for causal inference from observational data called generalized optimal matching (GOM). The framework is given by generalizing a new functional-analytical formulation of optimal matching, giving rise to the class of GOM methods, for which we provide a single unified theory to analyze tractability and consistency. Many commonly used existing methods are included in GOM and, using their GOM interpretation, can be extended to optimally and automatically trade off balance for variance and outperform their standard counterparts. As a subclass, GOM gives rise to kernel optimal matching (KOM), which, as supported by new theoretical and empirical results, is notable for combining many of the positive properties of other methods in one. KOM, which is solved as a linearly-constrained convex-quadratic optimization problem, inherits both the interpretability and model-free consistency of matching but can also achieve the $sqrt{n}$-consistency of well-specified regression and the bias reduction and robustness of doubly robust methods. In settings of limited overlap, KOM enables a very transparent method for interval estimation for partial identification and robust coverage. We demonstrate this in examples with both synthetic and real data.




etho

(1 + epsilon)-class Classification: an Anomaly Detection Method for Highly Imbalanced or Incomplete Data Sets

Anomaly detection is not an easy problem since distribution of anomalous samples is unknown a priori. We explore a novel method that gives a trade-off possibility between one-class and two-class approaches, and leads to a better performance on anomaly detection problems with small or non-representative anomalous samples. The method is evaluated using several data sets and compared to a set of conventional one-class and two-class approaches.




etho

Multivariate normal approximation of the maximum likelihood estimator via the delta method

Andreas Anastasiou, Robert E. Gaunt.

Source: Brazilian Journal of Probability and Statistics, Volume 34, Number 1, 136--149.

Abstract:
We use the delta method and Stein’s method to derive, under regularity conditions, explicit upper bounds for the distributional distance between the distribution of the maximum likelihood estimator (MLE) of a $d$-dimensional parameter and its asymptotic multivariate normal distribution. Our bounds apply in situations in which the MLE can be written as a function of a sum of i.i.d. $t$-dimensional random vectors. We apply our general bound to establish a bound for the multivariate normal approximation of the MLE of the normal distribution with unknown mean and variance.




etho

An estimation method for latent traits and population parameters in Nominal Response Model

Caio L. N. Azevedo, Dalton F. Andrade

Source: Braz. J. Probab. Stat., Volume 24, Number 3, 415--433.

Abstract:
The nominal response model (NRM) was proposed by Bock [ Psychometrika 37 (1972) 29–51] in order to improve the latent trait (ability) estimation in multiple choice tests with nominal items. When the item parameters are known, expectation a posteriori or maximum a posteriori methods are commonly employed to estimate the latent traits, considering a standard symmetric normal distribution as the latent traits prior density. However, when this item set is presented to a new group of examinees, it is not only necessary to estimate their latent traits but also the population parameters of this group. This article has two main purposes: first, to develop a Monte Carlo Markov Chain algorithm to estimate both latent traits and population parameters concurrently. This algorithm comprises the Metropolis–Hastings within Gibbs sampling algorithm (MHWGS) proposed by Patz and Junker [ Journal of Educational and Behavioral Statistics 24 (1999b) 346–366]. Second, to compare, in the latent trait recovering, the performance of this method with three other methods: maximum likelihood, expectation a posteriori and maximum a posteriori. The comparisons were performed by varying the total number of items (NI), the number of categories and the values of the mean and the variance of the latent trait distribution. The results showed that MHWGS outperforms the other methods concerning the latent traits estimation as well as it recoveries properly the population parameters. Furthermore, we found that NI accounts for the highest percentage of the variability in the accuracy of latent trait estimation.




etho

Flexible, boundary adapted, nonparametric methods for the estimation of univariate piecewise-smooth functions

Umberto Amato, Anestis Antoniadis, Italia De Feis.

Source: Statistics Surveys, Volume 14, 32--70.

Abstract:
We present and compare some nonparametric estimation methods (wavelet and/or spline-based) designed to recover a one-dimensional piecewise-smooth regression function in both a fixed equidistant or not equidistant design regression model and a random design model. Wavelet methods are known to be very competitive in terms of denoising and compression, due to the simultaneous localization property of a function in time and frequency. However, boundary assumptions, such as periodicity or symmetry, generate bias and artificial wiggles which degrade overall accuracy. Simple methods have been proposed in the literature for reducing the bias at the boundaries. We introduce new ones based on adaptive combinations of two estimators. The underlying idea is to combine a highly accurate method for non-regular functions, e.g., wavelets, with one well behaved at boundaries, e.g., Splines or Local Polynomial. We provide some asymptotic optimal results supporting our approach. All the methods can handle data with a random design. We also sketch some generalization to the multidimensional setting. To study the performance of the proposed approaches we have conducted an extensive set of simulations on synthetic data. An interesting regression analysis of two real data applications using these procedures unambiguously demonstrates their effectiveness.




etho

An approximate likelihood perspective on ABC methods

George Karabatsos, Fabrizio Leisen.

Source: Statistics Surveys, Volume 12, 66--104.

Abstract:
We are living in the big data era, as current technologies and networks allow for the easy and routine collection of data sets in different disciplines. Bayesian Statistics offers a flexible modeling approach which is attractive for describing the complexity of these datasets. These models often exhibit a likelihood function which is intractable due to the large sample size, high number of parameters, or functional complexity. Approximate Bayesian Computational (ABC) methods provides likelihood-free methods for performing statistical inferences with Bayesian models defined by intractable likelihood functions. The vastity of the literature on ABC methods created a need to review and relate all ABC approaches so that scientists can more readily understand and apply them for their own work. This article provides a unifying review, general representation, and classification of all ABC methods from the view of approximate likelihood theory. This clarifies how ABC methods can be characterized, related, combined, improved, and applied for future research. Possible future research in ABC is then outlined.




etho

Variable selection methods for model-based clustering

Michael Fop, Thomas Brendan Murphy.

Source: Statistics Surveys, Volume 12, 18--65.

Abstract:
Model-based clustering is a popular approach for clustering multivariate data which has seen applications in numerous fields. Nowadays, high-dimensional data are more and more common and the model-based clustering approach has adapted to deal with the increasing dimensionality. In particular, the development of variable selection techniques has received a lot of attention and research effort in recent years. Even for small size problems, variable selection has been advocated to facilitate the interpretation of the clustering results. This review provides a summary of the methods developed for variable selection in model-based clustering. Existing R packages implementing the different methods are indicated and illustrated in application to two data analysis examples.




etho

A survey of bootstrap methods in finite population sampling

Zeinab Mashreghi, David Haziza, Christian Léger.

Source: Statistics Surveys, Volume 10, 1--52.

Abstract:
We review bootstrap methods in the context of survey data where the effect of the sampling design on the variability of estimators has to be taken into account. We present the methods in a unified way by classifying them in three classes: pseudo-population, direct, and survey weights methods. We cover variance estimation and the construction of confidence intervals for stratified simple random sampling as well as some unequal probability sampling designs. We also address the problem of variance estimation in presence of imputation to compensate for item non-response.




etho

Some models and methods for the analysis of observational data

José A. Ferreira.

Source: Statistics Surveys, Volume 9, 106--208.

Abstract:
This article provides a concise and essentially self-contained exposition of some of the most important models and non-parametric methods for the analysis of observational data, and a substantial number of illustrations of their application. Although for the most part our presentation follows P. Rosenbaum’s book, “Observational Studies”, and naturally draws on related literature, it contains original elements and simplifies and generalizes some basic results. The illustrations, based on simulated data, show the methods at work in some detail, highlighting pitfalls and emphasizing certain subjective aspects of the statistical analyses.




etho

Errata: A survey of Bayesian predictive methods for model assessment, selection and comparison

Aki Vehtari, Janne Ojanen.

Source: Statistics Surveys, Volume 8, , 1--1.

Abstract:
Errata for “A survey of Bayesian predictive methods for model assessment, selection and comparison” by A. Vehtari and J. Ojanen, Statistics Surveys , 6 (2012), 142–228. doi:10.1214/12-SS102.




etho

A survey of Bayesian predictive methods for model assessment, selection and comparison

Aki Vehtari, Janne Ojanen

Source: Statist. Surv., Volume 6, 142--228.

Abstract:
To date, several methods exist in the statistical literature for model assessment, which purport themselves specifically as Bayesian predictive methods. The decision theoretic assumptions on which these methods are based are not always clearly stated in the original articles, however. The aim of this survey is to provide a unified review of Bayesian predictive model assessment and selection methods, and of methods closely related to them. We review the various assumptions that are made in this context and discuss the connections between different approaches, with an emphasis on how each method approximates the expected utility of using a Bayesian model for the purpose of predicting future data.




etho

The theory and application of penalized methods or Reproducing Kernel Hilbert Spaces made easy

Nancy Heckman

Source: Statist. Surv., Volume 6, 113--141.

Abstract:
The popular cubic smoothing spline estimate of a regression function arises as the minimizer of the penalized sum of squares $sum_{j}(Y_{j}-mu(t_{j}))^{2}+lambda int_{a}^{b}[mu''(t)]^{2},dt$, where the data are $t_{j},Y_{j}$, $j=1,ldots,n$. The minimization is taken over an infinite-dimensional function space, the space of all functions with square integrable second derivatives. But the calculations can be carried out in a finite-dimensional space. The reduction from minimizing over an infinite dimensional space to minimizing over a finite dimensional space occurs for more general objective functions: the data may be related to the function $mu$ in another way, the sum of squares may be replaced by a more suitable expression, or the penalty, $int_{a}^{b}[mu''(t)]^{2},dt$, might take a different form. This paper reviews the Reproducing Kernel Hilbert Space structure that provides a finite-dimensional solution for a general minimization problem. Particular attention is paid to the construction and study of the Reproducing Kernel Hilbert Space corresponding to a penalty based on a linear differential operator. In this case, one can often calculate the minimizer explicitly, using Green’s functions.




etho

Data confidentiality: A review of methods for statistical disclosure limitation and methods for assessing privacy

Gregory J. Matthews, Ofer Harel

Source: Statist. Surv., Volume 5, 1--29.

Abstract:
There is an ever increasing demand from researchers for access to useful microdata files. However, there are also growing concerns regarding the privacy of the individuals contained in the microdata. Ideally, microdata could be released in such a way that a balance between usefulness of the data and privacy is struck. This paper presents a review of proposed methods of statistical disclosure control and techniques for assessing the privacy of such methods under different definitions of disclosure.

References:
Abowd, J., Woodcock, S., 2001. Disclosure limitation in longitudinal linked data. Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies, 215–277.

Adam, N.R., Worthmann, J.C., 1989. Security-control methods for statistical databases: a comparative study. ACM Comput. Surv. 21 (4), 515–556.

Armstrong, M., Rushton, G., Zimmerman, D.L., 1999. Geographically masking health data to preserve confidentiality. Statistics in Medicine 18 (5), 497–525.

Bethlehem, J.G., Keller, W., Pannekoek, J., 1990. Disclosure control of microdata. Jorunal of the American Statistical Association 85, 38–45.

Blum, A., Dwork, C., McSherry, F., Nissam, K., 2005. Practical privacy: The sulq framework. In: Proceedings of the 24th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. pp. 128–138.

Bowden, R.J., Sim, A.B., 1992. The privacy bootstrap. Journal of Business and Economic Statistics 10 (3), 337–345.

Carlson, M., Salabasis, M., 2002. A data-swapping technique for generating synthetic samples; a method for disclosure control. Res. Official Statist. (5), 35–64.

Cox, L.H., 1980. Suppression methodology and statistical disclosure control. Journal of the American Statistical Association 75, 377–385.

Cox, L.H., 1984. Disclosure control methods for frequency count data. Tech. rep., U.S. Bureau of the Census.

Cox, L.H., 1987. A constructive procedure for unbiased controlled rounding. Journal of the American Statistical Association 82, 520–524.

Cox, L.H., 1994. Matrix masking methods for disclosure limitation in microdata. Survey Methodology 6, 165–169.

Cox, L.H., Fagan, J.T., Greenberg, B., Hemmig, R., 1987. Disclosure avoidance techniques for tabular data. Tech. rep., U.S. Bureau of the Census.

Dalenius, T., 1977. Towards a methodology for statistical disclosure control. Statistik Tidskrift 15, 429–444.

Dalenius, T., 1986. Finding a needle in a haystack - or identifying anonymous census record. Journal of Official Statistics 2 (3), 329–336.

Dalenius, T., Denning, D., 1982. A hybrid scheme for release of statistics. Statistisk Tidskrift.

Dalenius, T., Reiss, S.P., 1982. Data-swapping: A technique for disclosure control. Journal of Statistical Planning and Inference 6, 73–85.

De Waal, A., Hundepool, A., Willenborg, L., 1995. Argus: Software for statistical disclosure control of microdata. U.S. Census Bureau.

DeGroot, M.H., 1962. Uncertainty, information, and sequential experiments. Annals of Mathematical Statistics 33, 404–419.

DeGroot, M.H., 1970. Optimal Statistical Decisions. Mansell, London.

Dinur, I., Nissam, K., 2003. Revealing information while preserving privacy. In: Proceedings of the 22nd ACM SIGMOD-SIGACT-SIGART Symposium on Principlesof Database Systems. pp. 202–210.

Domingo-Ferrer, J., Torra, V., 2001a. A Quantitative Comparison of Disclosure Control Methods for Microdata. In: Doyle, P., Lane, J., Theeuwes, J., Zayatz, L. (Eds.), Confidentiality, Disclosure and Data Access - Theory and Practical Applications for Statistical Agencies. North-Holland, Amsterdam, Ch. 6, pp. 113–135.

Domingo-Ferrer, J., Torra, V., 2001b. Disclosure control methods and information loss for microdata. In: Doyle, P., Lane, J., Theeuwes, J., Zayatz, L. (Eds.), Confidentiality, Disclosure and Data Access - Theory and Practical Applications for Statistical Agencies. North-Holland, Amsterdam, Ch. 5, pp. 93–112.

Duncan, G., Lambert, D., 1986. Disclosure-limited data dissemination. Journal of the American Statistical Association 81, 10–28.

Duncan, G., Lambert, D., 1989. The risk of disclosure for microdata. Journal of Business & Economic Statistics 7, 207–217.

Duncan, G., Pearson, R., 1991. Enhancing access to microdata while protecting confidentiality: prospects for the future (with discussion). Statistical Science 6, 219–232.

Dwork, C., 2006. Differential privacy. In: ICALP. Springer, pp. 1–12.

Dwork, C., 2008. An ad omnia approach to defining and achieving private data analysis. In: Lecture Notes in Computer Science. Springer, p. 10.

Dwork, C., Lei, J., 2009. Differential privacy and robust statistics. In: Proceedings of the 41th Annual ACM Symposium on Theory of Computing (STOC). pp. 371–380.

Dwork, C., Mcsherry, F., Nissim, K., Smith, A., 2006. Calibrating noise to sensitivity in private data analysis. In: Proceedings of the 3rd Theory of Cryptography Conference. Springer, pp. 265–284.

Dwork, C., Nissam, K., 2004. Privacy-preserving datamining on vertically partitioned databases. In: Advances in Cryptology: Proceedings of Crypto. pp. 528–544.

Elliot, M., 2000. DIS: a new approach to the measurement of statistical disclosure risk. International Journal of Risk Assessment and Management 2, 39–48.

Federal Committee on Statistical Methodology (FCSM), 2005. Statistical policy working group 22 - report on statistical disclosure limitation methodology. U.S. Census Bureau.

Fellegi, I.P., 1972. On the question of statistical confidentiality. Journal of the American Statistical Association 67 (337), 7–18.

Fienberg, S.E., McIntyre, J., 2004. Data swapping: Variations on a theme by Dalenius and Reiss. In: Domingo-Ferrer, J., Torra, V. (Eds.), Privacy in Statistical Databases. Vol. 3050 of Lecture Notes in Computer Science. Springer Berlin/Heidelberg, pp. 519, http://dx.doi.org/10.1007/ 978-3-540-25955-8_2

Fuller, W., 1993. Masking procedurse for microdata disclosure limitation. Journal of Official Statistics 9, 383–406.

General Assembly of the United Nations, 1948. Universal declaration of human rights.

Gouweleeuw, J., P. Kooiman, L.W., de Wolf, P.-P., 1998. Post randomisation for statistical disclosure control: Theory and implementation. Journal of Official Statistics 14 (4), 463–478.

Greenberg, B., 1987. Rank swapping for masking ordinal microdata. Tech. rep., U.S. Bureau of the Census (unpublished manuscript), Suitland, Maryland, USA.

Greenberg, B.G., Abul-Ela, A.-L.A., Simmons, W.R., Horvitz, D.G., 1969. The unrelated question randomized response model: Theoretical framework. Journal of the American Statistical Association 64 (326), 520–539.

Harel, O., Zhou, X.-H., 2007. Multiple imputation: Review and theory, implementation and software. Statistics in Medicine 26, 3057–3077.

Hundepool, A., Domingo-ferrer, J., Franconi, L., Giessing, S., Lenz, R., Longhurst, J., Nordholt, E.S., Seri, G., paul De Wolf, P., 2006. A CENtre of EXcellence for Statistical Disclosure Control Handbook on Statistical Disclosure Control Version 1.01.

Hundepool, A., Wetering, A. v.d., Ramaswamy, R., Wolf, P.d., Giessing, S., Fischetti, M., Salazar, J., Castro, J., Lowthian, P., Feb. 2005. τ-argus 3.1 user manual. Statistics Netherlands, Voorburg NL.

Hundepool, A., Willenborg, L., 1996. μ- and τ-argus: Software for statistical disclosure control. Third International Seminar on Statistical Confidentiality, Bled.

Karr, A., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P., 2006. A framework for evaluating the utility of data altered to protect confidentiality. American Statistician 60 (3), 224–232.

Kaufman, S., Seastrom, M., Roey, S., 2005. Do disclosure controls to protect confidentiality degrade the quality of the data? In: American Statistical Association, Proceedings of the Section on Survey Research.

Kennickell, A.B., 1997. Multiple imputation and disclosure protection: the case of the 1995 survey of consumer finances. Record Linkage Techniques, 248–267.

Kim, J., 1986. Limiting disclosure in microdata based on random noise and transformation. Bureau of the Census.

Krumm, J., 2007. Inference attacks on location tracks. Proceedings of Fifth International Conference on Pervasive Computingy, 127–143.

Li, N., Li, T., Venkatasubramanian, S., 2007. t-closeness: Privacy beyond k-anonymity and l-diversity. In: Data Engineering, 2007. ICDE 2007. IEEE 23rd International Conference on. pp. 106–115.

Liew, C.K., Choi, U.J., Liew, C.J., 1985. A data distortion by probability distribution. ACM Trans. Database Syst. 10 (3), 395–411.

Little, R.J.A., 1993. Statistical analysis of masked data. Journal of Official Statistics 9, 407–426.

Little, R.J.A., Rubin, D.B., 1987. Statistical Analysis with Missing Data. John Wiley & Sons.

Liu, F., Little, R.J.A., 2002. Selective multiple mputation of keys for statistical disclosure control in microdata. In: Proceedings Joint Statistical Meet. pp. 2133–2138.

Machanavajjhala, A., Kifer, D., Abowd, J., Gehrke, J., Vilhuber, L., April 2008. Privacy: Theory meets practice on the map. In: International Conference on Data Engineering. Cornell University Comuputer Science Department, Cornell, USA, p. 10.

Machanavajjhala, A., Kifer, D., Gehrke, J., Venkitasubramaniam, M., 2007. L-diversity: Privacy beyond k-anonymity. ACM Trans. Knowl. Discov. Data 1 (1), 3.

Manning, A.M., Haglin, D.J., Keane, J.A., 2008. A recursive search algorithm for statistical disclosure assessment. Data Min. Knowl. Discov. 16 (2), 165–196.

Marsh, C., Skinner, C., Arber, S., Penhale, B., Openshaw, S., Hobcraft, J., Lievesley, D., Walford, N., 1991. The case for samples of anonymized records from the 1991 census. Journal of the Royal Statistical Society 154 (2), 305–340.

Matthews, G.J., Harel, O., Aseltine, R.H., 2010a. Assessing database privacy using the area under the receiver-operator characteristic curve. Health Services and Outcomes Research Methodology 10 (1), 1–15.

Matthews, G.J., Harel, O., Aseltine, R.H., 2010b. Examining the robustness of fully synthetic data techniques for data with binary variables. Journal of Statistical Computation and Simulation 80 (6), 609–624.

Moore, Jr., R., 1996. Controlled data-swapping techniques for masking public use microdata. Census Tech Report.

Mugge, R., 1983. Issues in protecting confidentiality in national health statistics. Proceedings of the Section on Survey Research Methods.

Nissim, K., Raskhodnikova, S., Smith, A., 2007. Smooth sensitivity and sampling in private data analysis. In: STOC ’07: Proceedings of the thirty-ninth annual ACM symposium on Theory of computing. pp. 75–84.

Paass, G., 1988. Disclosure risk and disclosure avoidance for microdata. Journal of Business and Economic Statistics 6 (4), 487–500.

Palley, M., Simonoff, J., 1987. The use of regression methodology for the compromise of confidential information in statistical databases. ACM Trans. Database Systems 12 (4), 593–608.

Raghunathan, T.E., Reiter, J.P., Rubin, D.B., 2003. Multiple imputation for statistical disclosure limitation. Journal of Official Statistics 19 (1), 1–16.

Rajasekaran, S., Harel, O., Zuba, M., Matthews, G.J., Aseltine, Jr., R., 2009. Responsible data releases. In: Proceedings 9th Industrial Conference on Data Mining (ICDM). Springer LNCS, pp. 388–400.

Reiss, S.P., 1984. Practical data-swapping: The first steps. CM Transactions on Database Systems 9, 20–37.

Reiter, J.P., 2002. Satisfying disclosure restriction with synthetic data sets. Journal of Official Statistics 18 (4), 531–543.

Reiter, J.P., 2003. Inference for partially synthetic, public use microdata sets. Survey Methodology 29 (2), 181–188.

Reiter, J.P., 2004a. New approaches to data dissemination: A glimpse into the future (?). Chance 17 (3), 11–15.

Reiter, J.P., 2004b. Simultaneous use of multiple imputation for missing data and disclosure limitation. Survey Methodology 30 (2), 235–242.

Reiter, J.P., 2005a. Estimating risks of identification disclosure in microdata. Journal of the American Statistical Association 100, 1103–1112.

Reiter, J.P., 2005b. Releasing multiply imputed, synthetic public use microdata: An illustration and empirical study. Journal of the Royal Statistical Society, Series A: Statistics in Society 168 (1), 185–205.

Reiter, J.P., 2005c. Using CART to generate partially synthetic public use microdata. Journal of Official Statistics 21 (3), 441–462.

Rubin, D.B., 1987. Multiple Imputation for Nonresponse in Surveys. John Wiley & Sons.

Rubin, D.B., 1993. Comment on “Statistical disclosure limitation”. Journal of Official Statistics 9, 461–468.

Rubner, Y., Tomasi, C., Guibas, L.J., 1998. A metric for distributions with applications to image databases. Computer Vision, IEEE International Conference on 0, 59.

Sarathy, R., Muralidhar, K., 2002a. The security of confidential numerical data in databases. Information Systems Research 13 (4), 389–403.

Sarathy, R., Muralidhar, K., 2002b. The security of confidential numerical data in databases. Info. Sys. Research 13 (4), 389–403.

Schafer, J.L., Graham, J.W., 2002. Missing data: Our view of state of the art. Psychological Methods 7 (2), 147–177.

Singh, A., Yu, F., Dunteman, G., 2003. MASSC: A new data mask for limiting statistical information loss and disclosure. In: Proceedings of the Joint UNECE/EUROSTAT Work Session on Statistical Data Confidentiality. pp. 373–394.

Skinner, C., 2009. Statistical disclosure control for survey data. In: Pfeffermann, D and Rao, C.R. eds. Handbook of Statistics Vol. 29A: Sample Surveys: Design, Methods and Applications. pp. 381–396.

Skinner, C., Marsh, C., Openshaw, S., Wymer, C., 1994. Disclosure control for census microdata. Journal of Official Statistics 10, 31–51.

Skinner, C., Shlomo, N., 2008. Assessing identification risk in survey microdata using log-linear models. Journal of the American Statistical Association 103, 989–1001.

Skinner, C.J., Elliot, M.J., 2002. A measure of disclosure risk for microdata. Journal of the Royal Statistical Society. Series B (Statistical Methodology) 64 (4), 855–867.

Smith, A., 2008. Efficient, dfferentially private point estimators. arXiv:0809.4794v1 [cs.CR].

Spruill, N.L., 1982. Measures of confidentiality. Statistics of Income and Related Administrative Record Research, 131–136.

Spruill, N.L., 1983. The confidentiality and analytic usefulness of masked business microdata. In: Proceedings of the Section on Survey Reserach Microdata. American Statistical Association, pp. 602–607.

Sweeney, L., 1996. Replacing personally-identifying information in medical records, the scrub system. In: American Medical Informatics Association. Hanley and Belfus, Inc., pp. 333–337.

Sweeney, L., 1997. Guaranteeing anonymity when sharing medical data, the datafly system. Journal of the American Medical Informatics Association 4, 51–55.

Sweeney, L., 2002a. Achieving k-anonymity privacy protection using generalization and suppression. International Journal of Uncertainty, Fuzziness and Knowledge Based Systems 10 (5), 571–588.

Sweeney, L., 2002b. k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge Based Systems 10 (5), 557–570.

Tendick, P., 1991. Optimal noise addition for preserving confidentiality in multivariate data. Journal of Statistical Planning and Inference 27 (2), 341–353.

United Nations Economic Comission for Europe (UNECE), 2007. Manging statistical cinfidentiality and microdata access: Principles and guidlinesof good practice.

Warner, S.L., 1965. Randomized response: A survey technique for eliminating evasive answer bias. Journal of the American Statistical Association 60 (309), 63–69.

Wasserman, L., Zhou, S., 2010. A statistical framework for differential privacy. Journal of the American Statistical Association 105 (489), 375–389.

Willenborg, L., de Waal, T., 2001. Elements of Statistical Disclosure Control. Springer-Verlag.

Woodward, B., 1995. The computer-based patient record and confidentiality. The New England Journal of Medicine, 1419–1422.




etho

Generating Thermal Image Data Samples using 3D Facial Modelling Techniques and Deep Learning Methodologies. (arXiv:2005.01923v2 [cs.CV] UPDATED)

Methods for generating synthetic data have become of increasing importance to build large datasets required for Convolution Neural Networks (CNN) based deep learning techniques for a wide range of computer vision applications. In this work, we extend existing methodologies to show how 2D thermal facial data can be mapped to provide 3D facial models. For the proposed research work we have used tufts datasets for generating 3D varying face poses by using a single frontal face pose. The system works by refining the existing image quality by performing fusion based image preprocessing operations. The refined outputs have better contrast adjustments, decreased noise level and higher exposedness of the dark regions. It makes the facial landmarks and temperature patterns on the human face more discernible and visible when compared to original raw data. Different image quality metrics are used to compare the refined version of images with original images. In the next phase of the proposed study, the refined version of images is used to create 3D facial geometry structures by using Convolution Neural Networks (CNN). The generated outputs are then imported in blender software to finally extract the 3D thermal facial outputs of both males and females. The same technique is also used on our thermal face data acquired using prototype thermal camera (developed under Heliaus EU project) in an indoor lab environment which is then used for generating synthetic 3D face data along with varying yaw face angles and lastly facial depth map is generated.




etho

Reference and Document Aware Semantic Evaluation Methods for Korean Language Summarization. (arXiv:2005.03510v1 [cs.CL])

Text summarization refers to the process that generates a shorter form of text from the source document preserving salient information. Recently, many models for text summarization have been proposed. Most of those models were evaluated using recall-oriented understudy for gisting evaluation (ROUGE) scores. However, as ROUGE scores are computed based on n-gram overlap, they do not reflect semantic meaning correspondences between generated and reference summaries. Because Korean is an agglutinative language that combines various morphemes into a word that express several meanings, ROUGE is not suitable for Korean summarization. In this paper, we propose evaluation metrics that reflect semantic meanings of a reference summary and the original document, Reference and Document Aware Semantic Score (RDASS). We then propose a method for improving the correlation of the metrics with human judgment. Evaluation results show that the correlation with human judgment is significantly higher for our evaluation metrics than for ROUGE scores.




etho

Feature Selection Methods for Uplift Modeling. (arXiv:2005.03447v1 [cs.LG])

Uplift modeling is a predictive modeling technique that estimates the user-level incremental effect of a treatment using machine learning models. It is often used for targeting promotions and advertisements, as well as for the personalization of product offerings. In these applications, there are often hundreds of features available to build such models. Keeping all the features in a model can be costly and inefficient. Feature selection is an essential step in the modeling process for multiple reasons: improving the estimation accuracy by eliminating irrelevant features, accelerating model training and prediction speed, reducing the monitoring and maintenance workload for feature data pipeline, and providing better model interpretation and diagnostics capability. However, feature selection methods for uplift modeling have been rarely discussed in the literature. Although there are various feature selection methods for standard machine learning models, we will demonstrate that those methods are sub-optimal for solving the feature selection problem for uplift modeling. To address this problem, we introduce a set of feature selection methods designed specifically for uplift modeling, including both filter methods and embedded methods. To evaluate the effectiveness of the proposed feature selection methods, we use different uplift models and measure the accuracy of each model with a different number of selected features. We use both synthetic and real data to conduct these experiments. We also implemented the proposed filter methods in an open source Python package (CausalML).




etho

The neuroethology of birdsong

9783030346836 (electronic bk.)




etho

Structured object-oriented formal language and method : 9th International Workshop, SOFL+MSVL 2019, Shenzhen, China, November 5, 2019, Revised selected papers

SOFL+MSVL (Workshop) (9th : 2019 : Shenzhen, China)
9783030414184 (electronic bk.)




etho

Spectral and matrix factorization methods for consistent community detection in multi-layer networks

Subhadeep Paul, Yuguo Chen.

Source: The Annals of Statistics, Volume 48, Number 1, 230--250.

Abstract:
We consider the problem of estimating a consensus community structure by combining information from multiple layers of a multi-layer network using methods based on the spectral clustering or a low-rank matrix factorization. As a general theme, these “intermediate fusion” methods involve obtaining a low column rank matrix by optimizing an objective function and then using the columns of the matrix for clustering. However, the theoretical properties of these methods remain largely unexplored. In the absence of statistical guarantees on the objective functions, it is difficult to determine if the algorithms optimizing the objectives will return good community structures. We investigate the consistency properties of the global optimizer of some of these objective functions under the multi-layer stochastic blockmodel. For this purpose, we derive several new asymptotic results showing consistency of the intermediate fusion techniques along with the spectral clustering of mean adjacency matrix under a high dimensional setup, where the number of nodes, the number of layers and the number of communities of the multi-layer graph grow. Our numerical study shows that the intermediate fusion techniques outperform late fusion methods, namely spectral clustering on aggregate spectral kernel and module allegiance matrix in sparse networks, while they outperform the spectral clustering of mean adjacency matrix in multi-layer networks that contain layers with both homophilic and heterophilic communities.




etho

Negative association, ordering and convergence of resampling methods

Mathieu Gerber, Nicolas Chopin, Nick Whiteley.

Source: The Annals of Statistics, Volume 47, Number 4, 2236--2260.

Abstract:
We study convergence and convergence rates for resampling schemes. Our first main result is a general consistency theorem based on the notion of negative association, which is applied to establish the almost sure weak convergence of measures output from Kitagawa’s [ J. Comput. Graph. Statist. 5 (1996) 1–25] stratified resampling method. Carpenter, Ckiffird and Fearnhead’s [ IEE Proc. Radar Sonar Navig. 146 (1999) 2–7] systematic resampling method is similar in structure but can fail to converge depending on the order of the input samples. We introduce a new resampling algorithm based on a stochastic rounding technique of [In 42nd IEEE Symposium on Foundations of Computer Science ( Las Vegas , NV , 2001) (2001) 588–597 IEEE Computer Soc.], which shares some attractive properties of systematic resampling, but which exhibits negative association and, therefore, converges irrespective of the order of the input samples. We confirm a conjecture made by [ J. Comput. Graph. Statist. 5 (1996) 1–25] that ordering input samples by their states in $mathbb{R}$ yields a faster rate of convergence; we establish that when particles are ordered using the Hilbert curve in $mathbb{R}^{d}$, the variance of the resampling error is ${scriptstylemathcal{O}}(N^{-(1+1/d)})$ under mild conditions, where $N$ is the number of particles. We use these results to establish asymptotic properties of particle algorithms based on resampling schemes that differ from multinomial resampling.




etho

Spectral method and regularized MLE are both optimal for top-$K$ ranking

Yuxin Chen, Jianqing Fan, Cong Ma, Kaizheng Wang.

Source: The Annals of Statistics, Volume 47, Number 4, 2204--2235.

Abstract:
This paper is concerned with the problem of top-$K$ ranking from pairwise comparisons. Given a collection of $n$ items and a few pairwise comparisons across them, one wishes to identify the set of $K$ items that receive the highest ranks. To tackle this problem, we adopt the logistic parametric model—the Bradley–Terry–Luce model, where each item is assigned a latent preference score, and where the outcome of each pairwise comparison depends solely on the relative scores of the two items involved. Recent works have made significant progress toward characterizing the performance (e.g., the mean square error for estimating the scores) of several classical methods, including the spectral method and the maximum likelihood estimator (MLE). However, where they stand regarding top-$K$ ranking remains unsettled. We demonstrate that under a natural random sampling model, the spectral method alone, or the regularized MLE alone, is minimax optimal in terms of the sample complexity—the number of paired comparisons needed to ensure exact top-$K$ identification, for the fixed dynamic range regime. This is accomplished via optimal control of the entrywise error of the score estimates. We complement our theoretical studies by numerical experiments, confirming that both methods yield low entrywise errors for estimating the underlying scores. Our theory is established via a novel leave-one-out trick, which proves effective for analyzing both iterative and noniterative procedures. Along the way, we derive an elementary eigenvector perturbation bound for probability transition matrices, which parallels the Davis–Kahan $mathop{mathrm{sin}} olimits Theta $ theorem for symmetric matrices. This also allows us to close the gap between the $ell_{2}$ error upper bound for the spectral method and the minimax lower limit.




etho

Estimating causal effects in studies of human brain function: New models, methods and estimands

Michael E. Sobel, Martin A. Lindquist.

Source: The Annals of Applied Statistics, Volume 14, Number 1, 452--472.

Abstract:
Neuroscientists often use functional magnetic resonance imaging (fMRI) to infer effects of treatments on neural activity in brain regions. In a typical fMRI experiment, each subject is observed at several hundred time points. At each point, the blood oxygenation level dependent (BOLD) response is measured at 100,000 or more locations (voxels). Typically, these responses are modeled treating each voxel separately, and no rationale for interpreting associations as effects is given. Building on Sobel and Lindquist ( J. Amer. Statist. Assoc. 109 (2014) 967–976), who used potential outcomes to define unit and average effects at each voxel and time point, we define and estimate both “point” and “cumulated” effects for brain regions. Second, we construct a multisubject, multivoxel, multirun whole brain causal model with explicit parameters for regions. We justify estimation using BOLD responses averaged over voxels within regions, making feasible estimation for all regions simultaneously, thereby also facilitating inferences about association between effects in different regions. We apply the model to a study of pain, finding effects in standard pain regions. We also observe more cerebellar activity than observed in previous studies using prevailing methods.




etho

A comparison of principal component methods between multiple phenotype regression and multiple SNP regression in genetic association studies

Zhonghua Liu, Ian Barnett, Xihong Lin.

Source: The Annals of Applied Statistics, Volume 14, Number 1, 433--451.

Abstract:
Principal component analysis (PCA) is a popular method for dimension reduction in unsupervised multivariate analysis. However, existing ad hoc uses of PCA in both multivariate regression (multiple outcomes) and multiple regression (multiple predictors) lack theoretical justification. The differences in the statistical properties of PCAs in these two regression settings are not well understood. In this paper we provide theoretical results on the power of PCA in genetic association testings in both multiple phenotype and SNP-set settings. The multiple phenotype setting refers to the case when one is interested in studying the association between a single SNP and multiple phenotypes as outcomes. The SNP-set setting refers to the case when one is interested in studying the association between multiple SNPs in a SNP set and a single phenotype as the outcome. We demonstrate analytically that the properties of the PC-based analysis in these two regression settings are substantially different. We show that the lower order PCs, that is, PCs with large eigenvalues, are generally preferred and lead to a higher power in the SNP-set setting, while the higher-order PCs, that is, PCs with small eigenvalues, are generally preferred in the multiple phenotype setting. We also investigate the power of three other popular statistical methods, the Wald test, the variance component test and the minimum $p$-value test, in both multiple phenotype and SNP-set settings. We use theoretical power, simulation studies, and two real data analyses to validate our findings.




etho

Scalable high-resolution forecasting of sparse spatiotemporal events with kernel methods: A winning solution to the NIJ “Real-Time Crime Forecasting Challenge”

Seth Flaxman, Michael Chirico, Pau Pereira, Charles Loeffler.

Source: The Annals of Applied Statistics, Volume 13, Number 4, 2564--2585.

Abstract:
We propose a generic spatiotemporal event forecasting method which we developed for the National Institute of Justice’s (NIJ) Real-Time Crime Forecasting Challenge (National Institute of Justice (2017)). Our method is a spatiotemporal forecasting model combining scalable randomized Reproducing Kernel Hilbert Space (RKHS) methods for approximating Gaussian processes with autoregressive smoothing kernels in a regularized supervised learning framework. While the smoothing kernels capture the two main approaches in current use in the field of crime forecasting, kernel density estimation (KDE) and self-exciting point process (SEPP) models, the RKHS component of the model can be understood as an approximation to the popular log-Gaussian Cox Process model. For inference, we discretize the spatiotemporal point pattern and learn a log-intensity function using the Poisson likelihood and highly efficient gradient-based optimization methods. Model hyperparameters including quality of RKHS approximation, spatial and temporal kernel lengthscales, number of autoregressive lags and bandwidths for smoothing kernels as well as cell shape, size and rotation, were learned using cross validation. Resulting predictions significantly exceeded baseline KDE estimates and SEPP models for sparse events.




etho

Bayesian methods for multiple mediators: Relating principal stratification and causal mediation in the analysis of power plant emission controls

Chanmin Kim, Michael J. Daniels, Joseph W. Hogan, Christine Choirat, Corwin M. Zigler.

Source: The Annals of Applied Statistics, Volume 13, Number 3, 1927--1956.

Abstract:
Emission control technologies installed on power plants are a key feature of many air pollution regulations in the US. While such regulations are predicated on the presumed relationships between emissions, ambient air pollution and human health, many of these relationships have never been empirically verified. The goal of this paper is to develop new statistical methods to quantify these relationships. We frame this problem as one of mediation analysis to evaluate the extent to which the effect of a particular control technology on ambient pollution is mediated through causal effects on power plant emissions. Since power plants emit various compounds that contribute to ambient pollution, we develop new methods for multiple intermediate variables that are measured contemporaneously, may interact with one another, and may exhibit joint mediating effects. Specifically, we propose new methods leveraging two related frameworks for causal inference in the presence of mediating variables: principal stratification and causal mediation analysis. We define principal effects based on multiple mediators, and also introduce a new decomposition of the total effect of an intervention on ambient pollution into the natural direct effect and natural indirect effects for all combinations of mediators. Both approaches are anchored to the same observed-data models, which we specify with Bayesian nonparametric techniques. We provide assumptions for estimating principal causal effects, then augment these with an additional assumption required for causal mediation analysis. The two analyses, interpreted in tandem, provide the first empirical investigation of the presumed causal pathways that motivate important air quality regulatory policies.




etho

Spatio-temporal short-term wind forecast: A calibrated regime-switching method

Ahmed Aziz Ezzat, Mikyoung Jun, Yu Ding.

Source: The Annals of Applied Statistics, Volume 13, Number 3, 1484--1510.

Abstract:
Accurate short-term forecasts are indispensable for the integration of wind energy in power grids. On a wind farm, local wind conditions exhibit sizeable variations at a fine temporal resolution. Existing statistical models may capture the in-sample variations in wind behavior, but are often shortsighted to those occurring in the near future, that is, in the forecast horizon. The calibrated regime-switching method proposed in this paper introduces an action of regime dependent calibration on the predictand (here the wind speed variable), which helps correct the bias resulting from out-of-sample variations in wind behavior. This is achieved by modeling the calibration as a function of two elements: the wind regime at the time of the forecast (and the calibration is therefore regime dependent), and the runlength, which is the time elapsed since the last observed regime change. In addition to regime-switching dynamics, the proposed model also accounts for other features of wind fields: spatio-temporal dependencies, transport effect of wind and nonstationarity. Using one year of turbine-specific wind data, we show that the calibrated regime-switching method can offer a wide margin of improvement over existing forecasting methods in terms of both wind speed and power.




etho

First-order covariance inequalities via Stein’s method

Marie Ernst, Gesine Reinert, Yvik Swan.

Source: Bernoulli, Volume 26, Number 3, 2051--2081.

Abstract:
We propose probabilistic representations for inverse Stein operators (i.e., solutions to Stein equations) under general conditions; in particular, we deduce new simple expressions for the Stein kernel. These representations allow to deduce uniform and nonuniform Stein factors (i.e., bounds on solutions to Stein equations) and lead to new covariance identities expressing the covariance between arbitrary functionals of an arbitrary univariate target in terms of a weighted covariance of the derivatives of the functionals. Our weights are explicit, easily computable in most cases and expressed in terms of objects familiar within the context of Stein’s method. Applications of the Cauchy–Schwarz inequality to these weighted covariance identities lead to sharp upper and lower covariance bounds and, in particular, weighted Poincaré inequalities. Many examples are given and, in particular, classical variance bounds due to Klaassen, Brascamp and Lieb or Otto and Menz are corollaries. Connections with more recent literature are also detailed.




etho

A new method for obtaining sharp compound Poisson approximation error estimates for sums of locally dependent random variables

Michael V. Boutsikas, Eutichia Vaggelatou

Source: Bernoulli, Volume 16, Number 2, 301--330.

Abstract:
Let X 1 , X 2 , …, X n be a sequence of independent or locally dependent random variables taking values in ℤ + . In this paper, we derive sharp bounds, via a new probabilistic method, for the total variation distance between the distribution of the sum ∑ i =1 n X i and an appropriate Poisson or compound Poisson distribution. These bounds include a factor which depends on the smoothness of the approximating Poisson or compound Poisson distribution. This “smoothness factor” is of order O( σ −2 ), according to a heuristic argument, where σ 2 denotes the variance of the approximating distribution. In this way, we offer sharp error estimates for a large range of values of the parameters. Finally, specific examples concerning appearances of rare runs in sequences of Bernoulli trials are presented by way of illustration.




etho

A Loss-Based Prior for Variable Selection in Linear Regression Methods

Cristiano Villa, Jeong Eun Lee.

Source: Bayesian Analysis, Volume 15, Number 2, 533--558.

Abstract:
In this work we propose a novel model prior for variable selection in linear regression. The idea is to determine the prior mass by considering the worth of each of the regression models, given the number of possible covariates under consideration. The worth of a model consists of the information loss and the loss due to model complexity. While the information loss is determined objectively, the loss expression due to model complexity is flexible and, the penalty on model size can be even customized to include some prior knowledge. Some versions of the loss-based prior are proposed and compared empirically. Through simulation studies and real data analyses, we compare the proposed prior to the Scott and Berger prior, for noninformative scenarios, and with the Beta-Binomial prior, for informative scenarios.




etho

A Bayesian Conjugate Gradient Method (with Discussion)

Jon Cockayne, Chris J. Oates, Ilse C.F. Ipsen, Mark Girolami.

Source: Bayesian Analysis, Volume 14, Number 3, 937--1012.

Abstract:
A fundamental task in numerical computation is the solution of large linear systems. The conjugate gradient method is an iterative method which offers rapid convergence to the solution, particularly when an effective preconditioner is employed. However, for more challenging systems a substantial error can be present even after many iterations have been performed. The estimates obtained in this case are of little value unless further information can be provided about, for example, the magnitude of the error. In this paper we propose a novel statistical model for this error, set in a Bayesian framework. Our approach is a strict generalisation of the conjugate gradient method, which is recovered as the posterior mean for a particular choice of prior. The estimates obtained are analysed with Krylov subspace methods and a contraction result for the posterior is presented. The method is then analysed in a simulation study as well as being applied to a challenging problem in medical imaging.




etho

Statistical Methodology in Single-Molecule Experiments

Chao Du, S. C. Kou.

Source: Statistical Science, Volume 35, Number 1, 75--91.

Abstract:
Toward the last quarter of the 20th century, the emergence of single-molecule experiments enabled scientists to track and study individual molecules’ dynamic properties in real time. Unlike macroscopic systems’ dynamics, those of single molecules can only be properly described by stochastic models even in the absence of external noise. Consequently, statistical methods have played a key role in extracting hidden information about molecular dynamics from data obtained through single-molecule experiments. In this article, we survey the major statistical methodologies used to analyze single-molecule experimental data. Our discussion is organized according to the types of stochastic models used to describe single-molecule systems as well as major experimental data collection techniques. We also highlight challenges and future directions in the application of statistical methodologies to single-molecule experiments.




etho

Comment on “Automated Versus Do-It-Yourself Methods for Causal Inference: Lessons Learned from a Data Analysis Competition”

Susan Gruber, Mark J. van der Laan.

Source: Statistical Science, Volume 34, Number 1, 82--85.

Abstract:
Dorie and co-authors (DHSSC) are to be congratulated for initiating the ACIC Data Challenge. Their project engaged the community and accelerated research by providing a level playing field for comparing the performance of a priori specified algorithms. DHSSC identified themes concerning characteristics of the DGP, properties of the estimators, and inference. We discuss these themes in the context of targeted learning.




etho

Matching Methods for Causal Inference: A Review and a Look Forward

Elizabeth A. Stuart

Source: Statist. Sci., Volume 25, Number 1, 1--21.

Abstract:
When estimating causal effects using observational data, it is desirable to replicate a randomized experiment as closely as possible by obtaining treated and control groups with similar covariate distributions. This goal can often be achieved by choosing well-matched samples of the original treated and control groups, thereby reducing bias due to the covariates. Since the 1970s, work on matching methods has examined how to best choose treated and control subjects for comparison. Matching methods are gaining popularity in fields such as economics, epidemiology, medicine and political science. However, until now the literature and related advice has been scattered across disciplines. Researchers who are interested in using matching methods—or developing methods related to matching—do not have a single place to turn to learn about past and current research. This paper provides a structure for thinking about matching methods and guidance on their use, coalescing the existing research (both old and new) and providing a summary of where the literature on matching methods is now and where it should be headed.




etho

NFL to match at least $5M US raised by fans through draft telethon

Clunky at times, poignant at others, and exceptionally entertaining in spots, the NFL draft entered its third and final day with Cincinnati selecting an Appalachian State linebacker on Saturday.



  • Sports/Football/NFL

etho

Transcriptomics and Proteomics Methods for Xenopus Embryos and Tissues

The general field of quantitative biology has advanced significantly on the back of recent improvements in both sequencing technology and proteomics methods. The development of high-throughput, short-read sequencing has revolutionized RNA-based expression studies, while improvements in proteomics methods have enabled quantitative studies to attain better resolution. Here we introduce methods to undertake global analyses of gene expression through RNA and protein quantification in Xenopus embryos and tissues.




etho

Methods for Measuring the Concentrations of Proteins

Determining the concentration of protein samples generally is accomplished either by measuring the UV absorbance at 280 nm or by reacting the protein quantitatively with dyes and/or metal ions (Bradford, Lowry, or BCA assays). For purified proteins, UV absorbance remains the most popular method because it is fast, convenient, and reproducible; it does not consume the protein; and it requires no additional reagents, standards, or incubations. No method of protein concentration determination is perfect because each is subject to a different set of constraints such as interference of buffer components and contaminating proteins in direct UV determination (A280) or reactivity of individual proteins and buffer components with the detecting reagents in colorimetric assays. In cases in which protein concentration is critical (e.g., determination of catalytic rate constants for an enzyme), it may be advisable to compare the results of several assays.




etho

A Simple Method To Detect Point Mutations in Aspergillus fumigatus cyp51A Gene Using a Surveyor Nuclease Assay [Analytical Procedures]

One of the main mechanisms of azole resistance of Aspergillus fumigatus is thought to be a reduction in the drug’s affinity for the target molecule, Cyp51A, due to its amino acid mutation(s). It is known that the azole resistance pattern is closely related to the mutation site(s) of the molecule. In this study, we tried to develop a simple and rapid detection method for cyp51A mutations using the endonuclease Surveyor nuclease. The Surveyor nuclease assay was verified using several azole-resistant strains of A. fumigatus that possess point mutations in Cyp51A. For validation of the Surveyor nuclease assay, blind tests were conducted using 48 strains of A. fumigatus (17 azole-resistant and 31 azole-susceptible strains). The Surveyor nuclease assay could rapidly detect cyp51A mutations with one primer set. Also, all the tested strains harboring different cyp51A single point mutations could be clearly distinguished from the wild type. The Surveyor nuclease assay is a simple method that can detect cyp51A mutations rapidly.




etho

Wis. Class-Size Study Yields Advice On Teachers' Methods

New findings on a state initiative in Wisconsin suggest that to make the most out of smaller class sizes in the early grades, teachers should focus on basic skills when they have one-on-one contact with students, ask children to discuss and demonstrate what they know, and have a firm, but nurturing,




etho

Clustering of Risk Factors: A Simple Method of Detecting Cardiovascular Disease in Youth

Cardiovascular risk factors predict the development of premature atherosclerosis. As the number of risk factors increases, so does the extent of these lesions. Assessment of cardiovascular risk factors is an accepted practice in adults but is not used in pediatrics.

In this study, the authors discuss how the presence of ≥2 cardiovascular risk factors is associated with vascular changes in adolescents. The findings were compared with the Patholobiological Determinants of Atherosclerosis in Youth risk score to demonstrate that a simple method of clustering is a reliable tool to use in clinical practice. (Read the full article)




etho

Rates of Nonsuicidal Self-Injury in Youth: Age, Sex, and Behavioral Methods in a Community Sample

Known rates of nonsuicidal self-injury, hurting oneself without the intent to die, are between ~7% and 24% in samples of early adolescents and older adolescents, yet research has not reported rates for youth younger than 11 years old.

Children as young as 7 years old report engaging in nonsuicidal self-injury. There is a grade by gender interaction for nonsuicidal self-injury, such that ninth-grade girls report the greatest rates of engagement and do so by cutting themselves. (Read the full article)




etho

Risk Adjustment for Neonatal Surgery: A Method for Comparison of In-Hospital Mortality

Evaluation of neonatal surgical outcomes is necessary to guide improvements in the quality of care. Meaningful comparisons must adjust for factors that alter outcomes independent of the surgical procedures.

Herein is described a method that permits risk adjustment for the broad range of noncardiac neonatal surgery, regardless of gestational age, to permit useful comparisons for quality improvement. (Read the full article)




etho

Expected Body Weight in Adolescents: Comparison Between Weight-for-Stature and BMI Methods

In adolescents with eating disorders, percent expected body weight (EBW) is used for diagnosis and to make clinical decisions. The assumption is that the weight-for-stature (WFS) and BMI methods of determining EBW are equivalent, but that may not be true.

This study demonstrates that EBWWFS is ~3.5% higher than EBWBMI. Differences are most pronounced at extremes of height. Compared with the EBWWFS method, sensitivity of EBWBMI to detect those <75% EBW is low. (Read the full article)




etho

Trends in Adverse Reactions to Trimethoprim-Sulfamethoxazole

Antimicrobials are a medication class frequently implicated in pediatric adverse drug reactions (ADRs). Trimethoprim-sulfamethoxazole (TMP-SMX) is long recognized as a contributor to the burden of these undesired and unpredictable events.

TMP-SMX ADRs increased from 2000 to 2009, with the majority of children taking the antibiotic for skin and soft tissue infections. The significant increase in TMP-SMX prescribing for these infections may result in a continued increase of associated ADRs. (Read the full article)




etho

Pediatric Medical Complexity Algorithm: A New Method to Stratify Children by Medical Complexity

Quality measures developed by the Pediatric Quality Measures Program are required to assess disparities in performance according to special health care need status. Methods are needed to identify children according to level of medical complexity in administrative data.

The Pediatric Medical Complexity Algorithm is a new, publicly available algorithm that identifies the small proportion of children with complex chronic disease in Medicaid claims and hospital discharge data with good sensitivity and good to excellent specificity. (Read the full article)




etho

Implementation Methods for Delivery Room Management: A Quality Improvement Comparison Study

Quality improvement (QI) studies generally do not account for concurrent trends of improvement and it is difficult to distinguish the impact of a multihospital collaborative QI project without a contemporary control group.

A multihospital collaborative QI model led to greater declines in hypothermia and invasive ventilation rates in the delivery room compared with an individual NICU QI model and NICUs that did not participate in formal QI activities. (Read the full article)




etho

Improvement Methodology Increases Guideline Recommended Blood Cultures in Children With Pneumonia

Blood cultures are the most widely available diagnostic tool to identify bacterial pathogens in community-acquired pneumonia (CAP). Despite a recent national guideline recommendation for blood culture performance in children with moderate/severe CAP, there is still wide variation across institutions.

Using improvement methodology, we demonstrated that blood cultures can be routinely performed in children admitted for CAP, in accordance with a recent national guideline, without increasing length of stay in a setting with a low false-positive blood culture rate. (Read the full article)




etho

Species Distribution and Comparison between EUCAST and Gradient Concentration Strips Methods for Antifungal Susceptibility Testing of 112 Aspergillus Section Nigri Isolates [Susceptibility]

Aspergillus niger, the third species responsible for invasive aspergillosis has been considered as a homogeneous species until DNA-based identification uncovered many cryptic species. These species have been recently reclassified into the Aspergillus section Nigri. However little is yet known among the section Nigri about the species distribution and the antifungal susceptibility pattern of each cryptic species. A total of 112 clinical isolates collected from 5 teaching hospitals in France and phenotypically identified as A. niger were analyzed. Identification to the species level was carried out by nucleotide sequence analysis. The Minimum Inhibitory Concentrations (MICs) of itraconazole, voriconazole, posaconazole, isavuconazole and amphotericin B were determined by both the EUCAST and gradient concentration strips methods. Aspergillus tubingensis (n=51, 45.5%) and A. welwitschiae (n=50, 44.6%) were the most common species while A. niger accounted for only 6.3% (n=7). The MICs of azoles drugs were higher for A. tubingensis than for A. welwitschiae. The MIC of amphotericin B was 2 mg/L or less for all isolates. Importantly, MICs determined by EUCAST showed no correlation with those determined by gradient concentration strips methods, these latter being lower than the former (Spearman's rank correlation tests ranging - depending on the antifungal agent - from 0.01 to 0.25; p>0.4). In conclusion, A. niger should be considered as a minority species in the section Nigri. The differences in MICs between species for different azoles underline the importance of accurate identification. Significant divergences in the determination of MIC between EUCAST and gradient concentration strips methods require further investigation.




etho

Advanced quantification methods to improve the 18b dormancy model for assessing the activity of tuberculosis drugs in vitro. [Clinical Therapeutics]

One of the reasons for the lengthy tuberculosis (TB) treatment is the difficult to treat non-multiplying mycobacterial subpopulation. In order to assess the ability of (new) TB drugs to target this subpopulation, we need to incorporate dormancy models in our pre-clinical drug development pipeline. In most available dormancy models it takes a long time to create a dormant state and it is difficult to identify and quantify this non-multiplying condition.

The Mycobacterium tuberculosis 18b strain might overcome some of these problems, because it is dependent on streptomycin for growth and becomes non-multiplying after 10 days of streptomycin starvation, but still can be cultured on streptomycin-supplemented culture plates. We developed our 18b dormancy time-kill kinetic model to assess the difference in the activity of isoniazid, rifampicin, moxifloxacin and bedaquiline against log-phase growth compared to the non-multiplying M. tuberculosis subpopulation by CFU counting including a novel AUC-based approach as well as time-to-positivity (TTP) measurements.

We observed that isoniazid and moxifloxacin were relatively more potent against replicating bacteria, while rifampicin and high dose bedaquiline were equally effective against both subpopulations. Moreover, the TTP data suggest that including a liquid culture-based method could be of additional value as it identifies a specific mycobacterial subpopulation that is non-culturable on solid media.

In conclusion, the results of our study underline that the time-kill kinetics 18b dormancy model in its current form is a useful tool to assess TB drug potency and thus has its place in the TB drug development pipeline.




etho

A SIMPLE PHENYLALANINE METHOD FOR DETECTING PHENYLKETONURIA IN LARGE POPULATIONS OF NEWBORN INFANTS

Robert Guthrie
Sep 1, 1963; 32:338-343
ARTICLES