data

Using “Dumb Data” To Make Smart Design Decisions

September 23, 2015

As an industry, we’ve worked to established many new practices and tools for nimble design teams, from A/B testing to measuring bounce rates and CTR performance. But a lot of these methods require engineers or some amount of technical know-how to execute, and they take place only after something has been launched.

The judicious application of “dumb data” can streamline your workflow and improve your designs

What many people don’t know is that there are some unexpected applications of data to consider earlier in the design process, which you, the designer, can do yourself. They’re not fancy, and you don’t need to know how to write SQL queries. The judicious application of just-enough “dumb data” can streamline your workflow and improve your designs in surprisingly useful ways.

Here are...read more
By Jocelyn Lin

             




data

Technology to help Delhi Police breach iPhones, get back deleted data

Technology to help Delhi Police breach iPhones, get back deleted data




data

SAD's criticism of employment generation programme based on flawed data: Capt Amarinder




data

Technology to help Delhi Police breach iPhones, get back deleted data

Delhi Police is going to procure a technology which will help the cops extract data from iPhones seized by them from suspects. The software can even extract deleted data and data from installed applications on the phones.




data

ProQEXAFS: a highly optimized parallelized rapid processing software for QEXAFS data

The high temporal resolution in data acquisition, possible in the quick-scanning EXAFS (QEXAFS) mode of operation, provides new challenges in efficient data processing methods. Here a new approach is developed that combines an easy to use interactive graphical interface with highly optimized and fully parallelized Python-based routines for extracting, normalizing and interpolating oversampled time-resolved XAS spectra from a raw binary stream of data acquired during operando QEXAFS studies. The programs developed are freely available via a Github repository.





data

Scaling diffraction data in the DIALS software package: algorithms and new approaches for multi-crystal scaling

A new scaling program is presented with new features to support multi-sweep workflows and analysis within the DIALS software package.




data

Rubidium tetra­fluorido­bromate(III): redetermination of the crystal structure from single-crystal X-ray diffraction data

Single crystals of rubidium tetra­fluorido­bromate(III), RbBrF4, were grown by melting and recrystallizing RbBrF4 from its melt. This is the first determination of the crystal structure of RbBrF4 using single-crystal X-ray diffraction data. We confirmed that the structure contains square-planar [BrF4]− anions and rubidium cations that are coordinated by F atoms in a square-anti­prismatic manner. The compound crystallizes in the KBrF4 structure type. Atomic coordinates and bond lengths and angles were determined with higher precision than in a previous report based on powder X-ray diffraction data [Ivlev et al. (2015). Z. Anorg. Allg. Chem. 641, 2593–2598].




data

Redetermination of the crystal structure of caesium tetra­fluorido­bromate(III) from single-crystal X-ray diffraction data

Caesium tetra­fluorido­bromate(III), CsBrF4, was crystallized in form of small blocks by melting and recrystallization. The crystal structure of CsBrF4 was redetermined from single-crystal X-ray diffraction data. In comparison with a previous study based on powder X-ray diffraction data [Ivlev et al. (2013). Z. Anorg. Allg. Chem. 639, 2846–2850], bond lengths and angles were determined with higher precision, and all atoms were refined with anisotropic displacement parameters. It was confirmed that the structure of CsBrF4 contains two square-planar [BrF4]− anions each with point group symmetry mmm, and a caesium cation (site symmetry mm2) that is coordinated by twelve fluorine atoms, forming an anti­cubocta­hedron. CsBrF4 is isotypic with CsAuF4.





data

Crystal structure of 3,14-diethyl-2,13-di­aza-6,17-diazo­niatri­cyclo­[16.4.0.07,12]docosane dinitrate dihydrate from synchrotron X-ray data

The crystal structure of title salt, C22H46N42+·2NO3−·2H2O, has been determined using synchrotron radiation at 220 K. The structure determination reveals that protonation has occurred at diagonally opposite amine N atoms. The asymmetric unit contains half a centrosymmetric dication, one nitrate anion and one water mol­ecule. The mol­ecular dication, C22H46N42+, together with the nitrate anion and hydrate water mol­ecule are involved in an extensive range of hydrogen bonds. The mol­ecule is stabilized, as is the conformation of the dication, by forming inter­molecular N—H⋯O, O—H⋯O, together with intra­molecular N—H⋯N hydrogen bonds.




data

Crystal structure of 1,4,8,11-tetra­methyl-1,4,8,11-tetra­azonia­cyclo­tetra­decane bis­(perchlorate) dichloride from synchrotron X-ray data

The crystal structure of title salt, C14H36N44+·2ClO4−·2Cl−, has been determined using synchrotron radiation at 220 K. The structure determination reveals that protonation has occurred at all four amine N atoms. The asymmetric unit contains one half-cation (completed by crystallographic inversion symmetry), one perchlorate anion and one chloride anion. A distortion of the perchlorate anion is due to its involvement in hydrogen-bonding inter­actions with the cations. The crystal structure is consolidated by inter­molecular hydrogen bonds involving the 1,4,8,11-tetra­methyl-1,4,8,11-tetra­azonia­cyclo­tetra­decane N—H and C—H groups as donor groups, and the O atoms of the perchlorate and chloride anion as acceptor groups, giving rise to a three-dimensional network.




data

Redetermination of the crystal structure of R5Si4 (R = Pr, Nd) from single-crystal X-ray diffraction data

The crystal structures of praseodymium silicide (5/4), Pr5Si4, and neodymium silicide (5/4), Nd5Si4, were redetermined using high-quality single-crystal X-ray diffraction data. The previous structure reports of Pr5Si4 were only based on powder X-ray diffraction data [Smith et al. (1967). Acta Cryst. 22 940–943; Yang et al. (2002b). J. Alloys Compd. 339, 189–194; Yang et al., (2003). J. Alloys Compd. 263, 146–153]. On the other hand, the structure of Nd5Si4 has been determined from powder data [neutron; Cadogan et al., (2002). J. Phys. Condens. Matter, 14, 7191–7200] and X-ray [Smith et al. (1967). Acta Cryst. 22 940–943; Yang et al. (2002b). J. Alloys Compd. 339, 189–194; Yang et al., (2003). J. Alloys Compd. 263, 146–153] and single-crystal data with isotropic atomic displacement parameters [Roger et al., (2006). J. Alloys Compd. 415, 73–84]. In addition, the anisotropic atomic displacement parameters for all atomic sites have been determined for the first time. These compounds are confirmed to have the tetra­gonal Zr5Si4-type structure (space group: P41212), as reported previously (Smith et al., 1967). The structure is built up by distorted body-centered cubes consisting of Pr(Nd) atoms, which are linked to each other by edge-sharing to form a three-dimensional framework. This framework delimits zigzag channels in which the silicon dimers are situated.




data

Crystal structure of 1,4,8,11-tetra­methyl-1,4,8,11-tetra­azonia­cyclo­tetra­decane bis­[chlorido­chromate(VI)] dichloride from synchrotron X-ray data

The crystal structure of title compound, (C14H36N4)[CrO3Cl]2Cl2, has been determined by synchrotron radiation X-ray crystallography at 220 K. The macrocyclic cation lies across a crystallographic inversion center and hence the asymmetric unit contains one half of the organic cation, one chloro­chromate anion and one chloride anion. Both the Cl− anion and chloro­chromate Cl atom are involved in hydrogen bonding. In the crystal, hydrogen bonds involving the 1,4,8,11-tetra­methyl-1,4,8,11-tetra­azonia­cyclo­tetra­decane (TMC) N—H groups and C—H groups as donor groups and three O atoms of the chloro­chromate and the chloride anion as acceptor groups link the components, giving rise to a three-dimensional network.




data

Equatorial aberration of powder diffraction data collected with an Si strip X-ray detector by a continuous-scan integration method

Exact and approximate mathematical formulas of equatorial aberration for powder diffraction data collected with an Si strip X-ray detector in continuous-scan integration mode are presented. An approximate formula is applied to treat the experimental data measured with a commercial powder diffractometer.




data

Accurate high-resolution single-crystal diffraction data from a Pilatus3 X CdTe detector

Hybrid photon-counting detectors are widely established at third-generation synchrotron facilities and the specifications of the Pilatus3 X CdTe were quickly recognized as highly promising in charge-density investigations. This is mainly attributable to the detection efficiency in the high-energy X-ray regime, in combination with a dynamic range and noise level that should overcome the perpetual problem of detecting strong and weak data simultaneously. These benefits, however, come at the expense of a persistent problem for high diffracted beam flux, which is particularly problematic in single-crystal diffraction of materials with strong scattering power and sharp diffraction peaks. Here, an in-depth examination of data collected on an inorganic material, FeSb2, and an organic semiconductor, rubrene, revealed systematic differences in strong intensities for different incoming beam fluxes, and the implemented detector intensity corrections were found to be inadequate. Only significant beam attenuation for the collection of strong reflections was able to circumvent this systematic error. All data were collected on a bending-magnet beamline at a third-generation synchrotron radiation facility, so undulator and wiggler beamlines and fourth-generation synchrotrons will be even more prone to this error. On the other hand, the low background now allows for an accurate measurement of very weak intensities, and it is shown that it is possible to extract structure factors of exceptional quality using standard crystallographic software for data processing (SAINT-Plus, SADABS and SORTAV), although special attention has to be paid to the estimation of the background. This study resulted in electron-density models of substantially higher accuracy and precision compared with a previous investigation, thus for the first time fulfilling the promise of photon-counting detectors for very accurate structure factor measurements.




data

Screening topological materials with a CsCl-type structure in crystallographic databases

CsCl-type materials have many outstanding characteristics, i.e. simple in structure, ease of synthesis and good stability at room temperature, thus are an excellent choice for designing functional materials. Using high-throughput first-principles calculations, a large number of topological semimetals/metals (TMs) were designed from CsCl-type materials found in crystallographic databases and their crystal and electronic structures have been studied. The CsCl-type TMs in this work show rich topological character, ranging from triple nodal points, type-I nodal lines and critical-type nodal lines, to hybrid nodal lines. The TMs identified show clean topological band structures near the Fermi level, which are suitable for experimental investigations and future applications. This work provides a rich data set of TMs with a CsCl-type structure.




data

Charge densities in actinide compounds: strategies for data reduction and model building

The data quality requirements for charge density studies on actinide compounds are extreme. Important steps in data collection and reduction required to obtain such data are summarized and evaluated. The steps involved in building an augmented Hansen–Coppens multipole model for an actinide pseudo-atom are provided. The number and choice of radial functions, in particular the definition of the core, valence and pseudo-valence terms are discussed. The conclusions in this paper are based on a re-examination and improvement of a previously reported study on [PPh4][UF6]. Topological analysis of the total electron density shows remarkable agreement between experiment and theory; however, there are significant differences in the Laplacian distribution close to the uranium atoms which may be due to the effective core potential employed for the theoretical calculations.




data

Refinement for single-nanoparticle structure determination from low-quality single-shot coherent diffraction data

With the emergence of X-ray free-electron lasers, it is possible to investigate the structure of nanoscale samples by employing coherent diffractive imaging in the X-ray spectral regime. In this work, we developed a refinement method for structure reconstruction applicable to low-quality coherent diffraction data. The method is based on the gradient search method and considers the missing region of a diffraction pattern and the small number of detected photons. We introduced an initial estimate of the structure in the method to improve the convergence. The present method is applied to an experimental diffraction pattern of an Xe cluster obtained in an X-ray scattering experiment at the SPring-8 Angstrom Compact free-electron LAser (SACLA) facility. It is found that the electron density is successfully reconstructed from the diffraction pattern with a large missing region, with a good initial estimate of the structure. The diffraction pattern calculated from the reconstructed electron density reproduced the observed diffraction pattern well, including the characteristic intensity modulation in each ring. Our refinement method enables structure reconstruction from diffraction patterns under difficulties such as missing areas and low diffraction intensity, and it is potentially applicable to the structure determination of samples that have low scattering power.




data

3D-MiXD: 3D-printed X-ray-compatible microfluidic devices for rapid, low-consumption serial synchrotron crystallography data collection in flow

Serial crystallography has enabled the study of complex biological questions through the determination of biomolecular structures at room temperature using low X-ray doses. Furthermore, it has enabled the study of protein dynamics by the capture of atomically resolved and time-resolved molecular movies. However, the study of many biologically relevant targets is still severely hindered by high sample consumption and lengthy data-collection times. By combining serial synchrotron crystallography (SSX) with 3D printing, a new experimental platform has been created that tackles these challenges. An affordable 3D-printed, X-ray-compatible microfluidic device (3D-MiXD) is reported that allows data to be collected from protein microcrystals in a 3D flow with very high hit and indexing rates, while keeping the sample consumption low. The miniaturized 3D-MiXD can be rapidly installed into virtually any synchrotron beamline with only minimal adjustments. This efficient collection scheme in combination with its mixing geometry paves the way for recording molecular movies at synchrotrons by mixing-triggered millisecond time-resolved SSX.




data

Estimation of high-order aberrations and anisotropic magnification from cryo-EM data sets in RELION-3.1

Methods are presented that detect three types of aberrations in single-particle cryo-EM data sets: symmetrical and antisymmetrical optical aberrations and magnification anisotropy. Because these methods only depend on the availability of a preliminary 3D reconstruction from the data, they can be used to correct for these aberrations for any given cryo-EM data set, a posteriori. Using five publicly available data sets, it is shown that considering these aberrations improves the resolution of the 3D reconstruction when these effects are present. The methods are implemented in version 3.1 of the open-source software package RELION.




data

The predictive power of data-processing statistics

This study describes a method to estimate the likelihood of success in determining a macromolecular structure by X-ray crystallography and experimental single-wavelength anomalous dispersion (SAD) or multiple-wavelength anomalous dispersion (MAD) phasing based on initial data-processing statistics and sample crystal properties. Such a predictive tool can rapidly assess the usefulness of data and guide the collection of an optimal data set. The increase in data rates from modern macromolecular crystallography beamlines, together with a demand from users for real-time feedback, has led to pressure on computational resources and a need for smarter data handling. Statistical and machine-learning methods have been applied to construct a classifier that displays 95% accuracy for training and testing data sets compiled from 440 solved structures. Applying this classifier to new data achieved 79% accuracy. These scores already provide clear guidance as to the effective use of computing resources and offer a starting point for a personalized data-collection assistant.




data

Prediction of models for ordered solvent in macromolecular structures by a classifier based upon resolution-independent projections of local feature data

Current software tools for the automated building of models for macro­molecular X-ray crystal structures are capable of assembling high-quality models for ordered macromolecule and small-molecule scattering components with minimal or no user supervision. Many of these tools also incorporate robust functionality for modelling the ordered water molecules that are found in nearly all macromolecular crystal structures. However, no current tools focus on differentiating these ubiquitous water molecules from other frequently occurring multi-atom solvent species, such as sulfate, or the automated building of models for such species. PeakProbe has been developed specifically to address the need for such a tool. PeakProbe predicts likely solvent models for a given point (termed a `peak') in a structure based on analysis (`probing') of its local electron density and chemical environment. PeakProbe maps a total of 19 resolution-dependent features associated with electron density and two associated with the local chemical environment to a two-dimensional score space that is independent of resolution. Peaks are classified based on the relative frequencies with which four different classes of solvent (including water) are observed within a given region of this score space as determined by large-scale sampling of solvent models in the Protein Data Bank. Designed to classify peaks generated from difference density maxima, PeakProbe also incorporates functionality for identifying peaks associated with model errors or clusters of peaks likely to correspond to multi-atom solvent, and for the validation of existing solvent models using solvent-omit electron-density maps. When tasked with classifying peaks into one of four distinct solvent classes, PeakProbe achieves greater than 99% accuracy for both peaks derived directly from the atomic coordinates of existing solvent models and those based on difference density maxima. While the program is still under development, a fully functional version is publicly available. PeakProbe makes extensive use of cctbx libraries, and requires a PHENIX licence and an up-to-date phenix.python environment for execution.




data

Methods for merging data sets in electron cryo-microscopy

Recent developments have resulted in electron cryo-microscopy (cryo-EM) becoming a useful tool for the structure determination of biological macromolecules. For samples containing inherent flexibility, heterogeneity or preferred orientation, the collection of extensive cryo-EM data using several conditions and microscopes is often required. In such a scenario, merging cryo-EM data sets is advantageous because it allows improved three-dimensional reconstructions to be obtained. Since data sets are not always collected with the same pixel size, merging data can be challenging. Here, two methods to combine cryo-EM data are described. Both involve the calculation of a rescaling factor from independent data sets. The effects of errors in the scaling factor on the results of data merging are also estimated. The methods described here provide a guideline for cryo-EM users who wish to combine data sets from the same type of microscope and detector.




data

SAD phasing of XFEL data depends critically on the error model

A nonlinear least-squares method for refining a parametric expression describing the estimated errors of reflection intensities in serial crystallographic (SX) data is presented. This approach, which is similar to that used in the rotation method of crystallographic data collection at synchrotrons, propagates error estimates from photon-counting statistics to the merged data. Here, it is demonstrated that the application of this approach to SX data provides better SAD phasing ability, enabling the autobuilding of a protein structure that had previously failed to be built. Estimating the error in the merged reflection intensities requires the understanding and propagation of all of the sources of error arising from the measurements. One type of error, which is well understood, is the counting error introduced when the detector counts X-ray photons. Thus, if other types of random errors (such as readout noise) as well as uncertainties in systematic corrections (such as from X-ray attenuation) are completely understood, they can be propagated along with the counting error, as appropriate. In practice, most software packages propagate as much error as they know how to model and then include error-adjustment terms that scale the error estimates until they explain the variance among the measurements. If this is performed carefully, then during SAD phasing likelihood-based approaches can make optimal use of these error estimates, increasing the chance of a successful structure solution. In serial crystallography, SAD phasing has remained challenging, with the few examples of de novo protein structure solution each requiring many thousands of diffraction patterns. Here, the effects of different methods of treating the error estimates are estimated and it is shown that using a parametric approach that includes terms proportional to the known experimental uncertainty, the reflection intensity and the squared reflection intensity to improve the error estimates can allow SAD phasing even from weak zinc anomalous signal.




data

Molecular replacement using structure predictions from databases

Molecular replacement (MR) is the predominant route to solution of the phase problem in macromolecular crystallography. Where the lack of a suitable homologue precludes conventional MR, one option is to predict the target structure using bioinformatics. Such modelling, in the absence of homologous templates, is called ab initio or de novo modelling. Recently, the accuracy of such models has improved significantly as a result of the availability, in many cases, of residue-contact predictions derived from evolutionary covariance analysis. Covariance-assisted ab initio models representing structurally uncharacterized Pfam families are now available on a large scale in databases, potentially representing a valuable and easily accessible supplement to the PDB as a source of search models. Here, the unconventional MR pipeline AMPLE is employed to explore the value of structure predictions in the GREMLIN and PconsFam databases. It was tested whether these deposited predictions, processed in various ways, could solve the structures of PDB entries that were subsequently deposited. The results were encouraging: nine of 27 GREMLIN cases were solved, covering target lengths of 109–355 residues and a resolution range of 1.4–2.9 Å, and with target–model shared sequence identity as low as 20%. The cluster-and-truncate approach in AMPLE proved to be essential for most successes. For the overall lower quality structure predictions in the PconsFam database, remodelling with Rosetta within the AMPLE pipeline proved to be the best approach, generating ensemble search models from single-structure deposits. Finally, it is shown that the AMPLE-obtained search models deriving from GREMLIN deposits are of sufficiently high quality to be selected by the sequence-independent MR pipeline SIMBAD. Overall, the results help to point the way towards the optimal use of the expanding databases of ab initio structure predictions.




data

Refinement of protein structures using a combination of quantum-mechanical calculations with neutron and X-ray crystallographic data. Corrigendum

Corrections are published for the article by Caldararu et al. [(2019), Acta Cryst. D75, 368–380].




data

Measuring and using information gained by observing diffraction data

The information gained by making a measurement, termed the Kullback–Leibler divergence, assesses how much more precisely the true quantity is known after the measurement was made (the posterior probability distribution) than before (the prior probability distribution). It provides an upper bound for the contribution that an observation can make to the total likelihood score in likelihood-based crystallographic algorithms. This makes information gain a natural criterion for deciding which data can legitimately be omitted from likelihood calculations. Many existing methods use an approximation for the effects of measurement error that breaks down for very weak and poorly measured data. For such methods a different (higher) information threshold is appropriate compared with methods that account well for even large measurement errors. Concerns are raised about a current trend to deposit data that have been corrected for anisotropy, sharpened and pruned without including the original unaltered measurements. If not checked, this trend will have serious consequences for the reuse of deposited data by those who hope to repeat calculations using improved new methods.




data

Scaling diffraction data in the DIALS software package: algorithms and new approaches for multi-crystal scaling

In processing X-ray diffraction data, the intensities obtained from integration of the diffraction images must be corrected for experimental effects in order to place all intensities on a common scale both within and between data collections. Scaling corrects for effects such as changes in sample illumination, absorption and, to some extent, global radiation damage that cause the measured intensities of symmetry-equivalent observations to differ throughout a data set. This necessarily requires a prior evaluation of the point-group symmetry of the crystal. This paper describes and evaluates the scaling algorithms implemented within the DIALS data-processing package and demonstrates the effectiveness and key features of the implementation on example macromolecular crystallographic rotation data. In particular, the scaling algorithms enable new workflows for the scaling of multi-crystal or multi-sweep data sets, providing the analysis required to support current trends towards collecting data from ever-smaller samples. In addition, the implementation of a free-set validation method is discussed, which allows the quantification of the suitability of scaling-model and algorithm choices.




data

Estimating signal and noise of time-resolved X-ray solution scattering data at synchrotrons and XFELs

Elucidating the structural dynamics of small molecules and proteins in the liquid solution phase is essential to ensure a fundamental understanding of their reaction mechanisms. In this regard, time-resolved X-ray solution scattering (TRXSS), also known as time-resolved X-ray liquidography (TRXL), has been established as a powerful technique for obtaining the structural information of reaction intermediates and products in the liquid solution phase and is expected to be applied to a wider range of molecules in the future. A TRXL experiment is generally performed at the beamline of a synchrotron or an X-ray free-electron laser (XFEL) to provide intense and short X-ray pulses. Considering the limited opportunities to use these facilities, it is necessary to verify the plausibility of a target experiment prior to the actual experiment. For this purpose, a program has been developed, referred to as S-cube, which is short for a Solution Scattering Simulator. This code allows the routine estimation of the shape and signal-to-noise ratio (SNR) of TRXL data from known experimental parameters. Specifically, S-cube calculates the difference scattering curve and the associated quantum noise on the basis of the molecular structure of the target reactant and product, the target solvent, the energy of the pump laser pulse and the specifications of the beamline to be used. Employing a simplified form for the pair-distribution function required to calculate the solute–solvent cross term greatly increases the calculation speed as compared with a typical TRXL data analysis. Demonstrative applications of S-cube are presented, including the estimation of the expected TRXL data and SNR level for the future LCLS-II HE beamlines.




data

GIDVis: a comprehensive software tool for geometry-independent grazing-incidence X-ray diffraction data analysis and pole-figure calculations

GIDVis is a software package based on MATLAB specialized for, but not limited to, the visualization and analysis of grazing-incidence thin-film X-ray diffraction data obtained during sample rotation around the surface normal. GIDVis allows the user to perform detector calibration, data stitching, intensity corrections, standard data evaluation (e.g. cuts and integrations along specific reciprocal-space directions), crystal phase analysis etc. To take full advantage of the measured data in the case of sample rotation, pole figures can easily be calculated from the experimental data for any value of the scattering angle covered. As an example, GIDVis is applied to phase analysis and the evaluation of the epitaxial alignment of pentacene­quinone crystallites on a single-crystalline Au(111) surface.




data

ClickX: a visualization-based program for preprocessing of serial crystallography data

Serial crystallography is a powerful technique in structure determination using many small crystals at X-ray free-electron laser or synchrotron radiation facilities. The large diffraction data volumes require high-throughput software to preprocess the raw images for subsequent analysis. ClickX is a program designated for serial crystallography data preprocessing, capable of rapid data sorting for online feedback and peak-finding refinement by parameter optimization. The graphical user interface (GUI) provides convenient access to various operations such as pattern visualization, statistics plotting and parameter tuning. A batch job module is implemented to facilitate large-data-volume processing. A two-step geometry calibration for single-panel detectors is also integrated into the GUI, where the beam center and detector tilting angles are optimized using an ellipse center shifting method first, then all six parameters, including the photon energy and detector distance, are refined together using a residual minimization method. Implemented in Python, ClickX has good portability and extensibility, so that it can be installed, configured and used on any computing platform that provides a Python interface or common data file format. ClickX has been tested in online analysis at the Pohang Accelerator Laboratory X-ray Free-Electron Laser, Korea, and the Linac Coherent Light Source, USA. It has also been applied in post-experimental data analysis. The source code is available via https://github.com/LiuLab-CSRC/ClickX under a GNU General Public License.




data

Recent developments in the Inorganic Crystal Structure Database: theoretical crystal structure data and related features

The Inorganic Crystal Structure Database (ICSD) is the world's largest database of fully evaluated and published crystal structure data, mostly obtained from experimental results. However, the purely experimental approach is no longer the only route to discover new compounds and structures. In the past few decades, numerous computational methods for simulating and predicting structures of inorganic solids have emerged, creating large numbers of theoretical crystal data. In order to take account of these new developments the scope of the ICSD was extended in 2017 to include theoretical structures which are published in peer-reviewed journals. Each theoretical structure has been carefully evaluated, and the resulting CIF has been extended and standardized. Furthermore, a first classification of theoretical data in the ICSD is presented, including additional categories used for comparison of experimental and theoretical information.




data

Efficient data extraction from neutron time-of-flight spin-echo raw data

Neutron spin-echo spectrometers with a position-sensitive detector and operating with extended time-of-flight-tagged wavelength frames are able to collect a comprehensive set of data covering a large range of wavevector and Fourier time space with only a few instrumental settings in a quasi-continuous way. Extracting all the information contained in the raw data and mapping them to a suitable physical space in the most efficient way is a challenge. This article reports algorithms employed in dedicated software, DrSpine (data reduction for spin echo), that achieves this goal and yields reliable representations of the intermediate scattering function S(Q, t) independent of the selected `binning'.




data

DatView: a graphical user interface for visualizing and querying large data sets in serial femtosecond crystallography

DatView is a new graphical user interface (GUI) for plotting parameters to explore correlations, identify outliers and export subsets of data. It was designed to simplify and expedite analysis of very large unmerged serial femtosecond crystallography (SFX) data sets composed of indexing results from hundreds of thousands of microcrystal diffraction patterns. However, DatView works with any tabulated data, offering its functionality to many applications outside serial crystallography. In DatView's user-friendly GUI, selections are drawn onto plots and synchronized across all other plots, so correlations between multiple parameters in large multi-parameter data sets can be rapidly identified. It also includes an item viewer for displaying images in the current selection alongside the associated metadata. For serial crystallography data processed by indexamajig from CrystFEL [White, Kirian, Martin, Aquila, Nass, Barty & Chapman (2012). J. Appl. Cryst. 45, 335–341], DatView generates a table of parameters and metadata from stream files and, optionally, the associated HDF5 files. By combining the functionality of several commonly needed tools for SFX in a single GUI that operates on tabulated data, the time needed to load and calculate statistics from large data sets is reduced. This paper describes how DatView facilitates (i) efficient feedback during data collection by examining trends in time, sample position or any parameter, (ii) determination of optimal indexing and integration parameters via the comparison mode, (iii) identification of systematic errors in unmerged SFX data sets, and (iv) sorting and highly flexible data filtering (plot selections, Boolean filters and more), including direct export of subset CrystFEL stream files for further processing.




data

Fast fitting of reflectivity data of growing thin films using neural networks

X-ray reflectivity (XRR) is a powerful and popular scattering technique that can give valuable insight into the growth behavior of thin films. This study shows how a simple artificial neural network model can be used to determine the thickness, roughness and density of thin films of different organic semiconductors [diindenoperylene, copper(II) phthalocyanine and α-sexithiophene] on silica from their XRR data with millisecond computation time and with minimal user input or a priori knowledge. For a large experimental data set of 372 XRR curves, it is shown that a simple fully connected model can provide good results with a mean absolute percentage error of 8–18% when compared with the results obtained by a genetic least mean squares fit using the classical Parratt formalism. Furthermore, current drawbacks and prospects for improvement are discussed.




data

PyMDA: microcrystal data assembly using Python

The recent developments at microdiffraction X-ray beamlines are making microcrystals of macromolecules appealing subjects for routine structural analysis. Microcrystal diffraction data collected at synchrotron microdiffraction beamlines may be radiation damaged with incomplete data per microcrystal and with unit-cell variations. A multi-stage data assembly method has previously been designed for microcrystal synchrotron crystallography. Here the strategy has been implemented as a Python program for microcrystal data assembly (PyMDA). PyMDA optimizes microcrystal data quality including weak anomalous signals through iterative crystal and frame rejections. Beyond microcrystals, PyMDA may be applicable for assembling data sets from larger crystals for improved data quality.




data

Simulation of small-angle X-ray scattering data of biological macromolecules in solution

This article presents IMSIM, an application to simulate two-dimensional small-angle X-ray scattering patterns and, further, one-dimensional profiles from biological macromolecules in solution. IMSIM implements a statistical approach yielding two-dimensional images in TIFF, CBF or EDF format, which may be readily processed by existing data-analysis pipelines. Intensities and error estimates of one-dimensional patterns obtained from the radial average of the two-dimensional images exhibit the same statistical properties as observed with actual experimental data. With initial input on an absolute scale, [cm−1]/c[mg ml−1], the simulated data frames may also be scaled to absolute scale such that the forward scattering after subtraction of the background is proportional to the molecular weight of the solute. The effects of changes of concentration, exposure time, flux, wavelength, sample–detector distance, detector dimensions, pixel size, and the mask as well as incident beam position can be considered for the simulation. The simulated data may be used in method development, for educational purposes, and also to determine the most suitable beamline setup for a project prior to the application and use of the actual beamtime. IMSIM is available as part of the ATSAS software package (3.0.0) and is freely available for academic use (http://www.embl-hamburg.de/biosaxs/download.html).




data

Reconstructing intragranular strain fields in polycrystalline materials from scanning 3DXRD data

Two methods for reconstructing intragranular strain fields are developed for scanning three-dimensional X-ray diffraction (3DXRD). The methods are compared with a third approach where voxels are reconstructed independently of their neighbours [Hayashi, Setoyama & Seno (2017). Mater. Sci. Forum, 905, 157–164]. The 3D strain field of a tin grain, located within a sample of approximately 70 grains, is analysed and compared across reconstruction methods. Implicit assumptions of sub-problem independence, made in the independent voxel reconstruction method, are demonstrated to introduce bias and reduce reconstruction accuracy. It is verified that the two proposed methods remedy these problems by taking the spatial properties of the inverse problem into account. Improvements in reconstruction quality achieved by the two proposed methods are further supported by reconstructions using synthetic diffraction data.




data

PtychoShelves, a versatile high-level framework for high-performance analysis of ptychographic data

Over the past decade, ptychography has been proven to be a robust tool for non-destructive high-resolution quantitative electron, X-ray and optical microscopy. It allows for quantitative reconstruction of the specimen's transmissivity, as well as recovery of the illuminating wavefront. Additionally, various algorithms have been developed to account for systematic errors and improved convergence. With fast ptychographic microscopes and more advanced algorithms, both the complexity of the reconstruction task and the data volume increase significantly. PtychoShelves is a software package which combines high-level modularity for easy and fast changes to the data-processing pipeline, and high-performance computing on CPUs and GPUs.




data

ACMS: a database of alternate conformations found in the atoms of main and side chains of protein structures

An online knowledge base on the alternate conformations adopted by main-chain and side-chain atoms in protein structures solved by X-ray crystallography is described.




data

Accurate high-resolution single-crystal diffraction data from a Pilatus3 X CdTe detector

Detailed analysis of the high-flux deficiencies of pixel-array detectors leads to a protocol for the measurement of structure factors of unprecedented accuracy even for inorganic materials, and this significantly advances the prospects for experimental electron-density investigations.




data

sasPDF: pair distribution function analysis of nanoparticle assemblies from small-angle scattering data

The sasPDF method, an extension of the atomic pair distribution function (PDF) analysis to the small-angle scattering (SAS) regime, is presented. The method is applied to characterize the structure of nanoparticle assemblies with different levels of structural order.




data

Equatorial aberration of powder diffraction data collected with an Si strip X-ray detector by a continuous-scan integration method

Exact and approximate formulas for equatorial aberration of a continuous-scan Si strip detector are compared.




data

Pattern matching indexing of Laue and monochromatic serial crystallography data for applications in Materials Science

An algorithm, based on the matching of q-vectors pairs, is combined with three-dimensional pattern matching using a nearest-neighbors approach to index Laue and monochromatic serial crystallography data recorded on small unit cell samples.




data

EDDIDAT: a graphical user interface for the analysis of energy-dispersive diffraction data

EDDIDAT is a program that provides a graphical user interface (GUI) for the evaluation of energy-dispersive X-ray diffraction data with the focus on the depth-resolved residual stress analysis.




data

Crystallization of chiral molecular compounds: what can be learned from the Cambridge Structural Database?

A detailed study on chiral compound structures found in the Cambridge Structural Database (CSD) is presented. Solvates, salts and co-crystals have intentionally been excluded, in order to focus on the most basic structures of single enantiomers, scalemates and racemates. Similarity between the latter and structures of achiral monomolecular compounds has been established and utilized to arrive at important conclusions about crystallization of chiral compounds. For example, the fundamental phenomenon of conglomerate formation and, in particular, their frequency of occurrence is addressed. In addition, rarely occurring kryptoracemates and scalemic compounds (anomalous racemates) are discussed. Finally, an extended search of enantiomer solid solutions in the CSD is performed to show that there are up to 1800 instances most probably hiding among the deposited crystal structures, while only a couple of dozen have been previously known and studied.




data

TAAM: a reliable and user friendly tool for hydrogen-atom location using routine X-ray diffraction data

Hydrogen is present in almost all of the molecules in living things. It is very reactive and forms bonds with most of the elements, terminating their valences and enhancing their chemistry. X-ray diffraction is the most common method for structure determination. It depends on scattering of X-rays from electron density, which means the single electron of hydrogen is difficult to detect. Generally, neutron diffraction data are used to determine the accurate position of hydrogen atoms. However, the requirement for good quality single crystals, costly maintenance and the limited number of neutron diffraction facilities means that these kind of results are rarely available. Here it is shown that the use of Transferable Aspherical Atom Model (TAAM) instead of Independent Atom Model (IAM) in routine structure refinement with X-ray data is another possible solution which largely improves the precision and accuracy of X—H bond lengths and makes them comparable to averaged neutron bond lengths. TAAM, built from a pseudoatom databank, was used to determine the X—H bond lengths on 75 data sets for organic molecule crystals. TAAM parametrizations available in the modified University of Buffalo Databank (UBDB) of pseudoatoms applied through the DiSCaMB software library were used. The averaged bond lengths determined by TAAM refinements with X-ray diffraction data of atomic resolution (dmin ≤ 0.83 Å) showed very good agreement with neutron data, mostly within one single sample standard deviation, much like Hirshfeld atom refinement (HAR). Atomic displacements for both hydrogen and non-hydrogen atoms obtained from the refinements systematically differed from IAM results. Overall TAAM gave better fits to experimental data of standard resolution compared to IAM. The research was accompanied with development of software aimed at providing user-friendly tools to use aspherical atom models in refinement of organic molecules at speeds comparable to routine refinements based on spherical atom model.




data

TAAM: a reliable and user friendly tool for hydrogen-atom location using routine X-ray diffraction data

Transferable Aspherical Atom Model (TAAM) instead of Independent Atom Model (IAM) applied through DiSCaMB software library in the structure refinement against X-ray diffraction data largely improves the X—H bond lengths and make them comparable to the averaged neutron bond lengths.




data

Crystallization of chiral molecular compounds: what can be learned from the Cambridge Structural Database?

A study on chiral monomolecular compound structures found in the Cambridge Structural Database is presented.