Difference between revisions of "Structure Similarity"

From Wiki
Jump to: navigation, search
(Structure Distance)
(Structure Distance)
Line 21: Line 21:
== Structure Distance ==
== Structure Distance/Dissimilarity ==
Finally, structure similarity is determined by the distance, ''d'', between two structure fingerprints '''v'''<sub>i</sub><sup>struct</sup> and '''v'''<sub>j</sub><sup>struct</sup>:
Finally, structure similarity is determined by the distance, ''d'', between two structure fingerprints '''v'''<sub>i</sub><sup>struct</sup> and '''v'''<sub>j</sub><sup>struct</sup>:

Revision as of 17:42, 4 April 2018


The similarity between two structures i and j is assessed on the basis of local coordination information from all sites in the two structures. [1] [2]

Site Fingerprints

The similarity calculation begins with computing a crystal site fingerprint, vsite, for each site in the two structures. The fingerprint is a 48-dimensional vector in which each element carries information about the local coordination environment computed with the site module of the python package matminer. For example, the first element "wt CN1" provides the fraction of how much the given site should be considered 1-fold coordinated (i.e., w|CN=1). The second element "wt CN2" provides the 2-fold coordinated fraction, whereas the third element "L-shaped CN2" holds the resemblance similarity to an L-shaped coordination geometry (also called local structure order parameter) given that we find a coordination configuration with 2 atoms (qL|CN=2). The remaining elements are: "water-like CN2," "bent 120 degrees CN2," "bent 150 degrees CN2," "linear CN2," "wt CN3," "trigonal planar CN3," "trigonal non-coplanar CN3," "T-shaped CN3," "wt CN4," "square co-planar CN4," "tetrahedral CN4," "rectangular see-saw-like CN4," "see-saw-like CN4," "trigonal pyramidal CN4," "wt CN5," "pentagonal planar CN5," "square pyramidal CN5," "trigonal bipyramidal CN5," "wt CN6," "hexagonal planar CN6," "octahedral CN6," "pentagonal pyramidal CN6," "wt CN7," "hexagonal pyramidal CN7," "pentagonal bipyramidal CN7," "wt CN8," "body-centered cubic CN8," "hexagonal bipyramidal CN8," "wt CN9," "q2 CN9," "q4 CN9," "q6 CN9," "wt CN10," "q2 CN10," "q4 CN10," "q6 CN10," "wt CN11," "q2 CN11," "q4 CN11," "q6 CN11," "wt CN12," "cuboctahedral CN12," "q2 CN12," "q4 CN12," "q6 CN12." Note that qn refers to Steinhardt bond orientational order parameter of order n. The resulting site fingerprint is thus defined as:

\mathbf{v}^\mathrm{site} = [w|_{\mathrm{CN}=1}, \quad w|_{\mathrm{CN}=2}, \quad q_\mathrm{L}|_{\mathrm{CN}=2}, \quad q_\mathrm{water}|_{\mathrm{CN}=2}, \quad \dots, \quad q_{6}|_{\mathrm{CN}=12}]^\mathrm{T}

Structure Fingerprints

The fingerprints from sites in a given structure are subsequently statistically processed to yield the minimum, maximum, mean, and standard deviation of each coordination information element. The resultant ordered vector defines a structure fingerprint, vstruct:

\mathbf{v}^\mathrm{struct} = [

\min(w|_{\mathrm{CN}=1}), \quad \max(w|_{\mathrm{CN}=1}), \quad \mathrm{mean}(w|_{\mathrm{CN}=1}), \quad \mathrm{std}(w|_{\mathrm{CN}=1}), \dots,

\min(q_{6}|_{\mathrm{CN}=12}), \quad \max(q_{6}|_{\mathrm{CN}=12}), \quad \mathrm{mean}(q_{6}|_{\mathrm{CN}=12}), \quad \mathrm{std}(q_{6}|_{\mathrm{CN}=12})


Structure Distance/Dissimilarity

Finally, structure similarity is determined by the distance, d, between two structure fingerprints vistruct and vjstruct:

d = || \mathbf{v}_{i}^\mathrm{struct} - \mathbf{v}_{j}^\mathrm{struct} ||

A small distance value indicates high similarity between two structures, whereas a large distance (1 to maximally 4.8) suggests that the structures are very dissimilar. The spinel example below gives an approximate threshold up to which distance you can still consider two structure to be similar (0.73 to 0.75). Anything beyond 0.75 is most certainly not the same structure prototype. However, we have observed that porous materials such as zeolites form an exception to this rough rule. We are currently working on addressing this open issue.


  • Diamond (mp-66) vs. Perfect CaTiO3 Perovskite (mp-5827) \rightarrow d = 1.4142

Below is a python code snippet that allows you to quickly reproduce above results. You will need to install pymatgen and matminer for this to work. Both are easily accessible via the Python Package Index.

import numpy as np
from pymatgen import MPRester
from matminer.featurizers.site import CrystalSiteFingerprint
from matminer.featurizers.structure import SiteStatsFingerprint

with MPRester() as mpr:

    # Get structures.
    diamond = mpr.get_structure_by_material_id("mp-66")
    gaas = mpr.get_structure_by_material_id("mp-2534")
    rocksalt = mpr.get_structure_by_material_id("mp-22862")
    perovskite = mpr.get_structure_by_material_id("mp-5827")

    # Calculate structure fingerprints.
    ssf = SiteStatsFingerprint(CrystalSiteFingerprint.from_preset('cn'))
    v_diamond = np.array(ssf.featurize(diamond))
    v_gaas = np.array(ssf.featurize(gaas))
    v_rocksalt = np.array(ssf.featurize(rocksalt))
    v_perovskite = np.array(ssf.featurize(perovskite))
    v_diamond = v_diamond / np.linalg.norm(v_diamond)
    v_gaas = v_gaas / np.linalg.norm(v_gaas)
    v_rocksalt = v_rocksalt / np.linalg.norm(v_rocksalt)
    v_perovskite = v_perovskite / np.linalg.norm(v_perovskite)

    # Print out distance between structures.
    print('Distance between diamond and GaAs: {:.4f}'.format(np.linalg.norm(v_diamond - v_gaas)))
    print('Distance between diamond and rocksalt: {:.4f}'.format(np.linalg.norm(v_diamond - v_rocksalt)))
    print('Distance between diamond and perovskite: {:.4f}'.format(np.linalg.norm(v_diamond - v_perovskite)))
    print('Distance between rocksalt and perovskite: {:.4f}'.format(np.linalg.norm(v_rocksalt - v_perovskite)))


  1. N. E. R. Zimmermann, A. Jain, in preparation (2018)
  2. N. E. R. Zimmermann, M. K. Horton, A. Jain, M. Haranczyk, Front. Mater., 4, 34, (2017)


Nils Zimmermann, Donny Winston