Half-Normal Probability Plots. Half-Normal Probability Plots - Categorized. This type of graph is used to evaluate the normality of the distribution of a variable, that is, whether and to what extent the distribution of the variable follows the normal distribution. The selected variable will be plotted in a scatterplot against the values "expected from the normal distribution." The half-normal probability plot is constructed in the same way as the standard normal probability plot, except that only the positive half of the normal curve is considered. Consequently, only positive normal values will be plotted on the Y-axis. This plot is used when one wants to ignore the sign of the residual, that is, when one is mostly interested in the distribution of absolute residuals, regardless of the sign.

Hamming Window. In Time Series, the Hamming window is a weighted moving average transformation used to smooth the periodogram values. In the Hamming (named after R. W. Hamming) window or Tukey- Hamming window (Blackman and Tukey, 1958), for each frequency, the weights for the weighted moving average of the periodogram values are computed as:

w_j = 0.54 + 0.46*cosine(*j/p) (for j=0 to p)
w_-j = w_j (for j 0)

where p = (m-1)/2

This weight function will assign the greatest weight to the observation being smoothed in the center of the window, and increasingly smaller weights to values that are further away from the center.

Hanging Bars Histogram. The hanging bars histogram offers a "visual test of normality" of the distribution that helps identify the areas of the distribution where the discrepancies (between the observed and expected normal frequencies) occur. While the standard way of presenting the normal distribution fitted to an observed distribution is to overlay the best-fitting normal curve over a histogram, the hanging bars histogram does just the opposite: it suspends the bars representing the observed frequencies for consecutive ranges of values from the best-fitting normal curve.

If the investigated distribution can be well approximated by the normal curve, then the bottoms of all bars should form a straight, horizontal line.

Harmonic Mean. The Harmonic Mean is a "summary" statistic used in analyses of frequency data; it is computed as:

H = n * 1/(1/x_i)

where
n is the sample size.

Hazard. It is often meaningful to consider the function that describes the probability of failure during a very small time increment (assuming that no failures have occurred prior to that time). This function is called the hazard function (or, sometimes, also conditional failure, intensity, or force of mortality function), and is generally defined as:

h(t) = f(t)/(1-F(t))

where h(t) stands for the hazard function (of time t), and f(t) and F(t) are the probability density and cumulative distribution functions, respectively.

For additional information see the Survival Analysis chapter, or the Weibull and Reliability/Failure Time Analysis section in the Process Analysis chapter.

Hazard Rate. In Survival Analysis the hazard rate is defined as the probability per time unit that a case that has survived to the beginning of the respective interval will fail in that interval. Specifically, it is computed as the number of failures per time units in the respective interval, divided by the average number of surviving cases at the mid-point of the interval.

Heuristic. As opposed to an algorithm (which contains a fully defined set of steps that will produce a specific outcome), heuristics are general recommendations or guides based on statistical evidence (e.g., "quit smoking to prolong your life," "males with college education are more likely to respond positively to this advertisement than…") or theoretical reasoning (e.g., "the mechanism of the vitamin X synthesis as we understand it, implies that eating Y will reduce the deficit of X"). For more information about the concept of heuristic, see Kahneman, Slovic, & Tversky, 1982.

Heywood Case. A Heywood case in common factor analysis occurs when the minimum of the discrepancy function is obtained with one or more negative values as estimates for the variance of the unique variables. Such values are of course impossible. Heywood cases occur frequently when too many factors are extracted, or the sample size is too small.

Hidden Layers. All layers of a neural network except the input and output layers. Hidden layers provide the network's non-linear modeling capabilities.

Histograms, 2D. 2D histograms (the term was first used by Pearson, 1895) present a graphical representation (see below) of the frequency distribution of the selected variable(s) in which the columns are drawn over the class intervals and the heights of the columns are proportional to the class frequencies.

Histograms, 3D Bivariate. Three-dimensional histograms are used to visualize crosstabulations of values in two variables. They can be considered to be a conjunction of two simple (i.e., univariate) histograms, combined such that the frequencies of co-occurrences of values on the two analyzed variables can be examined. In a most common format of this graph, a 3D bar is drawn for each "cell" of the crosstabulation table and the height of the bar represents the frequency of values for the respective cell of the table. Different methods of categorization can be used for each of the two variables for which the bivariate distribution is visualized (see below).

For information on smoothing 3D Bivariate Histograms, see Smoothing Bivariate Distributions.

Histograms, 3D - Box Plots. This type of bivariate histogram represents the frequencies as a series of 3D bars ("rectangular boxes"). This is the default representation of 3D histograms. The "height" of each bar on the Z- axis corresponds to the frequency of the respective combination of levels for the two variables.

Histograms, 3D - Contour/Discrete. This contour plot represents a discrete projection of the 3D (smoothed) histogram.

Histograms, 3D - Contour Plot. This contour plot presents a projection of the spline-smoothed surface fit to the frequency data (see 3D Sequential Surface Plot. Successive values of each series are plotted along the X-axis, with each successive series represented along the Y-axis.

Histograms, 3D - Spikes. In this type of bivariate histogram, the frequencies are represented as a series of "spikes" (point symbols with lines descending to the base plane). The "height" of each spike is determined by the frequency for the respective combination of levels of the two variables.

Histograms, 3D - Surface Plot. In this representaion of the 3D bivariate histogram, a spline-smoothed surface is fit to the frequency data.

Hollander-Proschan Test. This test compares the theoretical reliability function to the Kaplan-Meier estimate. The actual computations for this test are somewhat complex, and you may refer to Dodson (1994, Chapter 4) for a detailed description of the computational formulas. The Hollander-Proschan test is applicable to complete, single-censored, and multiple-censored data sets; however, Dodson (1994) cautions that the test may sometimes indicate a poor fit when the data are heavily single-censored. The Hollander-Proschan C statistic can be tested against the normal distribution (z).

The Hollander-Proschan test is used in Weibull and Reliability/Failure Time Analysis; see also, Mann-Scheuer-Fertig Test and Anderson-Darling Test.

Hyperbolic tangent (tanh). A symmetric S-shaped (sigmoid) function, sometimes used as an alternative to logistic functions.

Hyperplane. An N-dimensional analogy of a line or plane, which divides an N+1 dimensional space into two.

Hypersphere. An N-dimensional analogy of a circle or sphere.