This document contains a description of the change-point algorithm, the custom Matlab functions by which versions of it are implemented, and instructions for applying to testdata cp_wrapper, the function that integrates the other functions into a general-purpose change-point finding and plotting function. By applying cp_wrapper to the test data, the user can confirm the proper operation of the functions and understand their use. The custom Matlab functions should run on Release 13 or later of Matlab.
For feedback, suggestions, additions, or other comments, contact C.R.G. E-mail: galliste@ruccs.rutgers.edu. For current version information and updates, go to http://ruccs.rutgers.edu/distribution/change_point. Description of the Algorithm
When one makes successive measurements and one is interested in when the distribution from which those measurments are drawn changes, i.e., when there has been a change in the process generating what is being measured, the best way to visualize the measurements is by means of a cumulative record (the running sum of the measurements). Changes in the underlying process then appear as changes in the slope of the record. The points where such changes occur are change points. We have generalized a recursive algorithm developed by Gallistel et al. (1) so that it may be applied to the finding of change points in almost any kind of cumulative record. It can also indicate the extent to which the stretches between change points are or are not truly linear. Thus, it is a general tool for analyzing cumulative records of all kinds and for finding change points in a wide range of data (for example, firing rates in electrophysiology).
The algorithm has four stages: In the first stage, it identifies for each point in the cumulative record a putative change point. In the second stage, it computes for each putative change point, the strength of the evidence that it is a true change point. In the third stage, it finds the first change point for which the evidence exceeds a user-specified decision criterion, and it truncates the data at that point. In the fourth stage, it begins over again, taking the change point as the origin and the first datum after the change point as the first observation.
The reasoning behind the first stage is geometric. It builds on Skinner’s (2) insight regarding cumulative records, which is that they are the best way of visualizing changes in behavior, because changes in behavior appear as readily apparent changes in the slope of the cumulative record. When behavior does not change, the cumulative record (the running sum of the behavioral measurements) approximates a straight line. Thus, the null hypothesis for each successive point in a cumulative record is that there has been no change up to that point in the record (hereafter called the latest point). If there has been no change, a straight line drawn from the origin to the latest point captures all of the systematic information in the record up to that point. Deviations from that straight line at
earlier points are noise (random) and are not statistically significant. If, on the other hand, there has been a change, earlier points will deviate systematically from that straight line.
The point at which the change occurred will be close to the point at which the record deviates maximally from the straight line. Thus, the point of maximum deviation from the straight line is the point at which a change, if there has been change, is most likely to have occurred.
In its first stage, the algorithm associates with each point in the cumulative record (except the first two) an earlier point called the putative change point. To find this earlier point, it in effect draws a straight line between the start of the record and the latest point and finds the earlier point that deviates maximally from this straight line (Fig. 12). For each point in the record, the associated putative change point divides the record up to that point