^{1}

^{*}

^{2}

Drawing Archimedes spirals is a popular and valid method of assessing action tremor in the upper limbs. We performed the first blinded comparison of Fahn–Tolosa–Marín (FTM) ratings and tablet measures of essential tremor to determine if a digitizing tablet is better than 0–4 ratings in detecting changes in essential tremor that exceed random variability in tremor amplitude.

The large and small spirals of FTM were drawn with each hand on two consecutive days by 14 men and four women (age 60±8.7 years [mean±SD]) with mild to severe essential tremor. The drawings were simultaneously digitized with a digitizing tablet. Tremor in each digitized drawing was computed with spectral analysis in an independent laboratory, blinded to the clinical ratings. The mean peak-to-peak tremor displacement (cm) in the four spirals and mean FTM ratings were compared statistically.

Test–retest intraclass correlations (ICCs) (two-way random single measures, absolute agreement) were excellent for the FTM ratings (ICC 0.90, 95% CI 0.76–0.96) and tablet (ICC 0.97, 95% CI 0.91–0.99). Log_{10} tremor amplitude (

Digitizing tablets are much more precise than clinical ratings, but this advantage is mitigated by the natural variability in tremor. Nevertheless, the digitizing tablet is a robust method of quantifying tremor that can be used in lieu of or in combination with clinical ratings.

Tremor rating scales provide crude, nonlinear, subjective assessments of tremor severity.

Digitizing tablets are capable of providing linear objective measures of tremor in writing and drawings.

The greater precision of tablets, relative to rating scales, enables one to detect much smaller changes in tremor amplitude. However, this advantage of tablets is diminished when random variability in tremor is large. Tablets measure random variability precisely, but a change in tremor must exceed random variability to be recognizable as a statistically significant change resulting from treatment or disease progression (minimum detectable change).

Twenty patients were enrolled in an unpublished open-label pharmacokinetic–pharmacodynamic study of sodium oxybate for the treatment of essential tremor, conducted by Jazz Pharmaceuticals. Details of the study design can be found on ClinicalTrials.gov (

A paired t test analysis of the baseline FTM spiral ratings and tablet measures on days 1 and 2 revealed a statistically significant practice effect or carryover effect from day 1 to day 2. The mean FTM spiral rating decreased slightly (1.21 to 0.88, t=–3.011, p=0.008), as did the log-transformed tablet measure (geometric mean 0.28 to 0.20, t=–2.431, p=0.026). By contrast, the baseline FTM and tablet means were statistically identical on days 2 and 3 (mean FTM spiral ratings, 0.88 and 0.94, t=0.719, p=0.48; geometric mean tablet measures, 0.20 to 0.19, t=–0.457, p=0.65). We therefore used the data from days 2 and 3 in this study to estimate test–retest reliability and MDC. In this, study, baseline 1 refers to the baseline data from study day 2, and baseline 2 refers to the data from study day 3. Baseline assessments from these two days were used to compute test–retest reliability (two-way random single measures intraclass correlations [ICCs], absolute agreement) and minimum detectable change (MDC) for the FTM spiral ratings and digitizing tablet measurements.

MDC was computed using the formula MDC=SDd·1.96, where SDd is the standard deviation of the differences for the two measurements._{10} transformation was performed to normalize these data. Note that SDd of log-transformed data is a ratio, and the MDC is therefore also a ratio; they are not log SDd and log MDC of the non-transformed data.^{−MDC})·100.^{®} statistical software (

The mean spiral ratings did not differ statistically from a normal distribution (D’Agostino–Pearson test: p=0.16 for baseline 1 data and p=0.13 for baseline 2 data). The tablet data were positively skewed and deviated significantly from a normal distribution (D’Agostino–Pearson test: p<0.0001 for baseline 1 and 2 data), so log_{10} transformation was performed to normalize these data, producing data that did not deviate significantly from a normal distribution (D’Agostino–Pearson test: p=0.25 for baseline 1 data and p=0.18 for baseline 2 data). The FTM ratings exhibited a floor effect in this patient population (

Regression analysis revealed a very strong linear Weber–Fechner relationship (log

The MDC for the digitizing tablet was 51% of the baseline geometric mean tremor amplitude (

FTM Mean (cm)
SDd FTM Baseline 1–2
Tablet Geometric Mean (cm)
SDd Tablet Baseline 1–2
MDC%
FTM
Tablet
Baseline 1
0.88
0.41
0.20
0.16
^{4}90%
^{1}51%
^{3}
Baseline 2
0.94
0.19
67%
^{2}

Abbreviations: FTM, Fahn–Tolosa–Marín; MDC, Minimum Detectable Change (SDd·1.96); SDd: Standard Deviation of the Differences.

MDC%: percentage of baseline 1 mean FTM.

MDC%: percentage of baseline 1 mean FTM, computed with the Weber–Fechner equations in

MDC%: percentage of baseline 1 geometric mean.

SDd of log-transformed data.

In the above calculations, SDd (–0.41) is the standard deviation of the differences between the baseline 1 and baseline 2 FTM scores. This estimate of MDC% (67%) is similar to that found for the tablet.

Our estimates of MDC% appear to be very robust and not dependent on normalization of the data. We computed the MDC% of the tablet data without log transformation, using the baseline 1 mean (0.62 cm) and the SDd of the two baselines (0.20 cm). Using these values, the MDC% is as follows:

This is the first blinded study demonstrating a strong correlation between tablet and FTM spiral ratings, and this study provides much-needed estimates of test-retest reliability and MDC% for tablet and FTM spiral ratings. We have shown that tablet measures are highly correlated with FTM tremor ratings. The test-retest ICC for the tablet was only marginally better than the FTM ICC. However, the FTM ICC probably would have been lower if different raters had been used to assess the two baselines because intra-rater reliability is much better than inter-rater reliability for tremor rating scales.

Haubenberger and colleagues

There is no published evidence that the Bain and Findley 0–10 ratings are more sensitive to change than FTM 0–4 ratings.

Detectable change in essential tremor is limited by the considerable natural variability of tremor amplitude over time. The variability in tremor amplitude is so great that the MDC (the smallest detectable change exceeding random variability) of the digitizing tablet is similar to the MDC of the FTM 0–4 ratings and the Bain and Findley 0–10 ratings. Digitizing tablets are much more precise than clinical ratings, but this advantage is mitigated by the natural variability in tremor.

Digitizing tablets have potential floor and ceiling effects. They cannot measure tremor that is not visible because their accuracy is roughly ±0.25 mm. They also cannot record tremor that is so severe that the pen tip does not remain within 1 cm of the tablet surface. However, FTM ratings had an obvious floor effect in our patient population, but the tablet exhibited no floor effect for these patients. Tremor severity was not great enough in our patient cohort to examine a ceiling effect for the tablet vs. FTM.

Nevertheless, the digitizing tablet is clearly a valid and robust method of quantifying tremor. It can be used in lieu of, or in combination with, clinical ratings of tremor in Archimedes spirals. The tablet provides an accurate, clinically meaningful assessment of tremor amplitude. These devices cost a few hundred dollars, and free software for tremor analysis is available on the internet.

Our study has limitations. Our estimates of test–retest reliability and MDC% were computed using two baseline assessments at the same time of the day on two consecutive days, while controlling for tremor medications, caffeine, and alcohol. Random test–retest variability might be greater if the interval between assessments was longer and if the other controls were less stringent. Our results need to be confirmed using baseline assessments at intervals of 1 week and 1 month, which are common intervals of assessment in clinical trials.