Issue Navigator

Volume 08 No. 01
Earn CME
Accepted Papers

Scientific Investigations

Sources of Variability in Epidemiological Studies of Sleep Using Repeated Nights of In-Home Polysomnography: SWAN Sleep Study

Huiyong Zheng, Ph.D.1; MaryFran Sowers, Ph.D.1; Daniel J. Buysse, M.D.2; Flavia Consens, M.D.3; Howard M. Kravitz, D.O., M.P.H.4; Karen A. Matthews, Ph.D.5; Jane F. Owens, Dr.P.H.2; Ellen B. Gold, Ph.D.6; Martica Hall, Ph.D.2
1Department of Epidemiology, University of Michigan School of Public Health, Ann Arbor, MI; 2Department of Psychiatry, University of Pittsburgh, Pittsburgh, PA; 3Department of Neurology and Anesthesiology and Pain Medicine, University of Washington, Seattle, WA; 4Departments of Psychiatry and Preventive Medicine, Rush University Medical Center, Chicago, IL; 5Departments of Epidemiology and Psychiatry, University of Pittsburgh, Pittsburgh, PA; 6Division of Epidemiology, Department of Public Health Sciences, University of California, Davis, CA


Study Objective:

To quantify sources of night-to-night variability.


This project was conducted in 285 middle-aged African American, Caucasian, and Chinese women from the Study of Women's Health Across the Nation (SWAN) Sleep Study living in Chicago, the Detroit area, Oakland, and Pittsburgh. The study used 3 repeated nights of in-home polysomnography (PSG) measures. Night 1 data included assessment of sleep staging, sleep apnea, and periodic limb movements, while Nights 2 and 3 focused on sleep staging.


Mean total sleep time (TST) increased substantially from 365 minutes on Night 1 to 391 minutes and 380 minutes, respectively, on Nights 2 and 3. Mean percent sleep efficiency (SE%) for the 3 nights were 83%, 85%, and 85%, respectively. Night 1 sleep values were significantly different than Nights 2 and 3 measures except for S2 (%), S1 (min), and Delta (S3+4)%. Nights 2 and 3 differences in variability were negligible. Obesity, past smoking, and financial strain measures were associated with greater Night 1 vs. Night 2 or Night 3 differences. We concluded that there was significant Night 1 vs. Nights 2 and 3 variability and, though relatively modest, it was sufficient to bias estimates of association. Additionally, personal characteristics including smoking, obesity, and financial strain increased night-to-night variability.


This reports adds new information about between and within person sources of variation with in-home PSG and identifies elements that are essential in the design and planning of future sleep studies of multi-ethnic groups in social and physiological transition states such as the menopause.


Zheng H; Sowers MF; Buysse DJ; Consens F; Kravitz HM; Matthews KA; Owens JF; Gold EB; Hall M. Sources of variability in epidemiological studies of sleep using repeated nights of in-home polysomnography: SWAN Sleep Study. J Clin Sleep Med 2012;8(1):87-96.

Polysomnography (PSG) is widely used in clinical and epidemiological research settings to provide objective sleep measures.16 PSG can be conducted using in-home settings or sleep laboratories considering the advantages afforded by variability in the closeness of monitoring, the ability to correct technical problems in a timely manner, and control temperature, noise and other environmental factors to minimize systematic bias. Use of PSG in the home or laboratory setting has substantial costs for data acquisition and the time required for data processing while potentially imposing physical and psychological burdens on the participant.7 Given these considerations, it is important to ascertain how much and what kind of data must be collected, determine if more than a single night's data collection is required to describe sleep behaviors; and, identify personal characteristics associated with substantially increased within-person variation in sleep behaviors.

Considering the number of data collection nights that are needed to provide unbiased estimates of sleep characteristics is often made more complex because of the impact of the “first night” effect (FNE), generated from changes in the sleep environment, the presence of sleep monitoring instrumentation, and any potential psychological uneasiness of being observed.812 The FNE has been associated with less total sleep time (TST), lower sleep efficiency (SE), more intermittent waking time, and longer REM latency (RL)9 in clinical or in-home studies.13,14


Current Knowledge/Study Rationale: Use of repeated polysomnography (PSG) in the home or laboratory to describe sleep behavior has substantial costs in money and time for data acquisition and data processing while potentially imposing physical and psychological burdens on the participant. In order to reduce the costs and relieve the burdens for designing and planning of future sleep studies, this report uses 3 nights' in-home PSG data from SWAN Sleep Study to evaluate night-to-night variation, information redundancy, and identify personal characteristics associated with substantially increased within-person variation in sleep behaviors.

Study Impact: Through the evaluation of sources of night-to-night variation with in-home PSG, this reported identifies elements that are essential in the design and planning of future sleep studies of multi-ethnic groups in social and physiological transition states such as the menopause. Two nights of in-home PSG assessment with an appropriate sample size can provide robust parameter estimates of sleep duration, continuity, and architecture in community samples; the identified personal characteristics associated with greater variability between first and second night measures includes smoking, obesity, and financial strain.

The SWAN Sleep Study evaluated sleep characteristics in 368 African American, Caucasian, and Chinese women across the menopause transition using 3 nights of in-home PSG. Sleep stage scoring and electrocardiograms were used on all study nights while sleep disordered breathing and leg movements (Night 1) as well as skin temperature and snoring sensors (Night 2 only) were used on selected nights. We evaluated: (1) the magnitude of night-to-night variability on PSG-processed sleep measures; (2) the loss of information if PSG studies of women were restricted to 1, 2, or 3 nights; and (3) sources of within-person variation in the 3 nights of study.


The SWAN Sleep Study is a comprehensive study of sleep nested within the ongoing, larger parent longitudinal cohort SWAN study and conducted at 4 of the 7 clinical sites. This 2003 to 2005 time frame overlapped the 5th – 7th annual core SWAN protocol examinations.

SWAN Study Design and Participants

SWAN, a community-based, multisite cohort study of the menopausal transition, enrolled 3,302 women, aged 42-52 years, at its 1996 baseline.15 Each clinical site recruited Caucasian women. Also recruited were African American women in Boston, Chicago, Detroit area, and Pittsburgh, Chinese women in Oakland, Japanese women in Los Angeles, and Hispanic women in Newark. Women were excluded from cohort enrollment if they were pregnant, using exogenous hormones in the 3 months prior to the baseline interview, had not had menstrual bleeding in the 3 months prior to the baseline interview, or had a hysterectomy. Institutional review boards approved the study, and women gave signed, written informed consent to participate.

SWAN Sleep Study Design and Participants

The SWAN Sleep Study was a nested cross-sectional study of sleep patterns at mid-life.16,17 A cohort of 370 was enrolled, including 328 pre- and peri-menopausal and 42 postmenopausal African American, Caucasian, and Chinese women, aged 48 to 59 years, from the Chicago, Detroit area, Oakland CA, and Pittsburgh SWAN sites. Women with surgical menopause (< 1%) or using hormone therapy (approximately 23% of the cohort by SWAN follow-up visit 5) were excluded. Exclusion criteria also included factors that could affect sleep including ongoing treatment for cancer or rotating or night shift employment (exclusion rates for these measures were between 1% and 3%). Two of the 370 women had no PSG study and were excluded from this analysis.

Sleep Study Protocol

The sleep protocol was initiated within 7 days of the beginning of the follicular phase of the menstrual cycle in women who were still menstruating. Three consecutive nights of in-home PSG studies used the Vitaport 3 (VP3) PSG monitor (Temec, Netherlands). Night 1 included a sleep disorders screening PSG montage with 2 channels of electroencephalography (EEG) (C4/A1, C3/A2), bilateral electro-oculograms (EOG), bipolar submental electromyograms (EMG), and one channel of electrocardiogram (EKG). Sleep disordered breathing was assessed using nasal pressure and oral-nasal thermistors to measure airflow; impedance plethysmography characterized chest and abdominal wall movements; and fingertip oximetry (Nonin X-pod model 3012) to measure oxyhemoglobin saturation. Bilateral anterior tibialis EMG was used to assess periodic leg movements (PLM), and characteristics of restless legs were quantified by self-reported questionnaire.18 On Nights 2 and 3, a sleep staging montage was deployed, which included the EEG, EOG, submental EMG, and EKG channels but not nasal pressure, airflow, oximetry, respiratory effort, or anterior tibialis measurements. Because studies were conducted overnight in participants' homes, technicians were not present to replace sensors and electrodes during the studies. PSG study failure was defined as follows: for the sleep screening night the PSG had to include ≥ 4 h of scorable data for sleep staging and oximetry, concurrent with scorable data for ≥ 1 of the following: nasal pressure cannula, thermistor, or inductance plethysmography belt. For sleep staging PSGs, scorable data were required for 100% of the recording time for at least one EEG channel, one EOG channel, and the EMG channel. In the SWAN Sleep Study, the overall PSG failure rate was 6.25% (i.e., 11035368×3, the denominator 368 × 3 = 1104 is the total number of expected PSG studies for 368 women who participated in PSG studies and the numerator 1035 is the total number of scorable nights including repeat studies conducted when initial studies were inadequate), which compares favorably with that of other in-home PSG studies, such as the 5% to 9% failure rate reported in the Sleep Heart Health Study.19

Sleep was visually scored in 20-sec epochs on each night using standardized scoring criteria.16,20 This study was initiated prior to the recent publication of the American Academy of Sleep Medicine's scoring criteria. Rechtschaffen and Kales criteria recommend either 20- or 30-sec scoring epochs. The University of Pittsburgh laboratory used 20-sec epochs for 2 reasons. First, 20-sec epochs provide slightly finer-grained measures of sleep and wakefulness with less potential misclassification (since each epoch can receive only one stage score, up to 50% of an epoch may be another stage). Second, algorithms for quantitative EEG measurement with power spectral analysis used 4-sec epochs, and alignment with visually scored sleep data was more precise if scoring epochs were multiples of 4 seconds. Measures of sleep duration included time in bed and time spent asleep (TST). Time in bed was calculated as time from reported lights out (with confirmation of PSG signals consistent with reduced activity) to time of reported awakening from sleep (again with confirmation of PSG signals consistent with increased activity). TST was calculated as total minutes scored as stages 1 to 4 of NREM sleep and REM sleep. Sleep continuity was quantified by measures of sleep latency (SL [time in minutes from beginning of the recording period to the first consecutive 10 min of stage 2 or stage 3-4 sleep interrupted by ≤ 2 min of stage 1 or wakefulness]); wakefulness after sleep onset (WASO [total minutes of wakefulness between sleep onset and verified awakening in the morning]), and sleep efficiency (SE [time spent asleep/time in bed × 100]). Measures of sleep architecture included minutes and percent of time spent asleep spent in NREM stages 1, 2, and 3 + 4, and REM sleep.

Sociodemographic Information

Race/ethnicity was determined by self-designation as African American, Caucasian, or Chinese. Other sociodemographic variables included age (continuous variable), marital status (single/never married, married or living as married, separated/widowed/divorced), and educational attainment (high school graduate or less, some college, college graduate, graduate studies). A 3-level response to a question about difficulty in paying for basics (very, somewhat, or not very difficult) including food, shelter, and health care was used as an indicator of financial strain. Study site designation was included in statistical models.

Physical and Mental Health Variables

Self-perceived overall health was coded as excellent, very good/good, fair/poor. Body mass index (BMI) was computed as measured weight in kilograms divided by height in meters squared. Depressive symptoms were assessed with the Center for Epidemiologic Studies Depression (CES-D, depressed vs. not depressed) Scale administered at the closest annual core SWAN visit preceding the Sleep Study.

Menopause transition status was designated as using annual Core SWAN data into one of the following 4 categories: premenopausal (no change in menstrual bleeding regularity); early perimenopausal (menses in the preceding 3 months with an increase in bleeding irregularity); late perimenopausal (menses in the previous 12 months, but not the previous 3 months; and postmenopausal (≥ 12 months of amenorrhea).21

Daily medication use (prescription and over-the-counter), recorded at Sleep Study protocol inception and from daily diaries was coded according to the World Health Organization Anatomical Therapeutic Chemical (ATC) classification.22 Physical activity was measured at the annual core SWAN visits assessing 3 domains (sports, leisure, and household activities) and was treated as a continuous variable. Responses about smoking frequency, alcohol consumption, and caffeine consumption were determined from the daily sleep diaries. Smoking behavior was classified as current, past, and never. Current smokers were those who reported smoking ≥ 7 cigarettes in the 2-week period initiated by the sleep protocol.

Data Analysis

Of the 370 participants enrolled in the SWAN Sleep Study, 368 had PSG data: 364 completed Night 1 (sleep screening PSG), 342 completed Night 2 (sleep staging PSG), and 329 completed Night 3 (sleep staging PSG). These numbers include individuals who repeated PSGs when the initial study failed, yielding an overall PSG success rate of 93.8% (364 + 342 + 329 = 1035 successful studies, versus 368 × 3 = 1104 expected PSG studies; 1035 / 1104 = 0.938). Aggregating these data, 365 women had at least one sleep-staging PSG (i.e., Night 2 or Night 3); 361 had the sleep-screening PSG (Night 1) plus at least one sleep-staging PSG; 306 had both sleep-staging PSGs; and 303 had all 3 nights (including repeat studies for initial study failures). All 3 nights were completed in the protocol-planned order by 285 women (sleep screening PSG, sleep staging PSG 1, sleep staging PSG 2), which formed the dataset for these analyses. Data from 83 women who had at least one PSG were excluded due to non-scorable (n = 65) night(s) or failing to follow the temporal sequence (n = 18). The latter case pertained, for instance, to women whose initial screening study failed, and was repeated on another night.

Univariate statistics were computed for continuous variables and frequencies were determined for categorical variables. Variables with highly skewed distributions were transformed or categorized. Statistical significance was based on p-values from 2-sided tests at a value of p < 0.05.

A one-way repeated measures analysis of variance was used to evaluate the temporal effect of study nights on PSG measures. Orthogonal contrasts were used to compare measures across these 3 nights. The difference between the average of Nights 2 and 3 versus Night 1 was also compared. To evaluate whether having 2 subsequent nights added more information than a single night, a multivariate regression model with random design matrix was used.23

where N is the total number of subjects. Yi was the collection of measurements to be removed (e.g., Night 3) and Xi was the collection of measurements to be retained (e.g., Night 1 and Night 2). The loss-of-information, defined by normalized mean squared error (MSE) of the residuals, was used to quantify the effect of removing some nights of PSG measurements.

Within-person variation for individual sleep measures was assessed using intraclass correlation coefficients (ICC) with 95% confidence bands. The Bland and Altman approach was used to identify the relationship between the means of 2 nights of sleep measurements and the difference between them.24,25 A sign rank test was used to evaluate the hypothesis that the mean difference between nights was not equal to zero. Intra-individual (within person) and inter-individual (between persons) variation was calculated and placed in a ratio to describe the relative magnitude of each source of variation.

Stepwise regression analyses were use to relate personal characteristics of study participants with sleep characteristics, with a p-value of 0.05 as the inclusion criterion. Goodness of fit of models was assessed graphically and with the Akaike Information Criterion (AIC).

SAS 9.1 (SAS Institute, Cary, NC) and SAS macro facility were used in performing the statistical analyses and plot the findings.


Characteristics of Study Participants

Characteristics of the total sample with PSG were similar to characteristics of the analytical sample of 285 women (Table 1). Women in the analytical sample had a median age of 52 years (IQR = 3) and a median BMI of 27.5 kg/m2, similar to the overall Sleep Study sample.

Comparing characteristics of women having the night 1 visit and at least one additional sleep staging night (n = 361) in relation to having 3 consecutive nights in the temporal order specified by the protocol, SWAN Sleep Study, 2003 to 2005

VariableWith Night 1 and at least one additional sleep staging night, N = 361With 3 consecutive nights in temporal order, N = 285
Median (IQR*)Median (IQR*)
    Age, years52.0 (3.0)52.0 (3.0)
    Body mass index, kg/m228.1 (10.8)27.4 (9.3)
    Physical Activity, continuous score7.8 (2.4)7.8 (2.4)
    Apnea-hypopnea Index, events/h5.0 (10.4)4.9 (10.1)
    Periodic Leg Movement Index, events/h2.3 (4.4)2.4 (4.4)
    Obesity StatusN (%)N (%)
        BMI < 30212 (58%)175 (61%)
        BMI ≥ 30153 (42%)110 (39%)
    Financial Strain (How hard to pay for basics)15 (4%)9 (3%)
        Very hard83 (23%)57 (20%)
        Somewhat hard
        Not hard266 (73%)218 (77%)
        African American136 (37%)94 (33%)
        Chinese59 (16%)48 (17%)
        Caucasian170 (47%)143 (50%)
        ≤ High school61 (17%)48 (17%)
        Some college115 (32%)85 (30%)
        ≥ BS degree184 (51%)148 (53%)
    Health Status
        Worse46 (13%)30 (11%)
        Same106 (30%)83 (30%)
        Better206 (58%)168 (60%)
        Never238 (65%)189 (66%)
        Past87 (24%)66 (23%)
        Current40 (11%)30 (11%)
    Marital Status
        Single57 (16%)44 (16%)
        Married225 (63%)189 (67%)
        Not married76 (21%)48 (17%)
    Restless Legs Syndrome (RLS)
        Any RLS80 (22%)66 (23%)
        No RLS285 (78%)219 (77%)
    CES-D Score
        Not depressed305 (86%)240 (87%)
        Depressed48 (14%)36 (13%)
    Taking Sleep Medications
        No260 (72%)206 (73%)
        Yes99 (28%)75 (27%)
    Menopausal Status
        Pre- or early perimenopause240 (66%)190 (68%)
        Late perimenopause77 (21%)58 (20%)
        Surgical or postmenopause48 (13%)37 (13%)

* IQR, interquartile range.

table icon
Table 1

Comparing characteristics of women having the night 1 visit and at least one additional sleep staging night (n = 361) in relation to having 3 consecutive nights in the temporal order specified by the protocol, SWAN Sleep Study, 2003 to 2005

(more ...)

Information obtained only during Night 1 included the median apnea-hypopnea index (AHI, Night 1), which was 4.9 (IQR = 10.1) episodes/h of sleep, and the median number of periodic leg movements with arousal (PLMAI), which was 2.4 (IQR = 4.4)/h of sleep, respectively.26,27 Sixty-six (23.2%) women self-identified as having restless leg syndrome (RLS).18

Comparisons of PSG Measurements during 3 Consecutive PSG Nights

Mean TST increased from Night 1 (365 min) to Night 2 (391 min) and Night 3 (380 min) (Table 2). Mean SE% for each of the 3 nights were 83%, 85%, and 85%, respectively (Table 2). As seen in Table 2, when comparing Night 1 to Nights 2 and 3, all but 4 measures (S2 %, S1 [min], S2 [min] and Delta %) were different from each other. Three measures (Delta minutes, NUMA, and NREM) were statistically different between PSG Nights 1 and 3, but not PSG Nights 1 and 2. No statistically significant differences in the sleep measures were observed between Nights 2 and 3. Delta (%) was the only variable without significant mean differences across the 3 nights.

Comparisons of selected sleep measurements during 3 consecutive nights with PSG measures, SWAN Sleep Study, 2003 to 2005

1st order difference*
2nd order difference**
Mean (SE)Δ2−1*pΔ3−1*pΔ3−2*pMeanp
    TST (Minutes)364.52 (4.01)26.16< 0.000115.20.005−110.06−37.15< 0.0001
    logSL (Minutes)2.71 (0.05)−0.160.01−0.220.0003−0.050.720.110.26
    logWASO (Minutes)3.82 (0.04)−0.140.003−0.150.003−0.010.990.120.08
    SM (Percent)86.93 (0.43)1.970.00011.890.0007−0.080.99−2.050.01
    SE (Percent)82.79 (0.51)2.47< 0.00012.120.002−0.350.90−2.820.002
    DELTA (Percent)3.32 (0.28)0.230.490.140.90−0.090.94−0.320.26
    DELTA (Minutes)11.76 (1.01)1.950.0011.350.28−0.60.74−2.550.02
    REM (Percent)23.18 (0.37)1.87< 0.00011.630.0005−0.240.91−2.110.002
    REM (Minutes)85.76 (1.81)12.83< 0.00019.130.0002−3.70.20−16.54< 0.0001
    NUMA (Counts)19.74 (0.45)1.420.0030.810.21−0.610.40−2.030.006
    NREM (Minutes)278.76 (3.07)13.330.00036.030.23−7.290.09−20.620.0004

* 1st order difference: Δ2−1 = Night 2−Night 1; Δ3−1 = Night 3−Night 1; Δ3−2 = Night 3−Night 2.

** 2nd order difference = (Night 3−Night 2) − (Night 2−Night 1) = Night 1 − 2 Night 2 + Night 3.

table icon
Table 2

Comparisons of selected sleep measurements during 3 consecutive nights with PSG measures, SWAN Sleep Study, 2003 to 2005

(more ...)

Agreement and Variation in Data According to PSG Night

The daily difference in values and variation in data according to the different PSG nights was evaluated using the Bland and Altman approach to estimate the bias that can be discerned with repeated assessments (in Table 3 and Figure 1). For best agreement between 2 nights, the mean percent difference (or mean difference, night 2 – night 1) between 2 measures should be close to zero, with no significant correlations between the mean values and differences, i.e., the dispersion of the difference scores should be limited. While the signed rank test indicated that most differences between nights were greater than zero (Table 3), significant Bland Altman (BA) correlations were observed only in S1 (%), Delta (min), REM (%), RL (min), and RLMA (min) in comparing Night 2 to Night 1, indicating systematic differences between those 2 nights. No significant BA correlations were observed when comparing Nights 2 and 3, indicating no systematic differences between these nights.

Percent differences (Δ%) in selected sleep measures between nights with p-value of tests showing the difference is not equal to zero, and the Bland-Altman p-value for detecting possible bias, SWAN Sleep Study, 2003 to 2005

Sleep variablesNight 2 versus Night1
Night 3 versus Night 1
Δ%*Rank Test Δ% = 0Bland-Altman pBAΔ%*Rank Test Δ% = 0Bland-Altman pBA
    TST (Minutes)7.40.00000.444.00.00080.99
    SL (Minutes)−13.00.0060.48−18.90.00000.06
    WASO (Minutes)−12.70.00030.72−12.80.00080.64
    SM (Percent)2.30.00000.092.30.00040.15
    SE (Percent)3.10.00000.122.60.00040.23
    DELTA (Percent)
    DELTA (Minutes)18.20.00250.00417.90.0030.005
    REM (Percent)9.70.00000.047.10.00020.13
    REM (Minutes)16.30.00000.4010.50.00010.42
    NUMA (Counts)7.40.00080.943.20.130.14
    NREM (Minutes)5.00.00010.332.00.090.84

* Δ%, percent change of night b vs. night a, 100%×(ba) / ½(a+b)

table icon
Table 3

Percent differences (Δ%) in selected sleep measures between nights with p-value of tests showing the difference is not equal to zero, and the Bland-Altman p-value for detecting possible bias, SWAN Sleep Study, 2003 to 2005

(more ...)

Bland Altman plots* for total sleep time (TST-Top Figure) and sleep efficiency (SE-Bottom Figure) identifying the mean differences and bias between sleep night 1 with instrumentation for assessing respiration and restless legs versus sleep night 2 without that instrumentation, SWAN Sleep Study, 2003 to 2005

*Mean: (night 1 + night 2)/2; Difference: night 2 – night 1; Bias: mean of the Difference. The short dashed lines are +2SD, bias and −2SD, respectively; long dashed lines are slopes.


Figure 1

Bland Altman plots* for total sleep time (TST-Top Figure) and sleep efficiency (SE-Bottom Figure) identifying the mean differences and bias between sleep night 1 with instrumentation for assessing respiration and restless legs versus sleep night 2 without that instrumentation, SWAN Sleep Study, 2003 to 2005

(more ...)

Though the BA correlations for some sleep measures (e.g., TST and SE) were not statistically significant, the mean differences shown in Table 2 and percent change shown in Table 3 and the BA plots (Figure 1) indicated potentially systematic differences between Night 1 and Night 2, especially for SE. TST and SE mean differences between Night 1 and Night 2 were non-zero, indicating a systematic difference between Nights 1 and 2. In contrast, individual PSG measures showed a high degree of agreement of between Nights 2 and 3.

Within- and Between-Person Variation

To describe the variation between nights, Table 4 shows the intraclass correlation coefficients (ICCs) and their 95% confidence intervals. The highest ICCs were in Delta (%) and Delta (min) and were 0.68 and 0.66, respectively, for Night 1 to Night 3. The ICCs for Delta (%) and Delta (min) were 0.78 and 0.77, respectively, for Night 2 to Night 3. Likewise, the ICCs for Delta (%) and Delta (min) were 0.80 and 0.80, respectively, for Night 1 to Night 2 (data not shown). As seen in Table 4, measures of sleep continuity and duration were more likely to have the lower ICCs than selected measures of sleep architecture.

Intraclass correlation coefficients (with upper and lower 95% confidence intervals), intra- and inter-individual variability and the ratio of inter- to intra-individual variability for selected sleep measures, SWAN Sleep Study, 2003 to 2005

Sleep variables(Night1, Night3)
(Night2, Night3)
ICC (95% CI)*Within σ2intra**Between σ2inter**Ratio σ2inter2intraICC (95% CI)*Within σ2intra**Between σ2inter**Ratio σ2inter2intra
    TST (Minutes)0.28 (0.17,0.38)349313300.380.29 (0.18, 0.39)317913280.42
    logSL (Minutes)0.41 (0.31, 0.50)0.450.310.680.39 (0.29, 0.48)0.470.310.65
    logWASO (Minutes)0.35 (0.24, 0.45) (0.42, 0.59)
    SM (Percent)0.28 (0.17, 0.38)39150.400.43 (0.33, 0.52)28210.76
    SE (Percent)0.25 (0.14, 0.36)58190.330.37 (0.27, 0.47)44250.58
    S1 (Percent)0.44 (0.34, 0.53)17130.770.62 (0.54, 0.69)9151.62
    S2 (Percent)0.49 (0.40, 0.57)33310.950.56 (0.48, 0.63)25321.28
    S1 (Minutes)0.47 (0.37, 0.56)1971780.900.63 (0.55, 0.70)1402371.70
    S2 (Minutes)0.46 (0.36, 0.55)148212710.860.45 (0.35, 0.54)140011260.80
    DELTA (Percent)0.68 (0.61, 0.74)7152.110.78 (0.73, 0.82)4.5163.57
    DELTA (Minutes)0.66 (0.59, 0.72)951871.980.77 (0.72, 0.81)672293.41
    REM (Percent)0.31 (0.20, 0.41)29130.450.39 (0.29, 0.48)22140.64
    REM (Minutes)0.20 (0.09, 0.31)7541930.260.31 (0.20, 0.41)6262840.45
    NUMA (Counts)0.49 (0.40, 0.57)30280.950.56 (0.48, 0.63)26331.26
    NREM (Minutes)0.36 (0.25, 0.46)179210080.560.36 (0.25, 0.46)16959560.56
    RL (Minutes)0.37 (0.27, 0.47)14598730.600.43 (0.33, 0.52)11198550.76
    RLMA (Minutes)0.49 (0.40, 0.57)7917520.950.51 (0.42, 0.59)6947091.02

* ICC (95% CI), Intraclass correlation coefficient and 95% CI, ICC = σ2inter/(σ2inter + σ2intra) = σ2between /(σ2between + σ2within).

** σ2intra, intra-individual variability, within subject; σ2inter,inter-individual variability, between subject.

table icon
Table 4

Intraclass correlation coefficients (with upper and lower 95% confidence intervals), intra- and inter-individual variability and the ratio of inter- to intra-individual variability for selected sleep measures, SWAN Sleep Study, 2003 to 2005

(more ...)

We disaggregated the within- and between-woman variation when comparing data from 2 different nights (Table 4). A low within-person variation relative to the between-person variation is generally considered optimal to characterize group differences. Delta % and Delta minutes measures included greater between-person variation relative to the amount of within-person variation. This was associated with markedly greater ratios of inter-individual variation to intra-individual variation (i.e., σ2inter/σ2intra); the ratios comparing Night 1 to Night 3 were 2.11 and 1.98, respectively (Table 4). Other sleep measures had substantially more within-person variation than between-person variation and lower ratios (i.e., 0.38 [TST] and 0.33% [SE]).

Loss-of-Information: Two Sleep-Staging PSG Nights or One Sleep-Staging PSG Night

When it was identified that there was high correlation between these measures according to night, it was logical to consider how much less variation is explained should the number of study nights be reduced. The amount of information lost (less variation explained) was about 23.6% or 23.5% of the total variation if only “Night 1 + Night 2” or “Night 1 + Night 3” were used. In contrast, removing any 2 of 3 nights could result in the loss of more than half the information.

Participant Characteristics and Night-to-Night Sleep Measures Variability

Characteristics associated with having greater differences in the measures of Night 1 vs. Night 2 (i.e., Night 2 – Night 1) included obesity, financial strain, race/ethnicity, marital status, smoking, and PLMAI (see Table 5). While education, menopause status, and physical activity were evaluated, these were not significantly related to differences in measures between Nights 1 and 2 (data not shown). Participant characteristics were not associated with differences in night-to-night variability for Delta (min), NREM (min), NUMA, REM (min), S2 (min), and TST (min).

Beta estimates (with standard errors [SE] and p-values for the beta estimate) and standardized beta coefficients for participant characteristics in relation to selected sleep measures between night 2 and night 1, SWAN Sleep Study, 2003 to 2005

Sleep measuresParticipant characteristicsVariable levelsBeta estimates
BetaSEp valueBeta(s)
    logSL (Minutes)Financial strain (Difficulty paying for basics?)very hard0.31ns7.960.970.002
somewhat hard10.05*3.630.0060.17

    S1 (Minutes)BMIBMI ≥ 305.03*1.910.0090.16

    S1 (Percent)BMIBMI ≥ 301.36*0.550.010.15

    DELTA (Percent)PLMAI0.09*0.030.0090.16
Marital statusSingle−1.15*0.510.03−0.14

    REM (Percent)RaceAfrican American−1.93*0.910.03−0.14

[i] nsp > 0.05; *p < 0.05; **p < 0.001; ***p < 0.0001. Intercepts were not reported in the table. Beta(s), standardized Beta.

[ii] Only significant factors (p < 0.05) were retained in parsimonious models using stepwise selection.

table icon
Table 5

Beta estimates (with standard errors [SE] and p-values for the beta estimate) and standardized beta coefficients for participant characteristics in relation to selected sleep measures between night 2 and night 1, SWAN Sleep Study, 2003 to 2005

(more ...)


In evaluating the temporal night effect and patterns of variation arising from in-home PSG-derived measures with repeated nights of assessment, small but statistically significant differences in measures of sleep were observed when comparing Night 1 of data collection with Nights 2 and 3. Little or no difference was observed when comparing data from Nights 2 vs. 3. Mean total sleep time (TST) was increased 4% to 7% from 365 minutes in Night 1 to 391 minutes and 380 minutes, respectively, in Nights 2 and 3. All sleep measures on Night 1 were significantly different from nights 2 and 3, except for S2 (%), S1 (minutes), and Delta %.

We used multiple nights of evaluation to address whether the degree of intra-individual variation exceeded the between-person comparisons. To that end, we reported the between- and within-person variation and the ratio of these two measures. The within-person variation for most sleep measures was greater than the between-person variation, when comparing nights with different amounts of monitoring instrumentation. The exception included measures of Delta % and Delta minutes, indices of sleep architecture, for which substantial within- and between-woman variation were observed while comparing nights with differing levels of instrumentation. This may help to explain why there is less discernible night-to-night variation was detected in these measures when nights have different levels of monitoring instrumentation.

These data do not suggest a major advantage of having more than two repeated measures to characterize habitual sleep with a sample size of approximately 280 women unless the focus of the assessment is on sleep architecture. These data also suggest that clinical studies focused on sleep architecture, and specifically Delta measures, might consider that only a single night is needed to describe specific patient groups. This would then reflect that this group of sleep measures had greater between-person variability (as compared to within-person variation) in comparison to measures of sleep duration and continuity.

While PSG is widely regarded as the gold standard in assessing sleep, its administration, even as an in-home study, also generates an environment that may curtail “usual” sleep. The SWAN Sleep study implemented an in-home protocol to be able to characterize the participants' usual sleep and mitigate the “first night” effects (FNE) that include sleeping in an unfamiliar environment and anxiety about being observed.2832 FNEs have been reported to result in less total sleep time (TST), greater rapid eye movement (REM), lower sleep efficiency (SE), more intermittent wake time, and longer REM latency.9 Le Bon9 studied two consecutive PSG nights in 83 patients with chronic fatigue syndrome and observed significant differences in SPT, TST, SE, SE minus sleep onset latency, REM sleep, sleep onset latency, and RL, and concluded that no single sleep variable could summarize the FNE. In a study of 36 healthy adults,30 Sforza indicated that the FNE on arousal response was affected by individual susceptibility and circadian and homeostatic influences.

While participants were evaluated in-home, instrumentation to monitor respiration, sleep disordered breathing and leg movements were included in the Night 1 protocol, elements that were not present in Nights 2 and 3. Though we identified statistically significant night-to-night variation, particularly of Night 1 vs Nights 2 and 3, the magnitude of the differences was smaller than expected. Depending upon the sleep measure of interest, the differences were as little as 2% but not more than 20%. Study night or the presence of FNE was confounded with the PSG assessment protocol (i.e., the Night 1 assessment included additional electrodes and monitors to assess sleep disordered breathing and limb movements, whereas fewer signals were collected during Nights 2 and 3). Different PSG montages were used on the screening night and subsequent study nights. It is possible that the montage itself contributed to the difference in sleep from the first to the second and third nights, thereby confounding the potential effects due to montage and those due to study order. For instance, the greater amount of instrumentation on the first night may have led to greater sleep disruption, above and beyond any night order effect. However, many other studies that have used identical montages across multiple nights have reported similar findings to our own, making this explanation somewhat less compelling. A different study design would be required to disaggregate the effect of wearing additional monitoring devices as compared to night effect.

It should be noted that sleep stage scoring in our study differed from that currently recommended by the AASM scoring rules. In particular, we used 20-second scoring epochs, and all stages were scored using a central EEG derivation. These methods may limit generalizability for current research and clinical practice. On the one hand, shorter scoring epochs may lead to less misclassification of sleep and wakefulness: since individual epochs may contain up to 50% of another sleep-wake stage, 20-second epochs contain a maximum or 10 seconds of misclassified data, whereas 30-second epochs contain a maximum of 15 seconds of misclassified data. On the other hand, use of a single EEG derivation may provide less precise staging, since some EEG characteristics are better defined in occipital (alpha rhythm) or frontal (delta activity) derivations.

Characteristics associated with having greater differences in the measures of Night 1 vs. Nights 2 or 3 included financial strain, being obese, and past smoking. These are consistent with a newly reported cross-sectional evaluation of a sample representative of the US population and based on interview rather than instrumented monitoring.33

As aforementioned, 83 women who had at least one polysomnogram recording were excluded in this particular analysis. In order to evaluate the possible bias of missing these women (22.4% of the cohort, 22.6% of those with at least one night PSG data), a simulation study was conducted using TST as the outcome, night as the independent variable, and BMI, difficulty in paying for basics, and smoking as covariates. The simulation process was performed on the combination of sample size and strategies using three nights' PSG data following temporal sequence (night 1 and night 2; night 1, night 2 and night 3; night 1, mean of night 2 and night 3). The simulation results showed that about 125 women gave 95% coverage probability to detect “night” effect for all 3 strategies. The coverage probability curve as a function of sample size and strategies increased monotonically leading to the coverage probabilities approaching to 100% after about n = 200. This hints that the bias was minimized by using 285 women, even though 83 women were lost due to non-scorable night(s) or improper temporal sequence.

This study has strengths and limitations. It was an in-home-based PSG study with a substantial sample size and multiple nights of assessment. The ability to disaggregate the sources and magnitude of the night-to-night variability in this study allowed us to reduce the number of study nights in a follow-up study, implemented three years after the baseline. This study was conducted in a sample of healthy middle-aged women to characterize the normal physiological and psychological events of the menopause in relation to sleep, but care must be exercised in extrapolating these findings to studies directed toward samples or study designs that are highly enriched for sleep pathology (i.e., a case-control study of insomnia). Study night was confounded with the PSG assessment protocol, so the effect of this additional monitoring could not be quantitatively differentiated from the anxiety of the FNE of participating in a sleep study.

As noted above, sleep disordered breathing and leg movements were not monitored on Nights 2 and 3. Intra-individual night-to-night variability in sleep disordered breathing as well as periodic leg movements in sleep is an unresolved concern in clinically assessing these sleep disorders, even when accurately measured, recorded, and analyzed.6 Because both are associated with sleep fragmentation and loss, to presume night-to-night consistency without monitoring for respiratory events and leg movements can result in residual confounding of measurements of PSG sleep macrostructure.34 Thus, individual night-to-night differences can be a substantial source of confounding and a limitation in our analysis of variability, and should be considered in determining how many nights of PSG are needed to assess sleep and sleep disorders.34 Other sources of variation also could not be disaggregated. While African American women were recruited by three sites participating in the Sleep Study, only one site recruited Chinese women, a condition imposed by the design of the parent study. However, this precluded us from disaggregating sources of variation associated specifically with the site and race/ethnicity. This study focused on middle-aged healthy women, so while the study group did not include men, information about sources of variation from a large community-based sample should be helpful in anticipating the sources of variation that might be considered in designing studies of men or the elderly.

In summary, when we evaluated the night-to-night variation, the temporal night effect and patterns of variation arising from in-home PSG data with three repeated nights of study, modest differences in information were obtained on 3 nights of in-home sleep measures, even when Night 1 included a sleep staging and sleep disorder montage with additional instrumentation, whereas Nights 2 and 3 included a sleep staging montage with less instrumentation. This led us to conclude that two nights of in-home sleep assessment with an appropriate sample size can provide robust parameter estimates of sleep duration, continuity, and architecture in community samples. A study that focuses on measures of sleep architecture in which higher between-woman variability was demonstrated relative to the amount of within-woman variation may require a smaller sample size. Personal characteristics associated with greater variability between first and second night measures included smoking, obesity, and financial strain. Understanding the sources of variation can help in planning and developing laboratory and community-based studies of sleep and allow investigators to select appropriate statistical analyses to optimize the identification of important sleep relationships while minimizing bias.


This was not an industry supported study. Dr. Buysse has served as a paid consultant for the following companies: Actelion, Cephalon, Eisai, Eli Lilly, GlaxoSmithKline, Merck, Neurocrine, Neurogen, Pfizer, Philips, Sanofi-Aventis, Sepracor, Servier, Somnus Therapeutics, Takeda, and Transcept Pharmaceuticals, Inc. He has helped produce CME materials and has given paid CME lectures indirectly supported by industry sponsors. The other authors have indicated no financial conflicts of interest.



apnea/hypopnea events per hour of sleep


anatomical therapeutic chemical




body mass index


[(S3 + S4)/TST]*100%, or equivalently SWS% = slow wave sleep%: S3% + S4%

DELTA (min)

# minutes in stages 3 and 4 sleep from sleep onset to GMT, or equivalently SWS(min) = slow wave sleep: S3 + S4


Center for Epidemiologic Studies Depression Scale










first night effect


good morning time, wake up


good night time, turn light off and ready to sleep


intraclass correlation coefficients


interquartile range


mean squared error

NREM (min)

# minutes of NREM (non-REM): S1 + S2 + S3 + S4


# awakenings after sleep onset lasting ≥ 11 seconds (> 50% of a 20-sec epoch)


periodic leg movement


periodic leg movement index per hour of sleep with arousals




rapid eye movement



REM (min)

# minutes in REM sleep

RL (min)

REM latency in minutes

RLmA (min)

REM latency minus Awake in minutes


restless legs syndrome


# minutes in stage i sleep, i = 1, 2, 3, 4


percent in stage i sleep, (Si/TST)*100%, i = 1, 2, 3, 4

SE (%)

sleep efficiency, (TST/TRP)*100%

SL (min)

sleep latency, time in minutes elapsed from GNT to sleep onset

SM (%)

sleep maintenance, (TST/SPT)*100%

SPT (min)

sleep period time; the total time in minutes from sleep onset to GMT


Study of Women's Health Across the Nation

TRP (min)

total recording period; total amount of time in minutes from GNT to GMT

TST (min)

total sleep time; the total time asleep in minutes

WASO (min)

total time awake in minutes after sleep onset


This work was supported by the SWAN Sleep Study from the National Institute on Aging (AG019360, AG019361, AG019362, AG019363). The Study of Women's Health Across the Nation (SWAN) has grant support from the National Institutes of Health (NIH); DHHS, through the National Institute on Aging (NIA); the National Institute of Nursing Research (NINR); and the NIH Office of Research on Women's Health (ORWH) (NR004061, AG17104, AG017719, AG012505, AG012535, AG012531, AG012539, AG012546, AG012553, AG012554, AG012495). Sleep data were processed with the support of RR024153. The content of this article is solely the responsibility of the authors and does not necessarily represent the official views of the NIA, NINR, ORWH or the NIH. Institutions where work was performed: Rush University Medical Center, Chicago, IL; University of Michigan, Ann Arbor, MI; University of California-Davis, Davis, CA; University of Pittsburgh, Pittsburgh, PA.

In Memoriam: The authors thank Dr. MaryFran Sowers for her significant contributions to the NIH-funded Study of Women's Health Across the Nation (SWAN) and SWAN Sleep Study, as well as leading the work as the principal investigator at the University of Michigan. She significantly expanded the scientific community's understanding of menopause, aging, osteoporosis and bone health in women as principal investigator for the NIH-funded SWAN Study and the Tecumseh-based Michigan Bone Health and Metabolism Study (MBHMS). Dr. Sowers was the John G. Searle Professor of Public Health and the founder of the Center for Integrated Approaches to Complex Diseases (CIACD) at the University of Michigan at Ann Arbor. She was a pioneer of women's health issues in epidemiology and public health with groundbreaking works in characterizing reproductive aging and the multifaceted diminishing ovarian reserve and then relating this characterization to the changes in musculoskeletal, cardiovascular systems and the constellation of diseases surrounding obesity. Dr. Sowers passed away July 17, 2011 in Ann Arbor, MI.



AARC-APT (American Association of Respiratory Care-Association of Polysomnography Technologists) clinical practice guideline. Respir Care. 1995;40:1336–43. [PubMed]


Bloch KE, author. Polysomnography: a systematic review. Technol Health Care. 1997;5:285–305. [PubMed]


Madani M, Frank M, Lloyd R, Dimitrova DI, Madani F, authors. Polysomnography versus home sleep study: overview and clinical application. Atlas Oral Maxillofac Surg Clin North Am. 2007;15:101–9. [PubMed]


Normal SY, author. Polysomnography in children and adolescents. Chest. 2005;127:1080[PubMed]


Chesson AL Jr, Ferber RA, Fry JM, et al., authors. The indications for polysomnography and related procedures. Sleep. 1997;20:423–87. [PubMed]


Kushida CA, Littner MR, Morgenthaler T, et al., authors. Practice parameters for the indications for polysomnography and related procedures: an update for 2005. Sleep. 2005;28:499–521. [PubMed]


Su S, Baroody FM, Kohrman M, Suskind D, authors. A comparison of polysomnography and a portable home sleep study in the diagnosis of obstructive sleep apnea syndrome. Otolaryngol Head Neck Surg. 2004;131:844–50. [PubMed]


Tamaki M, Nittono H, Hayashi M, Hori T, authors. Spectral analysis of the first-night effect on the sleep-onset period. Sleep Biol Rhythms. 2005;3:122–9


Le Bon O, Minner P, Van Moorsel C, et al., authors. First-night effect in the chronic fatigue syndrome. Psychiatry Res. 2003;120:191–9. [PubMed]


Le Bon O, Staner L, Hoffmann G, et al., authors. The first-night effect may last more than one night. J Psychiatr Res. 2001;35:165–72. [PubMed]


Marzec ML, Selwa LM, Malow BA, authors. Analysis of the first night effect and sleep parameters in medically refractory epilepsy patients. Sleep Med. 2005;6:277–80. [PubMed]


Sforza E, Haba-Rubio J, authors. Night-to-night variability in periodic leg movements in patients with restless legs syndrome. Sleep Med. 2005;6:259–67. [PubMed]


Edinger JD, Fins AI, Sullivan RJ, et al., authors. Sleep in the laboratory and sleep at home: comparisons of older insomniacs and normal sleepers. Sleep. 1997;20:1119–26. [PubMed]


Edinger JD, Glenn DM, Bastian LA, et al., authors. Sleep in the laboratory and sleep at home II: comparisons of middle-aged insomnia sufferers and normal sleepers. Sleep. 2001;24:761–70. [PubMed]


Sowers MF, Crawford S, Sternfeld B, et al., authors; Wren J, Lobo RA, Kelsey J, Marcus R, editors. Design, survey sampling and recruitment methods of SWAN: A multi-center, multi-ethnic, community-based cohort study of women and the menopausal transition. Menopause: biology and pathobiology. 2000. Academic Press; p. 175–88


Hall MH, Matthews KA, Kravitz HM, et al., authors. Race and financial strain are independent correlates of sleep in midlife women: the SWAN sleep study. Sleep. 2009;32:73–82. [PubMed Central][PubMed]


Sowers MF, Zheng H, Kravitz HM, et al., authors. Sex steroid hormone profiles are related to sleep measures from polysomnography and the Pittsburgh Sleep Quality Index. Sleep. 2008;31:1339–49. [PubMed Central][PubMed]


Allen RP, author. The resurrections of periodic limb movements (PLM): leg activity monitoring and the restless legs syndrome (RLS). Sleep Med. 2005;6:385–7. [PubMed]


Quan SF, Griswold ME, Iber C, et al., authors. Short-term variability of respiration and sleep during unattended non-laboratory polysomnography: the Sleep Heart Health Study. Sleep. 2002;25:843–9. [PubMed]


Rechtschaffen A, Kales A, authors. A manual of standardized terminology, techniques and scoring system for sleep stages of human subjects. 1968. Washington, DC: U.S.Government Printing Office, Department of Health Education and Welfare. NIH Publication 204.


World Health Organization Scientific Group. Research on the menopause in the 1990s. Report of a WHO Scientific Group. World Health Organ Tech Rep Ser. 1996;866:1–107. [PubMed]


World Health Organization. Guidelines for ATC Classification. Accessed December 10, 2007Available at:


Srivastava MS, author. Methods of multivariate statistics. 2002. New York: John Wiley & Sons, Inc.;


Bland JM, Altman DG, authors. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet. 1986;1:307–10. [PubMed]


Donner A, author. A review of inference procedures for the intraclass correlation coefficient in the one-way random effects model. Int Stat Rev. 1986;54:67–82


American Academy of Sleep Medicine Task Force. Sleep-related breathing disorders in adults: Recommendations for syndrome definition and measurement techniques in clinical research. Sleep. 1999;22:667–89. [PubMed]


The Atlas Task Force. Recording and scoring leg movements. Sleep. 1993;16:749–59


Rechtschaffen A, Verdone P, authors. Amount of dreaming: effect of incentive, adaption to laboratory, and individual differences. Percept Mot Skills. 1964;19:947–58. [PubMed]


Agnew HW Jr, Webb WB, Williams RL, authors. The first night effect: an EEG study of sleep. Psychophysiology. 1966;2:263–6. [PubMed]


Sforza E, Chapotot F, Pigeau R, Buguet A, authors. Time of night and first night effects on arousal response in healthy adults. Clin Neurophysiol. 2008;119:1590–9. [PubMed]


Suetsugi M, Mizuki Y, Yamamoto K, Uchida S, Watanabe Y, authors. The effect of placebo administration on the first-night effect in healthy young volunteers. Prog Neuropsychopharmacol Biol Psychiatry. 2007;31:839–47. [PubMed]


Scholle S, Scholle HC, Kemper A, et al., authors. First night effect in children and adolescents undergoing polysomnography for sleep-disordered breathing. Clin Neurophysiol. 2003;114:2138–45. [PubMed]


Krueger PM, Friedman EM, authors. Sleep duration in the United States: a cross-sectional population-based study. Am J Epidemiol. 2009;169:1052–63. [PubMed Central][PubMed]


Van Dongen HPA, Vitellaro KM, Dinges DF, authors. Individual differences in adult human sleep and wakefulness: Leitmotif for a research agenda. Sleep. 2005;28:479–96. [PubMed]