What Patients Observe about Teamwork in the ED: Development of the PIVOT Questionnaire
For this study, we applied a many-facet Rasch model to acquire three indices used to evaluate content validity —observed averages, point-measure correlation and item outfit statistics. These indices, described in greater detail by Wolf and Smith[41] are summarized here:
- Observed averages. The observed average for each of the 21 items indicates the participants’ averaged ratings. Higher observed averages suggest high perceived feasibility and utility for that particular item of the PIVOT survey, while lower observed averages suggest lower perceived feasibility and utility.
- Point-measure correlations. The point-measure correlation provides a Pearson correlation between the vector of scores on an item and the perceived feasibility/utility of the item. The index identifies the degree in which the scores on an item are consistent with the averaged scores of the remaining items. A positive point-measure correlation is ideal and indicates that the particular item contributed useful information to the construct measured by the test as a whole. Here, a negative value for a particular item may suggest that the item may be measuring a different construct than the other items, and fail to offer evidence of content validity.
- Item outfit statistics. In order to evaluate evidence of content validity, we reviewed item outfit standardized and unweighted mean-square statistics. Doing so, allowed us to examine the level of agreement of participants’ feasibility and utility ratings toward each of the 21 items considered for inclusion in the final PIVOT survey. As described by Linacre, standardized outfit statistics (Z-Standardized) are t-tests of the hypothesis “do the data fit the model perfectly, with an expected value of 0.0, and acceptable values of typically range from -2.0 to 2.0.[42] Negative values indicate a high level of predictability in responses, while positive values suggest less predictability. Values over 2.0 indicate particular items may not contribute to the intended construct and should be considered for removal. The unweighted mean-squared item outfit statistic is sensitive to outlying (extreme) ratings, so can help identify idiosyncratic rating patterns, as well as problematic measurement conditions, such as multidimensionality and poorly written items. Outfit mean-square statistics range from 0 to infinity. With an expected value around 1.0, outfit mean-square values greater than 1.3 suggest that responses are so variable they may lack sufficient agreement to be meaningful. These conditions can distort or degrade the measurement system, can threaten evidence relevant to content validity.
We used these item outfit statistics to explain the level of participants’ agreement in ratings. Then categorized survey items into three action groups- (a) Keep, (b) Evaluate, and (c) Remove. The criterion for prescribed actions is described below:
- Keep: These items have high [>3.5 (3 = Don’t Know / 4 = Agree)] observed averages indicating participants’ feasibility and utility ratings are high, on average. Additionally, items with outfit Z. Stnd that fall <2.0, and outfit MS values that are less than or equal to 1.3 indicate a high level of rater consistency.
- Evaluate: Regardless of observed averages, items with low level of agreement (indicated by outfit Z. Stnd that fall > 2.0, and outfit MS values that are > 1.3). Given the participants’ inability to agree on these items, evaluation of item for ambiguous language and a review of the response option frequencies may help guide the decision about these particular items.
- Remove: These items have observed averages below 3.4 (raters may not agree that these items have adequate feasibility and/or utility, on average) and high level of agreement (indicated by outfit Z. Stnd that fall <2.0, and outfit MS values that are less than or equal to 1.3).