Printed from:

Measuring Team Development in Clinical Care Settings

Ronald Stock, MD, MA; Eldon Mahoney, PhD; Patricia A. Carney, PhD

Background and Objectives: Our objective was to describe the psychometric properties of a measure of team development that can be used to assess and guide team functioning in health care settings.

Methods: The Team Development Measure (TDM) is a 31-item questionnaire constructed using the Rasch rating scale measurement model. We conducted an Mplus exploratory factor analysis using data collected from 1,194 individuals representing 120 different teams. Team size ranged from three to 39 members from rural and urban inpatient and ambulatory health care settings. Here we characterize the domains of teamness, while taking into account the development of teams over time.

Results: The TDM was found to have good psychometric properties with little measurement error and a Rasch person reliability of 0.95. Overall Cronbach’s alpha was 0.97. An Mplus exploratory factor analysis combined with the stochastic nature of the Rasch model suggests a developmental sequence in building teams consisting of four sub-domains with the following mean item difficulty scores: cohesion=40.5, standard deviation (SD)=2.68, communication=49.3 (SD=2.78), roles and goals=52.7 (SD=2.74), and team primacy=53.3 (SD=1.06). This pattern suggests cohesiveness is an initial element for team development, followed by communication, roles and goals clarity, and team primacy.

Conclusions: We developed and tested a measure of team development that has strong psychometric properties. This tool could be used to study how team functioning affects clinical outcomes and as a quality improvement tool to improve team function.

(Fam Med 2013;45(10):691-700.)

Although working in teams is a central component of the patient-centered medical home,1 emphasizing teamwork in health care is not new.2 Most, if not all, health care systems and practices strive for highly developed teamwork. However, the practical application of team-based care in primary care settings has achieved only modest penetration into mainstream health care. As the American health care system continues to transform, there is consensus that team-based care is foundational to improving access, quality, and cost of care.3 Team-based care is critical to the creation of a care delivery system in which health professionals are intentionally configured to take collective responsibility for patient and population health outcomes.1,4 Team-based care holds promise as an important strategy to improve patient outcomes and to increase the capacity and sustainability to provide primary care. However, despite this potential, our understanding of the impact of teams and the ways they contribute to key health care outcomes is limited by the lack of robust, validated measures of the effectiveness or quality of team functioning, particularly in primary care.5

Decades of research, both in health care5,9-13 and outside of health care,6-8 have shown the importance of using groups of people working toward a common goal. While the term “team” is part of the organizational language in health care, work groups are often regarded as teams simply because they are labeled as such. Placing a group of individuals from different disciplines in the same room does not mean they will function as a team. There are numerous definitions of highly effective health care teams, but at the most basic level it requires two or more individuals with specific roles to perform interdependent tasks and share a common goal. Team members must have the skills to share information (communicate) to enable the team to coordinate the care necessary for effective and efficient patient care.10,11 Because most medical professionals are not trained in team skills and behaviors, facilitating team development and training is required. A conceptual model of teamwork that includes individual team member competencies, attributes specific to team behaviors, and organizational support for teamwork has been proposed.10 Each team member brings individual skills, knowledge, attitudes, and necessary behaviors to the practice team, but these are insufficient to fully define a team and generate the necessary outcomes of a team-based care approach. There is widespread agreement that two types of variables influence health care team performance: (1) attributes of the team itself and (2) attributes of the larger organizational context within which the team functions.11,12 While tools have been developed to assess individual team member behaviors and skills using standardized observational and self-assessment instruments,10 few instruments have been developed that use self-reported team member perceptions to evaluate the extent to which known team attributes are present in a team.

A recent review of survey instruments designed to measure teamwork in health care settings14 identified 35 surveys, nine of which met criteria for psychometric validity. Most of these surveys measured domains related to communication, coordination, and respect, though variability exists in the dimensions of teamwork that were assessed. The number of items from the psychometrically tested surveys ranged from five to 54 items. These existing measures have strengths and weaknesses based on the specific behavioral processes being assessed. An important challenge in studying teamwork is that the items can refer to many different behavioral processes that occur along the course of developing teamwork, which makes choosing an instrument based on the context of the study very important. The Agency for Healthcare Research and Quality recently developed a measure grounded in the hospital patient safety culture literature that assesses individual perceptions of group level team skills and behaviors.15 While it has shown utility in evaluation of the TeamSTEPPS team development process, it appears to have potential for use in other settings. Further, at the clinic practice level, there is still much to be learned about which measurement tools are best used to evaluate team care and its relationship to health outcomes.

We used existing literature10-13 and extensive experience working with teams to develop an instrument designed to measure team functioning from the perspective of individual members within the team. Here our intent is to describe the psychometric testing we conducted to evaluate the properties of a 31-item questionnaire used to measure team attributes and, more specifically, team development. The purpose of the Team Development Measure (TDM) is to measure the degree to which a team has components needed for highly effective teamwork in place.




Survey Instrument Design and Development

The initial questionnaire contained 39 items derived from literature on health care teams where attributes important to the building and maintenance of teams were identified.11-13 These variables were supplemented with the authors’ experience conducting field work with teams providing clinical care. The tool was designed to include a 4-point Likert-like scale response format (strongly disagree=1, disagree=2, agree=3, strongly agree=4). These items were operationalized into a questionnaire that was tested for the degree to which they measured team development. Four of the items were reverse scored. The 39 items were initially tested with five teams that included a total of 73 members. Based on the initial analysis, two items were dropped due to poor item fit (where fit values were above the upper control limit of the fit statistic range of 0.5–1.5016). Both dropped items were designed to assess leadership. A qualitative investigation at the time suggested that the term “leader” was interpreted by many to mean formal organization leaders, rather than rotating, task-specific, leaders within the team. A subsequent item was developed to change the wording to “lead” (“On this team, the person who takes the lead differs depending on who is best suited for the task”). To reduce response burden, eight items that calibrated similarly were dropped, which did not significantly decrease the precision of the measure. The final instrument includes 30 items from the original set plus a new item on leadership resulting in a 31-item questionnaire (Table 1).


Setting for Psychometric

Participants in the full psychometric analysis included 1,194 team members representing 120 different working health care teams. These teams practiced within a nonprofit health care system that includes urban and rural critical access hospitals and outpatient clinics across Oregon, Washington, and Alaska. Teams ranged in size from three to 39 members, with a mean of 9.95 and median of 8.0. Team composition included physicians, nurses, receptionists, administrators, and ancillary therapists. The response rate for completion of the test instruments was high, with 85% of participants answering all 31 questions and 95% answering 29 or more questions. No correlation was found between team size and number of questions answered (r=-0.046, P>.05). All TDM data collected were anonymous. Oregon Health & Science University’s Institutional Review Board reviewed this study and granted an exemption under 45 CFR 46.102(f).

Psychometric Analysis

Psychometric analyses involved a two-step process. In the first step, an MPlus promax exploratory factor analysis19 with items treated as ordinal variables was conducted to determine the number of underlying dimensions contained in the item set and to identify the subset of items that corresponds to each of the underlying dimensions. In the second step, a Rasch rating scale measurement model16-18 was applied with Winsteps16 software to transform ordinal numeric results to an interval score reflecting a range from 0–100. To assure that the created measurement scale is in fact equal interval (like a “yardstick”), the metric for all Rasch analysis is in logits (log odds units). To convert logit values to a user-friendly 0–100 scale (yardstick) from the summated 1–4 response category data, two components of conversion information are supplied by Winsteps: (1) the mean or center of the scale in 0–100 units, which in this study was 49.02 and (2) the size of one logit in 0–100 units, which in this study was 6.02. While, by definition, the center of a logit scale is zero and the size of a logit is one logit, the use of this conversion transforms logits into a 0–100 range where zero is the lowest possible score and 100 is the highest possible score, and the 0–100 remains an equal interval score.

For a comprehensive discussion of Rasch measurement models compared to other approaches, see Massof.17 Briefly, the Rasch model is a stochastic (represents random observations) mathematical statement of measurement that specifies that the probability of a respondent giving a specific response to a measurement item is a function of the ability of the respondent to agree or disagree with the item minus the difficulty of the item minus the difficulty of the response category.18 When a set of observations fits the model, the result is a true linear measure constituting a real number line representing the quantity of the construct present (eg, the amount of “teamness”). The location of an item on the scale represents the difficulty of the item or, in this case, how much team development a team member perceives as present to be able to endorse the item. The location of the individual team members on the scale is referred to as their “ability” or how much team development is present.

For the measure to be useful, all items must fit a single real number line (be uni-dimensional) as assessed by mean square error fit statistics.17 Item fit is the degree to which responses to an item fit expectations, given the ability of the respondent and the difficulty of the item. When this expectation is not met, the item is not contributing useful information in construct measurement. The overall precision of the measure is assessed by Rasch person reliability. One minus person reliability is the amount of error in measuring the construct.17 Mean square error fit statistics estimate the degree to which responses match what is expected given the calibrated item difficulty and person ability. A fit coefficient of 1.0 indicates perfect fit to Rasch model expectations. Fit values greater than 1.50 indicate the item is measuring something other than the intended construct. Fit values less than 0.5 indicate a lack of expected variability in responses (persons of very different ability are giving the same answer).16 Typically, few items are expected to have a perfect fit.

Winsteps software provides two estimates of measurement error.16,17 Real person reliability assumes that all measurement error is due to a misfit of the data to the model (worst case reliability). Model person reliability assumes that all errors of measurement are due to chance (best case reliability). The “true” reliability of a measure is somewhere between real and model. Since both persons and items have a location on the measurement scale, two item fit values were calculated. Infit is most sensitive to unexpected responses when the item scale location is close to the person’s scale location.17 Outfit is most sensitive to unexpected responses when item scale location is far from the person’s scale location.17 In addition, items are ideally spread across the scale with no large gaps. Very few items were left incomplete by respondents. However, since Rasch is a stochastic model, missing data is irrelevant. Thus, for the psychometric analysis, missing raw data were recoded using a “9” in the dataset, per Winstep protocol.




Factor Structure

Results of the Mplus19 promax exploratory factor analysis best solution indicated four factors (standardized root mean square residual=0.045) (Table 2). The item content of the four factors suggests that factor one is communication (n=14 variables), factor two is roles and goals clarity (n=4 variables), factor three is cohesion (n=4 variables), and factor four is primacy of the team (n=2 variables). Twenty-four of the 31 items loaded to a factor with five items not loading to any factor, though these were used in the Rasch Model. Alpha coefficients for each domain ranged from 0.76–0.94.


Person Reliability

Table 3 shows the number of teams and the within-team person reliability coefficients. For the 120 teams, neither real (r=0.136, P>.05) nor model (r=0.171, P>.05) person reliability was related to the number of persons within a given team. Real person reliability of the TDM in this psychometric analysis was 0.95, and model reliability was 0.96. Cronbach’s alpha for the scale overall was 0.97.


Item Fit and Difficulty

Table 4 shows item difficulties (measure) and fit statistics. The TDM measurement is scaled as 0–100, where zero is the lowest possible score and 100 the highest possible score. As demonstrated in Table 4, both infit and outfit for all items were found to be acceptable based on a fit statistics range of 0.5–1.5.16 The majority of items had fit values between 0.8 and 1.2, representing very good fit and contributing to the high person reliability. Items 3 and 27 have marginal fit, but deleting either item decreases person reliability.


Since the Rasch model is stochastic, the difficulty order of items suggests the developmental character of team development. The easiest item, with a mean calibrated value of 36.6, is item 22 (“I enjoy being in the company of the other members of the team”). Agreeing to this item requires the least amount of team development. The most difficult item is 27 (“Some members of this team resist being led”). Agreeing to this item requires the most amount of team development.

Table 5 outlines the distribution of transformed scores using the Rasch approach. The mean 0–100 difficulty of items loading within each of the four factors is: cohesion=40.5 (standard deviation [SD]=2.68), communication=49.3 (SD=2.78), roles and goals=52.7 (SD=2.74), and team primacy=53.3 (SD=1.06). This pattern suggests that cohesiveness is an initial element for team development, followed by communication, then roles and goals clarity, and team primacy. The Table 5 graphical depiction readily shows that some item-response combinations are more difficult than others even though the item domain may, in general, be easier. The eight stages were identified by naturally occurring break points of the item-response category combinations on the 0–100 scale. Since the concept is development, it is reasonable to consider that a team putting components in place and answering “agree” is clearly less “in place” than a team in which team members are more likely to respond to an item with “agree strongly.”






This study found that the TDM has strong psychometric properties and is valid, based on its development using existing literature and extensive field experience, and is a reliable instrument for measuring perceptions of team development. It is potentially a tool for assisting health care team members understand the extent that attributes of teamwork are present within their group. We used two rounds of testing with two independent groups of teams to assess its psychometric parameters. The initial test was used to conduct the data reduction step and then a second test was conducted to complete a full psychometric analysis of the tool. Importantly, it holds its psychometric strength in groups as small as three members and as large as 30. It was developed and tested in a variety of health care settings including ambulatory, hospital based, and administrative health care teams in both rural and urban environments. The TDM measures four domains, two of which are similar to domains identified in the recent systematic review,14 such as communication and roles/goals. However, the TDM was designed to measure teamwork development and, thus, it also assesses cohesion and team primacy, both of which are crucial to the development of teams. With the increased focus of team-based care as part of the patient-centered medical home,22 and health care reform initiatives in general, the ability to accurately measure team development is vitally important.

This paper presents detailed methods on factor analysis and use of the Rasch approach for score conversion. While the topic of measuring team development is clearly important for family medicine research and practice, the methodological detail provided here may be difficult to understand and interpret without training in psychometrics. We have provided this detail because instrument design, development, and testing contributes to reliability and validity of measures, which are important for accuracy. Family physicians often work with teams of investigators, including biostatisticians, as part of their scholarly work. We hope this paper will foster the use of a common language to make it easier to dialogue about instrument testing. We were unable to provide a tutorial on factor analysis and Rasch modeling in this paper due to space constraints, but we do provide citations that can be used for further study.20,21

The development of accurate tools to measure team-based care, as well as other aspects of health care reform, are important to academic family medicine for a number of reasons. First of all, educators in family medicine lead the development of training programs that promote team-based care, which includes teamwork as a critical component, especially in the development of primary care medical homes. Assisting our student and resident learners in understanding and improving the domains of effective practice teamwork is critical to our educational mission to prepare them to be leaders in their health care practice and community. Secondly, research in team-based care needs to be core to the mission of family medicine’s academic institutions emphasizing accurate measurement of team development and creating best methods for developing teams. Finally, the use of factor analysis and Rasch measurement methods in this study will hopefully encourage others in the academic family medicine research fields to consider these types of survey development and scoring methods.

The evaluation of the TDM has a number of strengths. The intention in the developmental phase of the instrument was to improve patient care through teamwork across a variety of health care settings, so the relevance to the teams and the care they provide was high. The data were collected with individual anonymity and, although it does prevent analyzing the data by team member profession or role, it ensures that response and group bias was minimized. We collected data from health care professionals performing a spectrum of roles with all individuals identifying themselves as part of a team. Use in health care settings, including inpatient and outpatient, occurring in moderate-sized communities in three states is beneficial as the tool is likely generalizable to similar communities and health care settings in the United States. Methodologically strong, the tool’s items were constructed using a two-phase approach, and large numbers of individuals and teams were surveyed. The MPlus factor analysis identified initial domains, and the use of the Rasch rating scale measurement model allowed us to apply a powerful method of developing a concise, accurate linear measurement by converting individual ordinal response data (Likert scale responses) into true interval data (like a “yardstick”) allowing statistical analysis across time and populations.

Some of the weaknesses to the current analysis include the lack of data to explore the relationship between perceived team attributes and the professional role of the team member and lack of information about the influence of the organizational context in which the teams performed. Future research is needed to conduct this step. What is known, however, is that all the teams were employed by the same health system, albeit in different geographic regions and different institutional settings. Further examination of data collected by professional role and clinical care setting and the association of observed team behaviors to the measured attributes would augment the analysis of this instrument. Typical limitations of the Rasch model include the need for a large number of observations or replications to estimate the parameters of the model, which we were able to obtain in this study.23 In addition, the Rasch model holds strong assumptions, which can be challenging to meet by the data collected. In fact, Rasch specifications are often not perfectly met but can be usefully met by careful data collection.21

In our experience in the clinical setting, information from the TDM is shared with team members to collectively discuss how to improve their teamwork. Survey data is presented to teams in three ways. First, an overall score of team development is calculated, and the team is shown where that score falls on the developmental scale as shown in Table 5. This provides the team with a “yardstick” measure of where they are currently in their stage of development and the potential for growth. It also assesses whether a team attribute, or domain, is “in place” (eg, the attribute is present, but the team is still working on it), or “firmly in place” (eg, the team attribute is such that external and/or internal stresses to the team are unlikely to impact that attribute). Secondly, the team is shown a graph that plots the frequency of individual team member scores on the 0–100 developmental scale. This allows team members to begin to understand that a range of perceptions of teamness exists within the clinic. In the early development of teams, the variation of team member perceptions can be quite broad. Often, the work of the team is to reduce this variability of perceptions (getting on the “same page”), thus allowing the group to develop together over time. Using the data in this way can help teams determine what strategies can improve their team functioning. Finally, individual items from the TDM, usually three to four, are presented to the team as potential items for the team to discuss and create strategies to improve. These items are specifically chosen from the item set based on evidence that some members agreed to the item and others disagreed, thus indicative of team attribute perception misalignment. Importantly, serial measures of the same team over time allows the team to observe changes in team functioning that demonstrates improvement in the four domains being measured (cohesiveness, communication, roles and goals, and team primacy). Areas of future research include assessment of the impact of highly developed teams on patient outcomes in which surprisingly little research has been performed. These studies could provide a better understanding of the relationship between the team development measure and clinical outcome measures, team productivity, effectiveness, and the impact of interventions that promote the development of team-based care. Further validation of the tool could be accomplished through correlation of direct observation to begin to develop observable behaviors that are closely associated with development of team attributes and vice versa. More inquiry would answer questions about potential differences in perceptions of “teamness” based on professional role, age, gender, race, urban versus rural, and hospital versus clinic teams. Some team attribute sub-domains noted in this analysis would benefit from testing of more items related to that domain, such as team primacy and leadership, and development of a shorter instrument would mitigate any item response burdern. In conclusion, we developed and tested a measure of team development that has very strong psychometric properties. This tool could be used as an evaluative tool that can be linked to clinical outcomes and as a quality improvement tool to improve team function.

Acknowledgments: PeaceHealth Oregon Region, Eugene, OR, and the Family Medicine Research Program at Oregon Health & Science University supported this research. The John A. Hartford Foundation Geriatrics Interdisciplinary Teams in Practice Initiative provided support for one of the authors (RS).

The authors would like to acknowledge the major contributions made to the development of the instrument by Bill Mahoney, PhD, and Carolyn Turkovich, as well as the clinicians and staff who contributed their time to complete the instrument. We also wish to thank LeNeva Spires, Publications Manager, OHSU Department of Family Medicine, for her editorial assistance.

Corresponding Author: Address correspondence to Dr Stock, Oregon Health & Science University, Department of Family Medicine, Mail Code: FM, 3181 SW Sam Jackson Park Road, Portland, OR 97239-3098. 503-494-6810.









  1. Future of Family Medicine Leadership Committee. The Future of Family Medicine: a collaborative project of the family medicine community. Ann Fam Med 2004;2:S3-S32.
  2. Carney PA, Dietrich AJ, Keller A, Landgraf JM, O’Connor GT. Tools, teamwork, and tenacity: an office system for cancer prevention. J Fam Pract 1992;35(4):388-94.
  3. Institute of Medicine. Crossing the quality chasm: a new health system for the 21st century. Washington, DC: National Academies Press, 2001.
  4. David AK. Preparing the Personal Physician for Practice (P4): residency training in family medicine for the future. J Am Board Fam Med 2007;20:332-41.
  7. Naylor M, Coburn K, Kurtzman E, et al. Inter-professional team-based primary care for chronically ill adults: state of the science 2010. Manuscript commissioned for ABIM Foundation and American Academy of Nursing March 24-25, 2010 meeting on advancing team-based care for the chronically ill.
  8. Salas E, Bowers CA, Cannon-Bowers JA. Military team research: 10 years of progress. Military Psychology 1995;7(2):55-75.
  9. Helmreich RL, Merritt AC, Wilhelm JA. The evolution of Crew Resource Management training in commercial aviation. Int J Aviat Psychol 1999;9(1):19-32.
  10. McCarthy A, Garavan TN. Team learning and metacognition: a neglected area of HRD research and practice. Advances in Developing Human Resources 2008;10(4):509-24.
  11. Boult C, Green A, Boult L, et al. Successful models of comprehensive health care for multi-morbid older persons: a review of effects on health and health care. Washington, DC: Institute of Medicine, National Academy of Sciences, 2008. Report commissioned for committee on Retooling for an Aging America: Building the Health Care Workforce.
  12. Baker DP, Salas E, King H, Battles J, Barach P. The role of teamwork in the professional education of physicians. J Qual Pat Safety 2005;31(4):185-202.
  13. Drinka TJK, Clark PG. Health care teamwork: interdisciplinary practice and teaching. Westport, CT: Auburn Press, 2000.
  14. Fried BJ, Topping S, Rundall TG. Groups and teams in health service organizations. In: Shortell S, Kaluzny AD. Health care management: organization and design. Albany, NY: Delmar Thomson Learning Publishing, 2000; Chapter 6:154-90.
  15. Rush Geriatric Interdisciplinary Team Training Project. Principles of successful team work and team competencies, Version 2.0. Rush University, June 1999.
  16. Valentine MA, Nembhard IM, Edmondson AC. Measuring teamwork in health care settings, April 12, 2012. Accessed May 14, 2012.
  17. Battles J, King HB. TeamSTEPPS Teamwork Perceptions Questionnaire (T-TPC) Manual. Washington, DC: American Institute for Research, 2010.
  18. Linacre JM. WINSTEPS® Rasch measurement computer program. 2009. Beaverton, OR.
  19. Massof RW. The measurement of vision disability. Optom Vis Sci 2002;79(8):516-52.
  23. Bond T, Fox C. Applying the Rasch model: fundamental measurement in the human sciences. Mahwah, NJ: Lawrence Erlbaum Associates, 2001.
  24. Muthen LK, Muthen BO. Mplus: the comprehensive modeling program for applied researchers guide. Los Angeles: Muthen & Muthen, 2001.
  25. Furr RM, Bacharach VR. Psychometrics: an introduction. Thousand Oaks, CA: Sage Publishers, 2008.
  26. Smith RM. Introduction to Rasch Measurement: theory, models and applications. Maple Grove, MN: JAM Press, 2004.
  27. Robert Graham Center for Policy Studies in Family Medicine and Primary Care. The patient-centered medical home: history, seven core features, evidence, and transformational change. November 2007.
  28. Linacre JM. Likert or Rasch? Rasch Measurement Transactions 1994;8(2):35.

From the Department of Family Medicine (Drs Stock and Carney) and the Department of Public Health and Preventive Medicine (Dr Carney), Oregon Health & Science University; and Bellingham, WA (Dr Mahoney).

Copyright 2017 by Society of Teachers of Family Medicine