Ⅰ. Introduction
1. Background
The term "health literacy" was introduced in the 1970s[1], and its importance in public health and healthcare has been increasing. It relates to an individual's ability to navigate the complexities of health in modern societies[2]. Health literacy involves understanding factors that affect individual, family, and community health problems and finding solutions. An individual with appropriate health literacy can take responsibility not only for their own health but also for their family's and community's[3]. Thus, evaluating health literacy levels is crucial before implementing health programs at individual and community levels. In the United States and other developed countries, health literacy measurements inform health policy[4]. The Korean government has recognized health literacy as a strategic component in the 5th National Health Plan 2030[5], prompting the development of measurement tools.
A 15-question Korean Functional Health Literacy Test (KFHLT) was developed, drawing from the Test of Functional Health Literacy in Adults (TOFHLA) and the U.S. Department of Education's study on Health Literacy of America's Adults. Its reliability was assessed through interviews with 103 senior citizens in Daegu, Kyungpook, and Busan Province[6]. Previously, the Korean Health Literacy Scale (KHLS) was developed to evaluate reliability and validity among the elderly[7]. Kang and Lee created the Korean Health Literacy Instrument (KHLI) with 18 items across three categories of health literacy. Its reliability and validity were tested on 315 adults aged 40-64 years[8]. Kim et al. developed the Korean Health Literacy Assessment Tool (KHLAT), a translation of the REALM, to assess the comprehension of 66 words for reliability evaluation[9]. Chun et al. [10] interviewed 725 individuals aged 60-79 years in Seoul, assessing reliability and validity using the Elderly Health and Functional Assessment, a tool based on a clinical screening tool for American White male outpatients[11].
Recent multidimensional studies on health literacy have utilized the HLS-EU-Q. Kim et al.[12] translated and employed 47 items from HLS-EU-Q47. Another study by Kim et al.[13] translated HLS-EU-Q47, conducted confirmatory factor analysis, and included 39 items with adequate model fit in their measurement tool. Several short versions of HLS-EU-Q have been developed, including the HLS-EU-Q16, a 16-item version used in a recent study on the elderly[14]. HLS-EU-Q16 covers all four domains of accessing, understanding, appraising, and applying health literacy and includes items on healthcare, disease prevention, and health promotion[14]. However, with limited domains for health literacy assessment and reliability and validity evaluations, there is a growing need for reliable and valid tools universally applicable to the Korean population. It is essential for future proper application of health literacy measurement tools in Korea. Therefore, this study aimed to review health literacy measurement tools and assess their reliability and validity through systematic literature review.
2. Objectives
The purpose of this study was to conduct a systematic review of health literacy measurement tools, aiming to provide evidence-based data for the effective development of tools that assess the health literacy levels of the general population.
Ⅱ. Methods
1. Study design
This study presents a systematic review that utilized the COnsensus-based Standards for the selection of health status Measurement INstruments (COSMIN) checklist of previous studies[15][16][17]. The COSMIN checklist, a consensus-driven tool for systematic reviews, is specifically designed to appraise the methodological quality of studies on measurement properties[18]. In this study, we adhere to the PRISMA guidelines, detailing our review results according to their recommended protocol[19].
2. Information sources
All literature was searched through the “Health Literacy Tool Shed” (https://healthliteracy.bu.edu), where measurement tools are registered.
3. Search strategy
This study was conducted using measurement tool-related literature registered in the “Health Literacy Tool Shed” (https://healthliteracy.bu.edu). Health Literacy Tool Shed is a comprehensive, searchable database housing an array of health literacy tools that have been rigorously validated for trustworthiness and accuracy. These tools, as delineated in various peer-reviewed articles, with the report validation procedures that include at least 100 participants. Over the past decade, the collection of health literacy evaluation tools has expanded, with over 100 tools currently accessible on this site. The Tool Shed is regularly updated quarterly, with providing the most prevalent validation data from existing research. As a result of searching the database from September 1 to October 10, 2021, a total of 216 measurement tools published from 1961 to 2021 were identified. As a search filter, “general” or “general health promotion” or “health promotion” were used in the tool range to select the relevant measurement tools related to this study. Considering representativeness, the validity study samples were searched as “over 300 people” and “over 18 years of age.” The languages of measurement tools were limited to English and Korean, and finally 38 tools were selected.
4. Selection criteria
We included studies that used a study design which described and evaluated all measurement properties of health measurement tools, including reliability, validity, responsiveness, and interpretability. We focused on health measurement tools for the general population, and the 38 selected studies were independently reviewed by 2 researchers. Finally, a total of 10 studies were included in the analysis, excluding 1 study without an original article, 16 studies in which tools were applied to patients, 6 studies using existing tools, and 5 studies not related to health literacy. All 10 studies were published in English <Figure 1>.
5. Data extraction
Basic information about the characteristics of the included studies, such as population, study type, location, year of publication, and content of each tool, and documented information about 10 measurement properties was extracted based on the COSMIN checklist manual[20]. The 10 measurement properties included 3 properties of reliability (internal consistency, test-retest reliability, and measurement errors) and 5 properties of validity (Content validity, Structural validity, Hypothesis testing, Cross-cultural validity, Criterion validity), Responsiveness and interpretability were also included in the measurement properties.
6. Measurement properties
3 properties of reliability (internal consistency, test-retest reliability, and measurement errors) were included in the measurement properties. Reliability is the degree to which a measurement is free of measurement error. Internal consistency is the degree of correlation between items. Test-retest reliability is the degree to which an assessment is consistent between a test and a retest performed on the same subject. Measurement error is not attributed to true changes in the construct to be measured.
5 properties of validity (Content validity, Structural validity, Hypothesis testing, Cross-cultural validity, Criterion validity) were included in the measurement properties. Validity is the extent to which a tool measures the construct it is intended to measure. Content validity is the extent to which the content of a tool adequately reflects the construct being measured. Structural validity is the degree to which the scores of an instrument are an adequate reflection of the dimensionality of the construct to be measured. Hypothesis testing is the examination of the validity of a conjecture or hypothesis. Cross-cultural validity refers to the degree to which the performance of items on a translated or culturally adapted instrument are an adequate reflection of the performance of items of the original version of the instrument. Criterion validity refers to the degree to which the scores of an instrument are an adequate reflection of a 'gold standard'.
Responsiveness and interpretability were also included in the measurement properties. Responsiveness refers to the ability of an instrument to detect change over time in the construct to be measured. Interpretability is the degree to which one can assign qualitative meaning (that is, clinical or commonly understood connotations) to an instrument's quantitative scores or change in scores.
The author, publication year, participant group and sample size, study design, evaluation purpose of the measurement tools, tool measurement method, subdomain, number of items, and scoring method were checked in the selected studies. Subsequently, the measurement properties of the tools used in each study were evaluated. Specific standards for evaluating the measurement properties were assessed as sufficient (+), insufficient (-), and indeterminate (?) according to the “2018 COSMIN risk of bias checklist”[16][17]. Finally, methodological quality was rated on a 4-point scale for reliability and validity used in each study[20]. For each property, 7 to 18 standards were assessed and each item was rated as “excellent,” “good,” “fair,” and “poor.” The evaluation items in the design requirements of all measurement properties also included the percentage of missing items for each item and description of how missing items were handled as common evaluation items.
All design requirements of the measurement properties contained whether there was a percentage of missing items for each item and whether there was a description of how the missing items were handled as common evaluation items. All design requirements for the measurement properties, except for content validity and interpretability, assessed whether a sample size suitable for analysis was included. After rating the methodological quality for all measurement properties, the overall quality of each study was rated as “excellent,” “good,” “fair,” or “poor” in accordance with the “worse score counts” principle. Two researchers independently evaluated the measurement properties to ensure consistency. The final evaluation result was determined by comparing the findings of the 3 research meetings.
Ⅲ. Results
1. Study characteristics
Two of the researchers screened 216 papers and found that 10 studies corresponded to the research requirements <Figure 1>. The sample size varied from 315 to 10,024 people, and the items of the measurement tool varied from 5 to 44. A detailed description of the measurement tools is provided in <Table 1>. Of the 10 studies, 2 were measurement tools for Koreans. The KHLI comprised 18 items in 3 domains, and reliability and validity were assessed in 315 people aged 40–64 years. The KHLS comprised 24 items in 3 domains, and reliability and validity were tested on 411 people aged 65–74 years. Among the 12 assessment domains of health literacy of measurement tools (application/function, appraisal, communication (listener, speaker), comprehension, conceptual knowledge, information seeking (document, interactive media navigation, media literacy, numeracy, prose (comprehension, pronunciation)), there was 1 study with 10 domains, 1 with 9 domains, 1 with 8 domains, 1 with 6 domains, 2 with 4 domains, 3 with 3 domains, and 1 with 1 domain. In terms of domains, “application/function” and “information seeking (document)” were included in 8 studies, “media literacy” was included in 1 study, and there was no measurement tool that included “prose (pronunciation).” We ran reliability and validity tests in all 10 studies <Table 2> and <Table 3>; 6 studies applied Classical Test Theory (CTT) and 4 studies evaluated measurement tools by applying Item Response Theory (IRT) [7][8].
<Table 1>
Author (year) | Instrument | Setting | Characteristics | Psychometrics | Assessment | ||||
|
|||||||||
Domains assessed | Validation sample population age | Modes of administration in validation study | Number of items | Sample in validation study | Language of validated version | ||||
|
|||||||||
Pleasant A. et al. (2018) | Calgary Charter on Health Literacy Scale | United States of America | Prose: Comprehension, Information seeking: Interactive media navigation, Information seeking: Document, Conceptual Knowledge, Comprehension, Communication: Speaker, Communication: Listener, Application/function | Older Adults: 65+ years, Adults: 18 to 64 years | Paper and pencil, Face-to-face | 5 | 633 | English | Self-reported |
Kayser L. et al. (2018) | eHealth Literacy Questionnaire (eHLQ) | Denmark, Taiwan, Germany, France, Norway, Australia | Information seeking: Interactive media navigation, Information seeking: Document, Conceptual Knowledge, Comprehension, Appraisal, Application/function | 16-74 years | Phone-based, Paper and pencil, Mailed survey, Face-to-face, Computer-based | 35 | 475 | English | Self-reported |
Chung, S.Y. et al. (2015) | eHealth Literacy Scale - Older Adults - eHEALS | United States of America | Information seeking: Interactive media navigation, Application/function | Older adults: 62.8 8.5years | Computer-based | 8 | 866 | English | Self-reported |
Rouquette A. et al. (2018) | European Health Literacy Survery - HLS-EU-Q6 | France | Comprehension, Communication: Speaker, Communication: Listener | Adults: 18 to 64 years 53 18years | Paper and pencil, Face-to-face | 6 | 317 | English | Self-reported |
Abel, T. et al. (2014) | Health Literacy Assessment Tool - HLAT-8 | Switzerland, China | Information seeking: Interactive media navigation, Information seeking: Document, Application/function | 18-25 years | Paper and pencil, Mailed survey | 8 | 8,349 | English | Self-reported |
Tavousi, M. et al. (2020) | Health Literacy Instrument for Adults -HELIA | Iran | Information seeking: Document, Comprehension, Appraisal, Application/function | 18-65 years | Paper and pencil | 33 | 323 | English | Self-reported |
Osborne, R.H. et al. | Health Literacy Questionnaire | Mali, Slovakia, | Prose: Comprehension, Numeracy, Information | Older Adults: 65+ years, Adults: 18 | Phone-based, Paper and pencil, | 44 | 1,039 | English | Self-reported |
Author (year) | Instrument | Setting | Characteristics | Psychometrics | |||||
|
|||||||||
Domains assessed | Validation sample population age | Modes of administration in validation study | Number of items | Sample in validation study | Language of validated version | ||||
|
|||||||||
(2013) | (HLQ) | South Korea, Egypt, Ghana, Reunion, Portugal, Denmark, The Netherlands, Norway, , Germany, Nepal, Brunei Darussalam, Brunei Darussalam | seeking: Interacti1e media na1igation, Information seeking: Document, Conceptual Knowledge, Comprehension, Communication: Speaker, Appraisal, Application/function | to 64 years | Mailed sur1ey, Face-to-face, Computer-based | ||||
Kang, S.J. et al. (2014) | Korean Health Literacy Instrument - KHLI | South Korea | Prose: Comprehension, Numeracy, Information seeking: Document | 40-64 years old | Face-to-face | 18 | 315 | Korean | |
Lee, T.W. et al. (2009) | Korean Health Literacy Scale - KHLS | South Korea | Prose: Comprehension, Numeracy, Information seeking: Document, Application/function | Older Adults: 65+ years | Paper and pencil | 24 | 411 | Korean | |
Duong, T.V. et al. (2019) | Short Form Health Literacy Questionnaire for Asian populations - HLS-SF12 | Malaysia, Kazakhstan, Taiwan, Myanmar, Vietnam, Indonesia | Prose: Comprehension, Media Literacy, Information seeking: Interactive media navigation, Information seeking: Document, Conceptual Knowledge, Comprehension, Communication: Speaker, Communication: Listener, Appraisal, Application/function | 15 years and older | Phone-based, Paper and pencil, Mailed survey, Face-to-face, Computer-based | 12 | 10,024 | English |
<Table 2>
Author (year) | Instrument | Application/ function | Appraisal | Communication | Comprehension | Conceptual Knowledge | Information seeking | Media Literacy | Numeracy | Prose | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|||||||||||
Listener | Spekker | Document | Interactive media navigation | Comprehension | Pronunciation | ||||||||
|
|||||||||||||
Pleasant A. et al. (2018) | Calgary Charter on Health Literacy Scale | Y | Y | Y | Y | Y | Y | Y | Y | ||||
Kayser L. et al. (2018) | eHealth Literacy Questionnaire (eHLQ) | Y | Y | Y | Y | Y | Y | ||||||
Chung, S.Y. et al. (2015) | eHealth Literacy Scale - Older Adults - eHEALS | Y | |||||||||||
Rouquette A. et al. (2018) | European Health Literacy Survery - HLS-EU-Q6 | Y | Y | Y | |||||||||
Abel, T. et al. (2014) | Health Literacy Assessment Tool - HLAT-8 | Y | Y | Y | |||||||||
Tavousi, M. et al. (2020) | Health Literacy Instrument for Adults - HELIA | Y | Y | Y | |||||||||
Osborne, R.H. et al. (2013) | Health Literacy Questionnaire (HLQ) | Y | Y | Y | Y | ||||||||
Kang, S.J. et al. (2014) | Korean Health Literacy Instrument - KHLI | Y | Y | Y | Y | ||||||||
Lee, T.W. et al. (2009) | Korean Health Literacy Scale - KHLS | Y | |||||||||||
Duong, T.V. et al. (2019) | Short Form Health Literacy Questionnaire for Asian populations - HLS-SF12 | Y | Y | Y | Y | Y | Y | Y | Y |
* Y: Assessed
<Table 3>
Author (year) | Instrument | Reliability | Validity | Responsiveness | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|
|
|||||||||||
Internal consistency | Reliability | Measurement error | Content validity | Structural validity | Hypothesis testing | Cross-cultural validity | Criterion validity | ||||
|
|||||||||||
Pleasant A. et al. (2018) | Calgary Charter on Health Literacy Scale | + | - | ? | ? | + | ? | ? | ? | ? | |
Kayser L. et al. (2018) | eHealth Literacy Questionnaire (eHLQ) | + | ? | ? | + | + | ? | + | ? | ? | |
Chung, S.Y. et al. (2015) | eHealth Literacy Scale - Older Adults - eHEALS | + | ? | ? | + | + | - | ? | ? | ? | |
Rouquette A. et al. (2018) | European Health Literacy Survery - HLS-EU-Q6 | + | ? | ? | + | + | + | ? | + | ? | |
Abel, T. et al. (2014) | Health Literacy Assessment Tool - HLAT-8 | - | ? | ? | ? | + | + | ? | ? | ? | |
Tavousi, M. et al. (2020) | Health Literacy Instrument for Adults - HELIA | + | + | ? | + | + | + | ? | ? | ? | |
Osborne, R.H. et al. (2013) | Health Literacy Questionnaire (HLQ) | + | ? | ? | + | - | + | ? | ? | ? | |
Kang, S.J. et al. (2014) | Korean Health Literacy Instrument - KHLI | + | + | ? | + | + | + | ? | ? | ? | |
Lee, T.W. et al. (2009) | Korean Health Literacy Scale - KHLS | + | ? | ? | + | + | + | ? | ? | ? | |
Duong, T.V. et al. (2019) | Short Form Health Literacy Questionnaire for Asian populations - HLS-SF12 | + | ? | ? | ? | + | + | + | + | ? |
* +: sufficient, -: insufficient, ?: indeterminate
2. Internal consistency
The overall quality of the internal consistency evaluation was good. All 10 studies (100.0%) showed internal consistency. Although 1 of them (10.0%) presented internal consistency, Cronbach's alpha value was 0.65, which fell short of the standard 0.7 <Table 3>. Sample size and statistics for the evaluation of internal consistency were rated as “Excellent.” Regarding the overall quality, 1 study (10.0%) was rated as “Excellent,” and 9 studies (90.0%) were rated as “Good.” This is because the studies evaluated as “Good” did not present the percentage of missing items given or lacked a description of how missing items were handled <Table 4>.
<Table 4>
Excellent | Good | Fair | Poor | |
---|---|---|---|---|
|
||||
Internal consistency | 1(10.0) | 9(90.0) | 0(0.0) | 0(0.0) |
Reliability | 0(0.0) | 3(30.0) | 0(0.0) | 7(70.0) |
Measurement error | 0(0.0) | 0(0.0) | 0(0.0) | 10(100.0) |
Content validity | 7(70.0) | 0(0.0) | 2(20.0) | 1(10.0) |
Structural validity | 0(0.0) | 10(100.0) | 0(0.0) | 0(0.0) |
Hypothesis testing | 0(0.0) | 2(20.0) | 0(0.0) | 8(80.0) |
Cross-cultural validity | 0(0.0) | 1(10.0) | 0(0.0) | 9(90.0) |
Criterion validity | 1(10.0) | 1(10.0) | 0(0.0) | 8(80.0) |
Responsiveness | 0(0.0) | 0(0.0) | 0(0.0) | 0(0.0) |
3. Reliability
A test–retest was conducted in 4 studies. Among them, 1 study presented statistics (ICC/weighted Kappa <=0.7 OR Pearson's r <=0.8) that were below the standard, so it was indicated as “-,” and 1 study did not present reasonable statistics <Table 3>. From the results of the overall quality evaluation for reliability, there were no studies rated as “Very good,” 3 studies were rated as “Good,” and 1 study was rated as “Poor” <Table 4>. The reason why it was rated as “Poor” is that ICC (Intraclass correlation coefficient) was not provided in the study, even though its measurement tools were evaluated with continuous grades. In all 4 studies conducting test-retest, the questionnaire administration methods were independent when collecting data (“Were the administrations independent?”), and the interval between the 2 tests was stated (“Was the time interval stated?”).
4. Measurement error
Measurement error was not observed in any of the 10 studies <Table 3>, and the overall quality assessment of measurement error was evaluated as “Poor” in all studies <Table 4>. In other words, none of the studies stated the standard error of measurement (SEM) used in the Classical Test Theory or limits of agreement (LoA).
5. Content validity
Content validity was evaluated for all 10 studies (100.0%). Seven were evaluated as “+,” since they met all the criteria such as “whether a clear description is provided of the measurement aim,” the “target population,” the “concepts that are being measured,” the “item selection AND target population,” and “whether investigators OR experts were involved in item selection.” The rest 3 were evaluated as “?” <Table 3> for reasons such as, “A clear description of the above-mentioned aspects is lacking OR only target population involved OR doubtful design or method.” In the overall quality assessment, the item “Was there an assessment of whether all items together comprehensively reflect the construct to be measured?” was assessed in consideration of 3 aspects (content coverage of items, description of domains, and theoretical foundation), and it was found that 7 studies (70%) were rated as “Excellent,” 2 (20.0%) as “Fair,” and 1 (10.0%) as “Poor” <Table 4>.
6. Structural validity
Structural validity was assessed in all 10 studies; 6 applied only CTT and 2 applied only IRT, whereas the other 2 applied both CTT and IRT to check structural validity. In 8 studies in which the CTT was applied, exploratory or confirmatory factor analysis was performed to verify the validity of the measurement tools. Consequently, 8 studies satisfying the criterion that “Factors should explain at least 50% of the variance” were evaluated as “+” (sufficient). IRT tests were performed to determine the (uni-) dimensionality of the items in only 2 of the 4 studies in which IRT was applied. One study reported RMSEA less than 0.06, and was rated as “+” (sufficient), and the other reported CFI = 0.936, TLI = 0.930, and RMSEA= 0.076 <Table 3>, and was rated as “-.”
As a result of the overall quality assessment, all 10 studies were rated as “Good.” All studies had sufficient sample sizes required for structural validity, and either exploratory or confirmatory factor analysis was properly utilized. However, none of the studies reported the percentage of missing items or how they were handled. Therefore, they were evaluated as “Good” <Table 4>.
7. Hypotheses-testing
Hypotheses testing was performed in 2 studies, and the results were partially or generally consistent with the hypothesis <Table 3>. After the quality assessment, none of the studies were rated as “Excellent,” but 2 (20.0%) were rated as “Good,” and 8 were rated as “Poor” <Table 4>. Five studies provided evidence of the expected correlation for hypothesis testing or the absolute and relative magnitudes of the mean comparison. Descriptions of comparator instruments were provided only in 2 studies to assess convergent validity.
8. Cross-cultural validity
Two studies (20.0%) were evaluated as “+” (sufficient). The result of the DIF analysis by age and gender in 1 study found no difference in scores between groups; therefore, it was evaluated as “+” (sufficient). In another study, there were no differences in the 6 countries and accordingly, it was rated as “+” (sufficient) <Table 3>. The overall quality evaluation found that 1 study (10.0%) was rated as “Good” and the other as “Poor” <Table 4>. The main reason why it was rated as “Poor” was that it neither processed forward and backward translation nor did it execute pre-test to verify cultural relevance of the translation and ease of comprehension.
9. Criterion validity
Criterion validity was verified in 2 studies (20.0%). Those 2 studies scored over 0.7 in the gold standard and correlation <Table 3>, and were all rated as “+” (sufficient). One study that assessed criterion validity was evaluated as “Excellent” and the other was evaluated as “Good” in the item whether the percentage of missing items was stated <Table 4>. It was evaluated as “Good” because neither the percentage of missing items nor descriptions of how missing items were handled were provided.
10. Responsiveness
Responsiveness refers to the degree to which score changes in measurement properties can be detected. None of the studies assessed responsiveness. In an experimental or intervention study, it is important to evaluate responsiveness to properly measure changes in treatment outcomes. . However, the participants of this study did not use the measurement tools used in clinical practice, but the ones used to measure the level of health literacy in the public health field. Therefore, responsiveness of the measurement tools was not assessed.
11. Interpretability
Seven studies (70%) reported the percentage of missing items for each item in the measurement tools. One study (10%) reported floor and ceiling effects, and 1 (10.0%) reported the overall score and subscale score distributions for the measurement tools. Four studies (40.0%) reported the score and score change for each subgroup of the measurement tools. None of the studies reported a minimal important change (MIC) or minimal important difference (MID).
Ⅳ. Discussion
Ten studies containing measurement tools assessing population health literacy were included in this systematic review. The COSMIN checklist was used to assess the methodological quality of individual studies on measurement properties, and the overall level of evidence of the measurement properties was evaluated. In previous studies, several reviews on oral and dental health literacy[21], mental health[15], and sleep quality[22] have been performed using the COSMIN checklist. This is the first review to systematically appraise the quality of the measurement properties of health literacy tools in the general population.
Four of the 10 studies selected in this review applied IRT and it refers to a theory of testing, based on the relationship between an individual’s performance on a test item and the test-taker’s level of performance[23]. Recently, IRT has been actively applied in the public health field[7][8][24]. Therefore, to measure health literacy in populations, it is necessary to apply the item response theory, which considers the relationship between the demographic characteristics of the participant and the item response of measurement tools. Most studies did not report on the frequency and percentage of missing items and how the missing items were handled. Since this is likely to affect the overall quality evaluation, it should be considered an important aspect when developing health literacy measurement tools in the future. After evaluating the reliability of the studies selected for this review, internal consistency was found to be the best measurement property. Internal consistency was verified in all studies, and the overall quality was good. However, test-retest was conducted in only 4 studies, and 1 study did not present an intraclass correlation coefficient (ICC). No measurement errors were observed in any of the studies. Therefore, it is necessary to calculate the intraclass correlation coefficient and measurement errors through test-retest to verify internal consistency when developing health measurement tools in the future.
Validity includes content validity, the degree to which the content of measurement tools reflects the concept to be measured; structural validity, the degree to which the score of measurement tools reflects the dimensionality of the concept to be measured; hypotheses testing, the validity of whether the same results as the predictions derived based on the results of previous studies are obtained; cross-cultural validity, the degree to which translated measurement tool items reflect the meaning of the original measurement tool items; and criterion validity, the degree to which the score of measurement tools reflects the gold standard of measurement tools. As a result of the validity evaluation of the health literacy measurement tools, content validity and structural validity were the most excellent measurement properties. Content validity was evaluated in all the studies, and the overall quality was good. Likewise, structural validity was evaluated in all studies through exploratory and confirmatory factor analyses, and the overall quality was good. Hypotheses testing, cross-cultural validity, and criterion validity were assessed in only 2 studies, which should be considered important when developing health literacy measurement tools in the future. As for hypothesis testing, no studies were rated as “very good.” Although hypothesis testing was stated in the 2 studies, there was little description of whether the expected correlation or absolute and relative magnitude of the mean comparison was shown in hypothesis testing. In the research planning stage, hypothesis setting for hypothesis testing should be clearly identified and statistics, size, and significance level regarding the interpretation of hypothesis testing results should be presented so that the quality of hypothesis testing can be improved.
No studies were rated as “Excellent”’ on cross-cultural validity evaluation. When developing health literacy measurement tools in the future, it is desirable to conduct a pre-test in a group similar to the target group of the main study through a mutually independent translation-reverse translation process by bilingual experts. Health literacy can vary depending on demographic characteristics such as age, gender, and education level. Therefore, sampling in consideration of demographic characteristics is required when making a feasibility study plan, and the DIF for each demographic characteristic should be analyzed. Next, the criterion validity, the degree to which the scores of measurement tools reflect the gold standard, was verified in 2 studies, and 1 study was rated as “Excellent.” According to the COSMIN checklist, the original long version can be considered the gold standard[17]. In this review, there was a case in which correlation was presented using health status or similar measurement tools rather than the long version. Therefore, a measurement tool developed for the first time may not have a gold standard, which can be a limitation of the COSMIN checklist. Only 1 study (10.0%) reported the floor and ceiling effects of the measurement tool; however, for ease of interpretation in the future, it is necessary to report the floor and ceiling effects, minimal important change (MIC), or minimal important difference (MID), as well as the missing items. It is necessary to determine whether the floor and ceiling effects are less than the reasonable standard of 30%[25]. To apply an appropriate measurement tool for reliability, validity, reactivity, and ease of interpretation in the general population, we suggest that researchers conduct a quality evaluation of measurement properties on the basis of the COSMIN checklist when developing measurement tools in the future. A limitation of the present review is that only health literacy measurement tools registered in a single database were included, and all measurement tools were written in English.
This study conducted a systematic review of health literacy measurement tools using the COSMIN checklist. Of the 10 health literacy measurement tools, only 1 used 10 domains, but they were measured in a limited way. Among the studies evaluated as “excellent” for overall quality, internal consistency was verified in 1 study, content validity was verified in 7 studies, and criterion validity was verified in 1 study. They were regarded as lacking in reliability and validity.
Ⅴ. Conclusion
Health literacy measurement tools, universally applicable to the Korean population, need to be developed. It is crucial to continuously verify their reliability and validity. Additionally, developing or adapting these tools should involve reference to evaluation frameworks such as the COSMIN checklist.