ISSN : 2093-5986(Print)
ISSN : 2288-0666(Online)
The Korean Society of Health Service Management
Vol.17 No.4 pp.31-47
https://doi.org/10.12811/kshsm.2023.17.4.031

Health Literacy Tool Shed 데이터베이스 활용한 헬스 리터러시 도구의 체계적 고찰

김수정1, 김민경2, Jill M. Norris3, 조경원4
1동서대학교 보건행정학과
2부산보건대학교 보건의료행정과
3콜로라도대학교 보건대학원 역학전공
4고신대학교 의료경영학부

A Review of Health Literacy Measurement Tools from the Health Literacy Tool Shed Database

Soojeong Kim1, Min Kyung Kim2, Jill M. Norris3, Kyoung Won Cho4
1Department of Health Administration, College of Bio-Health Convergence, Dongseo University
2Department of Health and Medical Administration, Busan Health University
3Department of Epidemiology, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Colorado, U.S.A
4Department of Healthcare Administration, Kosin University

Abstract

Objectives:

This study conducted a systematic review of generic health literacy measurement tools, aiming to support the development of effective tools for assessing health literacy levels among Koreans with evidence-based data.


Methods:

We analyzed data on measurement tools registered in the "Health Literacy Tool Shed" (https://healthliteracy.bu.edu) from September 1 to October 10, 2021. The database included 216 tools developed between 1961 and 2021, from which 10 were selected for in-depth review. We assessed the methodological quality of these tools using the COSMIN checklist.


Results:

Among the 10 studies reviewed, 4 implemented item response theory. All studies were evaluated for internal consistency in reliability, with the overall quality being deemed excellent. Additionally, both content and construct validity were rated as excellent in terms of quality.


Conclusions:

There is a need for the development of universally applicable health literacy measurement tools for Koreans. The reliability and validity of these tools should be subject to regular verification. Koreans need to be developed, and their reliability and validity should be regularly verified.



    Ⅰ. Introduction

    1. Background

    The term "health literacy" was introduced in the 1970s[1], and its importance in public health and healthcare has been increasing. It relates to an individual's ability to navigate the complexities of health in modern societies[2]. Health literacy involves understanding factors that affect individual, family, and community health problems and finding solutions. An individual with appropriate health literacy can take responsibility not only for their own health but also for their family's and community's[3]. Thus, evaluating health literacy levels is crucial before implementing health programs at individual and community levels. In the United States and other developed countries, health literacy measurements inform health policy[4]. The Korean government has recognized health literacy as a strategic component in the 5th National Health Plan 2030[5], prompting the development of measurement tools.

    A 15-question Korean Functional Health Literacy Test (KFHLT) was developed, drawing from the Test of Functional Health Literacy in Adults (TOFHLA) and the U.S. Department of Education's study on Health Literacy of America's Adults. Its reliability was assessed through interviews with 103 senior citizens in Daegu, Kyungpook, and Busan Province[6]. Previously, the Korean Health Literacy Scale (KHLS) was developed to evaluate reliability and validity among the elderly[7]. Kang and Lee created the Korean Health Literacy Instrument (KHLI) with 18 items across three categories of health literacy. Its reliability and validity were tested on 315 adults aged 40-64 years[8]. Kim et al. developed the Korean Health Literacy Assessment Tool (KHLAT), a translation of the REALM, to assess the comprehension of 66 words for reliability evaluation[9]. Chun et al. [10] interviewed 725 individuals aged 60-79 years in Seoul, assessing reliability and validity using the Elderly Health and Functional Assessment, a tool based on a clinical screening tool for American White male outpatients[11].

    Recent multidimensional studies on health literacy have utilized the HLS-EU-Q. Kim et al.[12] translated and employed 47 items from HLS-EU-Q47. Another study by Kim et al.[13] translated HLS-EU-Q47, conducted confirmatory factor analysis, and included 39 items with adequate model fit in their measurement tool. Several short versions of HLS-EU-Q have been developed, including the HLS-EU-Q16, a 16-item version used in a recent study on the elderly[14]. HLS-EU-Q16 covers all four domains of accessing, understanding, appraising, and applying health literacy and includes items on healthcare, disease prevention, and health promotion[14]. However, with limited domains for health literacy assessment and reliability and validity evaluations, there is a growing need for reliable and valid tools universally applicable to the Korean population. It is essential for future proper application of health literacy measurement tools in Korea. Therefore, this study aimed to review health literacy measurement tools and assess their reliability and validity through systematic literature review.

    2. Objectives

    The purpose of this study was to conduct a systematic review of health literacy measurement tools, aiming to provide evidence-based data for the effective development of tools that assess the health literacy levels of the general population.

    Ⅱ. Methods

    1. Study design

    This study presents a systematic review that utilized the COnsensus-based Standards for the selection of health status Measurement INstruments (COSMIN) checklist of previous studies[15][16][17]. The COSMIN checklist, a consensus-driven tool for systematic reviews, is specifically designed to appraise the methodological quality of studies on measurement properties[18]. In this study, we adhere to the PRISMA guidelines, detailing our review results according to their recommended protocol[19].

    2. Information sources

    All literature was searched through the “Health Literacy Tool Shed” (https://healthliteracy.bu.edu), where measurement tools are registered.

    3. Search strategy

    This study was conducted using measurement tool-related literature registered in the “Health Literacy Tool Shed” (https://healthliteracy.bu.edu). Health Literacy Tool Shed is a comprehensive, searchable database housing an array of health literacy tools that have been rigorously validated for trustworthiness and accuracy. These tools, as delineated in various peer-reviewed articles, with the report validation procedures that include at least 100 participants. Over the past decade, the collection of health literacy evaluation tools has expanded, with over 100 tools currently accessible on this site. The Tool Shed is regularly updated quarterly, with providing the most prevalent validation data from existing research. As a result of searching the database from September 1 to October 10, 2021, a total of 216 measurement tools published from 1961 to 2021 were identified. As a search filter, “general” or “general health promotion” or “health promotion” were used in the tool range to select the relevant measurement tools related to this study. Considering representativeness, the validity study samples were searched as “over 300 people” and “over 18 years of age.” The languages of measurement tools were limited to English and Korean, and finally 38 tools were selected.

    4. Selection criteria

    We included studies that used a study design which described and evaluated all measurement properties of health measurement tools, including reliability, validity, responsiveness, and interpretability. We focused on health measurement tools for the general population, and the 38 selected studies were independently reviewed by 2 researchers. Finally, a total of 10 studies were included in the analysis, excluding 1 study without an original article, 16 studies in which tools were applied to patients, 6 studies using existing tools, and 5 studies not related to health literacy. All 10 studies were published in English <Figure 1>.

    <Figure 1>

    Flow chart of study selection

    KSHSM-17-4-31_F1.gif

    5. Data extraction

    Basic information about the characteristics of the included studies, such as population, study type, location, year of publication, and content of each tool, and documented information about 10 measurement properties was extracted based on the COSMIN checklist manual[20]. The 10 measurement properties included 3 properties of reliability (internal consistency, test-retest reliability, and measurement errors) and 5 properties of validity (Content validity, Structural validity, Hypothesis testing, Cross-cultural validity, Criterion validity), Responsiveness and interpretability were also included in the measurement properties.

    6. Measurement properties

    3 properties of reliability (internal consistency, test-retest reliability, and measurement errors) were included in the measurement properties. Reliability is the degree to which a measurement is free of measurement error. Internal consistency is the degree of correlation between items. Test-retest reliability is the degree to which an assessment is consistent between a test and a retest performed on the same subject. Measurement error is not attributed to true changes in the construct to be measured.

    5 properties of validity (Content validity, Structural validity, Hypothesis testing, Cross-cultural validity, Criterion validity) were included in the measurement properties. Validity is the extent to which a tool measures the construct it is intended to measure. Content validity is the extent to which the content of a tool adequately reflects the construct being measured. Structural validity is the degree to which the scores of an instrument are an adequate reflection of the dimensionality of the construct to be measured. Hypothesis testing is the examination of the validity of a conjecture or hypothesis. Cross-cultural validity refers to the degree to which the performance of items on a translated or culturally adapted instrument are an adequate reflection of the performance of items of the original version of the instrument. Criterion validity refers to the degree to which the scores of an instrument are an adequate reflection of a 'gold standard'.

    Responsiveness and interpretability were also included in the measurement properties. Responsiveness refers to the ability of an instrument to detect change over time in the construct to be measured. Interpretability is the degree to which one can assign qualitative meaning (that is, clinical or commonly understood connotations) to an instrument's quantitative scores or change in scores.

    The author, publication year, participant group and sample size, study design, evaluation purpose of the measurement tools, tool measurement method, subdomain, number of items, and scoring method were checked in the selected studies. Subsequently, the measurement properties of the tools used in each study were evaluated. Specific standards for evaluating the measurement properties were assessed as sufficient (+), insufficient (-), and indeterminate (?) according to the “2018 COSMIN risk of bias checklist”[16][17]. Finally, methodological quality was rated on a 4-point scale for reliability and validity used in each study[20]. For each property, 7 to 18 standards were assessed and each item was rated as “excellent,” “good,” “fair,” and “poor.” The evaluation items in the design requirements of all measurement properties also included the percentage of missing items for each item and description of how missing items were handled as common evaluation items.

    All design requirements of the measurement properties contained whether there was a percentage of missing items for each item and whether there was a description of how the missing items were handled as common evaluation items. All design requirements for the measurement properties, except for content validity and interpretability, assessed whether a sample size suitable for analysis was included. After rating the methodological quality for all measurement properties, the overall quality of each study was rated as “excellent,” “good,” “fair,” or “poor” in accordance with the “worse score counts” principle. Two researchers independently evaluated the measurement properties to ensure consistency. The final evaluation result was determined by comparing the findings of the 3 research meetings.

    Ⅲ. Results

    1. Study characteristics

    Two of the researchers screened 216 papers and found that 10 studies corresponded to the research requirements <Figure 1>. The sample size varied from 315 to 10,024 people, and the items of the measurement tool varied from 5 to 44. A detailed description of the measurement tools is provided in <Table 1>. Of the 10 studies, 2 were measurement tools for Koreans. The KHLI comprised 18 items in 3 domains, and reliability and validity were assessed in 315 people aged 40–64 years. The KHLS comprised 24 items in 3 domains, and reliability and validity were tested on 411 people aged 65–74 years. Among the 12 assessment domains of health literacy of measurement tools (application/function, appraisal, communication (listener, speaker), comprehension, conceptual knowledge, information seeking (document, interactive media navigation, media literacy, numeracy, prose (comprehension, pronunciation)), there was 1 study with 10 domains, 1 with 9 domains, 1 with 8 domains, 1 with 6 domains, 2 with 4 domains, 3 with 3 domains, and 1 with 1 domain. In terms of domains, “application/function” and “information seeking (document)” were included in 8 studies, “media literacy” was included in 1 study, and there was no measurement tool that included “prose (pronunciation).” We ran reliability and validity tests in all 10 studies <Table 2> and <Table 3>; 6 studies applied Classical Test Theory (CTT) and 4 studies evaluated measurement tools by applying Item Response Theory (IRT) [7][8].

    <Table 1>

    Description of health literacy measurement

    Author (year) Instrument Setting Characteristics Psychometrics Assessment

    Domains assessed Validation sample population age Modes of administration in validation study Number of items Sample in validation study Language of validated version

    Pleasant A. et al. (2018) Calgary Charter on Health Literacy Scale United States of America Prose: Comprehension, Information seeking: Interactive media navigation, Information seeking: Document, Conceptual Knowledge, Comprehension, Communication: Speaker, Communication: Listener, Application/function Older Adults: 65+ years, Adults: 18 to 64 years Paper and pencil, Face-to-face 5 633 English Self-reported
    Kayser L. et al. (2018) eHealth Literacy Questionnaire (eHLQ) Denmark, Taiwan, Germany, France, Norway, Australia Information seeking: Interactive media navigation, Information seeking: Document, Conceptual Knowledge, Comprehension, Appraisal, Application/function 16-74 years Phone-based, Paper and pencil, Mailed survey, Face-to-face, Computer-based 35 475 English Self-reported
    Chung, S.Y. et al. (2015) eHealth Literacy Scale - Older Adults - eHEALS United States of America Information seeking: Interactive media navigation, Application/function Older adults: 62.8 8.5years Computer-based 8 866 English Self-reported
    Rouquette A. et al. (2018) European Health Literacy Survery - HLS-EU-Q6 France Comprehension, Communication: Speaker, Communication: Listener Adults: 18 to 64 years 53 18years Paper and pencil, Face-to-face 6 317 English Self-reported
    Abel, T. et al. (2014) Health Literacy Assessment Tool - HLAT-8 Switzerland, China Information seeking: Interactive media navigation, Information seeking: Document, Application/function 18-25 years Paper and pencil, Mailed survey 8 8,349 English Self-reported
    Tavousi, M. et al. (2020) Health Literacy Instrument for Adults -HELIA Iran Information seeking: Document, Comprehension, Appraisal, Application/function 18-65 years Paper and pencil 33 323 English Self-reported
    Osborne, R.H. et al. Health Literacy Questionnaire Mali, Slovakia, Prose: Comprehension, Numeracy, Information Older Adults: 65+ years, Adults: 18 Phone-based, Paper and pencil, 44 1,039 English Self-reported
    Author (year) Instrument Setting Characteristics Psychometrics

    Domains assessed Validation sample population age Modes of administration in validation study Number of items Sample in validation study Language of validated version

    (2013) (HLQ) South Korea, Egypt, Ghana, Reunion, Portugal, Denmark, The Netherlands, Norway, , Germany, Nepal, Brunei Darussalam, Brunei Darussalam seeking: Interacti1e media na1igation, Information seeking: Document, Conceptual Knowledge, Comprehension, Communication: Speaker, Appraisal, Application/function to 64 years Mailed sur1ey, Face-to-face, Computer-based
    Kang, S.J. et al. (2014) Korean Health Literacy Instrument - KHLI South Korea Prose: Comprehension, Numeracy, Information seeking: Document 40-64 years old Face-to-face 18 315 Korean
    Lee, T.W. et al. (2009) Korean Health Literacy Scale - KHLS South Korea Prose: Comprehension, Numeracy, Information seeking: Document, Application/function Older Adults: 65+ years Paper and pencil 24 411 Korean
    Duong, T.V. et al. (2019) Short Form Health Literacy Questionnaire for Asian populations - HLS-SF12 Malaysia, Kazakhstan, Taiwan, Myanmar, Vietnam, Indonesia Prose: Comprehension, Media Literacy, Information seeking: Interactive media navigation, Information seeking: Document, Conceptual Knowledge, Comprehension, Communication: Speaker, Communication: Listener, Appraisal, Application/function 15 years and older Phone-based, Paper and pencil, Mailed survey, Face-to-face, Computer-based 12 10,024 English
    <Table 2>

    Dimensions Assessed in Health Literacy Tools

    Author (year) Instrument Application/ function Appraisal Communication Comprehension Conceptual Knowledge Information seeking Media Literacy Numeracy Prose



    Listener Spekker Document Interactive media navigation Comprehension Pronunciation

    Pleasant A. et al. (2018) Calgary Charter on Health Literacy Scale Y Y Y Y Y Y Y Y
    Kayser L. et al. (2018) eHealth Literacy Questionnaire (eHLQ) Y Y Y Y Y Y
    Chung, S.Y. et al. (2015) eHealth Literacy Scale - Older Adults - eHEALS Y
    Rouquette A. et al. (2018) European Health Literacy Survery - HLS-EU-Q6 Y Y Y
    Abel, T. et al. (2014) Health Literacy Assessment Tool - HLAT-8 Y Y Y
    Tavousi, M. et al. (2020) Health Literacy Instrument for Adults - HELIA Y Y Y
    Osborne, R.H. et al. (2013) Health Literacy Questionnaire (HLQ) Y Y Y Y
    Kang, S.J. et al. (2014) Korean Health Literacy Instrument - KHLI Y Y Y Y
    Lee, T.W. et al. (2009) Korean Health Literacy Scale - KHLS Y
    Duong, T.V. et al. (2019) Short Form Health Literacy Questionnaire for Asian populations - HLS-SF12 Y Y Y Y Y Y Y Y

    * Y: Assessed

    <Table 3>

    Methodological quality of measurement properties in the health literacy measurements

    Author (year) Instrument Reliability Validity Responsiveness

    Internal consistency Reliability Measurement error Content validity Structural validity Hypothesis testing Cross-cultural validity Criterion validity

    Pleasant A. et al. (2018) Calgary Charter on Health Literacy Scale + - ? ? + ? ? ? ?
    Kayser L. et al. (2018) eHealth Literacy Questionnaire (eHLQ) + ? ? + + ? + ? ?
    Chung, S.Y. et al. (2015) eHealth Literacy Scale - Older Adults - eHEALS + ? ? + + - ? ? ?
    Rouquette A. et al. (2018) European Health Literacy Survery - HLS-EU-Q6 + ? ? + + + ? + ?
    Abel, T. et al. (2014) Health Literacy Assessment Tool - HLAT-8 - ? ? ? + + ? ? ?
    Tavousi, M. et al. (2020) Health Literacy Instrument for Adults - HELIA + + ? + + + ? ? ?
    Osborne, R.H. et al. (2013) Health Literacy Questionnaire (HLQ) + ? ? + - + ? ? ?
    Kang, S.J. et al. (2014) Korean Health Literacy Instrument - KHLI + + ? + + + ? ? ?
    Lee, T.W. et al. (2009) Korean Health Literacy Scale - KHLS + ? ? + + + ? ? ?
    Duong, T.V. et al. (2019) Short Form Health Literacy Questionnaire for Asian populations - HLS-SF12 + ? ? ? + + + + ?

    * +: sufficient, -: insufficient, ?: indeterminate

    2. Internal consistency

    The overall quality of the internal consistency evaluation was good. All 10 studies (100.0%) showed internal consistency. Although 1 of them (10.0%) presented internal consistency, Cronbach's alpha value was 0.65, which fell short of the standard 0.7 <Table 3>. Sample size and statistics for the evaluation of internal consistency were rated as “Excellent.” Regarding the overall quality, 1 study (10.0%) was rated as “Excellent,” and 9 studies (90.0%) were rated as “Good.” This is because the studies evaluated as “Good” did not present the percentage of missing items given or lacked a description of how missing items were handled <Table 4>.

    <Table 4>

    Frequency and percentage of overall quality of studies

    Excellent Good Fair Poor

    Internal consistency 1(10.0) 9(90.0) 0(0.0) 0(0.0)
    Reliability 0(0.0) 3(30.0) 0(0.0) 7(70.0)
    Measurement error 0(0.0) 0(0.0) 0(0.0) 10(100.0)
    Content validity 7(70.0) 0(0.0) 2(20.0) 1(10.0)
    Structural validity 0(0.0) 10(100.0) 0(0.0) 0(0.0)
    Hypothesis testing 0(0.0) 2(20.0) 0(0.0) 8(80.0)
    Cross-cultural validity 0(0.0) 1(10.0) 0(0.0) 9(90.0)
    Criterion validity 1(10.0) 1(10.0) 0(0.0) 8(80.0)
    Responsiveness 0(0.0) 0(0.0) 0(0.0) 0(0.0)

    3. Reliability

    A test–retest was conducted in 4 studies. Among them, 1 study presented statistics (ICC/weighted Kappa <=0.7 OR Pearson's r <=0.8) that were below the standard, so it was indicated as “-,” and 1 study did not present reasonable statistics <Table 3>. From the results of the overall quality evaluation for reliability, there were no studies rated as “Very good,” 3 studies were rated as “Good,” and 1 study was rated as “Poor” <Table 4>. The reason why it was rated as “Poor” is that ICC (Intraclass correlation coefficient) was not provided in the study, even though its measurement tools were evaluated with continuous grades. In all 4 studies conducting test-retest, the questionnaire administration methods were independent when collecting data (“Were the administrations independent?”), and the interval between the 2 tests was stated (“Was the time interval stated?”).

    4. Measurement error

    Measurement error was not observed in any of the 10 studies <Table 3>, and the overall quality assessment of measurement error was evaluated as “Poor” in all studies <Table 4>. In other words, none of the studies stated the standard error of measurement (SEM) used in the Classical Test Theory or limits of agreement (LoA).

    5. Content validity

    Content validity was evaluated for all 10 studies (100.0%). Seven were evaluated as “+,” since they met all the criteria such as “whether a clear description is provided of the measurement aim,” the “target population,” the “concepts that are being measured,” the “item selection AND target population,” and “whether investigators OR experts were involved in item selection.” The rest 3 were evaluated as “?” <Table 3> for reasons such as, “A clear description of the above-mentioned aspects is lacking OR only target population involved OR doubtful design or method.” In the overall quality assessment, the item “Was there an assessment of whether all items together comprehensively reflect the construct to be measured?” was assessed in consideration of 3 aspects (content coverage of items, description of domains, and theoretical foundation), and it was found that 7 studies (70%) were rated as “Excellent,” 2 (20.0%) as “Fair,” and 1 (10.0%) as “Poor” <Table 4>.

    6. Structural validity

    Structural validity was assessed in all 10 studies; 6 applied only CTT and 2 applied only IRT, whereas the other 2 applied both CTT and IRT to check structural validity. In 8 studies in which the CTT was applied, exploratory or confirmatory factor analysis was performed to verify the validity of the measurement tools. Consequently, 8 studies satisfying the criterion that “Factors should explain at least 50% of the variance” were evaluated as “+” (sufficient). IRT tests were performed to determine the (uni-) dimensionality of the items in only 2 of the 4 studies in which IRT was applied. One study reported RMSEA less than 0.06, and was rated as “+” (sufficient), and the other reported CFI = 0.936, TLI = 0.930, and RMSEA= 0.076 <Table 3>, and was rated as “-.”

    As a result of the overall quality assessment, all 10 studies were rated as “Good.” All studies had sufficient sample sizes required for structural validity, and either exploratory or confirmatory factor analysis was properly utilized. However, none of the studies reported the percentage of missing items or how they were handled. Therefore, they were evaluated as “Good” <Table 4>.

    7. Hypotheses-testing

    Hypotheses testing was performed in 2 studies, and the results were partially or generally consistent with the hypothesis <Table 3>. After the quality assessment, none of the studies were rated as “Excellent,” but 2 (20.0%) were rated as “Good,” and 8 were rated as “Poor” <Table 4>. Five studies provided evidence of the expected correlation for hypothesis testing or the absolute and relative magnitudes of the mean comparison. Descriptions of comparator instruments were provided only in 2 studies to assess convergent validity.

    8. Cross-cultural validity

    Two studies (20.0%) were evaluated as “+” (sufficient). The result of the DIF analysis by age and gender in 1 study found no difference in scores between groups; therefore, it was evaluated as “+” (sufficient). In another study, there were no differences in the 6 countries and accordingly, it was rated as “+” (sufficient) <Table 3>. The overall quality evaluation found that 1 study (10.0%) was rated as “Good” and the other as “Poor” <Table 4>. The main reason why it was rated as “Poor” was that it neither processed forward and backward translation nor did it execute pre-test to verify cultural relevance of the translation and ease of comprehension.

    9. Criterion validity

    Criterion validity was verified in 2 studies (20.0%). Those 2 studies scored over 0.7 in the gold standard and correlation <Table 3>, and were all rated as “+” (sufficient). One study that assessed criterion validity was evaluated as “Excellent” and the other was evaluated as “Good” in the item whether the percentage of missing items was stated <Table 4>. It was evaluated as “Good” because neither the percentage of missing items nor descriptions of how missing items were handled were provided.

    10. Responsiveness

    Responsiveness refers to the degree to which score changes in measurement properties can be detected. None of the studies assessed responsiveness. In an experimental or intervention study, it is important to evaluate responsiveness to properly measure changes in treatment outcomes. . However, the participants of this study did not use the measurement tools used in clinical practice, but the ones used to measure the level of health literacy in the public health field. Therefore, responsiveness of the measurement tools was not assessed.

    11. Interpretability

    Seven studies (70%) reported the percentage of missing items for each item in the measurement tools. One study (10%) reported floor and ceiling effects, and 1 (10.0%) reported the overall score and subscale score distributions for the measurement tools. Four studies (40.0%) reported the score and score change for each subgroup of the measurement tools. None of the studies reported a minimal important change (MIC) or minimal important difference (MID).

    Ⅳ. Discussion

    Ten studies containing measurement tools assessing population health literacy were included in this systematic review. The COSMIN checklist was used to assess the methodological quality of individual studies on measurement properties, and the overall level of evidence of the measurement properties was evaluated. In previous studies, several reviews on oral and dental health literacy[21], mental health[15], and sleep quality[22] have been performed using the COSMIN checklist. This is the first review to systematically appraise the quality of the measurement properties of health literacy tools in the general population.

    Four of the 10 studies selected in this review applied IRT and it refers to a theory of testing, based on the relationship between an individual’s performance on a test item and the test-taker’s level of performance[23]. Recently, IRT has been actively applied in the public health field[7][8][24]. Therefore, to measure health literacy in populations, it is necessary to apply the item response theory, which considers the relationship between the demographic characteristics of the participant and the item response of measurement tools. Most studies did not report on the frequency and percentage of missing items and how the missing items were handled. Since this is likely to affect the overall quality evaluation, it should be considered an important aspect when developing health literacy measurement tools in the future. After evaluating the reliability of the studies selected for this review, internal consistency was found to be the best measurement property. Internal consistency was verified in all studies, and the overall quality was good. However, test-retest was conducted in only 4 studies, and 1 study did not present an intraclass correlation coefficient (ICC). No measurement errors were observed in any of the studies. Therefore, it is necessary to calculate the intraclass correlation coefficient and measurement errors through test-retest to verify internal consistency when developing health measurement tools in the future.

    Validity includes content validity, the degree to which the content of measurement tools reflects the concept to be measured; structural validity, the degree to which the score of measurement tools reflects the dimensionality of the concept to be measured; hypotheses testing, the validity of whether the same results as the predictions derived based on the results of previous studies are obtained; cross-cultural validity, the degree to which translated measurement tool items reflect the meaning of the original measurement tool items; and criterion validity, the degree to which the score of measurement tools reflects the gold standard of measurement tools. As a result of the validity evaluation of the health literacy measurement tools, content validity and structural validity were the most excellent measurement properties. Content validity was evaluated in all the studies, and the overall quality was good. Likewise, structural validity was evaluated in all studies through exploratory and confirmatory factor analyses, and the overall quality was good. Hypotheses testing, cross-cultural validity, and criterion validity were assessed in only 2 studies, which should be considered important when developing health literacy measurement tools in the future. As for hypothesis testing, no studies were rated as “very good.” Although hypothesis testing was stated in the 2 studies, there was little description of whether the expected correlation or absolute and relative magnitude of the mean comparison was shown in hypothesis testing. In the research planning stage, hypothesis setting for hypothesis testing should be clearly identified and statistics, size, and significance level regarding the interpretation of hypothesis testing results should be presented so that the quality of hypothesis testing can be improved.

    No studies were rated as “Excellent”’ on cross-cultural validity evaluation. When developing health literacy measurement tools in the future, it is desirable to conduct a pre-test in a group similar to the target group of the main study through a mutually independent translation-reverse translation process by bilingual experts. Health literacy can vary depending on demographic characteristics such as age, gender, and education level. Therefore, sampling in consideration of demographic characteristics is required when making a feasibility study plan, and the DIF for each demographic characteristic should be analyzed. Next, the criterion validity, the degree to which the scores of measurement tools reflect the gold standard, was verified in 2 studies, and 1 study was rated as “Excellent.” According to the COSMIN checklist, the original long version can be considered the gold standard[17]. In this review, there was a case in which correlation was presented using health status or similar measurement tools rather than the long version. Therefore, a measurement tool developed for the first time may not have a gold standard, which can be a limitation of the COSMIN checklist. Only 1 study (10.0%) reported the floor and ceiling effects of the measurement tool; however, for ease of interpretation in the future, it is necessary to report the floor and ceiling effects, minimal important change (MIC), or minimal important difference (MID), as well as the missing items. It is necessary to determine whether the floor and ceiling effects are less than the reasonable standard of 30%[25]. To apply an appropriate measurement tool for reliability, validity, reactivity, and ease of interpretation in the general population, we suggest that researchers conduct a quality evaluation of measurement properties on the basis of the COSMIN checklist when developing measurement tools in the future. A limitation of the present review is that only health literacy measurement tools registered in a single database were included, and all measurement tools were written in English.

    This study conducted a systematic review of health literacy measurement tools using the COSMIN checklist. Of the 10 health literacy measurement tools, only 1 used 10 domains, but they were measured in a limited way. Among the studies evaluated as “excellent” for overall quality, internal consistency was verified in 1 study, content validity was verified in 7 studies, and criterion validity was verified in 1 study. They were regarded as lacking in reliability and validity.

    Ⅴ. Conclusion

    Health literacy measurement tools, universally applicable to the Korean population, need to be developed. It is crucial to continuously verify their reliability and validity. Additionally, developing or adapting these tools should involve reference to evaluation frameworks such as the COSMIN checklist.

    Figure

    KSHSM-17-4-31_F1.gif
    Flow chart of study selection

    Table

    Description of health literacy measurement
    Dimensions Assessed in Health Literacy Tools
    * Y: Assessed
    Methodological quality of measurement properties in the health literacy measurements
    * +: sufficient, -: insufficient, ?: indeterminate
    Frequency and percentage of overall quality of studies

    Reference

    1. S.K. Simonds (1974), Health education as social policy, Health Education Monographs, 2(1_suppl), pp.1-10.
    2. I. Kickbusch, D. Maag (2008), Health Literacy, International Encyclopedia of Public Health, Edited by: Kris H, Stella Q., Academic Press, pp.204-211.
    3. K. Sørensen, S. Van den Broucke, J. Fullam, G. Doyle, J. Pelikan, Z. Slonska, H. Brand (2012), Health literacy and public health: a systematic review and integration of definitions and models, BMC public health, vol.12(1);1-3.
    4. S. Qi, F. Hua, S. Xu, Z. Zhou, F. Liu (2021), Trends of global health literacy research (1995– 2020): Analysis of mapping knowledge domains based on citation data mining, Plos one, vol.16(8):e0254988.
    5. https://www.khealth.or.kr/healthplan
    6. S.H. Kim, E.J. Lee (2008), The Influence of Functional Literacy on Perceived Health Status in Korean Older Adults, Journal of Korean Academy of Nursing, vol.38(2);195-203.
    7. T.W. Lee, S.J. Kang, H.J. Lee, S.I. Hyun (2009), Testing health literacy skills in older Korean adults, Patient education and counseling, vol.5(3);302-307.
    8. S.J. Kang, T.W. Lee, M.K. Paasche-Orlow, G.S. Kim, H.K. Won (2014), Development and evaluation of the Korean health literacy instrument, Journal of Health Communication, vol.19(sup2);254-266.
    9. S.S. Kim, S.H. Kim, S.Y. Lee (2005), Health Literacy: Development of a Korean Health Literacy Assessment Tool, Korean journal of health education and promotion, vol.22(4);215-227.
    10. H.R. Chun, S.I. Cho, I.H. Kim (2018), Validation of the Measure of Health Literacy for the Elderly, Korean Public Health Research, vol.44(4);99-109.
    11. L.D. Chew, K.A. Bradley, E.J. Boyko (2004), Brief questions to identify patients with inadequate health literacy, Fam Med, vol.36(8);588-594.
    12. J.H. Kim, C.Y. Park, S.H. Kang (2019), A Survey on the Level and Related Factors of Health Literacy in Korean People, Health Policy and Management, vol.29(2);146-159.
    13. S.E. Kim, D.J. Park, J.H. Choi (2019), The Relationship between Sub-dimensions of Health Literacy and Health-Related Behaviors among Korean Adults, Health and Social Welfare Review, vol.39(1);334-364.
    14. H.R. Chun, J.Y. Lee (2020), Factors associated with health literacy among older adults: Results of the HLS-EU-Q16 measure, Korean J Health Educ Promot, vol.37(1);1-13.
    15. Y. Wei, P.J. McGrath, J. Hayden, S. Kutcher (2017), Measurement properties of mental health literacy tools measuring help-seeking: a systematic review, Journal of Mental Health, vol.26(6);543-555.
    16. C.B. Terwee, S.D. Bot, M.R. de Boer, D.A. van der Windt, D.L. Knol, J. Dekker, L.M. Bouter, H.C. de Vet (2007). Quality criteria were proposed for measurement properties of health status questionnaires. Journal of clinical epidemiology, vol.60(1); 34-42.
    17. C.A.C. Prinsen, L.B. Mokkink, L.M. Bouter, J. Alonso, D.L. Patrick, H.C.W. de Vet, C.B. Terwee (2018), COSMIN guideline for systematic reviews of patient-reported outcome measures, Quality of Life Research, vol.27(5);1147-1157.
    18. L.B. Mokkink, C.B. Terwee, D.L. Patrick, J. Alonso, P.W. Stratford, D.L. Knol, L.M. Bouter, H.C. de Vet (2010), The COSMIN checklist for assessing the methodological quality of studies on measurement properties of health status measurement instruments: an international Delphi study, Quality of life research, vol.19(4);539-549.
    19. D. Moher, A. Liberati, J. Tetzlaff, D.G. Altman, PRISMA Group (2009). Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement, Annals of internal medicine, vol.151(4);264-269.
    20. C.B. Terwee, L.B. Mokkink, D.L. Knol, R.W. Ostelo, L.M. Bouter, H.C. de Vet (2012), Rating the methodological quality in systematic reviews of studies on measurement properties: a scoring system for the COSMIN checklist, Quality of life research, vol.21(4);651-657.
    21. M. Ghaffari, S. Rakhshanderou, A. Ramezankhani, Y. Mehrabi, A. Safari-Moradabadi (2020), Systematic review of the tools of oral and dental health literacy: assessment of conceptual dimensions and psychometric properties, BMC oral health, vol.20(1);1-2.
    22. G.U. Kim, J.H. Lee (2019), Systematic Review of the Pittsburgh Sleep Quality Index used for Measuring Sleep Quality among Adults with Trauma, Korean Journal of Adult Nursing, vol.31(4);337-350.
    23. R.D. Bock, M.F. Zimowski (1997), Multiple group IRT. In Handbook of modern item response theory, New York, NY : Springer, pp.433-448.
    24. M.O. Edelen, B.B. Reeve (2007), Applying item response theory (IRT) modeling to questionnaire development, evaluation, and refinement, Qual Life Res, vol.16;5-18.
    25. R.L. Kane (2006), Understanding health care outcomes research, 2nd ed. Sudbury, Massachusetts: Jones and Bartlett, pp.131-164.
    December 5, 2023
    December 22, 2023
    December 29, 2023
    downolad list view