Skip to main content

Interlaboratory variability of HER2 fluorescence in situ hybridization testing in breast cancer: results of a multicenter proficiency-testing ring study in China

Abstract

Background

Accurate detection of human epidermal growth factor receptor 2 (HER2) gene amplification via fluorescence in situ hybridization (FISH) is necessary to determine HER2 status. Although many attempts have been made to increase the consistency of the results, the actual situation still needs to be determined. To investigate the latest interlaboratory variability of HER2 FISH testing for breast cancer, a multicenter proficiency-testing ring study was conducted in China.

Methods

A total of ten samples, each exhibiting distinct HER2 signal patterns and genetic heterogeneity, were distributed to 169 laboratories for HER2 FISH analysis. Data comprising both the results of the tests and feedback from questionnaires were compiled for comprehensive evaluation.

Results

The overall agreement among the participating laboratories was substantial to almost perfect, with a Fleiss’ kappa value of 0.765–0.911. However, it is important to note that cases with characteristics of HER2 signals near the critical cutoff range or with genetic heterogeneity showed lower congruence, poorer reproducibility, and higher variability (Fleiss’ kappa: 0.582). Our questionnaire showed that 52.2% (86/168) of the participants did not perform validation after their operation procedures or interpretation criteria were updated, and 75.6% (121/160) of the participants did not establish standard interpretation procedures. Since these laboratories showed worse performance (P < 0.05), the lack of validation and interpretation procedures was speculated to be the possible underlying cause.

Conclusions

This study presents the latest landscape of interlaboratory variability and accuracy of HER2 FISH testing in China and highlights potential causes for the variability. Despite many years of effort, the standardization of HER2 status determination still has a long way to go.

Background

As a representative biomarker for targeted therapy, amplification of the human epidermal growth factor receptor 2 (HER2) gene, which encodes overexpressed HER2 protein, has been demonstrated in 15–25% of breast cancers (BCs) [1, 2]. Since burgeoning anti-HER2 therapies have been demonstrated to significantly improve the survival of patients with HER2-positive or HER2-low expression [1,2,3], recent multi-country initiatives in practice guidelines suggest that the HER2 status should be determined for all patients with invasive BC at diagnosis to guide optimal management, making the accuracy and reproducibility of HER2 testing increasingly focused [4,5,6,7].

HER2 status is commonly assessed by combining immunohistochemistry (IHC) to assess protein levels and fluorescence in situ hybridization (FISH) to assess gene amplification. For accurate HER2 status determination, the standardization of HER2 status determination in breast cancer has been promoted for decades [4, 5, 8,9,10]. However, as IHC has a high degree of operational variability and scoring subjectivity, studies on the standardization of HER2 status determination and studies investigating changing laboratory testing practices have focused mainly on IHC [11,12,13,14]. Only a few studies have investigated the performance of HER2 FISH testing [15,16,17,18]. Although FISH is less susceptible to the adverse effects of tissue handling and fixation [19,20,21], the different reagents, operating procedures, interpretation procedures, and operators among laboratories may also lead to different test results. Moreover, while expert groups such as the joint committee of the College of American Pathologists (CAP) and the American Society of Clinical Oncology (ASCO) have published a series of guidelines to standardize HER2 FISH testing, and most cases can be easily interpreted according to these recently updated guidelines [5, 22], some cases still fall into less common categories (groups 2–4) or exhibit intratumoral genetic heterogeneity. These complexities can lead to unusual diagnostic challenges, placing significant demands on pathologists to accurately count and interpret FISH signals [23,24,25,26]. Hence, to guarantee accurate HER2 status determination, it is essential to confirm and keep track of the actual state and latest interlaboratory variability of HER2 FISH testing.

Proficiency testing (PT) is an important tool for investigating the performance of laboratories and improving the accuracy and reliability of tests. To date, only CAP and the UK National External Quality Assessment Scheme (NEQAS) have conducted multi-center trials for HER2 FISH testing over a decade ago [15,16,17,18]. Owing to the limitations of quality control (QC) materials, the PT schemes conducted by CAP and NEQAS only evaluate the detection capability of laboratories for typical HER2-amplified or HER2-negative samples. Since the classification and scoring criteria have been heavily adjusted in recent years, these schemes cannot reflect the real situation of HER2 FISH testing at present. Hence, to investigate the latest landscape of interlaboratory variability, comprehensively evaluate the testing performance of laboratories for different cases, and guarantee high-quality testing performance of HER2 FISH testing in breast cancer, we conducted an updated PT ring study in China. Given that the reproducibility of FISH near the critical cutoff range and the accuracy of complex HER2 genetic heterogeneity (GH) assessments are important issues in HER2 FISH interpretation [27], a set of simulated formalin-fixed and paraffin-embedded (FFPE) breast cancer samples with different HER2 FISH signal patterns and different HER2 genetic heterogeneity levels were applied. By using these simulated samples, the detection capability and interpretability of HER2 FISH testing in different laboratories was assessed.

Methods

Scheme design

This PT scheme for HER2 FISH testing comprised two rounds. In each round, five samples with particular clinical case scenarios were dispatched. Seven calendar days were provided to the participant laboratories to complete the analysis via their routine testing procedures. The raw scores for each case (including the HER2/CEP17 ratio, average HER2 signal/cell ratio, and number of counted cells) and final interpretation of HER2 FISH testing for particular simulated clinical samples were requested for final critical appraisal. Each laboratory was also requested to answer a questionnaire regarding procedures pertaining to HER2 FISH testing in the laboratory.

Table 1 presents an overview of the samples distributed in each scheme. The clinical case scenarios for each sample are described in the Supplementary File.

Table 1 The characteristics and homogeneity assessment results of the quality control samples in the PT scheme

Sample preparation and validation

Several types of immortalized human breast carcinoma cell lines containing different HER2 FISH signal patterns were selected for preparing our QC materials, including BT474, HCC1954 (purchased from the American Type Culture Collection, Manassas, USA), and MCF-7 (obtained from the National Infrastructure of Cell Line Resources, Beijing, China) [28, 29]. These cells were mass-cultured, mixed, and subcutaneously injected into the breasts of female nude (nu/nu) mice or CB-17 SCID mice (Vital River, Beijing, China) to produce orthotopic xenografts simulating breast cancer. The xenografts were cut and embedded into small blocks of various shapes to simulate samples from resection, mammotome core biopsy, and core needle biopsy. Sections of 4 μm thickness were cut from each FFPE tumor block as the QC materials.

Hematoxylin and eosin (HE) staining, FISH, and IHC were performed to validate the feasibility and commutability of our QC materials. FISH (PathVysion HER2 DNA Probe Kit [Abbott Molecular, Illinois, USA] and HER2/neu gene amplification Assay Kit by FISH [LBP, Guangdong, China]) was used as the gold standard for sample validation. To give auxiliary judgment and obtain the initial expected results of samples, IHC (utilizing anti-HER2 antibody [MXB Biotechnologies, Fuzhou, China]) was also performed. In each round, homogeneity and stability assessments were performed via FISH before the sections were distributed. IHC and FISH assays were performed following the manufacturer’s protocol, and interpreted following Chinese guideline for HER2 detection in breast cancer [6] and 2018 ASCO/CAP’s HER2 testing guideline [5]. Specific testing process and interpretation criteria are detailed in Supplementary material.

Result assessment

The laboratories were requested to interpret the samples as either HER2 gene amplification positive or negative on the basis of the provided clinical case scenarios and to provide feedback on the HER2/CEP17 ratio, average HER2 signal/cell ratio, and number of counted cells for each sample. Additional information regarding laboratory characteristics, applied kits, and specific testing and interpretation procedures was also requested for further evaluation.

The validated results of samples which matched with more than 80% of the feedback from the participants were finally regarded as the intended reference results [15]. The qualitative results obtained from the participants were assessed and compared with the intended results. The absence of a fluorescence signal was classified as a technical failure. The results that differed from the intended reference results were considered false negatives (FNs) or false positives (FPs). Any FN or FP result was considered a critical error because the treatment outcome would have been affected. In addition, we further evaluated the characteristics of the participating laboratories and the accuracy of the analysis of HER2 GH, the HER2/CEP17 ratio, and the average HER2 signal/cell ratio. To investigate the potential sources of error, the performances of different laboratories were analyzed in terms of kits, assay conditions, experimental operations, and staff competence.

Statistical analysis

All the statistical analyses were performed using SPSS 27.0 (IBM Corp., Armonk, NY, USA). The performances were compared using Fleiss’ kappa test, with Chi-square and Fisher’s exact tests applied as needed. For continuous variables, such as the proportion of HER2 GH, the HER2/CEP17 ratio, and average HER2 signal/cell ratio, t tests, one-way ANOVA, or Mann-Whitney test were applied as appropriate. Statistical significance was set at P < 0.05.

Results

Validation of quality control materials

HE staining revealed that the typical histological structures of our QC materials were similar to those of specimens derived from BC tumors. The percentage of tumor cells on each FFPE slide ranged from 50 to 90%, which also demonstrated the suitability of our FFPE samples for FISH and IHC. For FISH detection, two experienced pathologists counted and interpreted the HER2/CEP17 ratio and HER2 signal/cell ratio in the FFPE samples. The results showed that the HCC1954-derived samples have a highly positive HER2 gene amplification; the samples derived from the mixture of HCC1954 and MCF-7 cells, which included more than 10% HER2-amplified cells, exhibited heterogeneous amplification; BT474-derived samples presented an equivocal HER2 gene amplification with more than 90% nonclassical clones and approximately 5% HER2-amplified cells in a clustered form, whereas the MCF-7-derived samples showed no amplification (Fig. 1). With cross-validation by IHC (a tint strength of IHC 2+, IHC 3+, IHC 0, IHC 3+, and IHC 3+, respectively), the validated results for FFPE samples derived from BT474 (cases 1–5 and 2–5), HCC1954 (cases 1–2 and 2–2), MCF-7 (cases 1–1, 1–3, and 2 − 1), a mixture of HCC1954 and MCF-7 (cases 2–3 and 2–4), and BC tumors (cases 1–4) were ultimately interpreted as negative, positive, negative, positive, and positive, respectively (Fig. 1). Notably, the validated results for BT474-derived samples did not align with previous researches [30, 31] due to possible cross-contamination of cells in culture. However, since the majority of participants interpreted these samples as negative, and this discrepancy did not affect the homogeneity, stability, detectability of samples, or the evaluation results, the BT474-derived samples were ultimately included in this study.

Fig. 1
figure 1

Validation results of quality control samples by HE staining, IHC, and FISH. HE staining was performed at 100× magnification, IHC at 200× magnification, and FISH at 1000× magnification. All the quality control samples displayed a high amount of tumor cells, appropriate staining, and appropriate hybridization. The results obtained by these evaluation methods were basically in accordance with the expected results

In addition to sample validation, homogeneity and stability assessments were conducted using FISH before the sections were distributed for each round. Five sections were randomly sampled from each type of FFPE block. The results showed that the qualitative results and counting results of sections from the same tissue block were consistent, confirming that no significant change in tumor heterogeneity was present (Table 1). Stability studies revealed that samples can be maintained for at least 1 month at ambient temperature (22 °C) and 3 months at 4 °C. Hence, these analyses confirmed that the simulated FFPE samples were homogeneous and stable as QC materials prior to sample delivery.

Characteristics of the participating laboratories

In total, 169 independent laboratories participated in this study. Of which, 115 laboratories participated in both rounds, 25 laboratories participated only in the first round, and 29 laboratories participated only in the second round (Fig. 2a). Most participants were affiliated with pathology departments or clinical laboratories in public hospitals (56.80%, 96/169), followed by commercial laboratories (39.05%, 66/169). A few pathology departments and clinical laboratories in private hospitals (2.96%, 5/169) and reagent manufacturers (1.18%, 3/169) participated in this scheme (Fig. 2b).

Fig. 2
figure 2

Distribution of participating laboratories with different characteristics. (a) Number of participating laboratories in each round. (b) Distribution of laboratories of different types. (c) Distribution of laboratories with different seniority levels. (d) The distribution of laboratories using different kits

According to the returned questionnaires, approximately half of these laboratories had performed HER2 FISH testing for more than five years (49.70%, 84/169), and 25.44% of laboratories had 3–5 years of experience (43/169). However, 24.85% of the laboratories had carried out HER2 FISH testing for less than 3 years (42/169, Fig. 2c). In addition, feedback showed that a wide variety of test kits were used by the laboratories (Fig. 2d). The most commonly used kits were the PathVysion HER2 DNA Probe Kit (Abbott Molecular, Illinois, USA; 28.99%, 49/169), followed by the HER2 gene amplification Assay Kit by FISH (Healthcare-bio, Hubei, China; 23.07%, 39/169) and the HER2/neu gene amplification Assay Kit by FISH (LBP, Guangdong, China; 15.98%, 27/169). Notably, only two laboratories employed an automatic slide treatment system (FAS-1000, Orient Gene Biotech, Zhejiang, China), whereas 167 of the 169 laboratories (98.82%) performed FISH testing manually. Most participants (97.63%, 165/169) performed HER2 interpretation manually. Only four participants interpreted with the assistance of automated image analysis systems.

Performance of participating laboratories

On the basis of the feedback, 284 valid result sets (1 420 results) were reported. In the first and second rounds, 87.86% (123/140) and 72.22% (104/144) of laboratory were found to be entirely in accordance with the expected results, respectively. There was substantial to almost perfect overall agreement among the participating laboratories, demonstrated by high Fleiss’ kappa values of 0.911 (95% CI 0.902–0.919) and 0.765 (95% CI 0.756–0.773) across the two rounds. However, a series of inconsistent results were still identified, including 26 FNs (41.27%, 26/63), 36 FPs (57.14%, 36/63), and 1 technical failure (1.59%, 1/63). These inconsistent results primarily occurred in a subset of cases (1–5, 2–3, 2–4, and 2–5), where the Fleiss’ kappa value dropped to 0.582 (95% CI 0.571–0.593), indicating lower agreement among the laboratories for those specific cases (Table 2).

Table 2 Testing concordance between laboratories with potential reasons for interlaboratory discordance

By further analyzing the source of inconsistent results, two distinct categories of discordance were founded. Of which, intratumoral genetic heterogeneity bore the brunt of discordance. For case 2–3, which exhibited heterogeneous amplification (with > 10% cohesive highly HER2-amplified cells), nine laboratories (6.25%, 9/144) incorrectly interpreted the result as no amplification. Of these, four laboratories (44.44%, 4/9) did not observe highly HER2-amplified cells, four laboratories (44.44%, 4/9) estimated the size of the amplified clone to be < 10%, and the remaining one laboratory (11.11%, 1/9) erroneously interpreted the result as no HER2 amplification but estimated > 10% highly HER2-amplified cells. Similarly, in the 17 laboratories (11.81%, 17/144) that incorrectly interpreted the results of case 2–4 (simulated core biopsy specimen derived from the same cellular origins as case 2–3), four laboratories (23.53%, 4/17) did not observe highly HER2-amplified cells, and 13 laboratories (76.47%, 13/17) estimated the size of the amplified clone to be < 10%. For case 1–5 and 2–5, which were characterized by both HER2 GH (with < 10% HER2-amplified cells) and HER2 signals near the critical cutoff range, seven laboratories (5.00%, 7/140) and 16 laboratories (11.11%, 16/144) incorrectly reported the proportion of HER2 GH (estimated size of the clone > 10%), respectively. Of which, there were even three and seven laboratories reported the result of case 1–5 and case 2–5 as HER2 positive with simply “cherry picking” the amplified cells.

Apart from intratumoral genetic heterogeneity, HER2 signal near the critical cutoff range was also a potential contributing factor leading to the lack of participant consensus in our scheme. For case 1–5 and 2–5, nine laboratories (6.43%, 9/140) and four laboratories (2.77%, 4/144), respectively, made errors in the counting of background cells, excluding errors caused by incorrect heterogeneity estimation. Take case 1–5 as an example, seven laboratories (77.78%, 7/9) interpreted the tumor cells as group 3 (HER2/CEP17 ratio < 2.0 and average HER2 copy number ≥ 6.0 signals/cell), and two laboratories (22.22%, 2/9) interpreted the tumor cells as group 1 (HER2/CEP17 ratio ≥ 2.0 and average HER2 copy number ≥ 4.0 signals/cell).

Significantly, in laboratories that interpret qualitative results accurately, problems with variability in HER2 GH assessment and HER2 signal enumeration for challenging cases have been identified. Take case 1–5 as example, in 124 participants who reported qualitative results consistent with the intended results, only 41.9% (52/124) of the participants estimated the size of the clone to be between 1 and 10% with clustered signals, and 58.06% of the participants (72/124) gave feedback that no HER2 GH was observed. Moreover, the HER2/CEP17 ratios and average HER2 signals/cell ratios of the nonamplified areas of the tumor were reported to range from 0.89 to 1.72 and from 2.05 to 5.30 (Table 2), which can lead to different categories (group 4 or group 5). Similarly, high variability in HER2 GH assessment and HER2 signal enumeration was also observed both in the same laboratory and in different laboratories for case 2–5. Since case 2–5 is the repeat sample of case 1–5, the high variability in HER2 GH assessment and HER2 signal enumeration in the same laboratory further revealed the unsatisfactory reproducibility of laboratories for HER2 FISH testing (Fig. 3).

Fig. 3
figure 3

Horizontal and longitudinal comparisons of the proportion of HER2 genetic heterogeneity, average HER2 signal/cell ratio, and HER2/CEP17 ratio reported by 115 laboratories that submitted interpretations for both case 1–5 and case 2–5. (a) Distribution of the proportion of HER2 genetic heterogeneity of case 1–5 and case 2–5 reported by 115 laboratories that submitted interpretations. (b) Distribution of the average HER2 signals/cells of the nonamplified areas of case 1–5 and case 2–5 reported by 115 laboratories that submitted interpretations. (c) The distributions of the HER2/CEP17 ratios of the nonamplified areas of case 1–5 and case 2–5 reported by 115 laboratories that submitted interpretations

Further statistical analysis of the results obtained from the participants revealed no significant differences among most of the kits (P > 0.05), except for the HER2 gene amplification assay kit by FISH produced by GpMedical [P < 0.05] (Fig. 4), which suggests that different experimental methods, staff competences, and assay conditions are more likely to cause errors and variability than the kits. However, no correlation was observed between errors across laboratory types and seniority levels (both P > 0.05; Fig. 4). According to the questionnaires, most laboratories updated their operation and interpretation criteria based on the 2018 ASCO/CAP guidelines; however, only 48.8% (82/168) of the participants performed analytical validation after the changes. Statistical analysis revealed that laboratories that did not perform validation after the change had more errors (P < 0.05) than those who performed validation. Moreover, 75.6% (121/160) of the participants did not establish a standard interpretation procedure. Although the error rate did not vary significantly between laboratories that developed or did not develop interpretation procedures, greater variability in HER2 GH assessment and HER2 signal enumeration was observed in laboratories that did not develop interpretation procedures. This finding indicates that irregular experimental operations and interpretations may bear the brunt of variability.

Fig. 4
figure 4

Concordance rates with expected results across each PT scheme round. (a) Concordance rates with expected results of laboratories of different types across each PT scheme round. (b) Concordance rates with the expected results of laboratories with different seniority levels across each PT scheme round. (c) Concordance rates with the expected results of laboratories using different kits across each PT scheme round. Errors include false-positive and false-negative results in all quality control samples

Discussion

Strict standardization of HER2 FISH testing is necessary to achieve accurate HER2 status determination, identifying those who will gain the greatest benefit from specific HER2-targeted therapy and avoiding unnecessary treatment of patients who are unlikely to respond [7, 32, 33]. To investigate the latest interlaboratory variability and improve the testing reliability of HER2 FISH testing in Chinese clinical laboratories, a multicenter PT ring study was conducted in China.

Compared with the pre-existing PT schemes for HER2 FISH testing in the world [15,16,17,18], our PT scheme not only includes typical HER2-amplified and no amplification samples, but also utilizes a series of challenging samples that can simulate real tissue samples of BCs with different HER2 FISH signal patterns or different HER2 GHs. Hence, the testing performance, especially the reproducibility of FISH near the critical cutoff range and the accuracy of the complex HER2 GH assessment, for all participating laboratories was comprehensively evaluated.

Over our two rounds of evaluation, 1 420 results reported by 169 laboratories were analyzed. The results show that the performance of HER2 FISH testing in the real world may be sub-optimal, as only 72.22–87.86% of the results from laboratories were found to be entirely in accordance with the expected results in each round. Consistent with reports from other PT providers [16, 17], the laboratories participating in our scheme have excellent accuracy for samples with low copy number (no amplification) and high copy number (amplified) of the HER2 gene. However, low congruence was also observed as expected, in the analysis of challenging samples with different HER2 FISH signal patterns or different HER2 GHs, despite their self-claiming high accuracy. We also noted the problems of high variability in HER2 GH assessment and HER2 signal enumeration in laboratories that had interpreted the results as consistent with those expected for challenging cases. By horizontally and longitudinally comparing the proportion of HER2 GH, average HER2 signal/cell, and HER2/CEP17 ratio reported by laboratories for case 1–5 and case 2–5, unsatisfactory reproducibility for HER2 FISH testing was also observed. Further analysis revealed no significant differences in error frequency, GH assessment, or HER2 signal enumeration between the different laboratory types and seniority levels; hence, the difficulty in determining the HER2 status of challenging samples was highlighted even for laboratories with considerable experience. As the samples in our study were validated as relatively homogeneous, the confounding influence of tumor spatial heterogeneity bias from the same tissue block was eliminated. By further analyzing the source and nature of the variability, it was revealed that the unsatisfactory performance and poor reproducibility of laboratories may be attributed in part to specific testing reagents with poor performance but even more to unsatisfactory testing and interpretation.

By checking the questionnaire feedback, we considered that one potential contributing factor leading to unsatisfactory testing and interpretation may be the lack of validation. In our study, only 48.8% of the participants performed analytical validation after their operation procedures and interpretation criteria were updated. Since using an unvalidated testing system may result in technical failure or fluctuations in staining and thus cause inaccurate results and interpretations, as with other diagnostic tests, HER2 FISH detection should be rigorously evaluated before use in the diagnostic setting. It is worth emphasizing that samples with borderline HER2 results should be used for validation [34, 35]. In practice, some laboratories have taken participating PT schemes as a replacement for performing validation. However, by simply participating in conventional PT schemes that use only strong positive samples, the testing performance for cases with low HER2 copy numbers, especially rare-case scenarios with results near the ASCO/CAP cutoffs, cannot be ensured. Hence, in addition to frequent participation in rigorous PT schemes, robust methodology validation is also essential for high-quality analyses.

In addition, the lack of scoring and interpretation procedures is another issue. Our results showed that high variability in HER2 GH assessment and HER2 signal enumeration was concentrated in laboratories that did not develop interpretation procedures. Since unstandardized scoring and interpretation lead to high observer subjectivity in heterogeneity identification, interest region selection, and signal enumeration among pathologists who score FISH [23], a robust scoring and interpretation procedure should be established. Additional attention is required to establish this procedure. First, to prevent pathologists from missing any tumor populations with increased HER2 signals and to obtain an accurate percentage of heterogeneous cells, the entire slide must be screened before counting. Especially for patients with mild-to-moderate HER2 protein expression, special attention should be given to identifying genetic heterogeneity when performing FISH because marked intratumoral HER2 copy number variation or the presence of a small but distinct clone may lead to the consideration of a follow-up biopsy [5, 24, 27]. Subsequently, once the counting started, representative regions with contiguous and nonoverlapping cells should be scored to prevent simply “cherry picking” that could lead to false positive results. The laboratory should establish a specific process for interpreting different patterns of heterogeneous amplification, such as simple heterogeneous amplification, complex heterogeneous amplification, and isolated amplified tumor cells [27, 36]. Finally, establishing a review system in which different FISH readers score some of the same cases is also important, as all individuals have the potential for counting volatility. Beyond minimizing scoring variability, this can also help the laboratory stay alert for scoring trends among FISH readers.

While our study has certain advantages, it also has several limitations that deserve attention. Firstly, the xenograft specimens used are relatively homogeneous and could not perfectly simulate the real clinical specimens, especially the tissue structure of breast cancer. Since pre-analytical factors (such as identifying invasive tumors) can influence the results, the substantial agreement found with homogeneous xenograft specimens in this study may not fully reflect the performance of HER2 FISH testing in real-life clinical practice. Although the laboratories also demonstrated high consistency in detecting clinical specimens in this study, there may be still room for Chinese laboratories to improve in the pre-testing phase, given that only one clinical specimen was used for evaluation. Secondly, due to the insufficient participant data for the digital method, we could not perform a concordance analysis between the digital and manual methods for FISH scoring in this study. Since these different scoring methods may influence the concordance across laboratories, further investigation is necessary. Additionally, the limited number of laboratories involved in this study may restrict the generalizability of our findings to all laboratories performing HER2 FISH testing. Future research with larger scale may consider to improve representativeness.

Conclusion

Our study evaluated the current practices of FISH-based HER2 amplification detection in laboratories in China. Despite many years of effort, the standardization of the assessment of HER2 status is still a long way from being achieved. To improve performance, laboratories should pay close attention to rigorous validation and standardized operation and interpretation procedures. Additionally, we also call for more regulation in these aspects during laboratory accreditation.

Data availability

The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.

Abbreviations

HER2:

Human epidermal growth factor receptor 2

BCs:

Breast cancers

IHC:

Immunohistochemistry

FISH:

Fluorescence in situ hybridization

CAP:

College of American Pathologists

ASCO:

American Society of Clinical Oncology

PT:

Proficiency testing

NEQAS:

UK National External Quality Assessment Scheme

QC:

Quality control

GH:

Genetic heterogeneity

FFPE:

Formalin-fixed and paraffin-embedded

HE:

Hematoxylin and eosin

FNs:

False negatives

FPs:

False positives

References

  1. Wilcock P, Webster RM. The breast cancer drug market. Nat Rev Drug Discov. 2021;20:339–40.

    Article  CAS  PubMed  Google Scholar 

  2. Molinelli C, Jacobs F, Agostinetto E, et al. Prognostic value of HER2-low status in breast cancer: a systematic review and meta-analysis. ESMO Open. 2023;8:101592.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Mercogliano MF, Bruni S, Mauro FL, Schillaci R. Emerging Targeted Therapies for HER2-Positive Breast Cancer. Cancers (Basel) 2023; 15.

  4. Rakha EA, Tan PH, Quinn C, et al. UK recommendations for HER2 assessment in breast cancer: an update. J Clin Pathol. 2023;76:217–27.

    Article  CAS  PubMed  Google Scholar 

  5. Wolff AC, Hammond MEH, Allison KH, et al. Human Epidermal Growth Factor Receptor 2 Testing in Breast Cancer: American Society of Clinical Oncology/College of American Pathologists Clinical Practice Guideline Focused Update. J Clin Oncol. 2018;36:2105–22.

    Article  CAS  PubMed  Google Scholar 

  6. Recommended by Breast Cancer Expert Panel. Guideline for HER2 detection in breast cancer, the 2019 version. Zhonghua Bing Li Xue Za Zhi. 2019;48:169–75.

    Google Scholar 

  7. NCCN. Clinical Practice Guidelines in Oncology:Breast Cancer. 2023.

  8. Wolff AC, Somerfield MR, Dowsett M, et al. Human Epidermal Growth Factor Receptor 2 Testing in Breast Cancer: ASCO-College of American Pathologists Guideline Update. J Clin Oncol. 2023;41:3867–72.

    Article  CAS  PubMed  Google Scholar 

  9. Wolff AC, Hammond ME, Hicks DG, et al. Recommendations for human epidermal growth factor receptor 2 testing in breast cancer: American Society of Clinical Oncology/College of American Pathologists clinical practice guideline update. J Clin Oncol. 2013;31:3997–4013.

    Article  PubMed  Google Scholar 

  10. Shaaban AM, Purdie CA, Bartlett JM, et al. HER2 testing for breast carcinoma: recommendations for rapid diagnostic pathways in clinical practice. J Clin Pathol. 2014;67:161–7.

    Article  CAS  PubMed  Google Scholar 

  11. Md Pauzi SH, Masir N, Yahaya A, et al. HER2 testing by immunohistochemistry in breast cancer: A multicenter proficiency ring study. Indian J Pathol Microbiol. 2021;64:677–82.

    Article  PubMed  Google Scholar 

  12. Agersborg S, Mixon C, Nguyen T, et al. Immunohistochemistry and alternative FISH testing in breast cancer with HER2 equivocal amplification. Breast Cancer Res Treat. 2018;170:321–8.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Ohnishi C, Ohnishi T, Ntiamoah P, Ross DS, Yamaguchi M, Yagi Y. Standardizing HER2 immunohistochemistry assessment: calibration of color and intensity variation in whole slide imaging caused by staining and scanning. Appl Microsc. 2023;53:8.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Robbins CJ, Fernandez AI, Han G, et al. Multi-institutional Assessment of Pathologist Scoring HER2 Immunohistochemistry. Mod Pathol. 2023;36:100032.

    Article  PubMed  PubMed Central  Google Scholar 

  15. Geiersbach KB, Bridge JA, Dolan M, et al. Comparative Performance of Breast Cancer Human Epidermal Growth Factor Receptor 2 Fluorescence In Situ Hybridization and Brightfield In Situ Hybridization on College of American Pathologists Proficiency Tests. Arch Pathol Lab Med. 2018;142:1254–9.

    Article  CAS  PubMed  Google Scholar 

  16. Bartlett JM, Ibrahim M, Jasani B, et al. External quality assurance of HER2 fluorescence in situ hybridisation testing: results of a UK NEQAS pilot scheme. J Clin Pathol. 2007;60:816–9.

    Article  CAS  PubMed  Google Scholar 

  17. Persons DL, Tubbs RR, Cooley LD, et al. HER-2 fluorescence in situ hybridization: results from the survey program of the College of American Pathologists. Arch Pathol Lab Med. 2006;130:325–31.

    Article  PubMed  Google Scholar 

  18. Dowsett M, Hanna WM, Kockx M, et al. Standardization of HER2 testing: results of an international proficiency-testing ring study. Mod Pathol. 2007;20:584–91.

    Article  PubMed  Google Scholar 

  19. Chen Y, Liu L, Ni R, Zhou W. Advances in HER2 testing. Adv Clin Chem. 2019;91:123–62.

    Article  CAS  PubMed  Google Scholar 

  20. Qaiser T, Mukherjee A, Reddy Pb C, et al. HER2 challenge contest: a detailed assessment of automated HER2 scoring algorithms in whole slide images of breast cancer tissues. Histopathology. 2018;72:227–38.

    Article  PubMed  Google Scholar 

  21. Furrer D, Sanschagrin F, Jacob S, Diorio C. Advantages and disadvantages of technologies for HER2 testing in breast cancer specimens. Am J Clin Pathol. 2015;144:686–703.

    Article  CAS  PubMed  Google Scholar 

  22. Wolff AC, Somerfield MR, Dowsett M et al. Human Epidermal Growth Factor Receptor 2 Testing in Breast Cancer: ASCO-College of American Pathologists Guideline Update. J Clin Oncol 2023, JCO2202864.

  23. Grimm EV, Allison KH, Hicks DG, et al. HER2 Testing: Insights From Pathologists’ Perspective on Technically Challenging HER2 FISH Cases. Appl Immunohistochem Mol Morphol. 2021;29:635–42.

    Article  CAS  PubMed  Google Scholar 

  24. Filho OM, Viale G, Stein S, et al. Impact of HER2 Heterogeneity on Treatment Response of Early-Stage HER2-Positive Breast Cancer: Phase II Neoadjuvant Clinical Trial of T-DM1 Combined with Pertuzumab. Cancer Discov. 2021;11:2474–87.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Crespo J, Sun H, Wu J et al. Rate of reclassification of HER2-equivocal breast cancer cases to HER2-negative per the 2018 ASCO/CAP guidelines and response of HER2-equivocal cases to anti-HER2 therapy. PLoS ONE 2020;15, e0241775.

  26. Lin L, Sirohi D, Coleman JF, Gulbahce HE. American Society of Clinical Oncology/College of American Pathologists 2018 Focused Update of Breast Cancer HER2 FISH Testing GuidelinesResults From a National Reference Laboratory. Am J Clin Pathol. 2019;152:479–85.

    Article  CAS  PubMed  Google Scholar 

  27. Hanna WM, Ruschoff J, Bilous M, et al. HER2 in situ hybridization in breast cancer: clinical implications of polysomy 17 and genetic heterogeneity. Mod Pathol. 2014;27:4–18.

    Article  CAS  PubMed  Google Scholar 

  28. Schwarz LJ, Hutchinson KE, Rexer BN et al. An ERBB1-3 Neutralizing Antibody Mixture With High Activity Against Drug-Resistant HER2 + Breast Cancers With ERBB Ligand Overexpression. J Natl Cancer Inst 2017;109.

  29. Syed AK, Woodall R, Whisenant JG, Yankeelov TE, Sorace AG. Characterizing Trastuzumab-Induced Alterations in Intratumoral Heterogeneity with Quantitative Imaging and Immunohistochemistry in HER2 + Breast Cancer. Neoplasia. 2019;21:17–29.

    Article  CAS  PubMed  Google Scholar 

  30. Rondon-Lagos M, Verdun Di Cantogno L, Rangel N, et al. Unraveling the chromosome 17 patterns of FISH in interphase nuclei: an in-depth analysis of the HER2 amplicon and chromosome 17 centromere by karyotyping, FISH and M-FISH in breast cancer cells. BMC Cancer. 2014;14:922.

    Article  PubMed  PubMed Central  Google Scholar 

  31. Vargas-Rondon N, Perez-Mora E, Villegas VE, Rondon-Lagos M. Role of chromosomal instability and clonal heterogeneity in the therapy response of breast cancer cell lines. Cancer Biol Med. 2020;17:970–85.

    Article  PubMed  PubMed Central  Google Scholar 

  32. Wang CW, Khalil MA, Lin YJ, Lee YC, Chao TK. Detection of ERBB2 and CEN17 signals in fluorescent in situ hybridization and dual in situ hybridization for guiding breast cancer HER2 target therapy. Artif Intell Med. 2023;141:102568.

    Article  PubMed  Google Scholar 

  33. Tarantino P, Viale G, Press MF, et al. ESMO expert consensus statements (ECS) on the definition, diagnosis, and management of HER2-low breast cancer. Ann Oncol. 2023;34:645–59.

    Article  CAS  PubMed  Google Scholar 

  34. CLSI. Fluorescence In Situ Hybridization Methods for Clinical Laboratories, 2nd Edition. CLSI MM07-A2. Clinical & Laboratory Standards Institute 2018.

  35. Cell Markers And Cytogenetics Committees College Of American Pathologists. Clinical laboratory assays for HER-2 neu amplification and overexpression quality assurance, standardization, and proficiency testing. Arch Pathol Lab Med. 2002;126:803–8.

    Article  Google Scholar 

  36. Marchio C, Annaratone L, Marques A, Casorzo L, Berrino E, Sapino A. Evolving concepts in HER2 evaluation in breast cancer: Heterogeneity, HER2-low carcinomas and beyond. Semin Cancer Biol. 2021;72:123–35.

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

The authors would like to thank LBP, Inc., and Daan Gene, Inc., for their technical support.

Funding

This work was supported by the Dongcheng District Outstanding Talent Nurturing Program (No. 2020-dchrcpyzz-30 to R.P.). The funding sources have no role in the conceptualization, design, data collection, analysis, decision to publish, or preparation of the manuscript.

Author information

Authors and Affiliations

Authors

Contributions

R.P. and J.L. performed study concept and design; R.P., K.Z., and G.L. provided acquisition, analysis and interpretation of data, and statistical analysis; R.P. and J.L. performed writing, review and revision of the paper. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Rongxue Peng or Jinming Li.

Ethics declarations

Ethics approval and consent to participate

All animal experiments complied with the ARRIVE guidelines and were performed in accordance with the ethical standards of the institution or practice at which the studies were conducted. All applicable international, national, and/or institutional guidelines for the care and use of animals were followed. The experiments were approved by the Institutional Animal Care and Use Committee (IACUC) of Beijing Vital River Laboratory Animal Technology Co., Ltd. (Ethics approval number: P2019060). As per the IACUC guidelines, the maximum tumor size allowed is 1.5 cm³ to ensure animal welfare. We confirm that no tumor exceeded the IACUC limits in our study.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic Supplementary Material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Peng, R., Zhang, K., Lin, G. et al. Interlaboratory variability of HER2 fluorescence in situ hybridization testing in breast cancer: results of a multicenter proficiency-testing ring study in China. Diagn Pathol 19, 161 (2024). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13000-024-01588-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13000-024-01588-w

Keywords