- Split View
-
Views
-
Cite
Cite
F.J. Broekmans, J. Kwee, D.J. Hendriks, B.W. Mol, C.B. Lambalk, A systematic review of tests predicting ovarian reserve and IVF outcome, Human Reproduction Update, Volume 12, Issue 6, November/December 2006, Pages 685–718, https://doi.org/10.1093/humupd/dml034
- Share Icon Share
Abstract
The age-related decline of the success in IVF is largely attributable to a progressive decline of ovarian oocyte quality and quantity. Over the past two decades, a number of so-called ovarian reserve tests (ORTs) have been designed to determine oocyte reserve and quality and have been evaluated for their ability to predict the outcome of IVF in terms of oocyte yield and occurrence of pregnancy. Many of these tests have become part of the routine diagnostic procedure for infertility patients who undergo assisted reproductive techniques. The unifying goals are traditionally to find out how a patient will respond to stimulation and what are their chances of pregnancy. Evidence-based medicine has progressively developed as the standard approach for many diagnostic procedures and treatment options in the field of reproductive medicine. We here provide the first comprehensive systematic literature review, including an a priori protocolized information retrieval on all currently available and applied tests, namely early-follicular-phase blood values of FSH, estradiol, inhibin B and anti-Müllerian hormone (AMH), the antral follicle count (AFC), the ovarian volume (OVVOL) and the ovarian blood flow, and furthermore the Clomiphene Citrate Challenge Test (CCCT), the exogenous FSH ORT (EFORT) and the gonadotrophin agonist stimulation test (GAST), all as measures to predict ovarian response and chance of pregnancy. We provide, where possible, an integrated receiver operating characteristic (ROC) analysis and curve of all individual evaluated published papers of each test, as well as a formal judgement upon the clinical value. Our analysis shows that the ORTs known to date have only modest-to-poor predictive properties and are therefore far from suitable for relevant clinical use. Accuracy of testing for the occurrence of poor ovarian response to hyperstimulation appears to be modest. Whether the a priori identification of actual poor responders in the first IVF cycle has any prognostic value for their chances of conception in the course of a series of IVF cycles remains to be established. The accuracy of predicting the occurrence of pregnancy is very limited. If a high threshold is used, to prevent couples from wrongly being refused IVF, a very small minority of IVF-indicated cases (∼3%) are identified as having unfavourable prospects in an IVF treatment cycle. Although mostly inexpensive and not very demanding, the use of any ORT for outcome prediction cannot be supported. As poor ovarian response will provide some information on OR status, especially if the stimulation is maximal, entering the first cycle of IVF without any prior testing seems to be the preferable strategy.
Introduction
In Western societies the introduction in the 1960s of reliable methods of contraception has led to the birth of fewer children per family. Driven by increasing levels of female education, a growing participation in labour force and career demands, postponement of childbearing has been a secondary consequence of the so-called sexual revolution (Leridon, 1998). These societal changes in family planning have caused a significant increase in the incidence of unwanted infertility due to female reproductive ageing (Weinstein et al., 1993; Abma et al., 1997; Ventura et al., 2001).
From studies on natural populations in which no consistent methods of birth control are applied, it has been shown that natural fertility starts to decline after the age of 30, accelerates in the mid-30s and will lead to sterility at a mean age of 41 (Spira, 1988; Wood, 1989; te Velde and Pearson, 2002) (Figure 1). The reduction in female fertility can also be shown from contemporary population studies. The chance of not conceiving a first child within one year increases from under 5% in women in their early 20s to approximately 30% or over in the age group of 35 years and older (Abma et al., 1997). So, although the majority of women of older age will obtain the desired pregnancy within a one-year period, the chance of becoming subfertile increases ∼6 fold in comparison with very young women.
The age-related effect on female fertility has also been shown in numerous reports on the results of IVF treatment in infertile couples. The probability of live birth obtained through IVF treatment clearly decreases after the age of 35 (Anonymous, 1995; Templeton et al., 1996) and the same has been shown to be true for the implantation rate per embryo (van Kooij et al., 1996). In fact, female age has consistently been shown to be an important predictor of success in IVF treatment.
Over the past two decades, a number of so-called ovarian tests have been studied for their ability to predict outcome of IVF in terms of oocyte yield and occurrence of pregnancy. Some of these tests have become part of the routine diagnostic procedure for infertility patients that will undergo assisted reproductive techniques. With the current work we aim to provide an answer to the question of what the true value is of these tests to patient management. Evidence-based medicine has progressively developed as the standard approach for many diagnostic procedures and treatment options in the field of reproductive medicine (National Collaborating Center for Women’s and Children’s Health, 2004). Therefore, we provide a comprehensive systematic literature review, including an a priori protocolized information retrieval on all currently available and applied tests to determine ovarian reserve (OR).
What follows is first a general section in which we briefly outline the aims and the valuation of OR testing and the set-up of the systematic review. After this, we describe individually all currently available tests and their effectiveness with regard to prediction of ovarian response and pregnancy after IVF in generally accepted terms for diagnostic procedures. A unique feature of this systematic review is that we will furthermore provide where possible an integrated receiver operating characteristic (ROC) analysis and curve of all individual evaluated published papers of each test, as well as a formal judgement upon the clinical value.
The assessment of OR
OR can be considered normal in conditions where stimulation with the use of exogenous gonadotrophins will result in the development of at least 8–10 follicles and the retrieval of a corresponding number of healthy oocytes at follicle puncture (Fasouliotis et al., 2000). With such a yield, the chances of producing a live birth through IVF are considered optimal. In general, as outlined earlier, age of the woman is a simple way of obtaining information on the extent of her OR, in terms of both quantity and quality (Templeton et al., 1996). However, in the view of the substantial variation in the decline of reproductive capacity with age (te Velde and Pearson, 2002) (Figure 2), there is a need to identify women of relatively young age with clearly diminished reserve, as well as women around the mean age at which natural fertility on average is lost (41 years) but still with adequate OR. In clinical terms, we aim to identify women with a high risk of producing a poor response to ovarian stimulation and/or a very low probability of becoming pregnant through IVF, as well as those who still produce enough oocytes to have a good chance of becoming pregnant even if female age is advanced. If it appears possible to identify such categories of women, then management could be individualized, for instance by stimulation dose or treatment scheme adjustments (Tarlatzis et al., 2003), by counselling against initiation of IVF treatment or pertinent refusal to accept initiation, or by indicating the necessity of early initiation of treatment before reserve has diminished too far.
OR is currently defined as the number and quality of the follicles left in the ovary at any given time. An accurate measure of the quantitative OR would involve the counting of all follicles present in both ovaries, as is done in post-mortem studies (Block, 1952). For obvious reasons, in OR testing, the true size of the follicle pool has not been used as the benchmark for evaluation (Lass et al., 1997a; Lambalk et al., 2004; Lass, 2004; Sharara and Scott, 2004), apart from one distinct study (Gulekli et al., 1999), where whole ovary counts served as reference for several OR tests (ORTs). Instead, several proxy variables of the pool size are used in studies on diagnostic accuracy, like ovarian response to hyperstimulation with exogenous FSH in IVF and the occurrence of menopause or menopausal transition, as these events are quantitatively determined. Although related, the quality of the oocyte released from the dominant follicle at ovulation represents the other aspect of ovarian reserve. Proxy variables for oocyte quality currently used are the pregnancy probability in infertility treatment like IUI and IVF or in the follow-up of couples during and after the initial infertility work-up.
We should therefore realize that in the vast majority of studies on ORTs that will be discussed below, either ovarian response or occurrence of pregnancy in IVF serves as the benchmark to judge upon the accuracy and clinical value of the test under study. Ovarian response to adequate stimulation may be considered the most accurate, though still indirect, representation of the status of the primordial follicle pool, as it is a condition that is continuously present in the individual that undergoes the test. In contrast, the occurrence of pregnancy in such an individual may be influenced by many more factors than oocyte, and hence embryo quality, alone. Only if the occurrence of pregnancy is studied in a series of treatment cycles it may represent a solid proxy variable of the benchmark for ovarian reserve. Most ORTs are quite adequate in predicting ovarian response, but often fail to correctly predict the occurrence of pregnancy, especially if only one IVF cycle was studied.
Properties of test evaluation
ORT evaluation using response and/or pregnancy as reference or outcome variables should imply the assessment of predictive accuracy and clinical value of the test. Accuracy refers to the degree by which the outcome condition is predicted correctly. Summary statistics of accuracy include sensitivity (rate of correct identification of cases with poor response), specificity (rate of correct identification of cases without poor response), likelihood ratio (LR, how many times more likely particular test results are in patients with poor response than in those without poor response) and diagnostic odds ratios (DOR, the odds of positive test results in cases with poor response over the odds of positive test results in those without poor response) (Deeks, 2001; Grimes and Schulz, 2005). To identify all cases that will respond poorly to stimulation without judging many normal responders badly, the test must have high sensitivity and high specificity.
Positive LRs above 10 and negative LRs below 0.1 are considered as indicators of an adequate diagnostic test, while values between 5 and 10 and below 0.2 are considered to indicate a moderate test. As such, the LR can be considered a clinically useful tool to help judge the performance of the test, as the value will change when the threshold for an abnormal test is shifted.
The diagnostic odds ratio is an adequate measure when combining studies in a systematic review, as a single diagnostic odds ratio corresponds to a set of sensitivities and specificities depicted by an ROC curve and is considered threshold independent (Figure 3). It therefore can be considered a good parameter to compare the overall accuracy of a test evaluated in different studies. Although the DOR values will be higher for tests with better combinations for sensitivity and specificity, this value has not been advocated as a single measure of clinical value, as changes in the threshold used will not be expressed by a change in DOR value. For the meta-analytic approach, the range of DOR values across studies gives some indication as to the homogeneity of such studies.
Finally, the area under the ROC curve provides information on the overall discriminatory capacity of the test. Values of 1.0 imply perfect and that of 0.5 indicate completely absent discrimination.
Clinical value incorporates the question whether application of the test at a certain threshold will really change management or costs or safety or success rates on a population basis. It deals with the valuation of false positive and false negative test results in relation to the consequences of these test results for clinical decisions. Also it implies the rate of abnormal test results leading to altered decisions within the population of interest.
Design of ORT studies
Studies on the predictive accuracy and clinical value of ORTs should preferably be prospective in design, should examine cohorts of patients in IVF settings without exclusion of cases with signs of diminished ovarian reserve and patient management should not have been influenced by the test under study (verification bias). Also, evaluation should be equally weighted for every case, thus every case should contribute the same amount of cycles to the analysis. In most studies, only one IVF cycle is studied. A case–control design for the purpose of OR testing bears the disadvantage of retrospection and the absence of a reliable estimate of disease prevalence. The tests under study should in principle be reproducible, both at the laboratory (hormone assays) and at the operator level (ultrasound examination). Also, the outcome of treatment (response and pregnancy), serving as the reference for ovarian reserve, should be clearly defined.
The accuracy in predicting a certain outcome by the test under study should be evaluated by constructing contingency tables at several threshold levels for an abnormal test. Using the calculated sensitivity and specificity from each threshold level, a ROC curve (Figure 3) can be drawn and the calculated area under this curve represents the overall predictive accuracy of the test. Assessment of the clinical value is a complex process in which the applicability in daily practice should become clear. The overall accuracy represented by the ROC curve, the choice of a threshold for abnormality, the rate of abnormal tests at that threshold, the post-test probability of disease (i.e. poor response or non-pregnancy), the valuation of false positive and false negative test results and the consequence for patient management of an abnormal test will all contribute to the process of deciding whether a test is useful or not. Finally, the cost of carrying out the test as a routine measure and the burden to the patient balanced against the reduction in costs by excluding cases with low pregnancy prospects should contribute to the decision whether or not to apply a test.
ORTs in relation to other predictors of success
It is important for patients who are considering treatment with IVF to know the probability of success in the course of a series of IVF treatment cycles. The possibility of a live birth for any couple undergoing treatment will depend on the success rate at the individual clinic. However, equally important in the prediction of outcome are the characteristics of the couple seeking treatment (Stolwijk et al., 1996; Templeton et al., 1996; Sharma et al., 2002). Serious effort has been put into the build-up of prediction models that estimate the probabilities for success prior and during subsequent IVF cycles. In general, these models appeared inaccurate when external validation studies were carried out (Stolwijk et al., 1998; Smeenk et al., 2000). Intuitively, many IVF centres will use factors like female age, parity, duration of infertility, ovarian response in the first IVF attempt and embryo quality for individual counselling, albeit not through a formal prediction model. Within this practice, ORTs also may play a certain role and female age will be the one ORT applied almost without exception. The pressing question would be to what extent other, endocrine- or ultrasound-based, ORTs contribute and add to the prognostic information already obtained from the infertility work-up or the first IVF cycle. To date, studies specifically addressing this question are scarce or do not include the full range of prognostic factors available.
There are a number of studies (Eimers et al., 1994; Collins et al., 1995; Snick et al., 1997; Hunault et al., 2004; Hunault et al., 2005) that offer a model, based on factors like duration of subfertility, female age, parity, sperm quality and post-coital test, for the prediction of live birth among untreated subfertile couples. However, none of these models included ORTs, apart from female age. Only one study showed that on top of predictions based on the Eimers model, ORTs failed to add relevant information to the couple’s chances for a spontaneous pregnancy (van Rooij et al., 2005).
General remarks on physiological background of ORTs
Tests that are used to predict some defined outcome related to ovarian reserve almost without exception give assessment of the number of follicles remaining at some time point in both ovaries. Any marker giving an estimate of the remaining pool will at the same time be capable of providing, to some extent, information on oocyte quality. But on average, from prediction studies it seems that some markers give a better indication of quality than others. Female age, for instance, is the basic factor that is related to both quantity and quality. Basal FSH, through the feedback of inhibin B and estradiol, will represent cohort size but mostly at the extremes and therefore give a more thorough indication of quality aspects. This is in contrast to the more direct quantitative tests using antral follicle count (AFC), anti-Müllerian hormone (AMH) and ovarian volume (OVVOL) that are capable of describing a more complete range of ovarian reserve states. By choosing the right thresholds these tests may eventually correctly predict oocyte quality. The true relation between quantity and quality, however, remains a source of debate. Quantity is an aspect of ovarian reserve that is present in a continuous state and therefore offers a more or less continuous measurability. Quality, however, comes to expression every now and then, even in the setting of IVF. The relationship between the two aspects of ovarian reserve has become more evident when the predictive value of a poor response in a first IVF cycle was examined towards the probability of pregnancy in the actual or subsequent cycles (Klinkert et al., 2004). While cases with a normal response in additional cycles yielded acceptable rates of pregnancy, it was shown that in repeated poor responders this probability never surpassed 10% (de Boer et al., 2002; Lawson et al., 2003; Klinkert et al., 2004). It is also important to remember that there are several factors that contribute to the occurrence of pregnancy other than ovarian reserve, such as embryo transfer technique and number of embryos replaced. Even in young women with normal reserve the chance of non-pregnancy remains at least at the 50% level. So, a non-pregnancy state after IVF may even be attributed to unknown, yet non-ovarian reserve related, factors.
Approach of the systematic review
The aim of the systematic review on the value of diagnostic tests is to obtain an overall estimate of the test accuracy and clinical value based on all present evidence, after assessing the quality of the included studies and evaluating the variation in findings among the studies (Irwig et al., 1995; Deeks, 2001; Deville et al., 2002; Honest and Khan, 2002; Glas et al., 2003). Systematic review and meta-analysis on diagnostic accuracy and value implies consecutive steps as summarized in Table I (Irwig et al., 1994; Mol et al., 1997) please see addendum.
1 | Define the objective | Test and disease of interest. Reference standard for the disease. Impact of test result on clinical management. Comparison of tests |
2 | Literature search | Search, link and MESH terms. Inclusion and exclusion criteria. Databases used. Cross references. Contact authors for raw data if appropriate |
3 | Data extraction | Contingency table. Quality/Methodology characteristics. Extraction by two independent researchers. Disagreement solved by third independent researcher |
4 | Homogeneity test | Chi-square on sensitivity (sens) and specificity (spec) and provide ROC plot and sens, spec and diagnostic odds ratio (DOR) plot with 95% CI. Focus on outliers |
Homogeneity not rejected | Calculate summary point estimates for sens and spec and 95% CI | |
Homogeneity rejected | Logistic regression analysis on relation Quality/Methodology characteristics and test accuracy. If present: subgroup analysis. If absent assume cut-off point effect | |
5 | Data pooling | Spearman correlation between sens and spec (r < −0.5) or fixed effect logistic regression of ln DOR with an interaction term for test and study |
Sens and spec related and/or DORs homogenous | Summary ROC curve estimation using random-effects regression model | |
Sens and spec not related and/or DORs heterogenous | No pooling possible. Subgroup analysis? | |
6 | Assess clinical value | Positive predictive value of abnormal test at various prevalence values using various thresholds based on summary ROC curve, in correspondence with abnormal test rate |
If no estimated curve or point: comparison of individual sens and spec points with desired level of sens and spec |
1 | Define the objective | Test and disease of interest. Reference standard for the disease. Impact of test result on clinical management. Comparison of tests |
2 | Literature search | Search, link and MESH terms. Inclusion and exclusion criteria. Databases used. Cross references. Contact authors for raw data if appropriate |
3 | Data extraction | Contingency table. Quality/Methodology characteristics. Extraction by two independent researchers. Disagreement solved by third independent researcher |
4 | Homogeneity test | Chi-square on sensitivity (sens) and specificity (spec) and provide ROC plot and sens, spec and diagnostic odds ratio (DOR) plot with 95% CI. Focus on outliers |
Homogeneity not rejected | Calculate summary point estimates for sens and spec and 95% CI | |
Homogeneity rejected | Logistic regression analysis on relation Quality/Methodology characteristics and test accuracy. If present: subgroup analysis. If absent assume cut-off point effect | |
5 | Data pooling | Spearman correlation between sens and spec (r < −0.5) or fixed effect logistic regression of ln DOR with an interaction term for test and study |
Sens and spec related and/or DORs homogenous | Summary ROC curve estimation using random-effects regression model | |
Sens and spec not related and/or DORs heterogenous | No pooling possible. Subgroup analysis? | |
6 | Assess clinical value | Positive predictive value of abnormal test at various prevalence values using various thresholds based on summary ROC curve, in correspondence with abnormal test rate |
If no estimated curve or point: comparison of individual sens and spec points with desired level of sens and spec |
1 | Define the objective | Test and disease of interest. Reference standard for the disease. Impact of test result on clinical management. Comparison of tests |
2 | Literature search | Search, link and MESH terms. Inclusion and exclusion criteria. Databases used. Cross references. Contact authors for raw data if appropriate |
3 | Data extraction | Contingency table. Quality/Methodology characteristics. Extraction by two independent researchers. Disagreement solved by third independent researcher |
4 | Homogeneity test | Chi-square on sensitivity (sens) and specificity (spec) and provide ROC plot and sens, spec and diagnostic odds ratio (DOR) plot with 95% CI. Focus on outliers |
Homogeneity not rejected | Calculate summary point estimates for sens and spec and 95% CI | |
Homogeneity rejected | Logistic regression analysis on relation Quality/Methodology characteristics and test accuracy. If present: subgroup analysis. If absent assume cut-off point effect | |
5 | Data pooling | Spearman correlation between sens and spec (r < −0.5) or fixed effect logistic regression of ln DOR with an interaction term for test and study |
Sens and spec related and/or DORs homogenous | Summary ROC curve estimation using random-effects regression model | |
Sens and spec not related and/or DORs heterogenous | No pooling possible. Subgroup analysis? | |
6 | Assess clinical value | Positive predictive value of abnormal test at various prevalence values using various thresholds based on summary ROC curve, in correspondence with abnormal test rate |
If no estimated curve or point: comparison of individual sens and spec points with desired level of sens and spec |
1 | Define the objective | Test and disease of interest. Reference standard for the disease. Impact of test result on clinical management. Comparison of tests |
2 | Literature search | Search, link and MESH terms. Inclusion and exclusion criteria. Databases used. Cross references. Contact authors for raw data if appropriate |
3 | Data extraction | Contingency table. Quality/Methodology characteristics. Extraction by two independent researchers. Disagreement solved by third independent researcher |
4 | Homogeneity test | Chi-square on sensitivity (sens) and specificity (spec) and provide ROC plot and sens, spec and diagnostic odds ratio (DOR) plot with 95% CI. Focus on outliers |
Homogeneity not rejected | Calculate summary point estimates for sens and spec and 95% CI | |
Homogeneity rejected | Logistic regression analysis on relation Quality/Methodology characteristics and test accuracy. If present: subgroup analysis. If absent assume cut-off point effect | |
5 | Data pooling | Spearman correlation between sens and spec (r < −0.5) or fixed effect logistic regression of ln DOR with an interaction term for test and study |
Sens and spec related and/or DORs homogenous | Summary ROC curve estimation using random-effects regression model | |
Sens and spec not related and/or DORs heterogenous | No pooling possible. Subgroup analysis? | |
6 | Assess clinical value | Positive predictive value of abnormal test at various prevalence values using various thresholds based on summary ROC curve, in correspondence with abnormal test rate |
If no estimated curve or point: comparison of individual sens and spec points with desired level of sens and spec |
For each study finally included in the meta-analysis, sensitivity and specificity are calculated from the contingency tables. Homogeneity of the sensitivity–specificity points is tested by means of the χ2-test statistic. A summary point estimate of sensitivity and specificity and the 95% confidence interval is calculated if homogeneity cannot be rejected. In case of heterogeneity, logistic regression is used to evaluate whether Quality/Methodology characteristics of a study are associated with the discriminative capacity of the test under study. If one of the study characteristics is found to have a statistically significant impact on the performance of the test, further analysis is performed in subgroups of patients. If not, it is explored whether the differences in sensitivity–specificity combinations are because of the use of different threshold levels of the test under study. For this purpose, a Spearman correlation coefficient is calculated to assess the association between sensitivity and specificity. If there is a negative correlation as defined by a correlation coefficient of −0.5 or stronger, the individual pairs of sensitivity and specificity are considered to originate from a single ROC curve. All sensitivity–specificity points are then plotted and a summary ROC curve is estimated using a random-effects regression model (Littenberg and Moses, 1993; Midgette et al., 1993; Moses et al., 1993).
An important issue is the fact that individual studies may produce highly variable sensitivity–specificity points in the ROC space. This is generally explained by variation in the applied threshold level for an abnormal test across the studies or the presence of considerable study heterogeneity. As in the formal analysis, the presence of heterogeneity in design will be dealt with, and the variation in sens/spec points is generally attributed to the variation in threshold levels and thus allows us to construct a summary ROC curve. At the same time, the threshold variation will prevent the possibility of assessing a single threshold for a specific test that has a generalizable value. This will only become possible if from every study the original database would be available and to date this seems to be an extreme effort.
To assess the clinical value of the test under study for the assessment of disease state (i.e. poor response or non-pregnancy), the positive and negative predictive values are calculated using the estimated summary ROC curve and assuming arbitrary prevalences of the disease in the population. An LR for a positive (or abnormal) test result is then calculated for each point on the estimated ROC curve. Subsequently, the post-test probabilities of disease at various LR values are then calculated for the arbitrary pre-test probabilities of disease, assuming independence between the pre-test probability and the performance of the test (Bancsi et al., 2003). Final judgement depends on the overall accuracy, the choice of the test threshold, the post-test prediction at that threshold level and the valuation of a false positive test result. In case no estimated curve from the selected studies can be constructed, the judgement upon the clinical value is based on a comparison of a preset level of sensitivity and specificity with the observed levels in the various studies.
Systematic reviewing of ORTs
The aim of the present series of systematic reviews is to assess the true diagnostic accuracy and clinical value of the ORTs known to date, when applied in an IVF/ICSI population. Reference standards used to valuate the test properties are response to ovarian stimulation and occurrence of pregnancy. No preset definition was used for these standards. For every ORT under study, a computerized MEDLINE search was performed to identify articles on the subject outlined in the previous chapters published until December 2004. Checking of reference lists of articles already obtained was done, all in an iterative fashion. Keywords used for the various searches were ‘in vitro fertilization’ or ‘in vitro fertilisation’ or ‘assisted’ or ‘intracytoplasmatic’ or ‘intracytoplasmic’, in combination with ‘test-specific’ keywords, as mentioned in the tables.
One investigator (DH or JK) read all abstracts of the articles that were identified by the search. Any article reporting on the association of the test with poor ovarian response and/or non-pregnancy after IVF or possibly containing information that was to be transformed into a predictive tabulation was pre-selected. Subsequently, all pre-selected articles were fully read and judged independently by two investigators (DH and JK), and separate 2 × 2 tables were constructed for cross classification of the test result and the occurrence of poor response and/or non-pregnancy, whenever possible. In the event of disagreement on the inclusion or exclusion of pre-selected studies for the meta-analysis or on the calculation of the 2 × 2 table data or the scoring of quality characteristics, the judgement of a third author (FB or CL) was decisive. Studies in which it was not possible to construct 2 × 2 tables were excluded. Cross-references in all selected articles were checked, and, if applicable, studies were added to the analysis.
Each study was scored by the investigators on the following Quality/Methodology characteristics: (i) sampling (consecutive versus other), (ii) data collection (prospective versus retrospective), (iii) study design (cohort study versus case–control study), (iv) blinding (present or absent), (v) selection bias, (vi) verification bias, (vii) analysis on one or multiple cycles per couple and (viii) definition of outcome, poor response and pregnancy.
In the following sections, the results of search, data extraction, quality and methodology assessment and meta-analysis of extracted data as outlined above are discussed for every ORT comprised in this review.
Basal FSH
Systematic review
Through the search and selection strategy, a total of 37 studies reporting on the capacity of basal FSH to predict poor ovarian response and/or non-pregnancy after IVF and which were suitable for data extraction and meta-analysis were identified (Scott et al., 1989; Padilla et al., 1990; Toner et al., 1991; Khalifa et al., 1992; Chan et al., 1993; Ebrahim et al., 1993; Fanchin et al., 1994; Huyser et al., 1995; Licciardi et al., 1995; Smotrich et al., 1995; Balasch et al., 1996; Csemiczky et al., 1996; Martin et al., 1996; Pruksananonda et al., 1996; Gurgan et al., 1997; Chang et al., 1998a; Evers et al., 1998; Ranieri et al., 1998; Sharif et al., 1998; Bassil et al., 1999; Hall et al., 1999; Bancsi et al., 2000; Chae et al., 2000; Creus et al., 2000; Fabregues et al., 2000; Jinno et al., 2000; Penarrubia et al., 2000; Mikkelsen et al., 2001; Nahum et al., 2001; van der Stege and van der Linden, 2001; Esposito e al., 2002; Chuang et al., 2003; Fiçicioğlu et al., 2003; Kwee et al., 2003; Yanushpolsky et al., 2003; Akande et al., 2004; Erdem et al., 2004). Characteristics of the included studies are listed in Table II. As shown, there was a large diversity with regard to the various aspects of methodology and quality, and the definition of poor ovarian response. Logistic regression analysis indicated no significant association between any of these study characteristics and the predictive performance of basal FSH. For example, whether the design of the study was retrospective or prospective did not influence the prognostic capacity of basal FSH.
Author . | Consecutive . | One cycle per couple . | Data per . | Definition . | . | FSH-assay . | |
---|---|---|---|---|---|---|---|
. | . | . | . | Poor response/Cancel . | Pregnancy . | . | |
Scott et al. | Yes | No | Cycle | Not stated | Clinical/ongoing | RIA: Leeco Diagnostics | |
Padilla et al. | No | No | Cycle | Not stated | Clinical | RIA: Amersham Corp. | |
Toner et al. | No | No | Cycle/retrieval | <2 follicles 16 mm | Ongoing | RIA: Leeco Diagnostics | |
Khalifa et al. | No | No | Retrieval | Not stated | Ongoing | RIA: Leeco Diagnostics | |
Ebrahim et al. | Yes | Yes | Cycle | <3 oocytes | Term | RIA: Serono Diagnostics | |
Chan et al. | No | Not stated | Cycle | <3 follicles 15 mm | Clinical/ongoing | RIA: Diag. Products Inc. | |
Fanchin et al. | Yes | Yes | Cycle | <3 oocytes | Not applicable | Immunometric: Kodak Diag. | |
Huyser et al. | No | Yes | Cycle | Not stated | Term | IFMA: Delfia | |
Licciardi et al. | No | Not stated | Retrieval | Not stated | Ongoing | RIA: Leeco Diagnostics | |
Smotrich et al. | No | No | Cycle | <2 follicles 16 mm | Clinical | RIA: Nichols Inst. Radio. | |
Martin et al. | Yes | No | Cycle | Not stated | Clinical | ACS-180: Chemilum. | |
Pruksanonda et al. | No | Yes | Cycle | <3 follicle | Clinical | Fuorescense immunoassay | |
Csemiczky et al. | No | No | Cycle | Not stated | Clinical | RIA: Diag. Products Inc. | |
Balasch et al. | Yes | Yes | Cycle | <2 follicles 17 mm. or <5 follicles 14 mm | Not applicable | RIA: Immunotech Int. | |
Gurgan et al. | No | Yes | Cycle | <2 follicles 18 mm | Clinical | RIA: J&J Clin. Diagnostics | |
Sharif et al. | Yes | Yes | Cycle | <4 follicles 14 mm | Clinical | ACS-180: Chemilum. | |
Chang et al. | Yes | No | Cycle | Not stated | Clinical | Not stated | |
Evers et al. | Yes | Yes | Cycle | <4 follicles 14 mm | Clinical | RIA: Delfia | |
Ranieri et al. | No | Yes | Cycle | <5 follicles 15 mm | Not applicable | Immunometric: Nichols Inst. | |
Hall et al. | No | No | Patient | Not stated | Clincical | RIA | |
Bassil et al. | No | No | Cycle | Not stated | Clinical | Not stated | |
Jinno et al. | Yes | No | Cycle | Not stated | Not stated | Enzyme immunoassay: Abbott | |
Bancsi et al. | No | Yes | Cycle | Not stated | Ongoing | Immunoan./immunometric: Chiron | |
Chae et al. | Yes | Yes | Cycle | Not stated | Clinical | IRMA: Jeil Japan | |
Penarrubia et al. | Yes | Yes | Cycle | <3 follicles 14 mm | Not applicable | Immunoenzymometric: Technicon | |
Creus et al. | Yes | Yes | Cycle | <3 follicles 14 mm | Not applicable | Immunoenzymometric: Technicon | |
Fabregues et al. | Yes | Yes | Cycle | <3 follicles 14 mm | Not applicable | IRMA: Immunotech | |
Mikkelsen et al. | Yes | No | Retrieval | Not stated | Clinical | Immuno I; Bayer | |
Van de Stege et al. | Yes | Yes | Cycle | <3 follicles 18 mm | Clinical | RIA: Elecsys | |
Nahum et al. | Yes | No | Cycle | <3 follicles 18 mm | Clinical | MEIA: Abbott | |
Esposito et al. | No | Yes | Cycle | Not stated | Live birth | Immuno I; Bayer | |
Fiçicioğlu et al. | Yes | Yes | Cycle | <2 follicles or <5 oocytes | Not applicable | ELISA: Serotec Ltd, UK | |
Chuang et al. | No | Yes | Cycle | Not stated | Ongoing | Chemilum. Immunoassay: Immulite | |
Yanushpolsky et al. | Yes | No | Retrieval | Not stated | Delivery | Techn. Imm. Syst.: Bayer | |
Erdem et al. | Yes | No | Cycle | <5 oocytes or <3 follicles 18 mm | Not applicable | Immunometric: Immulite 2000 | |
Akande et al. | Yes | Yes | Cycle | <3 follicles 18 mm | Not applicable | Immunofluorimetric: DELFIA | |
Kwee et al. | Yes | Yes | Cycle | Poor response <6 oocytes | Not applicable | Immunomet.: Amerlite/Delfia |
Author . | Consecutive . | One cycle per couple . | Data per . | Definition . | . | FSH-assay . | |
---|---|---|---|---|---|---|---|
. | . | . | . | Poor response/Cancel . | Pregnancy . | . | |
Scott et al. | Yes | No | Cycle | Not stated | Clinical/ongoing | RIA: Leeco Diagnostics | |
Padilla et al. | No | No | Cycle | Not stated | Clinical | RIA: Amersham Corp. | |
Toner et al. | No | No | Cycle/retrieval | <2 follicles 16 mm | Ongoing | RIA: Leeco Diagnostics | |
Khalifa et al. | No | No | Retrieval | Not stated | Ongoing | RIA: Leeco Diagnostics | |
Ebrahim et al. | Yes | Yes | Cycle | <3 oocytes | Term | RIA: Serono Diagnostics | |
Chan et al. | No | Not stated | Cycle | <3 follicles 15 mm | Clinical/ongoing | RIA: Diag. Products Inc. | |
Fanchin et al. | Yes | Yes | Cycle | <3 oocytes | Not applicable | Immunometric: Kodak Diag. | |
Huyser et al. | No | Yes | Cycle | Not stated | Term | IFMA: Delfia | |
Licciardi et al. | No | Not stated | Retrieval | Not stated | Ongoing | RIA: Leeco Diagnostics | |
Smotrich et al. | No | No | Cycle | <2 follicles 16 mm | Clinical | RIA: Nichols Inst. Radio. | |
Martin et al. | Yes | No | Cycle | Not stated | Clinical | ACS-180: Chemilum. | |
Pruksanonda et al. | No | Yes | Cycle | <3 follicle | Clinical | Fuorescense immunoassay | |
Csemiczky et al. | No | No | Cycle | Not stated | Clinical | RIA: Diag. Products Inc. | |
Balasch et al. | Yes | Yes | Cycle | <2 follicles 17 mm. or <5 follicles 14 mm | Not applicable | RIA: Immunotech Int. | |
Gurgan et al. | No | Yes | Cycle | <2 follicles 18 mm | Clinical | RIA: J&J Clin. Diagnostics | |
Sharif et al. | Yes | Yes | Cycle | <4 follicles 14 mm | Clinical | ACS-180: Chemilum. | |
Chang et al. | Yes | No | Cycle | Not stated | Clinical | Not stated | |
Evers et al. | Yes | Yes | Cycle | <4 follicles 14 mm | Clinical | RIA: Delfia | |
Ranieri et al. | No | Yes | Cycle | <5 follicles 15 mm | Not applicable | Immunometric: Nichols Inst. | |
Hall et al. | No | No | Patient | Not stated | Clincical | RIA | |
Bassil et al. | No | No | Cycle | Not stated | Clinical | Not stated | |
Jinno et al. | Yes | No | Cycle | Not stated | Not stated | Enzyme immunoassay: Abbott | |
Bancsi et al. | No | Yes | Cycle | Not stated | Ongoing | Immunoan./immunometric: Chiron | |
Chae et al. | Yes | Yes | Cycle | Not stated | Clinical | IRMA: Jeil Japan | |
Penarrubia et al. | Yes | Yes | Cycle | <3 follicles 14 mm | Not applicable | Immunoenzymometric: Technicon | |
Creus et al. | Yes | Yes | Cycle | <3 follicles 14 mm | Not applicable | Immunoenzymometric: Technicon | |
Fabregues et al. | Yes | Yes | Cycle | <3 follicles 14 mm | Not applicable | IRMA: Immunotech | |
Mikkelsen et al. | Yes | No | Retrieval | Not stated | Clinical | Immuno I; Bayer | |
Van de Stege et al. | Yes | Yes | Cycle | <3 follicles 18 mm | Clinical | RIA: Elecsys | |
Nahum et al. | Yes | No | Cycle | <3 follicles 18 mm | Clinical | MEIA: Abbott | |
Esposito et al. | No | Yes | Cycle | Not stated | Live birth | Immuno I; Bayer | |
Fiçicioğlu et al. | Yes | Yes | Cycle | <2 follicles or <5 oocytes | Not applicable | ELISA: Serotec Ltd, UK | |
Chuang et al. | No | Yes | Cycle | Not stated | Ongoing | Chemilum. Immunoassay: Immulite | |
Yanushpolsky et al. | Yes | No | Retrieval | Not stated | Delivery | Techn. Imm. Syst.: Bayer | |
Erdem et al. | Yes | No | Cycle | <5 oocytes or <3 follicles 18 mm | Not applicable | Immunometric: Immulite 2000 | |
Akande et al. | Yes | Yes | Cycle | <3 follicles 18 mm | Not applicable | Immunofluorimetric: DELFIA | |
Kwee et al. | Yes | Yes | Cycle | Poor response <6 oocytes | Not applicable | Immunomet.: Amerlite/Delfia |
Author . | Consecutive . | One cycle per couple . | Data per . | Definition . | . | FSH-assay . | |
---|---|---|---|---|---|---|---|
. | . | . | . | Poor response/Cancel . | Pregnancy . | . | |
Scott et al. | Yes | No | Cycle | Not stated | Clinical/ongoing | RIA: Leeco Diagnostics | |
Padilla et al. | No | No | Cycle | Not stated | Clinical | RIA: Amersham Corp. | |
Toner et al. | No | No | Cycle/retrieval | <2 follicles 16 mm | Ongoing | RIA: Leeco Diagnostics | |
Khalifa et al. | No | No | Retrieval | Not stated | Ongoing | RIA: Leeco Diagnostics | |
Ebrahim et al. | Yes | Yes | Cycle | <3 oocytes | Term | RIA: Serono Diagnostics | |
Chan et al. | No | Not stated | Cycle | <3 follicles 15 mm | Clinical/ongoing | RIA: Diag. Products Inc. | |
Fanchin et al. | Yes | Yes | Cycle | <3 oocytes | Not applicable | Immunometric: Kodak Diag. | |
Huyser et al. | No | Yes | Cycle | Not stated | Term | IFMA: Delfia | |
Licciardi et al. | No | Not stated | Retrieval | Not stated | Ongoing | RIA: Leeco Diagnostics | |
Smotrich et al. | No | No | Cycle | <2 follicles 16 mm | Clinical | RIA: Nichols Inst. Radio. | |
Martin et al. | Yes | No | Cycle | Not stated | Clinical | ACS-180: Chemilum. | |
Pruksanonda et al. | No | Yes | Cycle | <3 follicle | Clinical | Fuorescense immunoassay | |
Csemiczky et al. | No | No | Cycle | Not stated | Clinical | RIA: Diag. Products Inc. | |
Balasch et al. | Yes | Yes | Cycle | <2 follicles 17 mm. or <5 follicles 14 mm | Not applicable | RIA: Immunotech Int. | |
Gurgan et al. | No | Yes | Cycle | <2 follicles 18 mm | Clinical | RIA: J&J Clin. Diagnostics | |
Sharif et al. | Yes | Yes | Cycle | <4 follicles 14 mm | Clinical | ACS-180: Chemilum. | |
Chang et al. | Yes | No | Cycle | Not stated | Clinical | Not stated | |
Evers et al. | Yes | Yes | Cycle | <4 follicles 14 mm | Clinical | RIA: Delfia | |
Ranieri et al. | No | Yes | Cycle | <5 follicles 15 mm | Not applicable | Immunometric: Nichols Inst. | |
Hall et al. | No | No | Patient | Not stated | Clincical | RIA | |
Bassil et al. | No | No | Cycle | Not stated | Clinical | Not stated | |
Jinno et al. | Yes | No | Cycle | Not stated | Not stated | Enzyme immunoassay: Abbott | |
Bancsi et al. | No | Yes | Cycle | Not stated | Ongoing | Immunoan./immunometric: Chiron | |
Chae et al. | Yes | Yes | Cycle | Not stated | Clinical | IRMA: Jeil Japan | |
Penarrubia et al. | Yes | Yes | Cycle | <3 follicles 14 mm | Not applicable | Immunoenzymometric: Technicon | |
Creus et al. | Yes | Yes | Cycle | <3 follicles 14 mm | Not applicable | Immunoenzymometric: Technicon | |
Fabregues et al. | Yes | Yes | Cycle | <3 follicles 14 mm | Not applicable | IRMA: Immunotech | |
Mikkelsen et al. | Yes | No | Retrieval | Not stated | Clinical | Immuno I; Bayer | |
Van de Stege et al. | Yes | Yes | Cycle | <3 follicles 18 mm | Clinical | RIA: Elecsys | |
Nahum et al. | Yes | No | Cycle | <3 follicles 18 mm | Clinical | MEIA: Abbott | |
Esposito et al. | No | Yes | Cycle | Not stated | Live birth | Immuno I; Bayer | |
Fiçicioğlu et al. | Yes | Yes | Cycle | <2 follicles or <5 oocytes | Not applicable | ELISA: Serotec Ltd, UK | |
Chuang et al. | No | Yes | Cycle | Not stated | Ongoing | Chemilum. Immunoassay: Immulite | |
Yanushpolsky et al. | Yes | No | Retrieval | Not stated | Delivery | Techn. Imm. Syst.: Bayer | |
Erdem et al. | Yes | No | Cycle | <5 oocytes or <3 follicles 18 mm | Not applicable | Immunometric: Immulite 2000 | |
Akande et al. | Yes | Yes | Cycle | <3 follicles 18 mm | Not applicable | Immunofluorimetric: DELFIA | |
Kwee et al. | Yes | Yes | Cycle | Poor response <6 oocytes | Not applicable | Immunomet.: Amerlite/Delfia |
Author . | Consecutive . | One cycle per couple . | Data per . | Definition . | . | FSH-assay . | |
---|---|---|---|---|---|---|---|
. | . | . | . | Poor response/Cancel . | Pregnancy . | . | |
Scott et al. | Yes | No | Cycle | Not stated | Clinical/ongoing | RIA: Leeco Diagnostics | |
Padilla et al. | No | No | Cycle | Not stated | Clinical | RIA: Amersham Corp. | |
Toner et al. | No | No | Cycle/retrieval | <2 follicles 16 mm | Ongoing | RIA: Leeco Diagnostics | |
Khalifa et al. | No | No | Retrieval | Not stated | Ongoing | RIA: Leeco Diagnostics | |
Ebrahim et al. | Yes | Yes | Cycle | <3 oocytes | Term | RIA: Serono Diagnostics | |
Chan et al. | No | Not stated | Cycle | <3 follicles 15 mm | Clinical/ongoing | RIA: Diag. Products Inc. | |
Fanchin et al. | Yes | Yes | Cycle | <3 oocytes | Not applicable | Immunometric: Kodak Diag. | |
Huyser et al. | No | Yes | Cycle | Not stated | Term | IFMA: Delfia | |
Licciardi et al. | No | Not stated | Retrieval | Not stated | Ongoing | RIA: Leeco Diagnostics | |
Smotrich et al. | No | No | Cycle | <2 follicles 16 mm | Clinical | RIA: Nichols Inst. Radio. | |
Martin et al. | Yes | No | Cycle | Not stated | Clinical | ACS-180: Chemilum. | |
Pruksanonda et al. | No | Yes | Cycle | <3 follicle | Clinical | Fuorescense immunoassay | |
Csemiczky et al. | No | No | Cycle | Not stated | Clinical | RIA: Diag. Products Inc. | |
Balasch et al. | Yes | Yes | Cycle | <2 follicles 17 mm. or <5 follicles 14 mm | Not applicable | RIA: Immunotech Int. | |
Gurgan et al. | No | Yes | Cycle | <2 follicles 18 mm | Clinical | RIA: J&J Clin. Diagnostics | |
Sharif et al. | Yes | Yes | Cycle | <4 follicles 14 mm | Clinical | ACS-180: Chemilum. | |
Chang et al. | Yes | No | Cycle | Not stated | Clinical | Not stated | |
Evers et al. | Yes | Yes | Cycle | <4 follicles 14 mm | Clinical | RIA: Delfia | |
Ranieri et al. | No | Yes | Cycle | <5 follicles 15 mm | Not applicable | Immunometric: Nichols Inst. | |
Hall et al. | No | No | Patient | Not stated | Clincical | RIA | |
Bassil et al. | No | No | Cycle | Not stated | Clinical | Not stated | |
Jinno et al. | Yes | No | Cycle | Not stated | Not stated | Enzyme immunoassay: Abbott | |
Bancsi et al. | No | Yes | Cycle | Not stated | Ongoing | Immunoan./immunometric: Chiron | |
Chae et al. | Yes | Yes | Cycle | Not stated | Clinical | IRMA: Jeil Japan | |
Penarrubia et al. | Yes | Yes | Cycle | <3 follicles 14 mm | Not applicable | Immunoenzymometric: Technicon | |
Creus et al. | Yes | Yes | Cycle | <3 follicles 14 mm | Not applicable | Immunoenzymometric: Technicon | |
Fabregues et al. | Yes | Yes | Cycle | <3 follicles 14 mm | Not applicable | IRMA: Immunotech | |
Mikkelsen et al. | Yes | No | Retrieval | Not stated | Clinical | Immuno I; Bayer | |
Van de Stege et al. | Yes | Yes | Cycle | <3 follicles 18 mm | Clinical | RIA: Elecsys | |
Nahum et al. | Yes | No | Cycle | <3 follicles 18 mm | Clinical | MEIA: Abbott | |
Esposito et al. | No | Yes | Cycle | Not stated | Live birth | Immuno I; Bayer | |
Fiçicioğlu et al. | Yes | Yes | Cycle | <2 follicles or <5 oocytes | Not applicable | ELISA: Serotec Ltd, UK | |
Chuang et al. | No | Yes | Cycle | Not stated | Ongoing | Chemilum. Immunoassay: Immulite | |
Yanushpolsky et al. | Yes | No | Retrieval | Not stated | Delivery | Techn. Imm. Syst.: Bayer | |
Erdem et al. | Yes | No | Cycle | <5 oocytes or <3 follicles 18 mm | Not applicable | Immunometric: Immulite 2000 | |
Akande et al. | Yes | Yes | Cycle | <3 follicles 18 mm | Not applicable | Immunofluorimetric: DELFIA | |
Kwee et al. | Yes | Yes | Cycle | Poor response <6 oocytes | Not applicable | Immunomet.: Amerlite/Delfia |
Accuracy of poor response prediction
The sensitivities and specificities, as well as the positive LRs of an abnormal test and the DORs for the prediction of poor ovarian response, as calculated from each study, are summarized in Table III, please see addendum. Sensitivity and specificity points, as plotted in Figure 4, were heterogeneous between studies (χ2-test statistic: P-value for sensitivity 0.001 and P-value for specificity 0.001). Therefore, calculation of one summary point estimate for sensitivity and specificity was not meaningful for overall judgement of accuracy. The Spearman correlation coefficient for sensitivity and specificity was −0.87, which was judged to be sufficient to estimate a summary ROC curve (Figure 4).
Author . | Cycles (n) . | FSH threshold value (IU/l) . | Prediction of poor response . | . | . | . | Pre-FSH probability (%) . | Post-FSH probability (%) . | Proportion of patients/cycles with abnormal FSH (%) . | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Sensitivity . | Specificity . | LR+ . | DOR . | . | . | . | |||
Toner et al. | 1478 | 10 | 0.72 | 0.40 | 1.2 | 1.6 | 7 | 10 | 61 | |||
15 | 0.45 | 0.75 | 1.8 | 2.4 | 7 | 15 | 27 | |||||
20 | 0.31 | 0.90 | 3.1 | 3.9 | 7 | 19 | 12 | |||||
25 | 0.22 | 0.96 | 5.5 | 6.7 | 7 | 29 | 5 | |||||
Ebrahim et al. | 111 | 11.5 | 0.80 | 0.93 | 11.4 | 49.0 | 5 | 38 | 11 | |||
Chan et al. | 144 | 4.5 | 0.94 | 0.33 | 1.4 | 8.2 | 13 | 17 | 71 | |||
6 | 0.72 | 0.71 | 2.5 | 6.3 | 13 | 27 | 35 | |||||
Fanchin et al. | 52 | 11 | 0.86 | 0.45 | 1.6 | 4.9 | 27 | 37 | 63 | |||
Smotrich et al. | 292 | 15 | 0.00 | 0.95 | 0 | 2.8 | 2 | 0 | 4 | |||
Pruksanonda et al. | 36 | 4 | 1.00 | 0.26 | 1.4 | 0.7 | 3 | 4 | 75 | |||
8 | 1.00 | 0.71 | 3.5 | 4.7 | 3 | 10 | 31 | |||||
Balasch et al. | 120 | NS | 0.50 | 0.81 | 2.6 | 4.3 | 33 | 56 | 29 | |||
Gurgan et al. | 637 | 10 | 0.47 | 0.82 | 2.6 | 2.9 | 16 | 33 | 23 | |||
13 | 0.37 | 0.92 | 4.6 | 4.2 | 16 | 47 | 12 | |||||
15 | 0.33 | 0.95 | 6.6 | 4.9 | 16 | 56 | 9 | |||||
20 | 0.11 | 0.99 | 11.0 | 4.4 | 16 | 66 | 3 | |||||
Sharif et al. | 344 | 5.4 | 0.91 | 0.12 | 1.0 | 1.3 | 9 | 9 | 89 | |||
10.8 | 0.31 | 0.93 | 4.4 | 5.9 | 9 | 31 | 9 | |||||
Evers et al. | 231 | 17 | 0.26 | 0.97 | 8.7 | 10.5 | 20 | 69 | 8 | |||
Ranieri et al. | 177 | 9.5 | 0.81 | 0.65 | 2.3 | 8.2 | 27 | 48 | 47 | |||
Penarrubia et al. | 80 | Pmodel > 50% | 0.83 | 0.73 | 3.1 | 4.5 | 25 | 52 | 41 | |||
Creus et al. | 120 | 9.45 | 0.65 | 0.81 | 3.4 | 11.0 | 33 | 67 | 35 | |||
Fabregues et al. | 80 | Pmodel > 50% | 0.28 | 0.91 | 3.1 | 3.8 | 35 | 62 | 16 | |||
Van der Stege et al. | 87 | 10 | 0.60 | 0.85 | 4.1 | 8.8 | 6 | 20 | 17 | |||
Nahum et al. | 272 | 10 | 0.22 | 0.93 | 3.2 | 3.8 | 14 | 33 | 9 | |||
Fiçicioğlu et al. | 58 | 7 | 0.76 | 0.76 | 3.1 | 9.9 | 43 | 70 | 47 | |||
Chuang et al. | 1045 | 10 | 0.32 | 0.87 | 2.4 | 3.1 | 9 | 19 | 15 | |||
Erdem et al. | 32 | logistic model | 0.63 | 0.81 | 3.3 | 9.7 | 50 | 77 | 41 | |||
Akande et al. | 536 | 6 | 0.88 | 0.50 | 1.7 | 6.9 | 6 | 10 | 53 | |||
9 | 0.59 | 0.87 | 4.5 | 9.7 | 6 | 22 | 16 | |||||
12 | 0.47 | 0.96 | 11.3 | 20.3 | 6 | 42 | 7 | |||||
Kwee et al. | 110 | 4 | 1.00 | 0.05 | 1.1 | 1.9 | 26 | 27 | 96 | |||
6 | 0.93 | 0.40 | 1.5 | 8.8 | 26 | 36 | 69 | |||||
8 | 0.72 | 0.78 | 3.3 | 9.2 | 26 | 54 | 35 | |||||
10 | 0.34 | 0.96 | 9.3 | 8.97 | 26 | 77 | 12 | |||||
12 | 0.24 | 1.00 | 21.4 | 28.5 | 26 | 89 | 8 |
Author . | Cycles (n) . | FSH threshold value (IU/l) . | Prediction of poor response . | . | . | . | Pre-FSH probability (%) . | Post-FSH probability (%) . | Proportion of patients/cycles with abnormal FSH (%) . | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Sensitivity . | Specificity . | LR+ . | DOR . | . | . | . | |||
Toner et al. | 1478 | 10 | 0.72 | 0.40 | 1.2 | 1.6 | 7 | 10 | 61 | |||
15 | 0.45 | 0.75 | 1.8 | 2.4 | 7 | 15 | 27 | |||||
20 | 0.31 | 0.90 | 3.1 | 3.9 | 7 | 19 | 12 | |||||
25 | 0.22 | 0.96 | 5.5 | 6.7 | 7 | 29 | 5 | |||||
Ebrahim et al. | 111 | 11.5 | 0.80 | 0.93 | 11.4 | 49.0 | 5 | 38 | 11 | |||
Chan et al. | 144 | 4.5 | 0.94 | 0.33 | 1.4 | 8.2 | 13 | 17 | 71 | |||
6 | 0.72 | 0.71 | 2.5 | 6.3 | 13 | 27 | 35 | |||||
Fanchin et al. | 52 | 11 | 0.86 | 0.45 | 1.6 | 4.9 | 27 | 37 | 63 | |||
Smotrich et al. | 292 | 15 | 0.00 | 0.95 | 0 | 2.8 | 2 | 0 | 4 | |||
Pruksanonda et al. | 36 | 4 | 1.00 | 0.26 | 1.4 | 0.7 | 3 | 4 | 75 | |||
8 | 1.00 | 0.71 | 3.5 | 4.7 | 3 | 10 | 31 | |||||
Balasch et al. | 120 | NS | 0.50 | 0.81 | 2.6 | 4.3 | 33 | 56 | 29 | |||
Gurgan et al. | 637 | 10 | 0.47 | 0.82 | 2.6 | 2.9 | 16 | 33 | 23 | |||
13 | 0.37 | 0.92 | 4.6 | 4.2 | 16 | 47 | 12 | |||||
15 | 0.33 | 0.95 | 6.6 | 4.9 | 16 | 56 | 9 | |||||
20 | 0.11 | 0.99 | 11.0 | 4.4 | 16 | 66 | 3 | |||||
Sharif et al. | 344 | 5.4 | 0.91 | 0.12 | 1.0 | 1.3 | 9 | 9 | 89 | |||
10.8 | 0.31 | 0.93 | 4.4 | 5.9 | 9 | 31 | 9 | |||||
Evers et al. | 231 | 17 | 0.26 | 0.97 | 8.7 | 10.5 | 20 | 69 | 8 | |||
Ranieri et al. | 177 | 9.5 | 0.81 | 0.65 | 2.3 | 8.2 | 27 | 48 | 47 | |||
Penarrubia et al. | 80 | Pmodel > 50% | 0.83 | 0.73 | 3.1 | 4.5 | 25 | 52 | 41 | |||
Creus et al. | 120 | 9.45 | 0.65 | 0.81 | 3.4 | 11.0 | 33 | 67 | 35 | |||
Fabregues et al. | 80 | Pmodel > 50% | 0.28 | 0.91 | 3.1 | 3.8 | 35 | 62 | 16 | |||
Van der Stege et al. | 87 | 10 | 0.60 | 0.85 | 4.1 | 8.8 | 6 | 20 | 17 | |||
Nahum et al. | 272 | 10 | 0.22 | 0.93 | 3.2 | 3.8 | 14 | 33 | 9 | |||
Fiçicioğlu et al. | 58 | 7 | 0.76 | 0.76 | 3.1 | 9.9 | 43 | 70 | 47 | |||
Chuang et al. | 1045 | 10 | 0.32 | 0.87 | 2.4 | 3.1 | 9 | 19 | 15 | |||
Erdem et al. | 32 | logistic model | 0.63 | 0.81 | 3.3 | 9.7 | 50 | 77 | 41 | |||
Akande et al. | 536 | 6 | 0.88 | 0.50 | 1.7 | 6.9 | 6 | 10 | 53 | |||
9 | 0.59 | 0.87 | 4.5 | 9.7 | 6 | 22 | 16 | |||||
12 | 0.47 | 0.96 | 11.3 | 20.3 | 6 | 42 | 7 | |||||
Kwee et al. | 110 | 4 | 1.00 | 0.05 | 1.1 | 1.9 | 26 | 27 | 96 | |||
6 | 0.93 | 0.40 | 1.5 | 8.8 | 26 | 36 | 69 | |||||
8 | 0.72 | 0.78 | 3.3 | 9.2 | 26 | 54 | 35 | |||||
10 | 0.34 | 0.96 | 9.3 | 8.97 | 26 | 77 | 12 | |||||
12 | 0.24 | 1.00 | 21.4 | 28.5 | 26 | 89 | 8 |
DOR, diagnostic odds ratio; LR+, likelihood ratio for a positive test result; NS, not specified.
If a study reported on multiple threshold values, data for all threshold values are shown.
Author . | Cycles (n) . | FSH threshold value (IU/l) . | Prediction of poor response . | . | . | . | Pre-FSH probability (%) . | Post-FSH probability (%) . | Proportion of patients/cycles with abnormal FSH (%) . | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Sensitivity . | Specificity . | LR+ . | DOR . | . | . | . | |||
Toner et al. | 1478 | 10 | 0.72 | 0.40 | 1.2 | 1.6 | 7 | 10 | 61 | |||
15 | 0.45 | 0.75 | 1.8 | 2.4 | 7 | 15 | 27 | |||||
20 | 0.31 | 0.90 | 3.1 | 3.9 | 7 | 19 | 12 | |||||
25 | 0.22 | 0.96 | 5.5 | 6.7 | 7 | 29 | 5 | |||||
Ebrahim et al. | 111 | 11.5 | 0.80 | 0.93 | 11.4 | 49.0 | 5 | 38 | 11 | |||
Chan et al. | 144 | 4.5 | 0.94 | 0.33 | 1.4 | 8.2 | 13 | 17 | 71 | |||
6 | 0.72 | 0.71 | 2.5 | 6.3 | 13 | 27 | 35 | |||||
Fanchin et al. | 52 | 11 | 0.86 | 0.45 | 1.6 | 4.9 | 27 | 37 | 63 | |||
Smotrich et al. | 292 | 15 | 0.00 | 0.95 | 0 | 2.8 | 2 | 0 | 4 | |||
Pruksanonda et al. | 36 | 4 | 1.00 | 0.26 | 1.4 | 0.7 | 3 | 4 | 75 | |||
8 | 1.00 | 0.71 | 3.5 | 4.7 | 3 | 10 | 31 | |||||
Balasch et al. | 120 | NS | 0.50 | 0.81 | 2.6 | 4.3 | 33 | 56 | 29 | |||
Gurgan et al. | 637 | 10 | 0.47 | 0.82 | 2.6 | 2.9 | 16 | 33 | 23 | |||
13 | 0.37 | 0.92 | 4.6 | 4.2 | 16 | 47 | 12 | |||||
15 | 0.33 | 0.95 | 6.6 | 4.9 | 16 | 56 | 9 | |||||
20 | 0.11 | 0.99 | 11.0 | 4.4 | 16 | 66 | 3 | |||||
Sharif et al. | 344 | 5.4 | 0.91 | 0.12 | 1.0 | 1.3 | 9 | 9 | 89 | |||
10.8 | 0.31 | 0.93 | 4.4 | 5.9 | 9 | 31 | 9 | |||||
Evers et al. | 231 | 17 | 0.26 | 0.97 | 8.7 | 10.5 | 20 | 69 | 8 | |||
Ranieri et al. | 177 | 9.5 | 0.81 | 0.65 | 2.3 | 8.2 | 27 | 48 | 47 | |||
Penarrubia et al. | 80 | Pmodel > 50% | 0.83 | 0.73 | 3.1 | 4.5 | 25 | 52 | 41 | |||
Creus et al. | 120 | 9.45 | 0.65 | 0.81 | 3.4 | 11.0 | 33 | 67 | 35 | |||
Fabregues et al. | 80 | Pmodel > 50% | 0.28 | 0.91 | 3.1 | 3.8 | 35 | 62 | 16 | |||
Van der Stege et al. | 87 | 10 | 0.60 | 0.85 | 4.1 | 8.8 | 6 | 20 | 17 | |||
Nahum et al. | 272 | 10 | 0.22 | 0.93 | 3.2 | 3.8 | 14 | 33 | 9 | |||
Fiçicioğlu et al. | 58 | 7 | 0.76 | 0.76 | 3.1 | 9.9 | 43 | 70 | 47 | |||
Chuang et al. | 1045 | 10 | 0.32 | 0.87 | 2.4 | 3.1 | 9 | 19 | 15 | |||
Erdem et al. | 32 | logistic model | 0.63 | 0.81 | 3.3 | 9.7 | 50 | 77 | 41 | |||
Akande et al. | 536 | 6 | 0.88 | 0.50 | 1.7 | 6.9 | 6 | 10 | 53 | |||
9 | 0.59 | 0.87 | 4.5 | 9.7 | 6 | 22 | 16 | |||||
12 | 0.47 | 0.96 | 11.3 | 20.3 | 6 | 42 | 7 | |||||
Kwee et al. | 110 | 4 | 1.00 | 0.05 | 1.1 | 1.9 | 26 | 27 | 96 | |||
6 | 0.93 | 0.40 | 1.5 | 8.8 | 26 | 36 | 69 | |||||
8 | 0.72 | 0.78 | 3.3 | 9.2 | 26 | 54 | 35 | |||||
10 | 0.34 | 0.96 | 9.3 | 8.97 | 26 | 77 | 12 | |||||
12 | 0.24 | 1.00 | 21.4 | 28.5 | 26 | 89 | 8 |
Author . | Cycles (n) . | FSH threshold value (IU/l) . | Prediction of poor response . | . | . | . | Pre-FSH probability (%) . | Post-FSH probability (%) . | Proportion of patients/cycles with abnormal FSH (%) . | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Sensitivity . | Specificity . | LR+ . | DOR . | . | . | . | |||
Toner et al. | 1478 | 10 | 0.72 | 0.40 | 1.2 | 1.6 | 7 | 10 | 61 | |||
15 | 0.45 | 0.75 | 1.8 | 2.4 | 7 | 15 | 27 | |||||
20 | 0.31 | 0.90 | 3.1 | 3.9 | 7 | 19 | 12 | |||||
25 | 0.22 | 0.96 | 5.5 | 6.7 | 7 | 29 | 5 | |||||
Ebrahim et al. | 111 | 11.5 | 0.80 | 0.93 | 11.4 | 49.0 | 5 | 38 | 11 | |||
Chan et al. | 144 | 4.5 | 0.94 | 0.33 | 1.4 | 8.2 | 13 | 17 | 71 | |||
6 | 0.72 | 0.71 | 2.5 | 6.3 | 13 | 27 | 35 | |||||
Fanchin et al. | 52 | 11 | 0.86 | 0.45 | 1.6 | 4.9 | 27 | 37 | 63 | |||
Smotrich et al. | 292 | 15 | 0.00 | 0.95 | 0 | 2.8 | 2 | 0 | 4 | |||
Pruksanonda et al. | 36 | 4 | 1.00 | 0.26 | 1.4 | 0.7 | 3 | 4 | 75 | |||
8 | 1.00 | 0.71 | 3.5 | 4.7 | 3 | 10 | 31 | |||||
Balasch et al. | 120 | NS | 0.50 | 0.81 | 2.6 | 4.3 | 33 | 56 | 29 | |||
Gurgan et al. | 637 | 10 | 0.47 | 0.82 | 2.6 | 2.9 | 16 | 33 | 23 | |||
13 | 0.37 | 0.92 | 4.6 | 4.2 | 16 | 47 | 12 | |||||
15 | 0.33 | 0.95 | 6.6 | 4.9 | 16 | 56 | 9 | |||||
20 | 0.11 | 0.99 | 11.0 | 4.4 | 16 | 66 | 3 | |||||
Sharif et al. | 344 | 5.4 | 0.91 | 0.12 | 1.0 | 1.3 | 9 | 9 | 89 | |||
10.8 | 0.31 | 0.93 | 4.4 | 5.9 | 9 | 31 | 9 | |||||
Evers et al. | 231 | 17 | 0.26 | 0.97 | 8.7 | 10.5 | 20 | 69 | 8 | |||
Ranieri et al. | 177 | 9.5 | 0.81 | 0.65 | 2.3 | 8.2 | 27 | 48 | 47 | |||
Penarrubia et al. | 80 | Pmodel > 50% | 0.83 | 0.73 | 3.1 | 4.5 | 25 | 52 | 41 | |||
Creus et al. | 120 | 9.45 | 0.65 | 0.81 | 3.4 | 11.0 | 33 | 67 | 35 | |||
Fabregues et al. | 80 | Pmodel > 50% | 0.28 | 0.91 | 3.1 | 3.8 | 35 | 62 | 16 | |||
Van der Stege et al. | 87 | 10 | 0.60 | 0.85 | 4.1 | 8.8 | 6 | 20 | 17 | |||
Nahum et al. | 272 | 10 | 0.22 | 0.93 | 3.2 | 3.8 | 14 | 33 | 9 | |||
Fiçicioğlu et al. | 58 | 7 | 0.76 | 0.76 | 3.1 | 9.9 | 43 | 70 | 47 | |||
Chuang et al. | 1045 | 10 | 0.32 | 0.87 | 2.4 | 3.1 | 9 | 19 | 15 | |||
Erdem et al. | 32 | logistic model | 0.63 | 0.81 | 3.3 | 9.7 | 50 | 77 | 41 | |||
Akande et al. | 536 | 6 | 0.88 | 0.50 | 1.7 | 6.9 | 6 | 10 | 53 | |||
9 | 0.59 | 0.87 | 4.5 | 9.7 | 6 | 22 | 16 | |||||
12 | 0.47 | 0.96 | 11.3 | 20.3 | 6 | 42 | 7 | |||||
Kwee et al. | 110 | 4 | 1.00 | 0.05 | 1.1 | 1.9 | 26 | 27 | 96 | |||
6 | 0.93 | 0.40 | 1.5 | 8.8 | 26 | 36 | 69 | |||||
8 | 0.72 | 0.78 | 3.3 | 9.2 | 26 | 54 | 35 | |||||
10 | 0.34 | 0.96 | 9.3 | 8.97 | 26 | 77 | 12 | |||||
12 | 0.24 | 1.00 | 21.4 | 28.5 | 26 | 89 | 8 |
DOR, diagnostic odds ratio; LR+, likelihood ratio for a positive test result; NS, not specified.
If a study reported on multiple threshold values, data for all threshold values are shown.
Accuracy of non-pregnancy prediction
Sensitivities and specificities for the prediction of non-pregnancy, as calculated from each study, are summarized in Table IV, please see addendum. Again, sensitivity and specificity points plotted in Figure 5 were heterogeneous between studies (χ2-test statistic: P-value for sensitivity 0.001 and P-value for specificity 0.001). The Spearman correlation coefficient for sensitivity and specificity was −0.82 and as such was sufficient to estimate a summary ROC curve (Figure 5).
Author . | Cycles (n) . | FSH threshold value (IU/l) . | Prediction of non-pregnancy . | . | . | . | Pre-FSH probability (%) . | Post-FSH probability (%) . | Proportion of patients/cycles with abnormal FSH (%) . | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Sensitivity . | Specificity . | LR+ . | DOR . | . | . | . | |||
Scott et al. | 758 | 5 | 0.85 | 0.20 | 1.1 | 1.5 | 86 | 87 | 85 | |||
10 | 0.65 | 0.53 | 1.4 | 1.97 | 86 | 90 | 62 | |||||
15 | 0.31 | 0.84 | 1.9 | 2.4 | 86 | 92 | 29 | |||||
25 | 0.08 | 0.98 | 4.6 | 4.9 | 86 | 96 | 7 | |||||
Padilla et al. | 91 | 15 | 0.40 | 0.69 | 1.3 | 1.5 | 68 | 73 | 37 | |||
20 | 0.23 | 0.90 | 2.3 | 2.5 | 68 | 83 | 19 | |||||
Toner et al. | 1478 | 10 | 0.61 | 0.43 | 1.1 | 1.2 | 83 | 84 | 60 | |||
15 | 0.29 | 0.89 | 2.6 | 1.9 | 83 | 93 | 25 | |||||
20 | 0.13 | 0.95 | 2.6 | 2.5 | 83 | 93 | 10 | |||||
25 | 0.07 | 1.00 | 12.0 | 16.5 | 83 | 98 | 4 | |||||
Khalifa et al. | 1110 | 10 | 0.58 | 0.44 | 1.0 | 1.1 | 83 | 84 | 58 | |||
15 | 0.28 | 0.82 | 4.4 | 1.7 | 83 | 88 | 26 | |||||
20 | 0.08 | 0.93 | 13.6 | 1.1 | 83 | 84 | 9 | |||||
25 | 0.06 | 1.00 | 11.9 | 12.6 | 83 | 98 | 5 | |||||
Ebrahim et al. | 111 | 11.5 | 0.12 | 0.94 | 2.0 | 2.1 | 85 | 92 | 11 | |||
Chan et al. | 144 | 4.5 | 0.73 | 0.54 | 1.6 | 3.2 | 90 | 94 | 71 | |||
6 | 0.37 | 0.87 | 2.8 | 3.9 | 90 | 96 | 35 | |||||
Huyser et al. | 139 | 11.7 | 0.16 | 0.96 | 4.0 | 4.3 | 83 | 95 | 14 | |||
Licciardi et al. | 452 | 17 | 0.19 | 0.91 | 2.1 | 2.3 | 81 | 90 | 17 | |||
Smotrich et al. | 292 | 15 | 0.07 | 1.00 | 7.6 | 8.1 | 65 | 93 | 4 | |||
Martin et al. | 1868 | 20 | 0.03 | 1.00 | 10.1 | 10.4 | 84 | 98 | 3 | |||
Pruksanonda et al. | 36 | 4 | 0.78 | 0.50 | 1.6 | 3.6 | 89 | 93 | 75 | |||
8 | 0.34 | 1.00 | 2.1 | 2.7 | 89 | 94 | 31 | |||||
Csemiczky et al. | 53 | 7 | 0.26 | 1.00 | 6.8 | 8.6 | 58 | 90 | 15 | |||
Gurgan et al. | 637 | 10 | 0.24 | 0.80 | 1.2 | 1.2 | 81 | 84 | 23 | |||
13 | 0.14 | 0.95 | 2.8 | 3.1 | 81 | 92 | 12 | |||||
15 | 0.11 | 0.97 | 4.3 | 4.6 | 81 | 95 | 8 | |||||
20 | 0.03 | 1.00 | 4.4 | 4.5 | 81 | 95 | 3 | |||||
Sharif et al. | 344 | 10.8 | 0.12 | 0.97 | 4.0 | 4.6 | 70 | 90 | 9 | |||
Chang et al. | 149 | 10 | 0.13 | 0.97 | 4.3 | 5.5 | 74 | 92 | 10 | |||
Evers et al. | 231 | 17 | 0.09 | 1.00 | 3.2 | 3.4 | 86 | 95 | 8 | |||
Hall et al. | 110 | 9.4 | 0.77 | 0.27 | 1.1 | 1.95 | 39 | 40 | 75 | |||
11.2 | 0.60 | 0.57 | 1.4 | 2.0 | 39 | 47 | 50 | |||||
13.3 | 0.33 | 0.81 | 1.7 | 2.0 | 39 | 52 | 25 | |||||
Bassil et al. | 83 | 10 | 0.45 | 0.10 | 0.5 | 0.1 | 92 | 85 | 49 | |||
15 | 0.32 | 0.50 | 0.6 | 0.5 | 92 | 88 | 34 | |||||
20 | 0.09 | 0.80 | 0.5 | 0.4 | 92 | 85 | 10 | |||||
25 | 0.04 | 0.90 | 0.4 | 0.4 | 92 | 83 | 5 | |||||
30 | 0.03 | 1.00 | 0.5 | 0.5 | 92 | 83 | 3 | |||||
Jinno et al. | 271 | 15 | 0.05 | 0.96 | 1.1 | 1.1 | 65 | 67 | 4 | |||
Bancsi et al. | 435 | 15 | 0.06 | 1.00 | 3.9 | 4.0 | 86 | 96 | 5 | |||
Chae et al. | 118 | 8.5 | 0.46 | 0.85 | 3.0 | 4.6 | 89 | 96 | 42 | |||
Mikkelsen et al. | 130 | 15 | 0.34 | 0.73 | 1.3 | 1.4 | 88 | 91 | 33 | |||
Van der Stege et al. | 87 | 10 | 0.18 | 0.85 | 1.2 | 1.2 | 70 | 73 | 17 | |||
Nahum et al. | 272 | 10 | 0.11 | 0.96 | 2.7 | 2.9 | 65 | 83 | 9 | |||
Esposito et al. | 293 | 10 | 0.19 | 0.91 | 2.1 | 2.3 | 74 | 85 | 16 | |||
11.4 | 0.11 | 1.00 | 8.9 | 9.9 | 74 | 96 | 8 | |||||
Chuang et al. | 1045 | 10 | 0.18 | 0.91 | 2.0 | 2.2 | 70 | 82 | 15 | |||
Yanushpolsky et al. | 483 | 10 | 0.22 | 0.88 | 1.9 | 2.1 | 62 | 75 | 18 |
Author . | Cycles (n) . | FSH threshold value (IU/l) . | Prediction of non-pregnancy . | . | . | . | Pre-FSH probability (%) . | Post-FSH probability (%) . | Proportion of patients/cycles with abnormal FSH (%) . | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Sensitivity . | Specificity . | LR+ . | DOR . | . | . | . | |||
Scott et al. | 758 | 5 | 0.85 | 0.20 | 1.1 | 1.5 | 86 | 87 | 85 | |||
10 | 0.65 | 0.53 | 1.4 | 1.97 | 86 | 90 | 62 | |||||
15 | 0.31 | 0.84 | 1.9 | 2.4 | 86 | 92 | 29 | |||||
25 | 0.08 | 0.98 | 4.6 | 4.9 | 86 | 96 | 7 | |||||
Padilla et al. | 91 | 15 | 0.40 | 0.69 | 1.3 | 1.5 | 68 | 73 | 37 | |||
20 | 0.23 | 0.90 | 2.3 | 2.5 | 68 | 83 | 19 | |||||
Toner et al. | 1478 | 10 | 0.61 | 0.43 | 1.1 | 1.2 | 83 | 84 | 60 | |||
15 | 0.29 | 0.89 | 2.6 | 1.9 | 83 | 93 | 25 | |||||
20 | 0.13 | 0.95 | 2.6 | 2.5 | 83 | 93 | 10 | |||||
25 | 0.07 | 1.00 | 12.0 | 16.5 | 83 | 98 | 4 | |||||
Khalifa et al. | 1110 | 10 | 0.58 | 0.44 | 1.0 | 1.1 | 83 | 84 | 58 | |||
15 | 0.28 | 0.82 | 4.4 | 1.7 | 83 | 88 | 26 | |||||
20 | 0.08 | 0.93 | 13.6 | 1.1 | 83 | 84 | 9 | |||||
25 | 0.06 | 1.00 | 11.9 | 12.6 | 83 | 98 | 5 | |||||
Ebrahim et al. | 111 | 11.5 | 0.12 | 0.94 | 2.0 | 2.1 | 85 | 92 | 11 | |||
Chan et al. | 144 | 4.5 | 0.73 | 0.54 | 1.6 | 3.2 | 90 | 94 | 71 | |||
6 | 0.37 | 0.87 | 2.8 | 3.9 | 90 | 96 | 35 | |||||
Huyser et al. | 139 | 11.7 | 0.16 | 0.96 | 4.0 | 4.3 | 83 | 95 | 14 | |||
Licciardi et al. | 452 | 17 | 0.19 | 0.91 | 2.1 | 2.3 | 81 | 90 | 17 | |||
Smotrich et al. | 292 | 15 | 0.07 | 1.00 | 7.6 | 8.1 | 65 | 93 | 4 | |||
Martin et al. | 1868 | 20 | 0.03 | 1.00 | 10.1 | 10.4 | 84 | 98 | 3 | |||
Pruksanonda et al. | 36 | 4 | 0.78 | 0.50 | 1.6 | 3.6 | 89 | 93 | 75 | |||
8 | 0.34 | 1.00 | 2.1 | 2.7 | 89 | 94 | 31 | |||||
Csemiczky et al. | 53 | 7 | 0.26 | 1.00 | 6.8 | 8.6 | 58 | 90 | 15 | |||
Gurgan et al. | 637 | 10 | 0.24 | 0.80 | 1.2 | 1.2 | 81 | 84 | 23 | |||
13 | 0.14 | 0.95 | 2.8 | 3.1 | 81 | 92 | 12 | |||||
15 | 0.11 | 0.97 | 4.3 | 4.6 | 81 | 95 | 8 | |||||
20 | 0.03 | 1.00 | 4.4 | 4.5 | 81 | 95 | 3 | |||||
Sharif et al. | 344 | 10.8 | 0.12 | 0.97 | 4.0 | 4.6 | 70 | 90 | 9 | |||
Chang et al. | 149 | 10 | 0.13 | 0.97 | 4.3 | 5.5 | 74 | 92 | 10 | |||
Evers et al. | 231 | 17 | 0.09 | 1.00 | 3.2 | 3.4 | 86 | 95 | 8 | |||
Hall et al. | 110 | 9.4 | 0.77 | 0.27 | 1.1 | 1.95 | 39 | 40 | 75 | |||
11.2 | 0.60 | 0.57 | 1.4 | 2.0 | 39 | 47 | 50 | |||||
13.3 | 0.33 | 0.81 | 1.7 | 2.0 | 39 | 52 | 25 | |||||
Bassil et al. | 83 | 10 | 0.45 | 0.10 | 0.5 | 0.1 | 92 | 85 | 49 | |||
15 | 0.32 | 0.50 | 0.6 | 0.5 | 92 | 88 | 34 | |||||
20 | 0.09 | 0.80 | 0.5 | 0.4 | 92 | 85 | 10 | |||||
25 | 0.04 | 0.90 | 0.4 | 0.4 | 92 | 83 | 5 | |||||
30 | 0.03 | 1.00 | 0.5 | 0.5 | 92 | 83 | 3 | |||||
Jinno et al. | 271 | 15 | 0.05 | 0.96 | 1.1 | 1.1 | 65 | 67 | 4 | |||
Bancsi et al. | 435 | 15 | 0.06 | 1.00 | 3.9 | 4.0 | 86 | 96 | 5 | |||
Chae et al. | 118 | 8.5 | 0.46 | 0.85 | 3.0 | 4.6 | 89 | 96 | 42 | |||
Mikkelsen et al. | 130 | 15 | 0.34 | 0.73 | 1.3 | 1.4 | 88 | 91 | 33 | |||
Van der Stege et al. | 87 | 10 | 0.18 | 0.85 | 1.2 | 1.2 | 70 | 73 | 17 | |||
Nahum et al. | 272 | 10 | 0.11 | 0.96 | 2.7 | 2.9 | 65 | 83 | 9 | |||
Esposito et al. | 293 | 10 | 0.19 | 0.91 | 2.1 | 2.3 | 74 | 85 | 16 | |||
11.4 | 0.11 | 1.00 | 8.9 | 9.9 | 74 | 96 | 8 | |||||
Chuang et al. | 1045 | 10 | 0.18 | 0.91 | 2.0 | 2.2 | 70 | 82 | 15 | |||
Yanushpolsky et al. | 483 | 10 | 0.22 | 0.88 | 1.9 | 2.1 | 62 | 75 | 18 |
DOR, diagnostic odds ratio; LR+, likelihood ratio for a positive test result.
If a study reported on multiple threshold values, data for all threshold values are shown.
Author . | Cycles (n) . | FSH threshold value (IU/l) . | Prediction of non-pregnancy . | . | . | . | Pre-FSH probability (%) . | Post-FSH probability (%) . | Proportion of patients/cycles with abnormal FSH (%) . | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Sensitivity . | Specificity . | LR+ . | DOR . | . | . | . | |||
Scott et al. | 758 | 5 | 0.85 | 0.20 | 1.1 | 1.5 | 86 | 87 | 85 | |||
10 | 0.65 | 0.53 | 1.4 | 1.97 | 86 | 90 | 62 | |||||
15 | 0.31 | 0.84 | 1.9 | 2.4 | 86 | 92 | 29 | |||||
25 | 0.08 | 0.98 | 4.6 | 4.9 | 86 | 96 | 7 | |||||
Padilla et al. | 91 | 15 | 0.40 | 0.69 | 1.3 | 1.5 | 68 | 73 | 37 | |||
20 | 0.23 | 0.90 | 2.3 | 2.5 | 68 | 83 | 19 | |||||
Toner et al. | 1478 | 10 | 0.61 | 0.43 | 1.1 | 1.2 | 83 | 84 | 60 | |||
15 | 0.29 | 0.89 | 2.6 | 1.9 | 83 | 93 | 25 | |||||
20 | 0.13 | 0.95 | 2.6 | 2.5 | 83 | 93 | 10 | |||||
25 | 0.07 | 1.00 | 12.0 | 16.5 | 83 | 98 | 4 | |||||
Khalifa et al. | 1110 | 10 | 0.58 | 0.44 | 1.0 | 1.1 | 83 | 84 | 58 | |||
15 | 0.28 | 0.82 | 4.4 | 1.7 | 83 | 88 | 26 | |||||
20 | 0.08 | 0.93 | 13.6 | 1.1 | 83 | 84 | 9 | |||||
25 | 0.06 | 1.00 | 11.9 | 12.6 | 83 | 98 | 5 | |||||
Ebrahim et al. | 111 | 11.5 | 0.12 | 0.94 | 2.0 | 2.1 | 85 | 92 | 11 | |||
Chan et al. | 144 | 4.5 | 0.73 | 0.54 | 1.6 | 3.2 | 90 | 94 | 71 | |||
6 | 0.37 | 0.87 | 2.8 | 3.9 | 90 | 96 | 35 | |||||
Huyser et al. | 139 | 11.7 | 0.16 | 0.96 | 4.0 | 4.3 | 83 | 95 | 14 | |||
Licciardi et al. | 452 | 17 | 0.19 | 0.91 | 2.1 | 2.3 | 81 | 90 | 17 | |||
Smotrich et al. | 292 | 15 | 0.07 | 1.00 | 7.6 | 8.1 | 65 | 93 | 4 | |||
Martin et al. | 1868 | 20 | 0.03 | 1.00 | 10.1 | 10.4 | 84 | 98 | 3 | |||
Pruksanonda et al. | 36 | 4 | 0.78 | 0.50 | 1.6 | 3.6 | 89 | 93 | 75 | |||
8 | 0.34 | 1.00 | 2.1 | 2.7 | 89 | 94 | 31 | |||||
Csemiczky et al. | 53 | 7 | 0.26 | 1.00 | 6.8 | 8.6 | 58 | 90 | 15 | |||
Gurgan et al. | 637 | 10 | 0.24 | 0.80 | 1.2 | 1.2 | 81 | 84 | 23 | |||
13 | 0.14 | 0.95 | 2.8 | 3.1 | 81 | 92 | 12 | |||||
15 | 0.11 | 0.97 | 4.3 | 4.6 | 81 | 95 | 8 | |||||
20 | 0.03 | 1.00 | 4.4 | 4.5 | 81 | 95 | 3 | |||||
Sharif et al. | 344 | 10.8 | 0.12 | 0.97 | 4.0 | 4.6 | 70 | 90 | 9 | |||
Chang et al. | 149 | 10 | 0.13 | 0.97 | 4.3 | 5.5 | 74 | 92 | 10 | |||
Evers et al. | 231 | 17 | 0.09 | 1.00 | 3.2 | 3.4 | 86 | 95 | 8 | |||
Hall et al. | 110 | 9.4 | 0.77 | 0.27 | 1.1 | 1.95 | 39 | 40 | 75 | |||
11.2 | 0.60 | 0.57 | 1.4 | 2.0 | 39 | 47 | 50 | |||||
13.3 | 0.33 | 0.81 | 1.7 | 2.0 | 39 | 52 | 25 | |||||
Bassil et al. | 83 | 10 | 0.45 | 0.10 | 0.5 | 0.1 | 92 | 85 | 49 | |||
15 | 0.32 | 0.50 | 0.6 | 0.5 | 92 | 88 | 34 | |||||
20 | 0.09 | 0.80 | 0.5 | 0.4 | 92 | 85 | 10 | |||||
25 | 0.04 | 0.90 | 0.4 | 0.4 | 92 | 83 | 5 | |||||
30 | 0.03 | 1.00 | 0.5 | 0.5 | 92 | 83 | 3 | |||||
Jinno et al. | 271 | 15 | 0.05 | 0.96 | 1.1 | 1.1 | 65 | 67 | 4 | |||
Bancsi et al. | 435 | 15 | 0.06 | 1.00 | 3.9 | 4.0 | 86 | 96 | 5 | |||
Chae et al. | 118 | 8.5 | 0.46 | 0.85 | 3.0 | 4.6 | 89 | 96 | 42 | |||
Mikkelsen et al. | 130 | 15 | 0.34 | 0.73 | 1.3 | 1.4 | 88 | 91 | 33 | |||
Van der Stege et al. | 87 | 10 | 0.18 | 0.85 | 1.2 | 1.2 | 70 | 73 | 17 | |||
Nahum et al. | 272 | 10 | 0.11 | 0.96 | 2.7 | 2.9 | 65 | 83 | 9 | |||
Esposito et al. | 293 | 10 | 0.19 | 0.91 | 2.1 | 2.3 | 74 | 85 | 16 | |||
11.4 | 0.11 | 1.00 | 8.9 | 9.9 | 74 | 96 | 8 | |||||
Chuang et al. | 1045 | 10 | 0.18 | 0.91 | 2.0 | 2.2 | 70 | 82 | 15 | |||
Yanushpolsky et al. | 483 | 10 | 0.22 | 0.88 | 1.9 | 2.1 | 62 | 75 | 18 |
Author . | Cycles (n) . | FSH threshold value (IU/l) . | Prediction of non-pregnancy . | . | . | . | Pre-FSH probability (%) . | Post-FSH probability (%) . | Proportion of patients/cycles with abnormal FSH (%) . | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Sensitivity . | Specificity . | LR+ . | DOR . | . | . | . | |||
Scott et al. | 758 | 5 | 0.85 | 0.20 | 1.1 | 1.5 | 86 | 87 | 85 | |||
10 | 0.65 | 0.53 | 1.4 | 1.97 | 86 | 90 | 62 | |||||
15 | 0.31 | 0.84 | 1.9 | 2.4 | 86 | 92 | 29 | |||||
25 | 0.08 | 0.98 | 4.6 | 4.9 | 86 | 96 | 7 | |||||
Padilla et al. | 91 | 15 | 0.40 | 0.69 | 1.3 | 1.5 | 68 | 73 | 37 | |||
20 | 0.23 | 0.90 | 2.3 | 2.5 | 68 | 83 | 19 | |||||
Toner et al. | 1478 | 10 | 0.61 | 0.43 | 1.1 | 1.2 | 83 | 84 | 60 | |||
15 | 0.29 | 0.89 | 2.6 | 1.9 | 83 | 93 | 25 | |||||
20 | 0.13 | 0.95 | 2.6 | 2.5 | 83 | 93 | 10 | |||||
25 | 0.07 | 1.00 | 12.0 | 16.5 | 83 | 98 | 4 | |||||
Khalifa et al. | 1110 | 10 | 0.58 | 0.44 | 1.0 | 1.1 | 83 | 84 | 58 | |||
15 | 0.28 | 0.82 | 4.4 | 1.7 | 83 | 88 | 26 | |||||
20 | 0.08 | 0.93 | 13.6 | 1.1 | 83 | 84 | 9 | |||||
25 | 0.06 | 1.00 | 11.9 | 12.6 | 83 | 98 | 5 | |||||
Ebrahim et al. | 111 | 11.5 | 0.12 | 0.94 | 2.0 | 2.1 | 85 | 92 | 11 | |||
Chan et al. | 144 | 4.5 | 0.73 | 0.54 | 1.6 | 3.2 | 90 | 94 | 71 | |||
6 | 0.37 | 0.87 | 2.8 | 3.9 | 90 | 96 | 35 | |||||
Huyser et al. | 139 | 11.7 | 0.16 | 0.96 | 4.0 | 4.3 | 83 | 95 | 14 | |||
Licciardi et al. | 452 | 17 | 0.19 | 0.91 | 2.1 | 2.3 | 81 | 90 | 17 | |||
Smotrich et al. | 292 | 15 | 0.07 | 1.00 | 7.6 | 8.1 | 65 | 93 | 4 | |||
Martin et al. | 1868 | 20 | 0.03 | 1.00 | 10.1 | 10.4 | 84 | 98 | 3 | |||
Pruksanonda et al. | 36 | 4 | 0.78 | 0.50 | 1.6 | 3.6 | 89 | 93 | 75 | |||
8 | 0.34 | 1.00 | 2.1 | 2.7 | 89 | 94 | 31 | |||||
Csemiczky et al. | 53 | 7 | 0.26 | 1.00 | 6.8 | 8.6 | 58 | 90 | 15 | |||
Gurgan et al. | 637 | 10 | 0.24 | 0.80 | 1.2 | 1.2 | 81 | 84 | 23 | |||
13 | 0.14 | 0.95 | 2.8 | 3.1 | 81 | 92 | 12 | |||||
15 | 0.11 | 0.97 | 4.3 | 4.6 | 81 | 95 | 8 | |||||
20 | 0.03 | 1.00 | 4.4 | 4.5 | 81 | 95 | 3 | |||||
Sharif et al. | 344 | 10.8 | 0.12 | 0.97 | 4.0 | 4.6 | 70 | 90 | 9 | |||
Chang et al. | 149 | 10 | 0.13 | 0.97 | 4.3 | 5.5 | 74 | 92 | 10 | |||
Evers et al. | 231 | 17 | 0.09 | 1.00 | 3.2 | 3.4 | 86 | 95 | 8 | |||
Hall et al. | 110 | 9.4 | 0.77 | 0.27 | 1.1 | 1.95 | 39 | 40 | 75 | |||
11.2 | 0.60 | 0.57 | 1.4 | 2.0 | 39 | 47 | 50 | |||||
13.3 | 0.33 | 0.81 | 1.7 | 2.0 | 39 | 52 | 25 | |||||
Bassil et al. | 83 | 10 | 0.45 | 0.10 | 0.5 | 0.1 | 92 | 85 | 49 | |||
15 | 0.32 | 0.50 | 0.6 | 0.5 | 92 | 88 | 34 | |||||
20 | 0.09 | 0.80 | 0.5 | 0.4 | 92 | 85 | 10 | |||||
25 | 0.04 | 0.90 | 0.4 | 0.4 | 92 | 83 | 5 | |||||
30 | 0.03 | 1.00 | 0.5 | 0.5 | 92 | 83 | 3 | |||||
Jinno et al. | 271 | 15 | 0.05 | 0.96 | 1.1 | 1.1 | 65 | 67 | 4 | |||
Bancsi et al. | 435 | 15 | 0.06 | 1.00 | 3.9 | 4.0 | 86 | 96 | 5 | |||
Chae et al. | 118 | 8.5 | 0.46 | 0.85 | 3.0 | 4.6 | 89 | 96 | 42 | |||
Mikkelsen et al. | 130 | 15 | 0.34 | 0.73 | 1.3 | 1.4 | 88 | 91 | 33 | |||
Van der Stege et al. | 87 | 10 | 0.18 | 0.85 | 1.2 | 1.2 | 70 | 73 | 17 | |||
Nahum et al. | 272 | 10 | 0.11 | 0.96 | 2.7 | 2.9 | 65 | 83 | 9 | |||
Esposito et al. | 293 | 10 | 0.19 | 0.91 | 2.1 | 2.3 | 74 | 85 | 16 | |||
11.4 | 0.11 | 1.00 | 8.9 | 9.9 | 74 | 96 | 8 | |||||
Chuang et al. | 1045 | 10 | 0.18 | 0.91 | 2.0 | 2.2 | 70 | 82 | 15 | |||
Yanushpolsky et al. | 483 | 10 | 0.22 | 0.88 | 1.9 | 2.1 | 62 | 75 | 18 |
DOR, diagnostic odds ratio; LR+, likelihood ratio for a positive test result.
If a study reported on multiple threshold values, data for all threshold values are shown.
Clinical value
Based on the summary ROC curves depicted in Figure 4, a range of positive LRs was calculated and for each ratio the pre-FSH test probability of poor response and non-pregnancy was converted into a post-FSH-test probability. Table V, (please see addendum) depicts the probability of obtaining a certain FSH test result and the corresponding LR within different LR ranges for the prediction of poor response and non-pregnancy. At a maximum positive LR of 8, the post-FSH-test probability of poor response will approximate 70% if the pre‐FSH-test probability is assumed to be as high as 20%. As is apparent from this table, the probability of obtaining a test result (FSH level) with an LR of ∼8 is quite small. Table III shows that in women with an increased FSH level the probability of poor response only increases substantially (3‐fold or more) in studies applying a high threshold level for FSH, resulting in a very limited number of patients with an abnormal test result.
Prediction of poor response (pre-test probability = 20%) . | . | . | Prediction of non-pregnancy (pre-test probability = 80%) . | . | . | ||||
---|---|---|---|---|---|---|---|---|---|
LR range . | Occurrence of test results within this range (%) . | Post-test probability poor response (%) . | LR range . | Occurrence of test results within in this range (%) . | Post–test probability non-pregnancy (%) . | ||||
0–1 | 68 | <20 | 0–1 | 63 | <80 | ||||
1–2 | 15 | 20–33 | 1–2 | 22 | 80–89 | ||||
2–3 | 8 | 33–43 | 2–3 | 9 | 89–93 | ||||
3–4 | 3 | 43–50 | 3–4 | 1 | 93–94 | ||||
4–5 | 2 | 50–56 | 4–5 | 1 | 94–95 | ||||
5–6 | 1 | 56–60 | 5–6 | 1 | 95–96 | ||||
6–7 | 1 | 60–64 | 6–7 | 1 | 96–96.5 | ||||
7–8 | 1 | 64–67 | 7–8 | 1 | 96.5–97 | ||||
>8 | 1 | >67 | >8 | 1 | >97 |
Prediction of poor response (pre-test probability = 20%) . | . | . | Prediction of non-pregnancy (pre-test probability = 80%) . | . | . | ||||
---|---|---|---|---|---|---|---|---|---|
LR range . | Occurrence of test results within this range (%) . | Post-test probability poor response (%) . | LR range . | Occurrence of test results within in this range (%) . | Post–test probability non-pregnancy (%) . | ||||
0–1 | 68 | <20 | 0–1 | 63 | <80 | ||||
1–2 | 15 | 20–33 | 1–2 | 22 | 80–89 | ||||
2–3 | 8 | 33–43 | 2–3 | 9 | 89–93 | ||||
3–4 | 3 | 43–50 | 3–4 | 1 | 93–94 | ||||
4–5 | 2 | 50–56 | 4–5 | 1 | 94–95 | ||||
5–6 | 1 | 56–60 | 5–6 | 1 | 95–96 | ||||
6–7 | 1 | 60–64 | 6–7 | 1 | 96–96.5 | ||||
7–8 | 1 | 64–67 | 7–8 | 1 | 96.5–97 | ||||
>8 | 1 | >67 | >8 | 1 | >97 |
Prediction of poor response (pre-test probability = 20%) . | . | . | Prediction of non-pregnancy (pre-test probability = 80%) . | . | . | ||||
---|---|---|---|---|---|---|---|---|---|
LR range . | Occurrence of test results within this range (%) . | Post-test probability poor response (%) . | LR range . | Occurrence of test results within in this range (%) . | Post–test probability non-pregnancy (%) . | ||||
0–1 | 68 | <20 | 0–1 | 63 | <80 | ||||
1–2 | 15 | 20–33 | 1–2 | 22 | 80–89 | ||||
2–3 | 8 | 33–43 | 2–3 | 9 | 89–93 | ||||
3–4 | 3 | 43–50 | 3–4 | 1 | 93–94 | ||||
4–5 | 2 | 50–56 | 4–5 | 1 | 94–95 | ||||
5–6 | 1 | 56–60 | 5–6 | 1 | 95–96 | ||||
6–7 | 1 | 60–64 | 6–7 | 1 | 96–96.5 | ||||
7–8 | 1 | 64–67 | 7–8 | 1 | 96.5–97 | ||||
>8 | 1 | >67 | >8 | 1 | >97 |
Prediction of poor response (pre-test probability = 20%) . | . | . | Prediction of non-pregnancy (pre-test probability = 80%) . | . | . | ||||
---|---|---|---|---|---|---|---|---|---|
LR range . | Occurrence of test results within this range (%) . | Post-test probability poor response (%) . | LR range . | Occurrence of test results within in this range (%) . | Post–test probability non-pregnancy (%) . | ||||
0–1 | 68 | <20 | 0–1 | 63 | <80 | ||||
1–2 | 15 | 20–33 | 1–2 | 22 | 80–89 | ||||
2–3 | 8 | 33–43 | 2–3 | 9 | 89–93 | ||||
3–4 | 3 | 43–50 | 3–4 | 1 | 93–94 | ||||
4–5 | 2 | 50–56 | 4–5 | 1 | 94–95 | ||||
5–6 | 1 | 56–60 | 5–6 | 1 | 95–96 | ||||
6–7 | 1 | 60–64 | 6–7 | 1 | 96–96.5 | ||||
7–8 | 1 | 64–67 | 7–8 | 1 | 96.5–97 | ||||
>8 | 1 | >67 | >8 | 1 | >97 |
Even more so, for prediction of non-pregnancy, the extremely high FSH levels that are necessary to obtain the moderate positive LR of ∼5, leading to a post-test pregnancy rate of less than 5% based on a pre-test rate of 20%, again occur only in a very limited number of patients (Table V). Beyond the coordinate defined by specificity 0.90 and sensitivity 0.20, the summary ROC curve almost runs parallel to the line of equality. This indicates that this segment of the curve is 100% uninformative (LR ∼1).
All this leads to the conclusion that with the use of basal FSH in regularly cycling women, accuracy in the prediction of poor response and non-pregnancy is adequate only at very high threshold levels, but because of the very low numbers of abnormal tests has hardly any clinical value. Considering this along with a false positive rate of ∼ 5%, the test will not be suitable as a diagnostic test to exclude patients, but only as screening test for counselling purposes and further diagnostic steps, in which a first IVF attempt may be the step of choice (Roberts et al., 2005).
AMH
Systematic review
Through the search and selection strategy, two studies reporting on the predictive capacity of AMH and which were suitable for data extraction and meta-analysis were identified (van Rooij et al., 2002; Muttukrishna et al., 2004). Characteristics of the included studies are listed in addendum, Table VI.
Author . | Consecutive . | One cycle per couple . | Data per . | Definition . | . | AMH-assay . | |
---|---|---|---|---|---|---|---|
. | . | . | . | Poor response/Cancel . | pregnancy . | . | |
Van Rooij et al. | Yes | Yes | Cycle | <4 oocytes or <3 follicles | ongoing | Immuno-enzymometric (immunotech-Coulter) | |
Muttukrishna et al. | No | Yes | Cycle | <4 follicles 15 mm. | not applicable | Immuno-enzymometric (immunotech-Coulter) |
Author . | Consecutive . | One cycle per couple . | Data per . | Definition . | . | AMH-assay . | |
---|---|---|---|---|---|---|---|
. | . | . | . | Poor response/Cancel . | pregnancy . | . | |
Van Rooij et al. | Yes | Yes | Cycle | <4 oocytes or <3 follicles | ongoing | Immuno-enzymometric (immunotech-Coulter) | |
Muttukrishna et al. | No | Yes | Cycle | <4 follicles 15 mm. | not applicable | Immuno-enzymometric (immunotech-Coulter) |
Author . | Consecutive . | One cycle per couple . | Data per . | Definition . | . | AMH-assay . | |
---|---|---|---|---|---|---|---|
. | . | . | . | Poor response/Cancel . | pregnancy . | . | |
Van Rooij et al. | Yes | Yes | Cycle | <4 oocytes or <3 follicles | ongoing | Immuno-enzymometric (immunotech-Coulter) | |
Muttukrishna et al. | No | Yes | Cycle | <4 follicles 15 mm. | not applicable | Immuno-enzymometric (immunotech-Coulter) |
Author . | Consecutive . | One cycle per couple . | Data per . | Definition . | . | AMH-assay . | |
---|---|---|---|---|---|---|---|
. | . | . | . | Poor response/Cancel . | pregnancy . | . | |
Van Rooij et al. | Yes | Yes | Cycle | <4 oocytes or <3 follicles | ongoing | Immuno-enzymometric (immunotech-Coulter) | |
Muttukrishna et al. | No | Yes | Cycle | <4 follicles 15 mm. | not applicable | Immuno-enzymometric (immunotech-Coulter) |
Accuracy of poor response prediction
The sensitivities and specificities, the positive LR and the DOR for the prediction of poor ovarian response, as calculated from each study, are summarized in Table VII, (see addendum) and in Figure 6. Homogeneity could not be rejected for sensitivity and specificity (χ2‐test statistic: P-value for sensitivity 0.12 and P-value for specificity 0.64), but this is merely because of the fact that only two studies were included. As can be seen from Figure 6, the points of the two studies can be thought of as originating from a single ROC curve (Spearman correlation coefficient between sensitivity and specificity is −0.81). The summary ROC curve that can be estimated from these points is also shown in Figure 6.
Author . | Cycles (n) . | AMH threshold value (µg/l) . | Prediction of poor response . | . | . | . | Pre-AMH probability (%) . | Post-AMH probability (%) . | Proportion of patients/cycles with abnormal AMH (%) . | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Sensitivity . | Specificity . | LR+ . | DOR . | . | . | . | |||
Van Rooij et al. | 119 | <0.1 | 0.49 | 0.94 | 8.2 | 14,9 | 29 | 77 | 18 | |||
<0.2 | 0.54 | 0.90 | 5.7 | 11,3 | 29 | 70 | 23 | |||||
<0.3 | 0.60 | 0.89 | 5.6 | 12,5 | 29 | 70 | 25 | |||||
Muttukrishna et al. | 69 | <0.1 | 0.76 | 0.88 | 6.6 | 24.9 | 25 | 68 | 28 |
Author . | Cycles (n) . | AMH threshold value (µg/l) . | Prediction of poor response . | . | . | . | Pre-AMH probability (%) . | Post-AMH probability (%) . | Proportion of patients/cycles with abnormal AMH (%) . | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Sensitivity . | Specificity . | LR+ . | DOR . | . | . | . | |||
Van Rooij et al. | 119 | <0.1 | 0.49 | 0.94 | 8.2 | 14,9 | 29 | 77 | 18 | |||
<0.2 | 0.54 | 0.90 | 5.7 | 11,3 | 29 | 70 | 23 | |||||
<0.3 | 0.60 | 0.89 | 5.6 | 12,5 | 29 | 70 | 25 | |||||
Muttukrishna et al. | 69 | <0.1 | 0.76 | 0.88 | 6.6 | 24.9 | 25 | 68 | 28 |
DOR, diagnostic odds ratio; LR+, likelihood ratio for a positive test result.
If a study reported on multiple threshold values, data for all threshold values are shown.
Author . | Cycles (n) . | AMH threshold value (µg/l) . | Prediction of poor response . | . | . | . | Pre-AMH probability (%) . | Post-AMH probability (%) . | Proportion of patients/cycles with abnormal AMH (%) . | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Sensitivity . | Specificity . | LR+ . | DOR . | . | . | . | |||
Van Rooij et al. | 119 | <0.1 | 0.49 | 0.94 | 8.2 | 14,9 | 29 | 77 | 18 | |||
<0.2 | 0.54 | 0.90 | 5.7 | 11,3 | 29 | 70 | 23 | |||||
<0.3 | 0.60 | 0.89 | 5.6 | 12,5 | 29 | 70 | 25 | |||||
Muttukrishna et al. | 69 | <0.1 | 0.76 | 0.88 | 6.6 | 24.9 | 25 | 68 | 28 |
Author . | Cycles (n) . | AMH threshold value (µg/l) . | Prediction of poor response . | . | . | . | Pre-AMH probability (%) . | Post-AMH probability (%) . | Proportion of patients/cycles with abnormal AMH (%) . | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Sensitivity . | Specificity . | LR+ . | DOR . | . | . | . | |||
Van Rooij et al. | 119 | <0.1 | 0.49 | 0.94 | 8.2 | 14,9 | 29 | 77 | 18 | |||
<0.2 | 0.54 | 0.90 | 5.7 | 11,3 | 29 | 70 | 23 | |||||
<0.3 | 0.60 | 0.89 | 5.6 | 12,5 | 29 | 70 | 25 | |||||
Muttukrishna et al. | 69 | <0.1 | 0.76 | 0.88 | 6.6 | 24.9 | 25 | 68 | 28 |
DOR, diagnostic odds ratio; LR+, likelihood ratio for a positive test result.
If a study reported on multiple threshold values, data for all threshold values are shown.
Accuracy of non-pregnancy prediction
Sensitivities and specificities for the prediction of non-pregnancy by AMH, as calculated from each study, are summarized in Table VIII. As the study of Van Rooij was the only one detected, further meta-analysis is not useful. The ROC-curve derived from the data of Van Rooij et al. representing the accuracy of AMH in the prediction of non-pregnancy is shown in Figure 7.
Author . | Cycles (n) . | AMH threshold value (µg/l) . | Prediction of non-pregnancy . | . | . | . | Pre-AMH probability (%) . | Post-AMH probability (%) . | Proportion of patients/cycles with abnormal AMH (%) . | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Sensitivity . | Specificity . | LR+ . | DOR . | . | . | . | |||
Van Rooij et al. | 119 | <0.1 | 0.22 | 0.89 | 1.9 | 2.2 | 75 | 85 | 19 | |||
<0.2 | 0.27 | 0.85 | 1.8 | 2.1 | 75 | 84 | 24 | |||||
<0.3 | 0.28 | 0.81 | 1.5 | 1.7 | 75 | 81 | 25 |
Author . | Cycles (n) . | AMH threshold value (µg/l) . | Prediction of non-pregnancy . | . | . | . | Pre-AMH probability (%) . | Post-AMH probability (%) . | Proportion of patients/cycles with abnormal AMH (%) . | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Sensitivity . | Specificity . | LR+ . | DOR . | . | . | . | |||
Van Rooij et al. | 119 | <0.1 | 0.22 | 0.89 | 1.9 | 2.2 | 75 | 85 | 19 | |||
<0.2 | 0.27 | 0.85 | 1.8 | 2.1 | 75 | 84 | 24 | |||||
<0.3 | 0.28 | 0.81 | 1.5 | 1.7 | 75 | 81 | 25 |
DOR, diagnostic odds ratio; LR+, likelihood ratio for a positive test result.
If a study reported on multiple threshold values, data for all threshold values are shown.
Author . | Cycles (n) . | AMH threshold value (µg/l) . | Prediction of non-pregnancy . | . | . | . | Pre-AMH probability (%) . | Post-AMH probability (%) . | Proportion of patients/cycles with abnormal AMH (%) . | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Sensitivity . | Specificity . | LR+ . | DOR . | . | . | . | |||
Van Rooij et al. | 119 | <0.1 | 0.22 | 0.89 | 1.9 | 2.2 | 75 | 85 | 19 | |||
<0.2 | 0.27 | 0.85 | 1.8 | 2.1 | 75 | 84 | 24 | |||||
<0.3 | 0.28 | 0.81 | 1.5 | 1.7 | 75 | 81 | 25 |
Author . | Cycles (n) . | AMH threshold value (µg/l) . | Prediction of non-pregnancy . | . | . | . | Pre-AMH probability (%) . | Post-AMH probability (%) . | Proportion of patients/cycles with abnormal AMH (%) . | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Sensitivity . | Specificity . | LR+ . | DOR . | . | . | . | |||
Van Rooij et al. | 119 | <0.1 | 0.22 | 0.89 | 1.9 | 2.2 | 75 | 85 | 19 | |||
<0.2 | 0.27 | 0.85 | 1.8 | 2.1 | 75 | 84 | 24 | |||||
<0.3 | 0.28 | 0.81 | 1.5 | 1.7 | 75 | 81 | 25 |
DOR, diagnostic odds ratio; LR+, likelihood ratio for a positive test result.
If a study reported on multiple threshold values, data for all threshold values are shown.
Clinical value
As data from only two studies are available, it is not feasible to extract data on the interrelation between positive LRs, post-test probabilities and the rate of abnormal tests. However, looking at the performance of AMH in the prediction of poor response, a desired level for sensitivity of 75% and for specificity of 85% would imply that the test performs only moderately, especially at the sensitivity level. For non-pregnancy prediction, a desired level of sensitivity of 40% and specificity of 95% would imply that the test has hardly any value, unless very low threshold levels would be used, which will certainly lead to only very small percentages of abnormal tests. Additional studies are to be awaited to learn whether test capacity may prove to be more superior than current tests like basal FSH and the AFC (Hazout et al., 2004; Muttukrishna et al., 2005; Penarrubia et al., 2005).
Inhibin B
Systematic review
We detected a total of nine studies reporting on the predictive capacity of inhibin-B and which were suitable for data extraction and meta-analysis (Balasch et al., 1996; Seifer et al., 1997; Hall et al., 1999; Creus et al., 2000; Fabregues et al., 2000; Penarrubia et al., 2000; Bancsi et al., 2002a; Fiçicioğlu et al., 2003; Erdem et al., 2004). Characteristics of the included studies are listed in addendum Table IX. Variation among the definitions of poor response and study quality and design characteristics was clearly present but logistic regression analysis revealed that none of the items significantly impacted upon the predictive performance of the test. Subgroup analysis therefore was not indicated.
Author . | Consecutive . | One cycle per couple . | Data per . | Definition . | . | Inhibin B assay . | |
---|---|---|---|---|---|---|---|
. | . | . | . | Poor response/Cancel . | Pregnancy . | . | |
Balasch et al. | Yes | Yes | Cycle | <3 follicles 14 mm. | Not applicable | Immunoenzymometric assay (Medgenix) | |
Seifer et al. | Yes | No | Patient | <4 follicles 15 mm. | clinical | ELISA (Serotec Lim. UK) | |
Hall et al. | No | No | Cycle | Not stated | clinical | ELISA (Serotec) | |
Creus et al. | Yes | Yes | Cycle | <3 follicles 14 mm. | Not applicable | Enzyme-linked immunosorbent (Serotec) | |
Fabregues et al. | Yes | Yes | Cycle | <3 follicles 14 mm. | Not applicable | Immunoenzymatic (Medgenix) | |
Penarrubia et al. | Yes | Yes | Cycle | <3 follicles 14 mm. | Not applicable | Immunoenzymometric (Immuno 1; Bayer) | |
Bancsi et al. | Yes | Yes | Cycle | <4 oocytes or <3 follicles 18 mm. | ongoing | Immuno-enzymometric (Serotec) | |
Fiçicioğlu et al. | No | Yes | Cycle | <5 oocytes | Not applicable | ELISA (Serotec) | |
Erdem et al. | Yes | No | Cycle | <5 oocytes (MII) or <3 follicles | Not applicable | Immunosorbent (Serotec) |
Author . | Consecutive . | One cycle per couple . | Data per . | Definition . | . | Inhibin B assay . | |
---|---|---|---|---|---|---|---|
. | . | . | . | Poor response/Cancel . | Pregnancy . | . | |
Balasch et al. | Yes | Yes | Cycle | <3 follicles 14 mm. | Not applicable | Immunoenzymometric assay (Medgenix) | |
Seifer et al. | Yes | No | Patient | <4 follicles 15 mm. | clinical | ELISA (Serotec Lim. UK) | |
Hall et al. | No | No | Cycle | Not stated | clinical | ELISA (Serotec) | |
Creus et al. | Yes | Yes | Cycle | <3 follicles 14 mm. | Not applicable | Enzyme-linked immunosorbent (Serotec) | |
Fabregues et al. | Yes | Yes | Cycle | <3 follicles 14 mm. | Not applicable | Immunoenzymatic (Medgenix) | |
Penarrubia et al. | Yes | Yes | Cycle | <3 follicles 14 mm. | Not applicable | Immunoenzymometric (Immuno 1; Bayer) | |
Bancsi et al. | Yes | Yes | Cycle | <4 oocytes or <3 follicles 18 mm. | ongoing | Immuno-enzymometric (Serotec) | |
Fiçicioğlu et al. | No | Yes | Cycle | <5 oocytes | Not applicable | ELISA (Serotec) | |
Erdem et al. | Yes | No | Cycle | <5 oocytes (MII) or <3 follicles | Not applicable | Immunosorbent (Serotec) |
Author . | Consecutive . | One cycle per couple . | Data per . | Definition . | . | Inhibin B assay . | |
---|---|---|---|---|---|---|---|
. | . | . | . | Poor response/Cancel . | Pregnancy . | . | |
Balasch et al. | Yes | Yes | Cycle | <3 follicles 14 mm. | Not applicable | Immunoenzymometric assay (Medgenix) | |
Seifer et al. | Yes | No | Patient | <4 follicles 15 mm. | clinical | ELISA (Serotec Lim. UK) | |
Hall et al. | No | No | Cycle | Not stated | clinical | ELISA (Serotec) | |
Creus et al. | Yes | Yes | Cycle | <3 follicles 14 mm. | Not applicable | Enzyme-linked immunosorbent (Serotec) | |
Fabregues et al. | Yes | Yes | Cycle | <3 follicles 14 mm. | Not applicable | Immunoenzymatic (Medgenix) | |
Penarrubia et al. | Yes | Yes | Cycle | <3 follicles 14 mm. | Not applicable | Immunoenzymometric (Immuno 1; Bayer) | |
Bancsi et al. | Yes | Yes | Cycle | <4 oocytes or <3 follicles 18 mm. | ongoing | Immuno-enzymometric (Serotec) | |
Fiçicioğlu et al. | No | Yes | Cycle | <5 oocytes | Not applicable | ELISA (Serotec) | |
Erdem et al. | Yes | No | Cycle | <5 oocytes (MII) or <3 follicles | Not applicable | Immunosorbent (Serotec) |
Author . | Consecutive . | One cycle per couple . | Data per . | Definition . | . | Inhibin B assay . | |
---|---|---|---|---|---|---|---|
. | . | . | . | Poor response/Cancel . | Pregnancy . | . | |
Balasch et al. | Yes | Yes | Cycle | <3 follicles 14 mm. | Not applicable | Immunoenzymometric assay (Medgenix) | |
Seifer et al. | Yes | No | Patient | <4 follicles 15 mm. | clinical | ELISA (Serotec Lim. UK) | |
Hall et al. | No | No | Cycle | Not stated | clinical | ELISA (Serotec) | |
Creus et al. | Yes | Yes | Cycle | <3 follicles 14 mm. | Not applicable | Enzyme-linked immunosorbent (Serotec) | |
Fabregues et al. | Yes | Yes | Cycle | <3 follicles 14 mm. | Not applicable | Immunoenzymatic (Medgenix) | |
Penarrubia et al. | Yes | Yes | Cycle | <3 follicles 14 mm. | Not applicable | Immunoenzymometric (Immuno 1; Bayer) | |
Bancsi et al. | Yes | Yes | Cycle | <4 oocytes or <3 follicles 18 mm. | ongoing | Immuno-enzymometric (Serotec) | |
Fiçicioğlu et al. | No | Yes | Cycle | <5 oocytes | Not applicable | ELISA (Serotec) | |
Erdem et al. | Yes | No | Cycle | <5 oocytes (MII) or <3 follicles | Not applicable | Immunosorbent (Serotec) |
Accuracy of poor response prediction
The sensitivities and specificities, the positive LR and the DOR for the prediction of poor ovarian response, as calculated from each study, are summarized in Table X, see addendum. Calculation of one summary point estimate for sensitivity and specificity was not meaningful, as both test characteristics, as plotted in Figure 8, were heterogeneous among studies (χ2-test statistic: P-value for sensitivity <0.001 and P-value for specificity 0.002). The Spearman correlation coefficient for sensitivity and specificity was sufficient to estimate a summary ROC curve (R = −0.93, Figure 8). In the figure, it is clearly seen that all but one study were close to the estimated ROC curve, and that one study reported a clearly better accuracy (Fiçicioğlu et al., 2003). This study was of good quality, but reported on only a small number of patients.
Author . | Cycles (n) . | Inhibin B threshold value (pg/ml) . | Prediction of poor response . | . | . | . | Pre-inhibin B probability (%) . | Post-inhibin B probability (%) . | Proportion of patients/ cycles with abnormal inhibin B (%) . | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Sensitivity . | Specificity . | LR+ . | DOR . | . | . | . | |||
Balasch et al. | 120 | logistic model | 0.52 | 0.80 | 2.6 | 4.4 | 33 | 57 | 31 | |||
Seifer et al. | 178 | <45 | 0.53 | 0.79 | 2.6 | 4.3 | 8 | 19 | 24 | |||
Creus et al. | 120 | logistic model | 0.70 | 0.63 | 1.9 | 3.9 | 33 | 48 | 48 | |||
Fabregues et al. | 80 | logistic model | 0.32 | 0.83 | 1.9 | 2.3 | 35 | 50 | 23 | |||
Penarrubia et al. | 80 | logistic model | 0.89 | 0.29 | 1.3 | 3.6 | 25 | 30 | 76 | |||
Bancsi et al. | 120 | <45 | 0.33 | 0.95 | 6.9 | 10 | 30 | 75 | 13 | |||
<53.8 | 0.39 | 0.94 | 6.5 | 10.1 | 30 | 74 | 16 | |||||
Fiçicioğlu et al. | 58 | <56 | 0.81 | 0.81 | 4.4 | 18.0 | 43 | 77 | 45 | |||
Erdem et al. | 32 | logistic model | 0.69 | 063 | 1.8 | 3.7 | 50 | 65 | 53 |
Author . | Cycles (n) . | Inhibin B threshold value (pg/ml) . | Prediction of poor response . | . | . | . | Pre-inhibin B probability (%) . | Post-inhibin B probability (%) . | Proportion of patients/ cycles with abnormal inhibin B (%) . | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Sensitivity . | Specificity . | LR+ . | DOR . | . | . | . | |||
Balasch et al. | 120 | logistic model | 0.52 | 0.80 | 2.6 | 4.4 | 33 | 57 | 31 | |||
Seifer et al. | 178 | <45 | 0.53 | 0.79 | 2.6 | 4.3 | 8 | 19 | 24 | |||
Creus et al. | 120 | logistic model | 0.70 | 0.63 | 1.9 | 3.9 | 33 | 48 | 48 | |||
Fabregues et al. | 80 | logistic model | 0.32 | 0.83 | 1.9 | 2.3 | 35 | 50 | 23 | |||
Penarrubia et al. | 80 | logistic model | 0.89 | 0.29 | 1.3 | 3.6 | 25 | 30 | 76 | |||
Bancsi et al. | 120 | <45 | 0.33 | 0.95 | 6.9 | 10 | 30 | 75 | 13 | |||
<53.8 | 0.39 | 0.94 | 6.5 | 10.1 | 30 | 74 | 16 | |||||
Fiçicioğlu et al. | 58 | <56 | 0.81 | 0.81 | 4.4 | 18.0 | 43 | 77 | 45 | |||
Erdem et al. | 32 | logistic model | 0.69 | 063 | 1.8 | 3.7 | 50 | 65 | 53 |
DOR, diagnostic odds ratio; LR+, likelihood ratio for a positive test result; NS, not stated.
If a study reported on multiple threshold values, data for all threshold values are shown.
Author . | Cycles (n) . | Inhibin B threshold value (pg/ml) . | Prediction of poor response . | . | . | . | Pre-inhibin B probability (%) . | Post-inhibin B probability (%) . | Proportion of patients/ cycles with abnormal inhibin B (%) . | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Sensitivity . | Specificity . | LR+ . | DOR . | . | . | . | |||
Balasch et al. | 120 | logistic model | 0.52 | 0.80 | 2.6 | 4.4 | 33 | 57 | 31 | |||
Seifer et al. | 178 | <45 | 0.53 | 0.79 | 2.6 | 4.3 | 8 | 19 | 24 | |||
Creus et al. | 120 | logistic model | 0.70 | 0.63 | 1.9 | 3.9 | 33 | 48 | 48 | |||
Fabregues et al. | 80 | logistic model | 0.32 | 0.83 | 1.9 | 2.3 | 35 | 50 | 23 | |||
Penarrubia et al. | 80 | logistic model | 0.89 | 0.29 | 1.3 | 3.6 | 25 | 30 | 76 | |||
Bancsi et al. | 120 | <45 | 0.33 | 0.95 | 6.9 | 10 | 30 | 75 | 13 | |||
<53.8 | 0.39 | 0.94 | 6.5 | 10.1 | 30 | 74 | 16 | |||||
Fiçicioğlu et al. | 58 | <56 | 0.81 | 0.81 | 4.4 | 18.0 | 43 | 77 | 45 | |||
Erdem et al. | 32 | logistic model | 0.69 | 063 | 1.8 | 3.7 | 50 | 65 | 53 |
Author . | Cycles (n) . | Inhibin B threshold value (pg/ml) . | Prediction of poor response . | . | . | . | Pre-inhibin B probability (%) . | Post-inhibin B probability (%) . | Proportion of patients/ cycles with abnormal inhibin B (%) . | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Sensitivity . | Specificity . | LR+ . | DOR . | . | . | . | |||
Balasch et al. | 120 | logistic model | 0.52 | 0.80 | 2.6 | 4.4 | 33 | 57 | 31 | |||
Seifer et al. | 178 | <45 | 0.53 | 0.79 | 2.6 | 4.3 | 8 | 19 | 24 | |||
Creus et al. | 120 | logistic model | 0.70 | 0.63 | 1.9 | 3.9 | 33 | 48 | 48 | |||
Fabregues et al. | 80 | logistic model | 0.32 | 0.83 | 1.9 | 2.3 | 35 | 50 | 23 | |||
Penarrubia et al. | 80 | logistic model | 0.89 | 0.29 | 1.3 | 3.6 | 25 | 30 | 76 | |||
Bancsi et al. | 120 | <45 | 0.33 | 0.95 | 6.9 | 10 | 30 | 75 | 13 | |||
<53.8 | 0.39 | 0.94 | 6.5 | 10.1 | 30 | 74 | 16 | |||||
Fiçicioğlu et al. | 58 | <56 | 0.81 | 0.81 | 4.4 | 18.0 | 43 | 77 | 45 | |||
Erdem et al. | 32 | logistic model | 0.69 | 063 | 1.8 | 3.7 | 50 | 65 | 53 |
DOR, diagnostic odds ratio; LR+, likelihood ratio for a positive test result; NS, not stated.
If a study reported on multiple threshold values, data for all threshold values are shown.
Accuracy of non-pregnancy prediction
There were three studies that reported on the capacity of inhibin B to predict non-pregnancy. Sensitivities and specificities for the prediction of non-pregnancy, as calculated from each study, are summarized in Table XI. Sensitivity and specificity as plotted in Figure 9 were heterogeneous between studies (χ2-test statistic: P-value for sensitivity 0.004 and P-value for specificity <0.001). The Spearman correlation between sensitivity and specificity showed a coefficient of −0.94, sufficient to estimate a summary ROC curve.
Author . | Cycles (n) . | Inhibin B threshold value (pg/ml) . | Prediction of non-pregnancy . | . | . | . | Pre-inhibin B probability (%) . | Post-inhibin B probability (%) . | Proportion of patients/ cycles with abnormal inhibin B (%) . | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Sensitivity . | Specificity . | LR+ . | DOR . | . | . | . | |||
Seifer et al. | 178 | <45 | 0.28 | 0.92 | 3.5 | 4.5 | 79 | 93 | 24 | |||
Hall et al. | 111 | <53.8 | 0.23 | 0.74 | 0.9 | 0.8 | 39 | 36 | 25 | |||
<76.5 | 0.60 | 0.56 | 1.4 | 1.9 | 39 | 46 | 50 | |||||
<105.3 | 0.77 | 0.25 | 1.0 | 1.1 | 39 | 39 | 76 | |||||
Bancsi et al. | 120 | <45 | 0.17 | 1.00 | 5.2 | 6.1 | 78 | 94 | 13 | |||
<53.8 | 0.19 | 0.96 | 5.2 | 6.2 | 78 | 95 | 16 |
Author . | Cycles (n) . | Inhibin B threshold value (pg/ml) . | Prediction of non-pregnancy . | . | . | . | Pre-inhibin B probability (%) . | Post-inhibin B probability (%) . | Proportion of patients/ cycles with abnormal inhibin B (%) . | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Sensitivity . | Specificity . | LR+ . | DOR . | . | . | . | |||
Seifer et al. | 178 | <45 | 0.28 | 0.92 | 3.5 | 4.5 | 79 | 93 | 24 | |||
Hall et al. | 111 | <53.8 | 0.23 | 0.74 | 0.9 | 0.8 | 39 | 36 | 25 | |||
<76.5 | 0.60 | 0.56 | 1.4 | 1.9 | 39 | 46 | 50 | |||||
<105.3 | 0.77 | 0.25 | 1.0 | 1.1 | 39 | 39 | 76 | |||||
Bancsi et al. | 120 | <45 | 0.17 | 1.00 | 5.2 | 6.1 | 78 | 94 | 13 | |||
<53.8 | 0.19 | 0.96 | 5.2 | 6.2 | 78 | 95 | 16 |
DOR, diagnostic odds ratio; LR+, likelihood ratio for a positive test result.
If a study reported on multiple threshold values, data for all threshold values are shown.
Author . | Cycles (n) . | Inhibin B threshold value (pg/ml) . | Prediction of non-pregnancy . | . | . | . | Pre-inhibin B probability (%) . | Post-inhibin B probability (%) . | Proportion of patients/ cycles with abnormal inhibin B (%) . | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Sensitivity . | Specificity . | LR+ . | DOR . | . | . | . | |||
Seifer et al. | 178 | <45 | 0.28 | 0.92 | 3.5 | 4.5 | 79 | 93 | 24 | |||
Hall et al. | 111 | <53.8 | 0.23 | 0.74 | 0.9 | 0.8 | 39 | 36 | 25 | |||
<76.5 | 0.60 | 0.56 | 1.4 | 1.9 | 39 | 46 | 50 | |||||
<105.3 | 0.77 | 0.25 | 1.0 | 1.1 | 39 | 39 | 76 | |||||
Bancsi et al. | 120 | <45 | 0.17 | 1.00 | 5.2 | 6.1 | 78 | 94 | 13 | |||
<53.8 | 0.19 | 0.96 | 5.2 | 6.2 | 78 | 95 | 16 |
Author . | Cycles (n) . | Inhibin B threshold value (pg/ml) . | Prediction of non-pregnancy . | . | . | . | Pre-inhibin B probability (%) . | Post-inhibin B probability (%) . | Proportion of patients/ cycles with abnormal inhibin B (%) . | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Sensitivity . | Specificity . | LR+ . | DOR . | . | . | . | |||
Seifer et al. | 178 | <45 | 0.28 | 0.92 | 3.5 | 4.5 | 79 | 93 | 24 | |||
Hall et al. | 111 | <53.8 | 0.23 | 0.74 | 0.9 | 0.8 | 39 | 36 | 25 | |||
<76.5 | 0.60 | 0.56 | 1.4 | 1.9 | 39 | 46 | 50 | |||||
<105.3 | 0.77 | 0.25 | 1.0 | 1.1 | 39 | 39 | 76 | |||||
Bancsi et al. | 120 | <45 | 0.17 | 1.00 | 5.2 | 6.1 | 78 | 94 | 13 | |||
<53.8 | 0.19 | 0.96 | 5.2 | 6.2 | 78 | 95 | 16 |
DOR, diagnostic odds ratio; LR+, likelihood ratio for a positive test result.
If a study reported on multiple threshold values, data for all threshold values are shown.
Clinical value
Based on the summary ROC curves depicted in Figure 8, a range of positive LRs was calculated and for each ratio pre-inhibin B-test probabilities of poor response or non-pregnancy (20 and 80%, respectively) were converted into post-inhibin B-test probabilities. Table XII depicts the probability of obtaining a certain inhibin B test result and the corresponding LR, within different LR ranges for the prediction of poor response and non-pregnancy. At a very modest LR of 4, the post-inhibin B-test probability of poor response will not be higher than 55%, while the chance of obtaining such a test result is very small.
Prediction of poor response (pre-test probability = 20%) . | . | . | Prediction of non-pregnancy (pre-test probability = 80%) . | . | . | ||||
---|---|---|---|---|---|---|---|---|---|
LR range . | Occurrence of test results in this range (%) . | Post-test probability of poor response (%) . | LR range . | Occurrence of test results in this range (%) . | Post-test probability of non-pregnancy (%) . | ||||
0–1 | 60 | <20 | 0–1 | 79 | <80 | ||||
1–2 | 22 | 20–33 | 1–2 | 13 | 80–89 | ||||
2–3 | 10 | 33–43 | 2–3 | 4 | 89–93 | ||||
3–4 | 7.8 | 43–50 | 3–4 | 2 | 93–94 | ||||
4–5 | 0.2 | 50–56 | 4–5 | 1 | 94–95 | ||||
5–6 | 0 | 56–60 | 5–6 | 1 | 95–96 | ||||
6–7 | 0 | 60–64 | 6–7 | 0 | 96–96.5 | ||||
7–8 | 0 | 64–67 | 7–8 | 0 | 96.5–97 | ||||
>8 | 0 | >67 | >8 | 0 | >97 |
Prediction of poor response (pre-test probability = 20%) . | . | . | Prediction of non-pregnancy (pre-test probability = 80%) . | . | . | ||||
---|---|---|---|---|---|---|---|---|---|
LR range . | Occurrence of test results in this range (%) . | Post-test probability of poor response (%) . | LR range . | Occurrence of test results in this range (%) . | Post-test probability of non-pregnancy (%) . | ||||
0–1 | 60 | <20 | 0–1 | 79 | <80 | ||||
1–2 | 22 | 20–33 | 1–2 | 13 | 80–89 | ||||
2–3 | 10 | 33–43 | 2–3 | 4 | 89–93 | ||||
3–4 | 7.8 | 43–50 | 3–4 | 2 | 93–94 | ||||
4–5 | 0.2 | 50–56 | 4–5 | 1 | 94–95 | ||||
5–6 | 0 | 56–60 | 5–6 | 1 | 95–96 | ||||
6–7 | 0 | 60–64 | 6–7 | 0 | 96–96.5 | ||||
7–8 | 0 | 64–67 | 7–8 | 0 | 96.5–97 | ||||
>8 | 0 | >67 | >8 | 0 | >97 |
Prediction of poor response (pre-test probability = 20%) . | . | . | Prediction of non-pregnancy (pre-test probability = 80%) . | . | . | ||||
---|---|---|---|---|---|---|---|---|---|
LR range . | Occurrence of test results in this range (%) . | Post-test probability of poor response (%) . | LR range . | Occurrence of test results in this range (%) . | Post-test probability of non-pregnancy (%) . | ||||
0–1 | 60 | <20 | 0–1 | 79 | <80 | ||||
1–2 | 22 | 20–33 | 1–2 | 13 | 80–89 | ||||
2–3 | 10 | 33–43 | 2–3 | 4 | 89–93 | ||||
3–4 | 7.8 | 43–50 | 3–4 | 2 | 93–94 | ||||
4–5 | 0.2 | 50–56 | 4–5 | 1 | 94–95 | ||||
5–6 | 0 | 56–60 | 5–6 | 1 | 95–96 | ||||
6–7 | 0 | 60–64 | 6–7 | 0 | 96–96.5 | ||||
7–8 | 0 | 64–67 | 7–8 | 0 | 96.5–97 | ||||
>8 | 0 | >67 | >8 | 0 | >97 |
Prediction of poor response (pre-test probability = 20%) . | . | . | Prediction of non-pregnancy (pre-test probability = 80%) . | . | . | ||||
---|---|---|---|---|---|---|---|---|---|
LR range . | Occurrence of test results in this range (%) . | Post-test probability of poor response (%) . | LR range . | Occurrence of test results in this range (%) . | Post-test probability of non-pregnancy (%) . | ||||
0–1 | 60 | <20 | 0–1 | 79 | <80 | ||||
1–2 | 22 | 20–33 | 1–2 | 13 | 80–89 | ||||
2–3 | 10 | 33–43 | 2–3 | 4 | 89–93 | ||||
3–4 | 7.8 | 43–50 | 3–4 | 2 | 93–94 | ||||
4–5 | 0.2 | 50–56 | 4–5 | 1 | 94–95 | ||||
5–6 | 0 | 56–60 | 5–6 | 1 | 95–96 | ||||
6–7 | 0 | 60–64 | 6–7 | 0 | 96–96.5 | ||||
7–8 | 0 | 64–67 | 7–8 | 0 | 96.5–97 | ||||
>8 | 0 | >67 | >8 | 0 | >97 |
For prediction of non-pregnancy, extreme threshold levels are necessary to obtain a modest positive likelihood ratio of ∼4–5, leading to a post-test pregnancy rate of approximately 5%. Such abnormal test results occur only in a very limited number of patients, while the false positive rate will lead to unnecessary exclusions from IVF programs if the test is used in a diagnostic fashion.
With the use of basal inhibin B in regularly cycling women, the accuracy in the prediction of poor response and non-pregnancy is only modest at a very low threshold level. At best the test may be used as screening test for counselling purposes or to direct further diagnostic steps, like a first IVF attempt to observe the response to ovarian stimulation. Used in this way, the test may well be inferior to other tests discussed in this review.
Basal estradiol
Systematic review
We detected a total of 10 studies reporting on the predictive capacity of basal estradiol and which were suitable for data extraction and meta-analysis (Licciardi et al., 1995; Smotrich et al., 1995; Evers et al., 1998; Vazquez et al., 1998; Hall et al., 1999; Frattarelli et al., 2000; Penarrubia et al., 2000; Phophong et al., 2000; Mikkelsen et al., 2001; Ranieri et al., 2001; Bancsi et al., 2002a). Characteristics of the included studies are listed in addendum Table XIII. Again, variation among the definitions of poor response and study quality and design characteristics was clearly present, but logistic regression analysis revealed that none of the items significantly impacted upon the predictive performance of the test. Subgroup analysis therefore was not indicated.
Author . | Consecutive . | One cycle per couple . | Data per . | Definition . | . | Estradiol-assay . | |
---|---|---|---|---|---|---|---|
. | . | . | . | Poor response/Cancel . | Pregnancy . | . | |
Smotrich et al. | No | No | Cycle | <2 follicles 16 mm. | Clinical | RIA (Diag. Prod. USA) | |
Licciardi et al. | No | Not stated | Retrieval | Not stated | Ongoing | RIA (Pantax South Monica, CA) | |
Ranieri et al. | No | Yes | Cycle | <5 follicles 15 mm. | Not stated | RIA (Amersham Int. UK) | |
Evers et al. | Yes | Yes | Cycle | <4 follicles 15 mm. | Clinical | RIA (Diag. Prod. USA) | |
Vazquez et al. | Clinical | ||||||
Hall et al. | No | No | Patient | Not stated | Clinical | Enzyme immunoassay (Abott Lab. USA) | |
Frattarelli et al. | Yes | Yes | Cycle | <3 follicles | Clinical | Immunolite immunoassay (Diag. Pord. USA) | |
Phophong et al. | Yes | Yes | Cycle | <3 follicles 15 mm. | Clinical | RIA (Amersham Int. UK) | |
Penarrubia et al. | Yes | Yes | Cycle | <3 follicles 14 mm. | Not stated | Immunoenzymometric (Immuno I; Bayer) | |
Mikkelsen et al. | Yes | No | Retrieval | Not stated | Clinical | Autoanalyser (Immuno I; Bayer Denmark) | |
Bancsi et al. | Yes | Yes | Cycle | <4 oocytes or <3 follicles 18 mm. | Ongoing | AxSYM immunoanalyser (Abott Lab USA) |
Author . | Consecutive . | One cycle per couple . | Data per . | Definition . | . | Estradiol-assay . | |
---|---|---|---|---|---|---|---|
. | . | . | . | Poor response/Cancel . | Pregnancy . | . | |
Smotrich et al. | No | No | Cycle | <2 follicles 16 mm. | Clinical | RIA (Diag. Prod. USA) | |
Licciardi et al. | No | Not stated | Retrieval | Not stated | Ongoing | RIA (Pantax South Monica, CA) | |
Ranieri et al. | No | Yes | Cycle | <5 follicles 15 mm. | Not stated | RIA (Amersham Int. UK) | |
Evers et al. | Yes | Yes | Cycle | <4 follicles 15 mm. | Clinical | RIA (Diag. Prod. USA) | |
Vazquez et al. | Clinical | ||||||
Hall et al. | No | No | Patient | Not stated | Clinical | Enzyme immunoassay (Abott Lab. USA) | |
Frattarelli et al. | Yes | Yes | Cycle | <3 follicles | Clinical | Immunolite immunoassay (Diag. Pord. USA) | |
Phophong et al. | Yes | Yes | Cycle | <3 follicles 15 mm. | Clinical | RIA (Amersham Int. UK) | |
Penarrubia et al. | Yes | Yes | Cycle | <3 follicles 14 mm. | Not stated | Immunoenzymometric (Immuno I; Bayer) | |
Mikkelsen et al. | Yes | No | Retrieval | Not stated | Clinical | Autoanalyser (Immuno I; Bayer Denmark) | |
Bancsi et al. | Yes | Yes | Cycle | <4 oocytes or <3 follicles 18 mm. | Ongoing | AxSYM immunoanalyser (Abott Lab USA) |
Author . | Consecutive . | One cycle per couple . | Data per . | Definition . | . | Estradiol-assay . | |
---|---|---|---|---|---|---|---|
. | . | . | . | Poor response/Cancel . | Pregnancy . | . | |
Smotrich et al. | No | No | Cycle | <2 follicles 16 mm. | Clinical | RIA (Diag. Prod. USA) | |
Licciardi et al. | No | Not stated | Retrieval | Not stated | Ongoing | RIA (Pantax South Monica, CA) | |
Ranieri et al. | No | Yes | Cycle | <5 follicles 15 mm. | Not stated | RIA (Amersham Int. UK) | |
Evers et al. | Yes | Yes | Cycle | <4 follicles 15 mm. | Clinical | RIA (Diag. Prod. USA) | |
Vazquez et al. | Clinical | ||||||
Hall et al. | No | No | Patient | Not stated | Clinical | Enzyme immunoassay (Abott Lab. USA) | |
Frattarelli et al. | Yes | Yes | Cycle | <3 follicles | Clinical | Immunolite immunoassay (Diag. Pord. USA) | |
Phophong et al. | Yes | Yes | Cycle | <3 follicles 15 mm. | Clinical | RIA (Amersham Int. UK) | |
Penarrubia et al. | Yes | Yes | Cycle | <3 follicles 14 mm. | Not stated | Immunoenzymometric (Immuno I; Bayer) | |
Mikkelsen et al. | Yes | No | Retrieval | Not stated | Clinical | Autoanalyser (Immuno I; Bayer Denmark) | |
Bancsi et al. | Yes | Yes | Cycle | <4 oocytes or <3 follicles 18 mm. | Ongoing | AxSYM immunoanalyser (Abott Lab USA) |
Author . | Consecutive . | One cycle per couple . | Data per . | Definition . | . | Estradiol-assay . | |
---|---|---|---|---|---|---|---|
. | . | . | . | Poor response/Cancel . | Pregnancy . | . | |
Smotrich et al. | No | No | Cycle | <2 follicles 16 mm. | Clinical | RIA (Diag. Prod. USA) | |
Licciardi et al. | No | Not stated | Retrieval | Not stated | Ongoing | RIA (Pantax South Monica, CA) | |
Ranieri et al. | No | Yes | Cycle | <5 follicles 15 mm. | Not stated | RIA (Amersham Int. UK) | |
Evers et al. | Yes | Yes | Cycle | <4 follicles 15 mm. | Clinical | RIA (Diag. Prod. USA) | |
Vazquez et al. | Clinical | ||||||
Hall et al. | No | No | Patient | Not stated | Clinical | Enzyme immunoassay (Abott Lab. USA) | |
Frattarelli et al. | Yes | Yes | Cycle | <3 follicles | Clinical | Immunolite immunoassay (Diag. Pord. USA) | |
Phophong et al. | Yes | Yes | Cycle | <3 follicles 15 mm. | Clinical | RIA (Amersham Int. UK) | |
Penarrubia et al. | Yes | Yes | Cycle | <3 follicles 14 mm. | Not stated | Immunoenzymometric (Immuno I; Bayer) | |
Mikkelsen et al. | Yes | No | Retrieval | Not stated | Clinical | Autoanalyser (Immuno I; Bayer Denmark) | |
Bancsi et al. | Yes | Yes | Cycle | <4 oocytes or <3 follicles 18 mm. | Ongoing | AxSYM immunoanalyser (Abott Lab USA) |
Accuracy of poor response prediction
There were eight studies that reported on the prediction of poor response. The sensitivities and specificities, the positive LR and the DOR for the prediction of poor ovarian response, as calculated from each study, are summarized in Table XIV. Calculation of one summary point estimate for sensitivity and specificity was not meaningful, as both test characteristics as plotted in Figure 10 were heterogeneous among studies (χ2-test statistic: P-value for sensitivity <0.001 and P-value for specificity 0.002). The Spearman correlation coefficient for sensitivity and specificity was −0.50. As can be seen from Figure 10, this can be because of three outliers, which were extracted from the studies of Smotrich et al. and Ranieri et al. From neither the clinical nor the methodological point of view could a clear explanation be provided for the outliers. When correlation between sensitivity and specificity was assessed after exclusion of the three outliers, we found a very strong correlation (–0.94). Figure 10 shows two estimates of a summary ROC curve, one constructed with all data and one constructed after exclusion of the two studies with outlying data (Figure 10).
Author . | Cycles (n) . | Estradiol threshold value (pmol/l) . | Prediction of poor response . | . | . | . | Pre- estradiol probability (%) . | Post- estradiol probability (%) . | Proportion of patients/ cycles with abnormal estradiol (%) . | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Sensitivity . | Specificity . | LR+ . | DOR . | . | . | . | |||
Smotrich et al | 292 | >294 | 0.83 | 0.92 | 10.8 | 60.0 | 2 | 19 | 9 | |||
>367 | 0.83 | 0.97 | 23.8 | 138.0 | 2 | 33 | 5 | |||||
Ranieri et al. | 177 | >350 | 0.79 | 0.81 | 4.1 | 15.8 | 27 | 60 | 36 | |||
Evers et al. | 213 | >220 | 0.26 | 0.96 | 6.5 | 8.5 | 16 | 56 | 8 | |||
Vazquez et al. | 248 | >92 | 0.64 | 0.38 | 1.0 | 1.1 | 9 | 9 | 62 | |||
>184 | 0.27 | 0.71 | 0.9 | 0.9 | 9 | 8 | 29 | |||||
>275 | 0.09 | 0.88 | 0.7 | 0.7 | 9 | 7 | 12 | |||||
>367 | 0.05 | 0.94 | 0.7 | 0.7 | 9 | 7 | 6 | |||||
Frattarelli et al. | 2476 | >73 | 0.76 | 0.13 | 0.9 | 0.5 | 14 | 12 | 86 | |||
>147 | 0.34 | 0.56 | 0.8 | 0.7 | 14 | 11 | 43 | |||||
>220 | 0.14 | 0.88 | 1.1 | 1.2 | 14 | 15 | 13 | |||||
>294 | 0.06 | 0.97 | 1.95 | 2.0 | 14 | 24 | 4 | |||||
>367 | 0.03 | 0.98 | 2.2 | 2.2 | 14 | 26 | 2 | |||||
Phophong et al. | 305 | >250 | 0.12 | 0.86 | 0.8 | 0.8 | 9 | 7 | 14 | |||
Penarrubia et al. | 80 | logistic model | 0.70 | 0.32 | 1.0 | 1.1 | 25 | 25 | 69 | |||
Bancsi et al. | 120 | >200 | 0.31 | 0.74 | 1.2 | 1.2 | 30 | 33 | 28 | |||
>250 | 0.22 | 0.92 | 2.7 | 3.1 | 30 | 53 | 13 |
Author . | Cycles (n) . | Estradiol threshold value (pmol/l) . | Prediction of poor response . | . | . | . | Pre- estradiol probability (%) . | Post- estradiol probability (%) . | Proportion of patients/ cycles with abnormal estradiol (%) . | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Sensitivity . | Specificity . | LR+ . | DOR . | . | . | . | |||
Smotrich et al | 292 | >294 | 0.83 | 0.92 | 10.8 | 60.0 | 2 | 19 | 9 | |||
>367 | 0.83 | 0.97 | 23.8 | 138.0 | 2 | 33 | 5 | |||||
Ranieri et al. | 177 | >350 | 0.79 | 0.81 | 4.1 | 15.8 | 27 | 60 | 36 | |||
Evers et al. | 213 | >220 | 0.26 | 0.96 | 6.5 | 8.5 | 16 | 56 | 8 | |||
Vazquez et al. | 248 | >92 | 0.64 | 0.38 | 1.0 | 1.1 | 9 | 9 | 62 | |||
>184 | 0.27 | 0.71 | 0.9 | 0.9 | 9 | 8 | 29 | |||||
>275 | 0.09 | 0.88 | 0.7 | 0.7 | 9 | 7 | 12 | |||||
>367 | 0.05 | 0.94 | 0.7 | 0.7 | 9 | 7 | 6 | |||||
Frattarelli et al. | 2476 | >73 | 0.76 | 0.13 | 0.9 | 0.5 | 14 | 12 | 86 | |||
>147 | 0.34 | 0.56 | 0.8 | 0.7 | 14 | 11 | 43 | |||||
>220 | 0.14 | 0.88 | 1.1 | 1.2 | 14 | 15 | 13 | |||||
>294 | 0.06 | 0.97 | 1.95 | 2.0 | 14 | 24 | 4 | |||||
>367 | 0.03 | 0.98 | 2.2 | 2.2 | 14 | 26 | 2 | |||||
Phophong et al. | 305 | >250 | 0.12 | 0.86 | 0.8 | 0.8 | 9 | 7 | 14 | |||
Penarrubia et al. | 80 | logistic model | 0.70 | 0.32 | 1.0 | 1.1 | 25 | 25 | 69 | |||
Bancsi et al. | 120 | >200 | 0.31 | 0.74 | 1.2 | 1.2 | 30 | 33 | 28 | |||
>250 | 0.22 | 0.92 | 2.7 | 3.1 | 30 | 53 | 13 |
DOR, diagnostic odds ratio; LR+, likelihood ratio for a positive test result.
If a study reported on multiple threshold values, data for all threshold values are shown.
Author . | Cycles (n) . | Estradiol threshold value (pmol/l) . | Prediction of poor response . | . | . | . | Pre- estradiol probability (%) . | Post- estradiol probability (%) . | Proportion of patients/ cycles with abnormal estradiol (%) . | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Sensitivity . | Specificity . | LR+ . | DOR . | . | . | . | |||
Smotrich et al | 292 | >294 | 0.83 | 0.92 | 10.8 | 60.0 | 2 | 19 | 9 | |||
>367 | 0.83 | 0.97 | 23.8 | 138.0 | 2 | 33 | 5 | |||||
Ranieri et al. | 177 | >350 | 0.79 | 0.81 | 4.1 | 15.8 | 27 | 60 | 36 | |||
Evers et al. | 213 | >220 | 0.26 | 0.96 | 6.5 | 8.5 | 16 | 56 | 8 | |||
Vazquez et al. | 248 | >92 | 0.64 | 0.38 | 1.0 | 1.1 | 9 | 9 | 62 | |||
>184 | 0.27 | 0.71 | 0.9 | 0.9 | 9 | 8 | 29 | |||||
>275 | 0.09 | 0.88 | 0.7 | 0.7 | 9 | 7 | 12 | |||||
>367 | 0.05 | 0.94 | 0.7 | 0.7 | 9 | 7 | 6 | |||||
Frattarelli et al. | 2476 | >73 | 0.76 | 0.13 | 0.9 | 0.5 | 14 | 12 | 86 | |||
>147 | 0.34 | 0.56 | 0.8 | 0.7 | 14 | 11 | 43 | |||||
>220 | 0.14 | 0.88 | 1.1 | 1.2 | 14 | 15 | 13 | |||||
>294 | 0.06 | 0.97 | 1.95 | 2.0 | 14 | 24 | 4 | |||||
>367 | 0.03 | 0.98 | 2.2 | 2.2 | 14 | 26 | 2 | |||||
Phophong et al. | 305 | >250 | 0.12 | 0.86 | 0.8 | 0.8 | 9 | 7 | 14 | |||
Penarrubia et al. | 80 | logistic model | 0.70 | 0.32 | 1.0 | 1.1 | 25 | 25 | 69 | |||
Bancsi et al. | 120 | >200 | 0.31 | 0.74 | 1.2 | 1.2 | 30 | 33 | 28 | |||
>250 | 0.22 | 0.92 | 2.7 | 3.1 | 30 | 53 | 13 |
Author . | Cycles (n) . | Estradiol threshold value (pmol/l) . | Prediction of poor response . | . | . | . | Pre- estradiol probability (%) . | Post- estradiol probability (%) . | Proportion of patients/ cycles with abnormal estradiol (%) . | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Sensitivity . | Specificity . | LR+ . | DOR . | . | . | . | |||
Smotrich et al | 292 | >294 | 0.83 | 0.92 | 10.8 | 60.0 | 2 | 19 | 9 | |||
>367 | 0.83 | 0.97 | 23.8 | 138.0 | 2 | 33 | 5 | |||||
Ranieri et al. | 177 | >350 | 0.79 | 0.81 | 4.1 | 15.8 | 27 | 60 | 36 | |||
Evers et al. | 213 | >220 | 0.26 | 0.96 | 6.5 | 8.5 | 16 | 56 | 8 | |||
Vazquez et al. | 248 | >92 | 0.64 | 0.38 | 1.0 | 1.1 | 9 | 9 | 62 | |||
>184 | 0.27 | 0.71 | 0.9 | 0.9 | 9 | 8 | 29 | |||||
>275 | 0.09 | 0.88 | 0.7 | 0.7 | 9 | 7 | 12 | |||||
>367 | 0.05 | 0.94 | 0.7 | 0.7 | 9 | 7 | 6 | |||||
Frattarelli et al. | 2476 | >73 | 0.76 | 0.13 | 0.9 | 0.5 | 14 | 12 | 86 | |||
>147 | 0.34 | 0.56 | 0.8 | 0.7 | 14 | 11 | 43 | |||||
>220 | 0.14 | 0.88 | 1.1 | 1.2 | 14 | 15 | 13 | |||||
>294 | 0.06 | 0.97 | 1.95 | 2.0 | 14 | 24 | 4 | |||||
>367 | 0.03 | 0.98 | 2.2 | 2.2 | 14 | 26 | 2 | |||||
Phophong et al. | 305 | >250 | 0.12 | 0.86 | 0.8 | 0.8 | 9 | 7 | 14 | |||
Penarrubia et al. | 80 | logistic model | 0.70 | 0.32 | 1.0 | 1.1 | 25 | 25 | 69 | |||
Bancsi et al. | 120 | >200 | 0.31 | 0.74 | 1.2 | 1.2 | 30 | 33 | 28 | |||
>250 | 0.22 | 0.92 | 2.7 | 3.1 | 30 | 53 | 13 |
DOR, diagnostic odds ratio; LR+, likelihood ratio for a positive test result.
If a study reported on multiple threshold values, data for all threshold values are shown.
Accuracy of non-pregnancy prediction
There were nine studies that reported on the capacity of basal estradiol to predict non-pregnancy after IVF. Sensitivities and specificities for the prediction of non-pregnancy, as calculated from each study, are summarized in Table XV. Again, sensitivity and specificity as plotted in Figure 11 were heterogeneous between studies (χ2-test statistic: P-value for sensitivity <0.001 and P-value for specificity <0.001). The Spearman correlation between sensitivity and specificity showed a coefficient of −0.89, sufficient to estimate a summary ROC curve (Figure 11). This summary ROC curve is almost parallel to the line x = y, indicating virtually no discriminative capacity.
Author . | Cycles (n) . | Estradiol threshold value (IU/l) . | Prediction of non-pregnancy . | . | . | . | Pre- estradiol probability (%) . | Post- estradiol probability (%) . | Proportion of patients/cycles with abnormal estradiol (%) . | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Sensitivity . | Specificity . | LR+ . | DOR . | . | . | . | |||
Smotrich et al. | 292 | >294 | 0.12 | 0.96 | 3.1 | 3.4 | 65 | 85 | 9 | |||
>367 | 0.08 | 1.00 | 8.7 | 9.4 | 65 | 94 | 5 | |||||
Licciardi et al. | 452 | >110 | 0.76 | 0.37 | 1.2 | 1.9 | 81 | 84 | 73 | |||
>165 | 0.42 | 0.69 | 1.3 | 1.6 | 81 | 85 | 40 | |||||
>220 | 0.20 | 0.87 | 1.6 | 1.8 | 81 | 87 | 19 | |||||
>275 | 0.08 | 1.00 | 7.4 | 8.0 | 81 | 97 | 7 | |||||
Evers et al. | 213 | >220 | 0.09 | 1.00 | 3.2 | 3.4 | 85 | 94 | 8 | |||
Vazquez et al. | 248 | >92 | 0.60 | 0.33 | 0.9 | 0.8 | 70 | 67 | 62 | |||
>184 | 0.29 | 0.72 | 1.0 | 1.0 | 70 | 70 | 29 | |||||
>275 | 0.11 | 0.85 | 0.7 | 0.7 | 70 | 63 | 12 | |||||
>367 | 0.05 | 0.91 | 0.5 | 0.5 | 70 | 53 | 6 | |||||
Hall et al. | 120 | >108 | 0.71 | 0.25 | 0.95 | 0.8 | 38 | 36 | 73 | |||
>136 | 0.47 | 0.49 | 0.92 | 0.9 | 38 | 36 | 49 | |||||
>167 | 0.20 | 0.72 | 0.7 | 0.6 | 38 | 30 | 25 | |||||
Frattarelli et al. | 2476 | >73 | 0.84 | 0.12 | 0.96 | 0.8 | 54 | 53 | 86 | |||
>147 | 0.41 | 0.55 | 0.9 | 0.9 | 54 | 52 | 43 | |||||
>220 | 0.12 | 0.87 | 1.0 | 0.99 | 54 | 54 | 13 | |||||
>294 | 0.04 | 0.97 | 1.3 | 1.3 | 54 | 61 | 4 | |||||
>367 | 0.02 | 0.99 | 1.9 | 1.95 | 54 | 70 | 2 | |||||
Phopong et al. | 305 | >250 | 0.13 | 0.83 | 0.8 | 0.8 | 77 | 72 | 14 | |||
Mikkelsen et al. | 132 | >200 | 0.22 | 1.00 | 3.9 | 4.7 | 89 | 96 | 20 | |||
Bancsi et al. | 120 | >200 | 0.27 | 0.70 | 0.9 | 0.9 | 78 | 76 | 28 | |||
>250 | 0.12 | 0.85 | 0.8 | 0.8 | 78 | 73 | 13 |
Author . | Cycles (n) . | Estradiol threshold value (IU/l) . | Prediction of non-pregnancy . | . | . | . | Pre- estradiol probability (%) . | Post- estradiol probability (%) . | Proportion of patients/cycles with abnormal estradiol (%) . | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Sensitivity . | Specificity . | LR+ . | DOR . | . | . | . | |||
Smotrich et al. | 292 | >294 | 0.12 | 0.96 | 3.1 | 3.4 | 65 | 85 | 9 | |||
>367 | 0.08 | 1.00 | 8.7 | 9.4 | 65 | 94 | 5 | |||||
Licciardi et al. | 452 | >110 | 0.76 | 0.37 | 1.2 | 1.9 | 81 | 84 | 73 | |||
>165 | 0.42 | 0.69 | 1.3 | 1.6 | 81 | 85 | 40 | |||||
>220 | 0.20 | 0.87 | 1.6 | 1.8 | 81 | 87 | 19 | |||||
>275 | 0.08 | 1.00 | 7.4 | 8.0 | 81 | 97 | 7 | |||||
Evers et al. | 213 | >220 | 0.09 | 1.00 | 3.2 | 3.4 | 85 | 94 | 8 | |||
Vazquez et al. | 248 | >92 | 0.60 | 0.33 | 0.9 | 0.8 | 70 | 67 | 62 | |||
>184 | 0.29 | 0.72 | 1.0 | 1.0 | 70 | 70 | 29 | |||||
>275 | 0.11 | 0.85 | 0.7 | 0.7 | 70 | 63 | 12 | |||||
>367 | 0.05 | 0.91 | 0.5 | 0.5 | 70 | 53 | 6 | |||||
Hall et al. | 120 | >108 | 0.71 | 0.25 | 0.95 | 0.8 | 38 | 36 | 73 | |||
>136 | 0.47 | 0.49 | 0.92 | 0.9 | 38 | 36 | 49 | |||||
>167 | 0.20 | 0.72 | 0.7 | 0.6 | 38 | 30 | 25 | |||||
Frattarelli et al. | 2476 | >73 | 0.84 | 0.12 | 0.96 | 0.8 | 54 | 53 | 86 | |||
>147 | 0.41 | 0.55 | 0.9 | 0.9 | 54 | 52 | 43 | |||||
>220 | 0.12 | 0.87 | 1.0 | 0.99 | 54 | 54 | 13 | |||||
>294 | 0.04 | 0.97 | 1.3 | 1.3 | 54 | 61 | 4 | |||||
>367 | 0.02 | 0.99 | 1.9 | 1.95 | 54 | 70 | 2 | |||||
Phopong et al. | 305 | >250 | 0.13 | 0.83 | 0.8 | 0.8 | 77 | 72 | 14 | |||
Mikkelsen et al. | 132 | >200 | 0.22 | 1.00 | 3.9 | 4.7 | 89 | 96 | 20 | |||
Bancsi et al. | 120 | >200 | 0.27 | 0.70 | 0.9 | 0.9 | 78 | 76 | 28 | |||
>250 | 0.12 | 0.85 | 0.8 | 0.8 | 78 | 73 | 13 |
DOR, diagnostic odds ratio; LR+, likelihood ratio for a positive test result.
If a study reported on multiple threshold values, data for all threshold values are shown.
Author . | Cycles (n) . | Estradiol threshold value (IU/l) . | Prediction of non-pregnancy . | . | . | . | Pre- estradiol probability (%) . | Post- estradiol probability (%) . | Proportion of patients/cycles with abnormal estradiol (%) . | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Sensitivity . | Specificity . | LR+ . | DOR . | . | . | . | |||
Smotrich et al. | 292 | >294 | 0.12 | 0.96 | 3.1 | 3.4 | 65 | 85 | 9 | |||
>367 | 0.08 | 1.00 | 8.7 | 9.4 | 65 | 94 | 5 | |||||
Licciardi et al. | 452 | >110 | 0.76 | 0.37 | 1.2 | 1.9 | 81 | 84 | 73 | |||
>165 | 0.42 | 0.69 | 1.3 | 1.6 | 81 | 85 | 40 | |||||
>220 | 0.20 | 0.87 | 1.6 | 1.8 | 81 | 87 | 19 | |||||
>275 | 0.08 | 1.00 | 7.4 | 8.0 | 81 | 97 | 7 | |||||
Evers et al. | 213 | >220 | 0.09 | 1.00 | 3.2 | 3.4 | 85 | 94 | 8 | |||
Vazquez et al. | 248 | >92 | 0.60 | 0.33 | 0.9 | 0.8 | 70 | 67 | 62 | |||
>184 | 0.29 | 0.72 | 1.0 | 1.0 | 70 | 70 | 29 | |||||
>275 | 0.11 | 0.85 | 0.7 | 0.7 | 70 | 63 | 12 | |||||
>367 | 0.05 | 0.91 | 0.5 | 0.5 | 70 | 53 | 6 | |||||
Hall et al. | 120 | >108 | 0.71 | 0.25 | 0.95 | 0.8 | 38 | 36 | 73 | |||
>136 | 0.47 | 0.49 | 0.92 | 0.9 | 38 | 36 | 49 | |||||
>167 | 0.20 | 0.72 | 0.7 | 0.6 | 38 | 30 | 25 | |||||
Frattarelli et al. | 2476 | >73 | 0.84 | 0.12 | 0.96 | 0.8 | 54 | 53 | 86 | |||
>147 | 0.41 | 0.55 | 0.9 | 0.9 | 54 | 52 | 43 | |||||
>220 | 0.12 | 0.87 | 1.0 | 0.99 | 54 | 54 | 13 | |||||
>294 | 0.04 | 0.97 | 1.3 | 1.3 | 54 | 61 | 4 | |||||
>367 | 0.02 | 0.99 | 1.9 | 1.95 | 54 | 70 | 2 | |||||
Phopong et al. | 305 | >250 | 0.13 | 0.83 | 0.8 | 0.8 | 77 | 72 | 14 | |||
Mikkelsen et al. | 132 | >200 | 0.22 | 1.00 | 3.9 | 4.7 | 89 | 96 | 20 | |||
Bancsi et al. | 120 | >200 | 0.27 | 0.70 | 0.9 | 0.9 | 78 | 76 | 28 | |||
>250 | 0.12 | 0.85 | 0.8 | 0.8 | 78 | 73 | 13 |
Author . | Cycles (n) . | Estradiol threshold value (IU/l) . | Prediction of non-pregnancy . | . | . | . | Pre- estradiol probability (%) . | Post- estradiol probability (%) . | Proportion of patients/cycles with abnormal estradiol (%) . | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Sensitivity . | Specificity . | LR+ . | DOR . | . | . | . | |||
Smotrich et al. | 292 | >294 | 0.12 | 0.96 | 3.1 | 3.4 | 65 | 85 | 9 | |||
>367 | 0.08 | 1.00 | 8.7 | 9.4 | 65 | 94 | 5 | |||||
Licciardi et al. | 452 | >110 | 0.76 | 0.37 | 1.2 | 1.9 | 81 | 84 | 73 | |||
>165 | 0.42 | 0.69 | 1.3 | 1.6 | 81 | 85 | 40 | |||||
>220 | 0.20 | 0.87 | 1.6 | 1.8 | 81 | 87 | 19 | |||||
>275 | 0.08 | 1.00 | 7.4 | 8.0 | 81 | 97 | 7 | |||||
Evers et al. | 213 | >220 | 0.09 | 1.00 | 3.2 | 3.4 | 85 | 94 | 8 | |||
Vazquez et al. | 248 | >92 | 0.60 | 0.33 | 0.9 | 0.8 | 70 | 67 | 62 | |||
>184 | 0.29 | 0.72 | 1.0 | 1.0 | 70 | 70 | 29 | |||||
>275 | 0.11 | 0.85 | 0.7 | 0.7 | 70 | 63 | 12 | |||||
>367 | 0.05 | 0.91 | 0.5 | 0.5 | 70 | 53 | 6 | |||||
Hall et al. | 120 | >108 | 0.71 | 0.25 | 0.95 | 0.8 | 38 | 36 | 73 | |||
>136 | 0.47 | 0.49 | 0.92 | 0.9 | 38 | 36 | 49 | |||||
>167 | 0.20 | 0.72 | 0.7 | 0.6 | 38 | 30 | 25 | |||||
Frattarelli et al. | 2476 | >73 | 0.84 | 0.12 | 0.96 | 0.8 | 54 | 53 | 86 | |||
>147 | 0.41 | 0.55 | 0.9 | 0.9 | 54 | 52 | 43 | |||||
>220 | 0.12 | 0.87 | 1.0 | 0.99 | 54 | 54 | 13 | |||||
>294 | 0.04 | 0.97 | 1.3 | 1.3 | 54 | 61 | 4 | |||||
>367 | 0.02 | 0.99 | 1.9 | 1.95 | 54 | 70 | 2 | |||||
Phopong et al. | 305 | >250 | 0.13 | 0.83 | 0.8 | 0.8 | 77 | 72 | 14 | |||
Mikkelsen et al. | 132 | >200 | 0.22 | 1.00 | 3.9 | 4.7 | 89 | 96 | 20 | |||
Bancsi et al. | 120 | >200 | 0.27 | 0.70 | 0.9 | 0.9 | 78 | 76 | 28 | |||
>250 | 0.12 | 0.85 | 0.8 | 0.8 | 78 | 73 | 13 |
DOR, diagnostic odds ratio; LR+, likelihood ratio for a positive test result.
If a study reported on multiple threshold values, data for all threshold values are shown.
Clinical value
Based on the two summary ROC curves for all studies depicted in Figure 10, a range of positive LRs was calculated and for each ratio, pre-estradiol-test probabilities of poor response or non-pregnancy (20 and 80%, respectively) were converted into post-estradiol-test probabilities. Table XVI (please see addendum) depicts the probability of obtaining a certain estradiol-test result and the corresponding LR, within different LR ranges for the prediction of poor response and non-pregnancy. At a moderate LR of 4–5, the post-estradiol-test probability of poor response will not be higher than ∼50%, while the chance of obtaining such a test result is very small.
Prediction of poor response (pre-test probability = 20%) . | . | . | Prediction of non-pregnancy (pre-test probability = 80%) . | . | . | ||||
---|---|---|---|---|---|---|---|---|---|
LR range . | Occurrence of test results in this range (%) . | Post-test probability of poor response (%) . | LR range . | Occurrence of test results in this range (%) . | Post-test probability of non-pregnancy (%) . | ||||
0–1 | 83 | <20 | 0–1 | 82 | <80 | ||||
1–2 | 12 | 20–33 | 1–2 | 17 | 80–89 | ||||
2–3 | 3 | 33–43 | 2–3 | 1 | 89–93 | ||||
3–4 | 1 | 43–50 | 3–4 | 0 | 93–94 | ||||
4–5 | 1 | 50–56 | 4–5 | 0 | 94–95 | ||||
5–6 | 0 | 56–60 | 5–6 | 0 | 95–96 | ||||
6–7 | 0 | 60–64 | 6–7 | 0 | 96–96.5 | ||||
7–8 | 0 | 64–67 | 7–8 | 0 | 96.5–97 | ||||
>8 | 0 | >67 | >8 | 0 | >97 |
Prediction of poor response (pre-test probability = 20%) . | . | . | Prediction of non-pregnancy (pre-test probability = 80%) . | . | . | ||||
---|---|---|---|---|---|---|---|---|---|
LR range . | Occurrence of test results in this range (%) . | Post-test probability of poor response (%) . | LR range . | Occurrence of test results in this range (%) . | Post-test probability of non-pregnancy (%) . | ||||
0–1 | 83 | <20 | 0–1 | 82 | <80 | ||||
1–2 | 12 | 20–33 | 1–2 | 17 | 80–89 | ||||
2–3 | 3 | 33–43 | 2–3 | 1 | 89–93 | ||||
3–4 | 1 | 43–50 | 3–4 | 0 | 93–94 | ||||
4–5 | 1 | 50–56 | 4–5 | 0 | 94–95 | ||||
5–6 | 0 | 56–60 | 5–6 | 0 | 95–96 | ||||
6–7 | 0 | 60–64 | 6–7 | 0 | 96–96.5 | ||||
7–8 | 0 | 64–67 | 7–8 | 0 | 96.5–97 | ||||
>8 | 0 | >67 | >8 | 0 | >97 |
Prediction of poor response (pre-test probability = 20%) . | . | . | Prediction of non-pregnancy (pre-test probability = 80%) . | . | . | ||||
---|---|---|---|---|---|---|---|---|---|
LR range . | Occurrence of test results in this range (%) . | Post-test probability of poor response (%) . | LR range . | Occurrence of test results in this range (%) . | Post-test probability of non-pregnancy (%) . | ||||
0–1 | 83 | <20 | 0–1 | 82 | <80 | ||||
1–2 | 12 | 20–33 | 1–2 | 17 | 80–89 | ||||
2–3 | 3 | 33–43 | 2–3 | 1 | 89–93 | ||||
3–4 | 1 | 43–50 | 3–4 | 0 | 93–94 | ||||
4–5 | 1 | 50–56 | 4–5 | 0 | 94–95 | ||||
5–6 | 0 | 56–60 | 5–6 | 0 | 95–96 | ||||
6–7 | 0 | 60–64 | 6–7 | 0 | 96–96.5 | ||||
7–8 | 0 | 64–67 | 7–8 | 0 | 96.5–97 | ||||
>8 | 0 | >67 | >8 | 0 | >97 |
Prediction of poor response (pre-test probability = 20%) . | . | . | Prediction of non-pregnancy (pre-test probability = 80%) . | . | . | ||||
---|---|---|---|---|---|---|---|---|---|
LR range . | Occurrence of test results in this range (%) . | Post-test probability of poor response (%) . | LR range . | Occurrence of test results in this range (%) . | Post-test probability of non-pregnancy (%) . | ||||
0–1 | 83 | <20 | 0–1 | 82 | <80 | ||||
1–2 | 12 | 20–33 | 1–2 | 17 | 80–89 | ||||
2–3 | 3 | 33–43 | 2–3 | 1 | 89–93 | ||||
3–4 | 1 | 43–50 | 3–4 | 0 | 93–94 | ||||
4–5 | 1 | 50–56 | 4–5 | 0 | 94–95 | ||||
5–6 | 0 | 56–60 | 5–6 | 0 | 95–96 | ||||
6–7 | 0 | 60–64 | 6–7 | 0 | 96–96.5 | ||||
7–8 | 0 | 64–67 | 7–8 | 0 | 96.5–97 | ||||
>8 | 0 | >67 | >8 | 0 | >97 |
For prediction of non-pregnancy no clear threshold levels can be identified for basal estradiol that will lead to an adequate combination of LR, post-test probability and abnormal test rate. This could be anticipated from the shape of the ROC curve in Figure 11
All this leads to the conclusion that the clinical applicability for basal estradiol as a test before starting IVF is prevented by the very low predictive accuracy, both for poor response and non-pregnancy.
AFC
Systematic review
Through the search and selection strategy, a total of 15 studies reporting on the predictive capacity of basal AFC and suitable for data extraction and meta-analysis were identified (Chang et al., 1998b; Frattarelli et al., 2000; Ng et al., 2000; Sharara and McClamrock, 2000; Hsieh et al., 2001; Nahum et al., 2001; Bancsi et al., 2002a; Erdem et al., 2002; Fisch and Sher, 2002; Fiçicioğlu et al., 2003; Frattarelli et al., 2003; Jarvela et al., 2003; Kupesic et al., 2003; Yong et al., 2003; Durmusoglu et al., 2004). Characteristics of the included studies are listed in addendum Table XVII. Variation among the definitions of poor response and study quality and design characteristics is clearly present but logistic regression analysis revealed that none of the items significantly impacted upon the predictive performance of the test. Subgroup analysis therefore was not indicated.
Author . | Consecutive . | One cycle per couple . | Data per . | Definition . | . | Diameter follicles (mm) . | ultrasonograph . | |
---|---|---|---|---|---|---|---|---|
. | . | . | . | Poor response/Cancel . | Pregnancy . | . | . | |
Chang et al. | Yes | No | Cycle | <2 follicles 18 mm. | Ongoing | 2–5 | Accuson 120XP/10: 7 MHz probe | |
Ng et al. | Yes | Yes | Cycle | <3 follicles 15 mm. | Clinical | Not stated | Aloka SSD-620: 5 MHz probe | |
Frattarelli et al. (2000) | No | Yes | Cycle | <3 follicles | Not stated | 2–10 | Acuson 128: 7 MHz probe | |
Sharara et al. | Yes | No | Cycle | Not stated | Clinical | 2–8 | Not stated | |
Hsieh et al. | Yes | No | Cycle | No oocytes or poor follicle growth | Clinical | 2–10 | Acuson Aspen: 4 MHz probe | |
Nahum et al. | Yes | No | Cycle | <3 follicles 18 mm. | Clinical | 2–6 | General electric RT-X200: 6.5 MHz probe | |
Fisch et al. | Yes | Yes | Cycle | Not stated | Clinical | Not stated | Not stated | |
Bancsi et al. | Yes | Yes | Cycle | <4 oocytes or <3 follicles 18 mm. | Clinical/ongoing | 2–5 | Toshiba Capasee SSA-220A: 7.5 MHz probe | |
Frattarelli et al. (2003) | Yes | Yes | Cycle | <3 follicles | Not stated | 2–10 | Acuson 128: 7 MHz probe | |
Järvelä et al. | Yes | Yes | Cycle | <4 follicles | Clinical | 2–5 | Kretz Combison 530D | |
Kupesic et al. | Yes | Yes | Cycle | Not stated | Clinical | Not stated | Combison 530D: 7.5 MHz probe | |
Yong et al. | No | Yes | Cycle | <4 oocytes or cancel | Clinical | 2–10 | Toshiba Eccocee: 7 MHz probe | |
Fiçicioğlu et al. | Yes | Yes | Cycle | <5 oocytes | Not stated | ≥2 | General Electric Alfa Logic 200: 5 MHZ probe | |
Erdem et al. | Yes | Yes | Cycle | <3 follicles 14 mm. or <5 oocytes (MII) | Not stated | Not stated | Aloka SSD-1000: 5 MHz probe | |
Durmusoglu et al. | No | No | Cycle | Poor follicle growth or <3 oocytes (MII) | Not stated | 2–10 | GE Logiq200: 6.5 MHz probe |
Author . | Consecutive . | One cycle per couple . | Data per . | Definition . | . | Diameter follicles (mm) . | ultrasonograph . | |
---|---|---|---|---|---|---|---|---|
. | . | . | . | Poor response/Cancel . | Pregnancy . | . | . | |
Chang et al. | Yes | No | Cycle | <2 follicles 18 mm. | Ongoing | 2–5 | Accuson 120XP/10: 7 MHz probe | |
Ng et al. | Yes | Yes | Cycle | <3 follicles 15 mm. | Clinical | Not stated | Aloka SSD-620: 5 MHz probe | |
Frattarelli et al. (2000) | No | Yes | Cycle | <3 follicles | Not stated | 2–10 | Acuson 128: 7 MHz probe | |
Sharara et al. | Yes | No | Cycle | Not stated | Clinical | 2–8 | Not stated | |
Hsieh et al. | Yes | No | Cycle | No oocytes or poor follicle growth | Clinical | 2–10 | Acuson Aspen: 4 MHz probe | |
Nahum et al. | Yes | No | Cycle | <3 follicles 18 mm. | Clinical | 2–6 | General electric RT-X200: 6.5 MHz probe | |
Fisch et al. | Yes | Yes | Cycle | Not stated | Clinical | Not stated | Not stated | |
Bancsi et al. | Yes | Yes | Cycle | <4 oocytes or <3 follicles 18 mm. | Clinical/ongoing | 2–5 | Toshiba Capasee SSA-220A: 7.5 MHz probe | |
Frattarelli et al. (2003) | Yes | Yes | Cycle | <3 follicles | Not stated | 2–10 | Acuson 128: 7 MHz probe | |
Järvelä et al. | Yes | Yes | Cycle | <4 follicles | Clinical | 2–5 | Kretz Combison 530D | |
Kupesic et al. | Yes | Yes | Cycle | Not stated | Clinical | Not stated | Combison 530D: 7.5 MHz probe | |
Yong et al. | No | Yes | Cycle | <4 oocytes or cancel | Clinical | 2–10 | Toshiba Eccocee: 7 MHz probe | |
Fiçicioğlu et al. | Yes | Yes | Cycle | <5 oocytes | Not stated | ≥2 | General Electric Alfa Logic 200: 5 MHZ probe | |
Erdem et al. | Yes | Yes | Cycle | <3 follicles 14 mm. or <5 oocytes (MII) | Not stated | Not stated | Aloka SSD-1000: 5 MHz probe | |
Durmusoglu et al. | No | No | Cycle | Poor follicle growth or <3 oocytes (MII) | Not stated | 2–10 | GE Logiq200: 6.5 MHz probe |
Author . | Consecutive . | One cycle per couple . | Data per . | Definition . | . | Diameter follicles (mm) . | ultrasonograph . | |
---|---|---|---|---|---|---|---|---|
. | . | . | . | Poor response/Cancel . | Pregnancy . | . | . | |
Chang et al. | Yes | No | Cycle | <2 follicles 18 mm. | Ongoing | 2–5 | Accuson 120XP/10: 7 MHz probe | |
Ng et al. | Yes | Yes | Cycle | <3 follicles 15 mm. | Clinical | Not stated | Aloka SSD-620: 5 MHz probe | |
Frattarelli et al. (2000) | No | Yes | Cycle | <3 follicles | Not stated | 2–10 | Acuson 128: 7 MHz probe | |
Sharara et al. | Yes | No | Cycle | Not stated | Clinical | 2–8 | Not stated | |
Hsieh et al. | Yes | No | Cycle | No oocytes or poor follicle growth | Clinical | 2–10 | Acuson Aspen: 4 MHz probe | |
Nahum et al. | Yes | No | Cycle | <3 follicles 18 mm. | Clinical | 2–6 | General electric RT-X200: 6.5 MHz probe | |
Fisch et al. | Yes | Yes | Cycle | Not stated | Clinical | Not stated | Not stated | |
Bancsi et al. | Yes | Yes | Cycle | <4 oocytes or <3 follicles 18 mm. | Clinical/ongoing | 2–5 | Toshiba Capasee SSA-220A: 7.5 MHz probe | |
Frattarelli et al. (2003) | Yes | Yes | Cycle | <3 follicles | Not stated | 2–10 | Acuson 128: 7 MHz probe | |
Järvelä et al. | Yes | Yes | Cycle | <4 follicles | Clinical | 2–5 | Kretz Combison 530D | |
Kupesic et al. | Yes | Yes | Cycle | Not stated | Clinical | Not stated | Combison 530D: 7.5 MHz probe | |
Yong et al. | No | Yes | Cycle | <4 oocytes or cancel | Clinical | 2–10 | Toshiba Eccocee: 7 MHz probe | |
Fiçicioğlu et al. | Yes | Yes | Cycle | <5 oocytes | Not stated | ≥2 | General Electric Alfa Logic 200: 5 MHZ probe | |
Erdem et al. | Yes | Yes | Cycle | <3 follicles 14 mm. or <5 oocytes (MII) | Not stated | Not stated | Aloka SSD-1000: 5 MHz probe | |
Durmusoglu et al. | No | No | Cycle | Poor follicle growth or <3 oocytes (MII) | Not stated | 2–10 | GE Logiq200: 6.5 MHz probe |
Author . | Consecutive . | One cycle per couple . | Data per . | Definition . | . | Diameter follicles (mm) . | ultrasonograph . | |
---|---|---|---|---|---|---|---|---|
. | . | . | . | Poor response/Cancel . | Pregnancy . | . | . | |
Chang et al. | Yes | No | Cycle | <2 follicles 18 mm. | Ongoing | 2–5 | Accuson 120XP/10: 7 MHz probe | |
Ng et al. | Yes | Yes | Cycle | <3 follicles 15 mm. | Clinical | Not stated | Aloka SSD-620: 5 MHz probe | |
Frattarelli et al. (2000) | No | Yes | Cycle | <3 follicles | Not stated | 2–10 | Acuson 128: 7 MHz probe | |
Sharara et al. | Yes | No | Cycle | Not stated | Clinical | 2–8 | Not stated | |
Hsieh et al. | Yes | No | Cycle | No oocytes or poor follicle growth | Clinical | 2–10 | Acuson Aspen: 4 MHz probe | |
Nahum et al. | Yes | No | Cycle | <3 follicles 18 mm. | Clinical | 2–6 | General electric RT-X200: 6.5 MHz probe | |
Fisch et al. | Yes | Yes | Cycle | Not stated | Clinical | Not stated | Not stated | |
Bancsi et al. | Yes | Yes | Cycle | <4 oocytes or <3 follicles 18 mm. | Clinical/ongoing | 2–5 | Toshiba Capasee SSA-220A: 7.5 MHz probe | |
Frattarelli et al. (2003) | Yes | Yes | Cycle | <3 follicles | Not stated | 2–10 | Acuson 128: 7 MHz probe | |
Järvelä et al. | Yes | Yes | Cycle | <4 follicles | Clinical | 2–5 | Kretz Combison 530D | |
Kupesic et al. | Yes | Yes | Cycle | Not stated | Clinical | Not stated | Combison 530D: 7.5 MHz probe | |
Yong et al. | No | Yes | Cycle | <4 oocytes or cancel | Clinical | 2–10 | Toshiba Eccocee: 7 MHz probe | |
Fiçicioğlu et al. | Yes | Yes | Cycle | <5 oocytes | Not stated | ≥2 | General Electric Alfa Logic 200: 5 MHZ probe | |
Erdem et al. | Yes | Yes | Cycle | <3 follicles 14 mm. or <5 oocytes (MII) | Not stated | Not stated | Aloka SSD-1000: 5 MHz probe | |
Durmusoglu et al. | No | No | Cycle | Poor follicle growth or <3 oocytes (MII) | Not stated | 2–10 | GE Logiq200: 6.5 MHz probe |
Accuracy of poor response prediction
The sensitivities and specificities, the positive LR and the DOR for the prediction of poor ovarian response, as calculated from each study, are summarized in Table XVIII. Calculation of one summary point estimate for sensitivity and specificity was not meaningful, as both test characteristics as plotted in Figure 12 were heterogeneous among studies (χ2-test statistic: P-value for sensitivity 0.001 and P-value for specificity 0.001). The Spearman correlation coefficient for sensitivity and specificity was −0.57 and was judged to be sufficient to estimate a summary ROC curve (Figure 12).
Author . | Cycles (n) . | AFC threshold value (n) . | Prediction of poor response . | . | . | . | Pre-AFC probability (%) . | Post-AFC probability (%) . | Proportion of patients/cycles with abnormal AFC (%) . | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Sensitivity . | Specificity . | LR+ . | DOR . | . | . | . | |||
Chang et al. | 149 | <3 | 0.73 | 0.96 | 19.7 | 65 | 10 | 69 | 11 | |||
Ng et al. | 128 | <4 | 0.33 | 0.92 | 4.2 | 5.7 | 2 | 9 | 9 | |||
<6 | 0.80 | 0.76 | 3.3 | 13 | 2 | 11 | 27 | |||||
<9 | 0.80 | 0.40 | 1.3 | 2.7 | 2 | 5 | 61 | |||||
Frattarelli et al. (2000) | 278 | <10 | 0.87 | 0.41 | 1.5 | 4.7 | 8 | 12 | 61 | |||
Sharara et al. | 127 | <4 | 0.53 | 0.73 | 1.9 | 3.0 | 15 | 26 | 31 | |||
Hsieh et al. | 372 | <3 | 0.61 | 0.94 | 10.0 | 23 | 5 | 34 | 9 | |||
Nahum et al. | 272 | <6 | 0.95 | 0.69 | 3.1 | 42 | 14 | 33 | 39 | |||
Bancsi et al. | 120 | <4 | 0.61 | 0.88 | 5.1 | 12 | 30 | 69 | 27 | |||
<6 | 0.81 | 0.77 | 3.6 | 14 | 30 | 60 | 40 | |||||
Frattarelli et al. (2003) | 267 | <4 | 0.30 | 0.96 | 7.4 | 10 | 9 | 41 | 6 | |||
Järvelä et al. | 45 | <4 | 0.86 | 0.84 | 5.4 | 32 | 16 | 50 | 27 | |||
Yong et al. | 47 | <4 | 0.09 | 0.97 | 3.3 | 3.2 | 23 | 50 | 4 | |||
<6 | 0.36 | 0.89 | 3.3 | 4.6 | 23 | 50 | 17 | |||||
Fiçicioğlu et al. | 58 | <7 | 0.77 | 0.41 | 1.3 | 2.3 | 43 | 50 | 66 | |||
Erdem et al. | 32 | logistic model | 0.75 | 0.63 | 2.0 | 5.1 | 50 | 67 | 56 | |||
Durmusoglu et al. | 91 | <6.5 | 0.85 | 0.74 | 3.3 | 16 | 26 | 53 | 41 |
Author . | Cycles (n) . | AFC threshold value (n) . | Prediction of poor response . | . | . | . | Pre-AFC probability (%) . | Post-AFC probability (%) . | Proportion of patients/cycles with abnormal AFC (%) . | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Sensitivity . | Specificity . | LR+ . | DOR . | . | . | . | |||
Chang et al. | 149 | <3 | 0.73 | 0.96 | 19.7 | 65 | 10 | 69 | 11 | |||
Ng et al. | 128 | <4 | 0.33 | 0.92 | 4.2 | 5.7 | 2 | 9 | 9 | |||
<6 | 0.80 | 0.76 | 3.3 | 13 | 2 | 11 | 27 | |||||
<9 | 0.80 | 0.40 | 1.3 | 2.7 | 2 | 5 | 61 | |||||
Frattarelli et al. (2000) | 278 | <10 | 0.87 | 0.41 | 1.5 | 4.7 | 8 | 12 | 61 | |||
Sharara et al. | 127 | <4 | 0.53 | 0.73 | 1.9 | 3.0 | 15 | 26 | 31 | |||
Hsieh et al. | 372 | <3 | 0.61 | 0.94 | 10.0 | 23 | 5 | 34 | 9 | |||
Nahum et al. | 272 | <6 | 0.95 | 0.69 | 3.1 | 42 | 14 | 33 | 39 | |||
Bancsi et al. | 120 | <4 | 0.61 | 0.88 | 5.1 | 12 | 30 | 69 | 27 | |||
<6 | 0.81 | 0.77 | 3.6 | 14 | 30 | 60 | 40 | |||||
Frattarelli et al. (2003) | 267 | <4 | 0.30 | 0.96 | 7.4 | 10 | 9 | 41 | 6 | |||
Järvelä et al. | 45 | <4 | 0.86 | 0.84 | 5.4 | 32 | 16 | 50 | 27 | |||
Yong et al. | 47 | <4 | 0.09 | 0.97 | 3.3 | 3.2 | 23 | 50 | 4 | |||
<6 | 0.36 | 0.89 | 3.3 | 4.6 | 23 | 50 | 17 | |||||
Fiçicioğlu et al. | 58 | <7 | 0.77 | 0.41 | 1.3 | 2.3 | 43 | 50 | 66 | |||
Erdem et al. | 32 | logistic model | 0.75 | 0.63 | 2.0 | 5.1 | 50 | 67 | 56 | |||
Durmusoglu et al. | 91 | <6.5 | 0.85 | 0.74 | 3.3 | 16 | 26 | 53 | 41 |
DOR, diagnostic odds ratio; LR+, likelihood ratio for a positive test result.
If a study reported on multiple threshold values, data for all threshold values are shown.
Author . | Cycles (n) . | AFC threshold value (n) . | Prediction of poor response . | . | . | . | Pre-AFC probability (%) . | Post-AFC probability (%) . | Proportion of patients/cycles with abnormal AFC (%) . | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Sensitivity . | Specificity . | LR+ . | DOR . | . | . | . | |||
Chang et al. | 149 | <3 | 0.73 | 0.96 | 19.7 | 65 | 10 | 69 | 11 | |||
Ng et al. | 128 | <4 | 0.33 | 0.92 | 4.2 | 5.7 | 2 | 9 | 9 | |||
<6 | 0.80 | 0.76 | 3.3 | 13 | 2 | 11 | 27 | |||||
<9 | 0.80 | 0.40 | 1.3 | 2.7 | 2 | 5 | 61 | |||||
Frattarelli et al. (2000) | 278 | <10 | 0.87 | 0.41 | 1.5 | 4.7 | 8 | 12 | 61 | |||
Sharara et al. | 127 | <4 | 0.53 | 0.73 | 1.9 | 3.0 | 15 | 26 | 31 | |||
Hsieh et al. | 372 | <3 | 0.61 | 0.94 | 10.0 | 23 | 5 | 34 | 9 | |||
Nahum et al. | 272 | <6 | 0.95 | 0.69 | 3.1 | 42 | 14 | 33 | 39 | |||
Bancsi et al. | 120 | <4 | 0.61 | 0.88 | 5.1 | 12 | 30 | 69 | 27 | |||
<6 | 0.81 | 0.77 | 3.6 | 14 | 30 | 60 | 40 | |||||
Frattarelli et al. (2003) | 267 | <4 | 0.30 | 0.96 | 7.4 | 10 | 9 | 41 | 6 | |||
Järvelä et al. | 45 | <4 | 0.86 | 0.84 | 5.4 | 32 | 16 | 50 | 27 | |||
Yong et al. | 47 | <4 | 0.09 | 0.97 | 3.3 | 3.2 | 23 | 50 | 4 | |||
<6 | 0.36 | 0.89 | 3.3 | 4.6 | 23 | 50 | 17 | |||||
Fiçicioğlu et al. | 58 | <7 | 0.77 | 0.41 | 1.3 | 2.3 | 43 | 50 | 66 | |||
Erdem et al. | 32 | logistic model | 0.75 | 0.63 | 2.0 | 5.1 | 50 | 67 | 56 | |||
Durmusoglu et al. | 91 | <6.5 | 0.85 | 0.74 | 3.3 | 16 | 26 | 53 | 41 |
Author . | Cycles (n) . | AFC threshold value (n) . | Prediction of poor response . | . | . | . | Pre-AFC probability (%) . | Post-AFC probability (%) . | Proportion of patients/cycles with abnormal AFC (%) . | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Sensitivity . | Specificity . | LR+ . | DOR . | . | . | . | |||
Chang et al. | 149 | <3 | 0.73 | 0.96 | 19.7 | 65 | 10 | 69 | 11 | |||
Ng et al. | 128 | <4 | 0.33 | 0.92 | 4.2 | 5.7 | 2 | 9 | 9 | |||
<6 | 0.80 | 0.76 | 3.3 | 13 | 2 | 11 | 27 | |||||
<9 | 0.80 | 0.40 | 1.3 | 2.7 | 2 | 5 | 61 | |||||
Frattarelli et al. (2000) | 278 | <10 | 0.87 | 0.41 | 1.5 | 4.7 | 8 | 12 | 61 | |||
Sharara et al. | 127 | <4 | 0.53 | 0.73 | 1.9 | 3.0 | 15 | 26 | 31 | |||
Hsieh et al. | 372 | <3 | 0.61 | 0.94 | 10.0 | 23 | 5 | 34 | 9 | |||
Nahum et al. | 272 | <6 | 0.95 | 0.69 | 3.1 | 42 | 14 | 33 | 39 | |||
Bancsi et al. | 120 | <4 | 0.61 | 0.88 | 5.1 | 12 | 30 | 69 | 27 | |||
<6 | 0.81 | 0.77 | 3.6 | 14 | 30 | 60 | 40 | |||||
Frattarelli et al. (2003) | 267 | <4 | 0.30 | 0.96 | 7.4 | 10 | 9 | 41 | 6 | |||
Järvelä et al. | 45 | <4 | 0.86 | 0.84 | 5.4 | 32 | 16 | 50 | 27 | |||
Yong et al. | 47 | <4 | 0.09 | 0.97 | 3.3 | 3.2 | 23 | 50 | 4 | |||
<6 | 0.36 | 0.89 | 3.3 | 4.6 | 23 | 50 | 17 | |||||
Fiçicioğlu et al. | 58 | <7 | 0.77 | 0.41 | 1.3 | 2.3 | 43 | 50 | 66 | |||
Erdem et al. | 32 | logistic model | 0.75 | 0.63 | 2.0 | 5.1 | 50 | 67 | 56 | |||
Durmusoglu et al. | 91 | <6.5 | 0.85 | 0.74 | 3.3 | 16 | 26 | 53 | 41 |
DOR, diagnostic odds ratio; LR+, likelihood ratio for a positive test result.
If a study reported on multiple threshold values, data for all threshold values are shown.
Accuracy of non-pregnancy prediction
Sensitivities and specificities for the prediction of non-pregnancy, as calculated from each study, are summarized in Table XIX. Again, sensitivity and specificity as plotted in Figure 13 were heterogeneous between studies (χ2-test statistic: P-value for sensitivity 0.001 and P-value for specificity 0.001). The Spearman correlation between sensitivity and specificity showed a coefficient of −0.66, sufficient to estimate a summary ROC curve (Figure 13).
Author . | Cycles (n) . | AFC threshold value (n) . | Prediction of non-pregnancy . | . | . | . | Pre-AFC probability (%) . | Post-AFC probability (%) . | Proportion of patients/cycles with abnormal AFC (%) . | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Sensitivity . | Specificity . | LR+ . | DOR . | . | . | . | |||
Chang et al. | 149 | <3 | 0.13 | 0.96 | 3.6 | 3.6 | 83 | 94 | 11 | |||
Ng et al. | 128 | <4 | 0.07 | 0.83 | 0.4 | 0.4 | 86 | 73 | 9 | |||
<6 | 0.26 | 0.78 | 1.2 | 1.2 | 86 | 88 | 26 | |||||
<9 | 0.60 | 0.33 | 0.9 | 0.7 | 86 | 61 | 85 | |||||
Sharara et al. | 127 | <4 | 0.27 | 0.64 | 0.8 | 0.7 | 56 | 49 | 31 | |||
Hsieh et al. | 372 | <3 | 0.12 | 0.98 | 6.9 | 6.7 | 68 | 94 | 9 | |||
Nahum et al. | 272 | <6 | 0.54 | 0.87 | 4.0 | 7.9 | 64 | 88 | 39 | |||
Fisch et al. | 200 | <10 | 0.24 | 0.89 | 2.2 | 2.6 | 59 | 76 | 19 | |||
Bancsi et al. | 107 | <4 | 0.34 | 0.88 | 2.9 | 3.8 | 68 | 86 | 27 | |||
<6 | 0.45 | 0.68 | 1.4 | 1.7 | 68 | 75 | 41 | |||||
Järvelä et al. | 45 | <4 | 0.26 | 0.71 | 0.9 | 0.9 | 69 | 67 | 27 | |||
Kupesic et al. | 56 | <4 | 0.33 | 0.96 | 8.3 | 11.8 | 61 | 92 | 22 | |||
Yong et al. | 47 | <4 | 0.08 | 0.92 | 0.9 | 1.0 | 76 | 75 | 9 | |||
<6 | 0.16 | 0.90 | 1.6 | 1.7 | 79 | 86 | 27 |
Author . | Cycles (n) . | AFC threshold value (n) . | Prediction of non-pregnancy . | . | . | . | Pre-AFC probability (%) . | Post-AFC probability (%) . | Proportion of patients/cycles with abnormal AFC (%) . | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Sensitivity . | Specificity . | LR+ . | DOR . | . | . | . | |||
Chang et al. | 149 | <3 | 0.13 | 0.96 | 3.6 | 3.6 | 83 | 94 | 11 | |||
Ng et al. | 128 | <4 | 0.07 | 0.83 | 0.4 | 0.4 | 86 | 73 | 9 | |||
<6 | 0.26 | 0.78 | 1.2 | 1.2 | 86 | 88 | 26 | |||||
<9 | 0.60 | 0.33 | 0.9 | 0.7 | 86 | 61 | 85 | |||||
Sharara et al. | 127 | <4 | 0.27 | 0.64 | 0.8 | 0.7 | 56 | 49 | 31 | |||
Hsieh et al. | 372 | <3 | 0.12 | 0.98 | 6.9 | 6.7 | 68 | 94 | 9 | |||
Nahum et al. | 272 | <6 | 0.54 | 0.87 | 4.0 | 7.9 | 64 | 88 | 39 | |||
Fisch et al. | 200 | <10 | 0.24 | 0.89 | 2.2 | 2.6 | 59 | 76 | 19 | |||
Bancsi et al. | 107 | <4 | 0.34 | 0.88 | 2.9 | 3.8 | 68 | 86 | 27 | |||
<6 | 0.45 | 0.68 | 1.4 | 1.7 | 68 | 75 | 41 | |||||
Järvelä et al. | 45 | <4 | 0.26 | 0.71 | 0.9 | 0.9 | 69 | 67 | 27 | |||
Kupesic et al. | 56 | <4 | 0.33 | 0.96 | 8.3 | 11.8 | 61 | 92 | 22 | |||
Yong et al. | 47 | <4 | 0.08 | 0.92 | 0.9 | 1.0 | 76 | 75 | 9 | |||
<6 | 0.16 | 0.90 | 1.6 | 1.7 | 79 | 86 | 27 |
DOR, diagnostic odds ratio; LR+, likelihood ratio for a positive test result.
If a study reported on multiple threshold values, data for all threshold values are shown.
Author . | Cycles (n) . | AFC threshold value (n) . | Prediction of non-pregnancy . | . | . | . | Pre-AFC probability (%) . | Post-AFC probability (%) . | Proportion of patients/cycles with abnormal AFC (%) . | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Sensitivity . | Specificity . | LR+ . | DOR . | . | . | . | |||
Chang et al. | 149 | <3 | 0.13 | 0.96 | 3.6 | 3.6 | 83 | 94 | 11 | |||
Ng et al. | 128 | <4 | 0.07 | 0.83 | 0.4 | 0.4 | 86 | 73 | 9 | |||
<6 | 0.26 | 0.78 | 1.2 | 1.2 | 86 | 88 | 26 | |||||
<9 | 0.60 | 0.33 | 0.9 | 0.7 | 86 | 61 | 85 | |||||
Sharara et al. | 127 | <4 | 0.27 | 0.64 | 0.8 | 0.7 | 56 | 49 | 31 | |||
Hsieh et al. | 372 | <3 | 0.12 | 0.98 | 6.9 | 6.7 | 68 | 94 | 9 | |||
Nahum et al. | 272 | <6 | 0.54 | 0.87 | 4.0 | 7.9 | 64 | 88 | 39 | |||
Fisch et al. | 200 | <10 | 0.24 | 0.89 | 2.2 | 2.6 | 59 | 76 | 19 | |||
Bancsi et al. | 107 | <4 | 0.34 | 0.88 | 2.9 | 3.8 | 68 | 86 | 27 | |||
<6 | 0.45 | 0.68 | 1.4 | 1.7 | 68 | 75 | 41 | |||||
Järvelä et al. | 45 | <4 | 0.26 | 0.71 | 0.9 | 0.9 | 69 | 67 | 27 | |||
Kupesic et al. | 56 | <4 | 0.33 | 0.96 | 8.3 | 11.8 | 61 | 92 | 22 | |||
Yong et al. | 47 | <4 | 0.08 | 0.92 | 0.9 | 1.0 | 76 | 75 | 9 | |||
<6 | 0.16 | 0.90 | 1.6 | 1.7 | 79 | 86 | 27 |
Author . | Cycles (n) . | AFC threshold value (n) . | Prediction of non-pregnancy . | . | . | . | Pre-AFC probability (%) . | Post-AFC probability (%) . | Proportion of patients/cycles with abnormal AFC (%) . | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Sensitivity . | Specificity . | LR+ . | DOR . | . | . | . | |||
Chang et al. | 149 | <3 | 0.13 | 0.96 | 3.6 | 3.6 | 83 | 94 | 11 | |||
Ng et al. | 128 | <4 | 0.07 | 0.83 | 0.4 | 0.4 | 86 | 73 | 9 | |||
<6 | 0.26 | 0.78 | 1.2 | 1.2 | 86 | 88 | 26 | |||||
<9 | 0.60 | 0.33 | 0.9 | 0.7 | 86 | 61 | 85 | |||||
Sharara et al. | 127 | <4 | 0.27 | 0.64 | 0.8 | 0.7 | 56 | 49 | 31 | |||
Hsieh et al. | 372 | <3 | 0.12 | 0.98 | 6.9 | 6.7 | 68 | 94 | 9 | |||
Nahum et al. | 272 | <6 | 0.54 | 0.87 | 4.0 | 7.9 | 64 | 88 | 39 | |||
Fisch et al. | 200 | <10 | 0.24 | 0.89 | 2.2 | 2.6 | 59 | 76 | 19 | |||
Bancsi et al. | 107 | <4 | 0.34 | 0.88 | 2.9 | 3.8 | 68 | 86 | 27 | |||
<6 | 0.45 | 0.68 | 1.4 | 1.7 | 68 | 75 | 41 | |||||
Järvelä et al. | 45 | <4 | 0.26 | 0.71 | 0.9 | 0.9 | 69 | 67 | 27 | |||
Kupesic et al. | 56 | <4 | 0.33 | 0.96 | 8.3 | 11.8 | 61 | 92 | 22 | |||
Yong et al. | 47 | <4 | 0.08 | 0.92 | 0.9 | 1.0 | 76 | 75 | 9 | |||
<6 | 0.16 | 0.90 | 1.6 | 1.7 | 79 | 86 | 27 |
DOR, diagnostic odds ratio; LR+, likelihood ratio for a positive test result.
If a study reported on multiple threshold values, data for all threshold values are shown.
Clinical value
Based on the summary ROC curves depicted in Figure 12, a range of positive LRs was calculated and for each ratio pre-AFC test probabilities of poor response or non-pregnancy were converted into a post-AFC-test probability. Table XX depicts the probability of obtaining a certain AFC test result and the corresponding LR within different LR ranges for the prediction of poor response and non-pregnancy. At a maximum positive LR of ∼8, the post-AFC test probability of poor response will approximate 70%, if the pre-AFC-test probability is assumed to be as high as 20%. The probability of obtaining a test result (AFC) with a likelihood ratio ∼8 is high enough to consider the AFC as a clinically valuable test for poor response prediction.
Prediction of poor response (pre-test probability = 20%) . | . | . | Prediction of non-pregnancy (pre-test probability = 80%) . | . | . | ||||
---|---|---|---|---|---|---|---|---|---|
LR range . | Occurrence of test results in this range (%) . | Post-test probability of poor response (%) . | LR range . | Occurrence of test results in this range (%) . | Post-test probability of non-pregnancy (%) . | ||||
0–1 | 68 | <20 | 0–1 | 77 | <80 | ||||
1–2 | 10 | 20–33 | 1–2 | 16 | 80–89 | ||||
2–3 | 4 | 33–43 | 2–3 | 5 | 89–93 | ||||
3–4 | 6 | 43–50 | 3–4 | 0 | 93–94 | ||||
4–5 | 0 | 50–56 | 4–5 | 2 | 94–95 | ||||
5–6 | 0 | 56–60 | 5–6 | 0 | 95–96 | ||||
6–7 | 0 | 60–64 | 6–7 | 0 | 96–96.5 | ||||
7–8 | 0 | 64–67 | 7–8 | 0 | 96.5–97 | ||||
>8 | 12 | >67 | >8 | 0 | >97 |
Prediction of poor response (pre-test probability = 20%) . | . | . | Prediction of non-pregnancy (pre-test probability = 80%) . | . | . | ||||
---|---|---|---|---|---|---|---|---|---|
LR range . | Occurrence of test results in this range (%) . | Post-test probability of poor response (%) . | LR range . | Occurrence of test results in this range (%) . | Post-test probability of non-pregnancy (%) . | ||||
0–1 | 68 | <20 | 0–1 | 77 | <80 | ||||
1–2 | 10 | 20–33 | 1–2 | 16 | 80–89 | ||||
2–3 | 4 | 33–43 | 2–3 | 5 | 89–93 | ||||
3–4 | 6 | 43–50 | 3–4 | 0 | 93–94 | ||||
4–5 | 0 | 50–56 | 4–5 | 2 | 94–95 | ||||
5–6 | 0 | 56–60 | 5–6 | 0 | 95–96 | ||||
6–7 | 0 | 60–64 | 6–7 | 0 | 96–96.5 | ||||
7–8 | 0 | 64–67 | 7–8 | 0 | 96.5–97 | ||||
>8 | 12 | >67 | >8 | 0 | >97 |
Prediction of poor response (pre-test probability = 20%) . | . | . | Prediction of non-pregnancy (pre-test probability = 80%) . | . | . | ||||
---|---|---|---|---|---|---|---|---|---|
LR range . | Occurrence of test results in this range (%) . | Post-test probability of poor response (%) . | LR range . | Occurrence of test results in this range (%) . | Post-test probability of non-pregnancy (%) . | ||||
0–1 | 68 | <20 | 0–1 | 77 | <80 | ||||
1–2 | 10 | 20–33 | 1–2 | 16 | 80–89 | ||||
2–3 | 4 | 33–43 | 2–3 | 5 | 89–93 | ||||
3–4 | 6 | 43–50 | 3–4 | 0 | 93–94 | ||||
4–5 | 0 | 50–56 | 4–5 | 2 | 94–95 | ||||
5–6 | 0 | 56–60 | 5–6 | 0 | 95–96 | ||||
6–7 | 0 | 60–64 | 6–7 | 0 | 96–96.5 | ||||
7–8 | 0 | 64–67 | 7–8 | 0 | 96.5–97 | ||||
>8 | 12 | >67 | >8 | 0 | >97 |
Prediction of poor response (pre-test probability = 20%) . | . | . | Prediction of non-pregnancy (pre-test probability = 80%) . | . | . | ||||
---|---|---|---|---|---|---|---|---|---|
LR range . | Occurrence of test results in this range (%) . | Post-test probability of poor response (%) . | LR range . | Occurrence of test results in this range (%) . | Post-test probability of non-pregnancy (%) . | ||||
0–1 | 68 | <20 | 0–1 | 77 | <80 | ||||
1–2 | 10 | 20–33 | 1–2 | 16 | 80–89 | ||||
2–3 | 4 | 33–43 | 2–3 | 5 | 89–93 | ||||
3–4 | 6 | 43–50 | 3–4 | 0 | 93–94 | ||||
4–5 | 0 | 50–56 | 4–5 | 2 | 94–95 | ||||
5–6 | 0 | 56–60 | 5–6 | 0 | 95–96 | ||||
6–7 | 0 | 60–64 | 6–7 | 0 | 96–96.5 | ||||
7–8 | 0 | 64–67 | 7–8 | 0 | 96.5–97 | ||||
>8 | 12 | >67 | >8 | 0 | >97 |
For prediction of non-pregnancy, the extremely low AFC that is necessary to obtain a moderate positive likelihood ratio of ∼5, leading to a post-test pregnancy rate of less than 5% based on a pre-test rate of 20%, occurs only in an extremely limited number of patients (Table XX). Beyond the coordinate defined by specificity 0.80 and sensitivity 0.30, the summary ROC curve almost runs parallel to the line of equality. This indicates that this segment of the curve is 100% uninformative (LR ∼1).
Based on these data, it can be concluded that the accuracy of the AFC for predicting poor response in regularly cycling women is adequate at a low threshold level, but because of the very limited numbers of abnormal tests has hardly any clinical value for pregnancy prediction. Added to the false positive rate of ∼5% the test will not be suitable as diagnostic test to exclude patients on the basis of the presumed diagnosis of advanced ovarian ageing. It may well be used as a screening test for possible poor responders and for directing further diagnostic steps like a first IVF attempt, where the ovarian response to hyperstimulation will provide additional information (Hendriks et al., 2005d).
OVVOL
Systematic review
For assessing the predictive value of OVVOL, the search detected a total of 10 studies available for data extraction and meta-analysis. Of these, two studies reported solely on the prediction of poor response (Sharara and McClamrock, 1999; Fiçicioğlu et al., 2003) and eight studies reported on the prediction of both poor response and pregnancy (Syrop et al., 1995; Lass et al., 1997b; Frattarelli et al., 2000; Schild et al., 2001; Bancsi et al., 2002a; Jarvela et al., 2003; Kupesic et al., 2003; Erdem et al., 2004). Study characteristics of the included studies are listed in addendum Table XXI. Selection bias was present in almost half of all studies (Lass et al., 1997b; Frattarelli et al., 2000; Kupesic et al., 2003; Erdem et al., 2004). In three studies, patients were selected by basal FSH level (Frattarelli et al., 2000; Kupesic et al., 2003; Erdem et al., 2004) and in the study by Lass et al. (Lass et al., 1997b) only patients aged >36 years with an FSH level <15 IU/L were included. Three studies showed evidence of verification bias (Jarvela et al., 2003; Kupesic et al., 2003; Erdem et al., 2004), implying that smaller OVVOL altered the management of the patient by applying higher FSH dosages.
Author . | Consecutive . | One cycle per couple . | Data per . | Definition . | . | Ovarian volume (ml) definition . | Ultrasonography equipment . | |
---|---|---|---|---|---|---|---|---|
. | . | . | . | Poor response/Cancel . | Pregnancy . | . | . | |
Syrop et al. | Yes | Yes | Cycle | <2 follicles 18 mm. | Clinical | Total <8.6 ml, smallest <3 ml | General Elect. 3600: 5 MHz probe | |
Lass et al. | Yes | Yes | Cycle | <3 follicles 17 mm. | Clinical | MOV <3 ml | Kretz Comb. 410: 5–7.5 Mhz probe | |
Sharara et al. | Yes | Yes | Cycle | Poor follicle development | Not stated | MOV <3 ml | Performa: 6.5 MHz probe | |
Schild et al. | Yes | Yes | Cycle | Not stated | Biochemical | MOV <3 ml | Voluson 530D : 7.5 MHz probe | |
Bancsi et al. | Yes | Yes | Cycle | <4 oocytes or <3 follicles 18 mm | Clinical | Total <7 ml or <8.6 ml | Toshiba SSA: 7.5 MHZ probe | |
Kupesic et al. | Yes | Yes | Cycle | Not stated | Biochemical | Total <7 ml | Combison 530 D: 7.5 Mhz probe | |
Järvelä et al. | Yes | Yes | Cycle | <4 follicles | Clinical | MOV <7 ml or <3 ml | Kretz Comb 530 | |
Fiçicioğlu et al. | Yes | Yes | Cycle | <5 oocytes | Not stated | Total <4.9 ml | General Electric Alfa Logic 200: 5 MHz probe | |
Erdem et al. | Yes | Yes | Cycle | <3 follicles 14 mm. or <5 oocytes (MII) | Clinical | MOV < 2.98 ml | Aloka SSD1000: 5 MHz probe | |
Frattarelli et al. | Yes | Yes | Cycle | <3 follicles | Biochemical | MOV < 2 ml or < 3 ml | Acuson 128: 7 MHz probe |
Author . | Consecutive . | One cycle per couple . | Data per . | Definition . | . | Ovarian volume (ml) definition . | Ultrasonography equipment . | |
---|---|---|---|---|---|---|---|---|
. | . | . | . | Poor response/Cancel . | Pregnancy . | . | . | |
Syrop et al. | Yes | Yes | Cycle | <2 follicles 18 mm. | Clinical | Total <8.6 ml, smallest <3 ml | General Elect. 3600: 5 MHz probe | |
Lass et al. | Yes | Yes | Cycle | <3 follicles 17 mm. | Clinical | MOV <3 ml | Kretz Comb. 410: 5–7.5 Mhz probe | |
Sharara et al. | Yes | Yes | Cycle | Poor follicle development | Not stated | MOV <3 ml | Performa: 6.5 MHz probe | |
Schild et al. | Yes | Yes | Cycle | Not stated | Biochemical | MOV <3 ml | Voluson 530D : 7.5 MHz probe | |
Bancsi et al. | Yes | Yes | Cycle | <4 oocytes or <3 follicles 18 mm | Clinical | Total <7 ml or <8.6 ml | Toshiba SSA: 7.5 MHZ probe | |
Kupesic et al. | Yes | Yes | Cycle | Not stated | Biochemical | Total <7 ml | Combison 530 D: 7.5 Mhz probe | |
Järvelä et al. | Yes | Yes | Cycle | <4 follicles | Clinical | MOV <7 ml or <3 ml | Kretz Comb 530 | |
Fiçicioğlu et al. | Yes | Yes | Cycle | <5 oocytes | Not stated | Total <4.9 ml | General Electric Alfa Logic 200: 5 MHz probe | |
Erdem et al. | Yes | Yes | Cycle | <3 follicles 14 mm. or <5 oocytes (MII) | Clinical | MOV < 2.98 ml | Aloka SSD1000: 5 MHz probe | |
Frattarelli et al. | Yes | Yes | Cycle | <3 follicles | Biochemical | MOV < 2 ml or < 3 ml | Acuson 128: 7 MHz probe |
MOV, mean ovarian volume.
Author . | Consecutive . | One cycle per couple . | Data per . | Definition . | . | Ovarian volume (ml) definition . | Ultrasonography equipment . | |
---|---|---|---|---|---|---|---|---|
. | . | . | . | Poor response/Cancel . | Pregnancy . | . | . | |
Syrop et al. | Yes | Yes | Cycle | <2 follicles 18 mm. | Clinical | Total <8.6 ml, smallest <3 ml | General Elect. 3600: 5 MHz probe | |
Lass et al. | Yes | Yes | Cycle | <3 follicles 17 mm. | Clinical | MOV <3 ml | Kretz Comb. 410: 5–7.5 Mhz probe | |
Sharara et al. | Yes | Yes | Cycle | Poor follicle development | Not stated | MOV <3 ml | Performa: 6.5 MHz probe | |
Schild et al. | Yes | Yes | Cycle | Not stated | Biochemical | MOV <3 ml | Voluson 530D : 7.5 MHz probe | |
Bancsi et al. | Yes | Yes | Cycle | <4 oocytes or <3 follicles 18 mm | Clinical | Total <7 ml or <8.6 ml | Toshiba SSA: 7.5 MHZ probe | |
Kupesic et al. | Yes | Yes | Cycle | Not stated | Biochemical | Total <7 ml | Combison 530 D: 7.5 Mhz probe | |
Järvelä et al. | Yes | Yes | Cycle | <4 follicles | Clinical | MOV <7 ml or <3 ml | Kretz Comb 530 | |
Fiçicioğlu et al. | Yes | Yes | Cycle | <5 oocytes | Not stated | Total <4.9 ml | General Electric Alfa Logic 200: 5 MHz probe | |
Erdem et al. | Yes | Yes | Cycle | <3 follicles 14 mm. or <5 oocytes (MII) | Clinical | MOV < 2.98 ml | Aloka SSD1000: 5 MHz probe | |
Frattarelli et al. | Yes | Yes | Cycle | <3 follicles | Biochemical | MOV < 2 ml or < 3 ml | Acuson 128: 7 MHz probe |
Author . | Consecutive . | One cycle per couple . | Data per . | Definition . | . | Ovarian volume (ml) definition . | Ultrasonography equipment . | |
---|---|---|---|---|---|---|---|---|
. | . | . | . | Poor response/Cancel . | Pregnancy . | . | . | |
Syrop et al. | Yes | Yes | Cycle | <2 follicles 18 mm. | Clinical | Total <8.6 ml, smallest <3 ml | General Elect. 3600: 5 MHz probe | |
Lass et al. | Yes | Yes | Cycle | <3 follicles 17 mm. | Clinical | MOV <3 ml | Kretz Comb. 410: 5–7.5 Mhz probe | |
Sharara et al. | Yes | Yes | Cycle | Poor follicle development | Not stated | MOV <3 ml | Performa: 6.5 MHz probe | |
Schild et al. | Yes | Yes | Cycle | Not stated | Biochemical | MOV <3 ml | Voluson 530D : 7.5 MHz probe | |
Bancsi et al. | Yes | Yes | Cycle | <4 oocytes or <3 follicles 18 mm | Clinical | Total <7 ml or <8.6 ml | Toshiba SSA: 7.5 MHZ probe | |
Kupesic et al. | Yes | Yes | Cycle | Not stated | Biochemical | Total <7 ml | Combison 530 D: 7.5 Mhz probe | |
Järvelä et al. | Yes | Yes | Cycle | <4 follicles | Clinical | MOV <7 ml or <3 ml | Kretz Comb 530 | |
Fiçicioğlu et al. | Yes | Yes | Cycle | <5 oocytes | Not stated | Total <4.9 ml | General Electric Alfa Logic 200: 5 MHz probe | |
Erdem et al. | Yes | Yes | Cycle | <3 follicles 14 mm. or <5 oocytes (MII) | Clinical | MOV < 2.98 ml | Aloka SSD1000: 5 MHz probe | |
Frattarelli et al. | Yes | Yes | Cycle | <3 follicles | Biochemical | MOV < 2 ml or < 3 ml | Acuson 128: 7 MHz probe |
MOV, mean ovarian volume.
Accuracy of poor response prediction
Sensitivities and specificities, positive LR and the DOR for the prediction of poor ovarian response are summarized in Table XXII. Homogeneity for both sensitivity and specificity had to be rejected (χ2-test: both P-values <0.001). Hence, the calculation of a summary point estimate for sensitivity and specificity was not meaningful. None of the study characteristics recorded had a statistically significant impact on the reported predictive performance of OVVOL. The Spearman correlation coefficient for the relation between sensitivity and specificity was −0.55, sufficient to estimate a summary ROC curve. This curve showed a modest overall predictive accuracy as can be seen in the ROC space in Figure 14.
Author . | Cycles (n) . | Volume threshold value (ml) . | Prediction of poor response . | . | . | . | Pre-volume probability (%) . | Post-volume probability (%) . | Proportion of patients/cycles with abnormal volume (%) . | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Sensitivity . | Specificity . | LR+ . | DOR . | . | . | . | |||
Syrop et al. | 188 | <8.6 | 0.25 | 0.86 | 1.78 | 2.0 | 13 | 21 | 15 | |||
<3 | 0.17 | 0.91 | 1.95 | 2.1 | 13 | 22 | 10 | |||||
Lass et al. | 140 | <3 | 0.45 | 0.93 | 6.75 | 11.5 | 14 | 53 | 12 | |||
Sharara et al. | 73 | <3 | 0.80 | 0.72 | 2.86 | 10.3 | 7 | 17 | 32 | |||
Schild et al. | 152 | <3 | 0.11 | 0.90 | 1.10 | 1.1 | 18 | 20 | 10 | |||
Bancsi et al. | 120 | <8.6 | 0.61 | 0.73 | 2.23 | 4.2 | 30 | 49 | 38 | |||
<7 | 0.39 | 0.85 | 2.51 | 3.5 | 30 | 52 | 23 | |||||
Kupesic et al. | 56 | <7 | 0.86 | 0.87 | 6.49 | 39.4 | 12 | 46 | 22 | |||
Järvelä et al. | 60 | <3 | 0.08 | 0.94 | 1.30 | 1.3 | 18 | 25 | 6 | |||
<7 | 0.55 | 0.67 | 1.67 | 2.5 | 18 | 27 | 37 | |||||
Fiçicioğlu et al. | 58 | <4.9 | 0.73 | 0.53 | 1.50 | 2.7 | 43 | 53 | 59 | |||
Erdem et al. | 32 | <2.98 | 0.75 | 0.81 | 4.00 | 13.0 | 50 | 80 | 47 | |||
Frattarelli et al. | 267 | <2 | 0.17 | 0.94 | 2.83 | 3.2 | 9 | 21 | 7 | |||
<3 | 0.35 | 0.82 | 1.89 | 1.4 | 9 | 15 | 20 |
Author . | Cycles (n) . | Volume threshold value (ml) . | Prediction of poor response . | . | . | . | Pre-volume probability (%) . | Post-volume probability (%) . | Proportion of patients/cycles with abnormal volume (%) . | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Sensitivity . | Specificity . | LR+ . | DOR . | . | . | . | |||
Syrop et al. | 188 | <8.6 | 0.25 | 0.86 | 1.78 | 2.0 | 13 | 21 | 15 | |||
<3 | 0.17 | 0.91 | 1.95 | 2.1 | 13 | 22 | 10 | |||||
Lass et al. | 140 | <3 | 0.45 | 0.93 | 6.75 | 11.5 | 14 | 53 | 12 | |||
Sharara et al. | 73 | <3 | 0.80 | 0.72 | 2.86 | 10.3 | 7 | 17 | 32 | |||
Schild et al. | 152 | <3 | 0.11 | 0.90 | 1.10 | 1.1 | 18 | 20 | 10 | |||
Bancsi et al. | 120 | <8.6 | 0.61 | 0.73 | 2.23 | 4.2 | 30 | 49 | 38 | |||
<7 | 0.39 | 0.85 | 2.51 | 3.5 | 30 | 52 | 23 | |||||
Kupesic et al. | 56 | <7 | 0.86 | 0.87 | 6.49 | 39.4 | 12 | 46 | 22 | |||
Järvelä et al. | 60 | <3 | 0.08 | 0.94 | 1.30 | 1.3 | 18 | 25 | 6 | |||
<7 | 0.55 | 0.67 | 1.67 | 2.5 | 18 | 27 | 37 | |||||
Fiçicioğlu et al. | 58 | <4.9 | 0.73 | 0.53 | 1.50 | 2.7 | 43 | 53 | 59 | |||
Erdem et al. | 32 | <2.98 | 0.75 | 0.81 | 4.00 | 13.0 | 50 | 80 | 47 | |||
Frattarelli et al. | 267 | <2 | 0.17 | 0.94 | 2.83 | 3.2 | 9 | 21 | 7 | |||
<3 | 0.35 | 0.82 | 1.89 | 1.4 | 9 | 15 | 20 |
DOR, diagnostic odds ratio; LR+, likelihood ratio for a positive test result.
If a study reported on multiple threshold values, data for all threshold values are shown.
Author . | Cycles (n) . | Volume threshold value (ml) . | Prediction of poor response . | . | . | . | Pre-volume probability (%) . | Post-volume probability (%) . | Proportion of patients/cycles with abnormal volume (%) . | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Sensitivity . | Specificity . | LR+ . | DOR . | . | . | . | |||
Syrop et al. | 188 | <8.6 | 0.25 | 0.86 | 1.78 | 2.0 | 13 | 21 | 15 | |||
<3 | 0.17 | 0.91 | 1.95 | 2.1 | 13 | 22 | 10 | |||||
Lass et al. | 140 | <3 | 0.45 | 0.93 | 6.75 | 11.5 | 14 | 53 | 12 | |||
Sharara et al. | 73 | <3 | 0.80 | 0.72 | 2.86 | 10.3 | 7 | 17 | 32 | |||
Schild et al. | 152 | <3 | 0.11 | 0.90 | 1.10 | 1.1 | 18 | 20 | 10 | |||
Bancsi et al. | 120 | <8.6 | 0.61 | 0.73 | 2.23 | 4.2 | 30 | 49 | 38 | |||
<7 | 0.39 | 0.85 | 2.51 | 3.5 | 30 | 52 | 23 | |||||
Kupesic et al. | 56 | <7 | 0.86 | 0.87 | 6.49 | 39.4 | 12 | 46 | 22 | |||
Järvelä et al. | 60 | <3 | 0.08 | 0.94 | 1.30 | 1.3 | 18 | 25 | 6 | |||
<7 | 0.55 | 0.67 | 1.67 | 2.5 | 18 | 27 | 37 | |||||
Fiçicioğlu et al. | 58 | <4.9 | 0.73 | 0.53 | 1.50 | 2.7 | 43 | 53 | 59 | |||
Erdem et al. | 32 | <2.98 | 0.75 | 0.81 | 4.00 | 13.0 | 50 | 80 | 47 | |||
Frattarelli et al. | 267 | <2 | 0.17 | 0.94 | 2.83 | 3.2 | 9 | 21 | 7 | |||
<3 | 0.35 | 0.82 | 1.89 | 1.4 | 9 | 15 | 20 |
Author . | Cycles (n) . | Volume threshold value (ml) . | Prediction of poor response . | . | . | . | Pre-volume probability (%) . | Post-volume probability (%) . | Proportion of patients/cycles with abnormal volume (%) . | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Sensitivity . | Specificity . | LR+ . | DOR . | . | . | . | |||
Syrop et al. | 188 | <8.6 | 0.25 | 0.86 | 1.78 | 2.0 | 13 | 21 | 15 | |||
<3 | 0.17 | 0.91 | 1.95 | 2.1 | 13 | 22 | 10 | |||||
Lass et al. | 140 | <3 | 0.45 | 0.93 | 6.75 | 11.5 | 14 | 53 | 12 | |||
Sharara et al. | 73 | <3 | 0.80 | 0.72 | 2.86 | 10.3 | 7 | 17 | 32 | |||
Schild et al. | 152 | <3 | 0.11 | 0.90 | 1.10 | 1.1 | 18 | 20 | 10 | |||
Bancsi et al. | 120 | <8.6 | 0.61 | 0.73 | 2.23 | 4.2 | 30 | 49 | 38 | |||
<7 | 0.39 | 0.85 | 2.51 | 3.5 | 30 | 52 | 23 | |||||
Kupesic et al. | 56 | <7 | 0.86 | 0.87 | 6.49 | 39.4 | 12 | 46 | 22 | |||
Järvelä et al. | 60 | <3 | 0.08 | 0.94 | 1.30 | 1.3 | 18 | 25 | 6 | |||
<7 | 0.55 | 0.67 | 1.67 | 2.5 | 18 | 27 | 37 | |||||
Fiçicioğlu et al. | 58 | <4.9 | 0.73 | 0.53 | 1.50 | 2.7 | 43 | 53 | 59 | |||
Erdem et al. | 32 | <2.98 | 0.75 | 0.81 | 4.00 | 13.0 | 50 | 80 | 47 | |||
Frattarelli et al. | 267 | <2 | 0.17 | 0.94 | 2.83 | 3.2 | 9 | 21 | 7 | |||
<3 | 0.35 | 0.82 | 1.89 | 1.4 | 9 | 15 | 20 |
DOR, diagnostic odds ratio; LR+, likelihood ratio for a positive test result.
If a study reported on multiple threshold values, data for all threshold values are shown.
Accuracy of non-pregnancy prediction
For the prediction of non-pregnancy, test characteristics for each study are summarized in Table XXIII. As with the data for ovarian response, homogeneity for sensitivity had to be rejected. However, specificity appeared to be homogeneous (χ2-test: P-value 0.11). Because for the estimation of one summary point for sensitivity and specificity statistical homogeneity, both test parameters are required, this solution was abandoned. Logistic regression analysis showed that three studies which suffered from verification bias reported a significantly different accuracy compared to the seven remaining studies (p-value: 0.01). None of the other study characteristics had a significant impact on the estimates of test accuracy. In the subgroup analysis of the seven studies without verification bias, homogeneity was again rejected for sensitivity, while specificity again showed homogeneity. The Spearman correlation coefficient for sensitivity and specificity was −0.94, which was judged to be sufficient to estimate a summary ROC curve. The curve in Figure 15 indicates that OVVOL volume has no clear accuracy in the prediction of non-pregnancy in IVF patients, even if a very low threshold for abnormality of the test would be chosen.
Author . | Cycles (n) . | Volume threshold value (ml) . | Prediction of non-pregnancy . | . | . | . | Pre-volume probability (%) . | Post-volume probability (%) . | Proportion of patients/cycles with abnormal volume (%) . | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Sensitivity . | Specificity . | LR+ . | DOR . | . | . | . | |||
Syrop et al. | 188 | <8.6 | 0.17 | 0.87 | 1.23 | 1.3 | 65 | 69 | 15 | |||
<3 | 0.11 | 0.93 | 1.44 | 1.5 | 65 | 72 | 10 | |||||
Lass et al. | 140 | <3 | 0.12 | 0.88 | 0.97 | 0.96 | 89 | 88 | 12 | |||
Schild et al. | 152 | <3 | 0.12 | 0.97 | 3.60 | 3.9 | 80 | 93 | 10 | |||
Bancsi et al. | 120 | <8.6 | 0.47 | 0.71 | 1.58 | 2.1 | 68 | 77 | 41 | |||
<7 | 0.27 | 0.79 | 1.33 | 1.5 | 68 | 74 | 25 | |||||
Kupesic et al. | 56 | <7 | 0.33 | 0.96 | 8.00 | 11.5 | 60 | 92 | 22 | |||
Järvelä et al. | 60 | <3 | 0.08 | 0.96 | 1.80 | 1.9 | 63 | 75 | 6 | |||
<7 | 0.42 | 0.73 | 1.54 | 1.9 | 63 | 73 | 37 | |||||
Erdem et al. | 32 | <2.98 | 0.70 | 0.92 | 8.40 | 25.7 | 63 | 93 | 47 | |||
Frattarelli et al. | 267 | <2 | 0.10 | 0.96 | 2.46 | 2.6 | 47 | 68 | 7 | |||
<3 | 0.22 | 0.82 | 1.27 | 1.4 | 47 | 53 | 20 |
Author . | Cycles (n) . | Volume threshold value (ml) . | Prediction of non-pregnancy . | . | . | . | Pre-volume probability (%) . | Post-volume probability (%) . | Proportion of patients/cycles with abnormal volume (%) . | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Sensitivity . | Specificity . | LR+ . | DOR . | . | . | . | |||
Syrop et al. | 188 | <8.6 | 0.17 | 0.87 | 1.23 | 1.3 | 65 | 69 | 15 | |||
<3 | 0.11 | 0.93 | 1.44 | 1.5 | 65 | 72 | 10 | |||||
Lass et al. | 140 | <3 | 0.12 | 0.88 | 0.97 | 0.96 | 89 | 88 | 12 | |||
Schild et al. | 152 | <3 | 0.12 | 0.97 | 3.60 | 3.9 | 80 | 93 | 10 | |||
Bancsi et al. | 120 | <8.6 | 0.47 | 0.71 | 1.58 | 2.1 | 68 | 77 | 41 | |||
<7 | 0.27 | 0.79 | 1.33 | 1.5 | 68 | 74 | 25 | |||||
Kupesic et al. | 56 | <7 | 0.33 | 0.96 | 8.00 | 11.5 | 60 | 92 | 22 | |||
Järvelä et al. | 60 | <3 | 0.08 | 0.96 | 1.80 | 1.9 | 63 | 75 | 6 | |||
<7 | 0.42 | 0.73 | 1.54 | 1.9 | 63 | 73 | 37 | |||||
Erdem et al. | 32 | <2.98 | 0.70 | 0.92 | 8.40 | 25.7 | 63 | 93 | 47 | |||
Frattarelli et al. | 267 | <2 | 0.10 | 0.96 | 2.46 | 2.6 | 47 | 68 | 7 | |||
<3 | 0.22 | 0.82 | 1.27 | 1.4 | 47 | 53 | 20 |
DOR, diagnostic odds ratio; LR+, likelihood ratio for a positive test result.
If a study reported on multiple threshold values, data for all threshold values are shown.
Author . | Cycles (n) . | Volume threshold value (ml) . | Prediction of non-pregnancy . | . | . | . | Pre-volume probability (%) . | Post-volume probability (%) . | Proportion of patients/cycles with abnormal volume (%) . | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Sensitivity . | Specificity . | LR+ . | DOR . | . | . | . | |||
Syrop et al. | 188 | <8.6 | 0.17 | 0.87 | 1.23 | 1.3 | 65 | 69 | 15 | |||
<3 | 0.11 | 0.93 | 1.44 | 1.5 | 65 | 72 | 10 | |||||
Lass et al. | 140 | <3 | 0.12 | 0.88 | 0.97 | 0.96 | 89 | 88 | 12 | |||
Schild et al. | 152 | <3 | 0.12 | 0.97 | 3.60 | 3.9 | 80 | 93 | 10 | |||
Bancsi et al. | 120 | <8.6 | 0.47 | 0.71 | 1.58 | 2.1 | 68 | 77 | 41 | |||
<7 | 0.27 | 0.79 | 1.33 | 1.5 | 68 | 74 | 25 | |||||
Kupesic et al. | 56 | <7 | 0.33 | 0.96 | 8.00 | 11.5 | 60 | 92 | 22 | |||
Järvelä et al. | 60 | <3 | 0.08 | 0.96 | 1.80 | 1.9 | 63 | 75 | 6 | |||
<7 | 0.42 | 0.73 | 1.54 | 1.9 | 63 | 73 | 37 | |||||
Erdem et al. | 32 | <2.98 | 0.70 | 0.92 | 8.40 | 25.7 | 63 | 93 | 47 | |||
Frattarelli et al. | 267 | <2 | 0.10 | 0.96 | 2.46 | 2.6 | 47 | 68 | 7 | |||
<3 | 0.22 | 0.82 | 1.27 | 1.4 | 47 | 53 | 20 |
Author . | Cycles (n) . | Volume threshold value (ml) . | Prediction of non-pregnancy . | . | . | . | Pre-volume probability (%) . | Post-volume probability (%) . | Proportion of patients/cycles with abnormal volume (%) . | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Sensitivity . | Specificity . | LR+ . | DOR . | . | . | . | |||
Syrop et al. | 188 | <8.6 | 0.17 | 0.87 | 1.23 | 1.3 | 65 | 69 | 15 | |||
<3 | 0.11 | 0.93 | 1.44 | 1.5 | 65 | 72 | 10 | |||||
Lass et al. | 140 | <3 | 0.12 | 0.88 | 0.97 | 0.96 | 89 | 88 | 12 | |||
Schild et al. | 152 | <3 | 0.12 | 0.97 | 3.60 | 3.9 | 80 | 93 | 10 | |||
Bancsi et al. | 120 | <8.6 | 0.47 | 0.71 | 1.58 | 2.1 | 68 | 77 | 41 | |||
<7 | 0.27 | 0.79 | 1.33 | 1.5 | 68 | 74 | 25 | |||||
Kupesic et al. | 56 | <7 | 0.33 | 0.96 | 8.00 | 11.5 | 60 | 92 | 22 | |||
Järvelä et al. | 60 | <3 | 0.08 | 0.96 | 1.80 | 1.9 | 63 | 75 | 6 | |||
<7 | 0.42 | 0.73 | 1.54 | 1.9 | 63 | 73 | 37 | |||||
Erdem et al. | 32 | <2.98 | 0.70 | 0.92 | 8.40 | 25.7 | 63 | 93 | 47 | |||
Frattarelli et al. | 267 | <2 | 0.10 | 0.96 | 2.46 | 2.6 | 47 | 68 | 7 | |||
<3 | 0.22 | 0.82 | 1.27 | 1.4 | 47 | 53 | 20 |
DOR, diagnostic odds ratio; LR+, likelihood ratio for a positive test result.
If a study reported on multiple threshold values, data for all threshold values are shown.
Clinical value
Based on the estimated ROC curves in Figure 14 the probability of obtaining a certain test result for the OVVOL measurement is shown in Table XXIV within a corresponding range of LRs for the prediction of poor ovarian response. Only at modest LRs, the post-test probability of poor response may approach 50%, while abnormal test results will be obtained in some 30% of tested cases. However, applying a more adequate positive likelihood level will result in virtually no cases being identified by the test. For non-pregnancy prediction, Table XXIV shows that for higher LRs (>4), the post-test probability of non-pregnancy may increase to ∼93–97%, assuming a pre-test probability of 80%. However, the probability that a test result will be in that range is close to zero. As false positive test results for both ovarian response and non-pregnancy prediction are not acceptable if patients are refused treatment, all this implies that the OVVOL is hardly suitable as a routine test for ovarian reserve assessment.
Prediction of poor response (pre-test probability = 20%) . | . | . | Prediction of non-pregnancy (pre-test probability = 80%) . | . | . | ||||
---|---|---|---|---|---|---|---|---|---|
LR range . | Occurrence of test results in this range (%) . | Post-test probability of poor response (%) . | LR range . | Occurrence of test results in this range (%) . | Post-test probability of non-pregnancy (%) . | ||||
0–1 | 54 | <20 | 0–1 | 68 | <80 | ||||
1–2 | 16 | 20–33 | 1–2 | 31 | 80–89 | ||||
2–3 | 30 | 33–43 | 2–3 | 3 | 89–93 | ||||
3–4 | 0 | 43–50 | 3–4 | 0 | 93–94 | ||||
4–5 | 0 | 50–56 | 4–5 | 0 | 94–95 | ||||
5–6 | 0 | 56–60 | 5–6 | 0 | 95–96 | ||||
6–7 | 0 | 60–64 | 6–7 | 0 | 96–96.5 | ||||
7–8 | 0 | 64–67 | 7–8 | 0 | 96.5–97 | ||||
>8 | 0 | >67 | >8 | 0 | >97 |
Prediction of poor response (pre-test probability = 20%) . | . | . | Prediction of non-pregnancy (pre-test probability = 80%) . | . | . | ||||
---|---|---|---|---|---|---|---|---|---|
LR range . | Occurrence of test results in this range (%) . | Post-test probability of poor response (%) . | LR range . | Occurrence of test results in this range (%) . | Post-test probability of non-pregnancy (%) . | ||||
0–1 | 54 | <20 | 0–1 | 68 | <80 | ||||
1–2 | 16 | 20–33 | 1–2 | 31 | 80–89 | ||||
2–3 | 30 | 33–43 | 2–3 | 3 | 89–93 | ||||
3–4 | 0 | 43–50 | 3–4 | 0 | 93–94 | ||||
4–5 | 0 | 50–56 | 4–5 | 0 | 94–95 | ||||
5–6 | 0 | 56–60 | 5–6 | 0 | 95–96 | ||||
6–7 | 0 | 60–64 | 6–7 | 0 | 96–96.5 | ||||
7–8 | 0 | 64–67 | 7–8 | 0 | 96.5–97 | ||||
>8 | 0 | >67 | >8 | 0 | >97 |
Prediction of poor response (pre-test probability = 20%) . | . | . | Prediction of non-pregnancy (pre-test probability = 80%) . | . | . | ||||
---|---|---|---|---|---|---|---|---|---|
LR range . | Occurrence of test results in this range (%) . | Post-test probability of poor response (%) . | LR range . | Occurrence of test results in this range (%) . | Post-test probability of non-pregnancy (%) . | ||||
0–1 | 54 | <20 | 0–1 | 68 | <80 | ||||
1–2 | 16 | 20–33 | 1–2 | 31 | 80–89 | ||||
2–3 | 30 | 33–43 | 2–3 | 3 | 89–93 | ||||
3–4 | 0 | 43–50 | 3–4 | 0 | 93–94 | ||||
4–5 | 0 | 50–56 | 4–5 | 0 | 94–95 | ||||
5–6 | 0 | 56–60 | 5–6 | 0 | 95–96 | ||||
6–7 | 0 | 60–64 | 6–7 | 0 | 96–96.5 | ||||
7–8 | 0 | 64–67 | 7–8 | 0 | 96.5–97 | ||||
>8 | 0 | >67 | >8 | 0 | >97 |
Prediction of poor response (pre-test probability = 20%) . | . | . | Prediction of non-pregnancy (pre-test probability = 80%) . | . | . | ||||
---|---|---|---|---|---|---|---|---|---|
LR range . | Occurrence of test results in this range (%) . | Post-test probability of poor response (%) . | LR range . | Occurrence of test results in this range (%) . | Post-test probability of non-pregnancy (%) . | ||||
0–1 | 54 | <20 | 0–1 | 68 | <80 | ||||
1–2 | 16 | 20–33 | 1–2 | 31 | 80–89 | ||||
2–3 | 30 | 33–43 | 2–3 | 3 | 89–93 | ||||
3–4 | 0 | 43–50 | 3–4 | 0 | 93–94 | ||||
4–5 | 0 | 50–56 | 4–5 | 0 | 94–95 | ||||
5–6 | 0 | 56–60 | 5–6 | 0 | 95–96 | ||||
6–7 | 0 | 60–64 | 6–7 | 0 | 96–96.5 | ||||
7–8 | 0 | 64–67 | 7–8 | 0 | 96.5–97 | ||||
>8 | 0 | >67 | >8 | 0 | >97 |
Ovarian vascular flow
Systematic review
Through the search we detected seven studies reporting on the predictive capacity of ovarian vascular flow parameters for ovarian response and/or the occurrence of pregnancy (Zaidi et al., 1996; Engmann et al., 1999a,b; Kim et al., 2002; Kupesic and Kurjak, 2002; Kupesic et al., 2003; Popovic-Todorovic et al., 2003b). In these studies ovarian flow was assessed either on cycle day 3 or after achievement of pituitary suppression with a GnRh agonist and before the onset of ovarian stimulation. As only the 2003 study by Kupesic (Kupesic et al., 2003) could be included on a 2 × 2 for cross classification of the test result and the occurrence of poor response or non-pregnancy, it was not possible to carry out a formal meta-analysis (see addendum Table XXV and XXVI). Also, the studies used very different flow-derived predictors. Peak systolic velocity was used as the main predictor (Kupesic et al., 2003). Others used ovarian stromal blood flow obtained by 3D power Doppler (Engmann et al., 1999a).
Author . | Consecutive . | One cycle per couple . | Data per . | Definition . | . | OSF parameter . | Ultrasonography equipment . | |
---|---|---|---|---|---|---|---|---|
. | . | . | . | Poor response/Cancel . | Pregnancy . | . | . | |
Kupesic et al. | Yes | Yes | Cycle | Not applicable | Biochemical | Peak systolic velocity | Combison 530 D, 7.5 Mhz probe |
Author . | Consecutive . | One cycle per couple . | Data per . | Definition . | . | OSF parameter . | Ultrasonography equipment . | |
---|---|---|---|---|---|---|---|---|
. | . | . | . | Poor response/Cancel . | Pregnancy . | . | . | |
Kupesic et al. | Yes | Yes | Cycle | Not applicable | Biochemical | Peak systolic velocity | Combison 530 D, 7.5 Mhz probe |
Author . | Consecutive . | One cycle per couple . | Data per . | Definition . | . | OSF parameter . | Ultrasonography equipment . | |
---|---|---|---|---|---|---|---|---|
. | . | . | . | Poor response/Cancel . | Pregnancy . | . | . | |
Kupesic et al. | Yes | Yes | Cycle | Not applicable | Biochemical | Peak systolic velocity | Combison 530 D, 7.5 Mhz probe |
Author . | Consecutive . | One cycle per couple . | Data per . | Definition . | . | OSF parameter . | Ultrasonography equipment . | |
---|---|---|---|---|---|---|---|---|
. | . | . | . | Poor response/Cancel . | Pregnancy . | . | . | |
Kupesic et al. | Yes | Yes | Cycle | Not applicable | Biochemical | Peak systolic velocity | Combison 530 D, 7.5 Mhz probe |
Author . | Cycles (n) . | OSF threshold value (flow index) . | Prediction of non-pregnancy . | . | . | . | Pre-OSF probability (%) . | Post-OSF probability (%) . | Proportion of patients/cycles with abnormal OSF (%) . | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Sensitivity . | Specificity . | LR+ . | DOR . | . | . | . | |||
Kupesic et al | 56 | <11 | 0.31 | 0.96 | 7.7 | 4.1 | 60 | 92 | 20 | |||
≤13 | 0.85 | 0.23 | 1.1 | 1.5 | 60 | 64 | 82 |
Author . | Cycles (n) . | OSF threshold value (flow index) . | Prediction of non-pregnancy . | . | . | . | Pre-OSF probability (%) . | Post-OSF probability (%) . | Proportion of patients/cycles with abnormal OSF (%) . | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Sensitivity . | Specificity . | LR+ . | DOR . | . | . | . | |||
Kupesic et al | 56 | <11 | 0.31 | 0.96 | 7.7 | 4.1 | 60 | 92 | 20 | |||
≤13 | 0.85 | 0.23 | 1.1 | 1.5 | 60 | 64 | 82 |
DOR, diagnostic odds ratio; LR+, likelihood ratio for a positive test result.
If a study reported on multiple threshold values, data for all threshold values are shown.
Author . | Cycles (n) . | OSF threshold value (flow index) . | Prediction of non-pregnancy . | . | . | . | Pre-OSF probability (%) . | Post-OSF probability (%) . | Proportion of patients/cycles with abnormal OSF (%) . | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Sensitivity . | Specificity . | LR+ . | DOR . | . | . | . | |||
Kupesic et al | 56 | <11 | 0.31 | 0.96 | 7.7 | 4.1 | 60 | 92 | 20 | |||
≤13 | 0.85 | 0.23 | 1.1 | 1.5 | 60 | 64 | 82 |
Author . | Cycles (n) . | OSF threshold value (flow index) . | Prediction of non-pregnancy . | . | . | . | Pre-OSF probability (%) . | Post-OSF probability (%) . | Proportion of patients/cycles with abnormal OSF (%) . | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Sensitivity . | Specificity . | LR+ . | DOR . | . | . | . | |||
Kupesic et al | 56 | <11 | 0.31 | 0.96 | 7.7 | 4.1 | 60 | 92 | 20 | |||
≤13 | 0.85 | 0.23 | 1.1 | 1.5 | 60 | 64 | 82 |
DOR, diagnostic odds ratio; LR+, likelihood ratio for a positive test result.
If a study reported on multiple threshold values, data for all threshold values are shown.
Ovarian biopsy
Ovarian reserve depends on the number of primordial follicles in the ovarian cortex, which suggests that the obvious way to obtain an estimate would be to measure follicular density in an ovarian biopsy (Lass, 2001; Lass, 2004). Attempts were made to quantify the number of small antral follicles in small shallow biopsies taken during diagnostic laparoscopy from infertility patients (Lass et al., 1997a) and there was a clear age-dependent decline in follicular density. Women over 35 years of age had only 30% of the quantities present in younger women. The number of follicles per unit of volume found in the biopsies was used to estimate the total and it was suggested that it could as such be potentially applied at the individual level. It was recognized though that the biopsy follicle density would not accurately represent the density in the whole ovary (Lass, 2001) and this seems indeed the case. Recently, several investigators have shown that follicle density varied greatly in small pieces of cortex, rendering information from biopsies as completely unreliable for an individual ovarian follicle content irrespective of how many were taken, their size and the location (Qu et al., 2000; Schmidt et al., 2003; Lambalk et al., 2004; Sharara and Scott, 2004). This indicates that the technique which is invasive and potentially harmful in terms of risks of adhesions and other complications of the surgical procedure is intrinsically unreliable and should therefore not be used to evaluate individual ovarian reserve. It is probably useful for research purposes to determine follicle density statistics in patient groups provided that group sizes are such that they compensate for the inherent extreme inter-biopsy and inter-individual spread of information (Qu et al., 2000; Schmidt et al., 2003; Webber et al., 2003; Lambalk et al., 2004). Finally, in the context of the current systematic review, there are no studies published that have evaluated ovarian biopsy follicle density for prediction of IVF outcome in terms of ovarian response and pregnancy rates.
Clomiphene Citrate Challenge Test
Systematic review
The computerized MEDLINE search detected 12 studies on the capacity of the Clomiphene Citrate Challenge Test (CCCT) to predict poor ovarian response and/or pregnancy after IVF (Tanbo et al., 1989; Loumaye et al., 1990; Tanbo et al., 1990; Tanbo et al., 1992; Csemiczky et al., 1996; Kahraman et al., 1997; van der Stege and van der Linden, 2001; Csemiczky et al., 2002; Kwee et al., 2003; Yanushpolsky et al., 2003; Erdem et al., 2004; Hendriks et al., 2005a). Study characteristics of the included studies are listed in addendum Table XXVII. This table shows that many studies suffered from various sources of potential bias, especially selection bias. Also, definitions applied for poor ovarian response and for an abnormal CCCT result (based on either day-10 FSH alone or on both basal FSH and day-10 FSH results) varied considerably. Logistic regression analysis indicated that none of the study characteristics had a statistically significant impact on the reported predictive performance of the CCCT, neither for the outcome response nor for the outcome non-pregnancy. As a consequence, all studies were taken together for further analysis.
Author . | Consecutive . | One cycle per couple . | Data per . | Definition . | . | FSH-assay . | |
---|---|---|---|---|---|---|---|
. | . | . | . | Poor response/Cancel . | Pregnancy . | . | |
Tanbo et al. | No | Yes | Cycle | Cancel <3 follicles | Not stated | RIA: Amerlex | |
Tanbo et al. | No | No | Cycle | Cancel <2 follicles | Clinical | Fluoroimmunoassay: Delfia | |
Loumaye et al. | No | Yes | Cycle | Cancel <2 follicles 20 mm | Not stated | Immunoradiometr.: IRMA | |
Tanbo et al. | No | No | Cycle | Cancel <3 follicles | Ongoing | Fluoroimm.assay: Delfia | |
Csemiczky et al. | No | No | Cycle | Not stated | Clinical | RIA: Diagn Prod. Inc. | |
Kahraman et al. | No | Yes | Cycle | Not stated | Ongoing | Immunometr.: Diagn. Prod. Corp. | |
Vd Stege et al. | Yes | Yes | Cycle | Cancel <3 follicles 18 mm | Clinical | RIA: Roche Diagn. | |
Csemiczky et al. | No | Yes | Cycle | Cancel <3 follicles 17 mm | Ongoing | RIA: Farmos Group | |
Yanushpolsky et al. | Yes | No | Retrieval | Not stated | Delivery | Techn. Imm. Syst.: Bayer Corp. | |
Kwee et al. | Yes | Yes | Cycle | Poor response <6 oocytes | Not stated | Immunomet.: Amerlite/Delfia | |
Erdem et al. | Yes | Yes | Cycle | Cancel <4 follicles 15 mm or Poor response <5 oocytes | Clinical | Chemolum. Immunometr. Assay | |
Hendriks et al. | Yes | Yes | Cycle | Poor response <4 oocytes or cancel no follicle growth | Ongoing | AxSYM immunoanal.: Abbott Lab. |
Author . | Consecutive . | One cycle per couple . | Data per . | Definition . | . | FSH-assay . | |
---|---|---|---|---|---|---|---|
. | . | . | . | Poor response/Cancel . | Pregnancy . | . | |
Tanbo et al. | No | Yes | Cycle | Cancel <3 follicles | Not stated | RIA: Amerlex | |
Tanbo et al. | No | No | Cycle | Cancel <2 follicles | Clinical | Fluoroimmunoassay: Delfia | |
Loumaye et al. | No | Yes | Cycle | Cancel <2 follicles 20 mm | Not stated | Immunoradiometr.: IRMA | |
Tanbo et al. | No | No | Cycle | Cancel <3 follicles | Ongoing | Fluoroimm.assay: Delfia | |
Csemiczky et al. | No | No | Cycle | Not stated | Clinical | RIA: Diagn Prod. Inc. | |
Kahraman et al. | No | Yes | Cycle | Not stated | Ongoing | Immunometr.: Diagn. Prod. Corp. | |
Vd Stege et al. | Yes | Yes | Cycle | Cancel <3 follicles 18 mm | Clinical | RIA: Roche Diagn. | |
Csemiczky et al. | No | Yes | Cycle | Cancel <3 follicles 17 mm | Ongoing | RIA: Farmos Group | |
Yanushpolsky et al. | Yes | No | Retrieval | Not stated | Delivery | Techn. Imm. Syst.: Bayer Corp. | |
Kwee et al. | Yes | Yes | Cycle | Poor response <6 oocytes | Not stated | Immunomet.: Amerlite/Delfia | |
Erdem et al. | Yes | Yes | Cycle | Cancel <4 follicles 15 mm or Poor response <5 oocytes | Clinical | Chemolum. Immunometr. Assay | |
Hendriks et al. | Yes | Yes | Cycle | Poor response <4 oocytes or cancel no follicle growth | Ongoing | AxSYM immunoanal.: Abbott Lab. |
Author . | Consecutive . | One cycle per couple . | Data per . | Definition . | . | FSH-assay . | |
---|---|---|---|---|---|---|---|
. | . | . | . | Poor response/Cancel . | Pregnancy . | . | |
Tanbo et al. | No | Yes | Cycle | Cancel <3 follicles | Not stated | RIA: Amerlex | |
Tanbo et al. | No | No | Cycle | Cancel <2 follicles | Clinical | Fluoroimmunoassay: Delfia | |
Loumaye et al. | No | Yes | Cycle | Cancel <2 follicles 20 mm | Not stated | Immunoradiometr.: IRMA | |
Tanbo et al. | No | No | Cycle | Cancel <3 follicles | Ongoing | Fluoroimm.assay: Delfia | |
Csemiczky et al. | No | No | Cycle | Not stated | Clinical | RIA: Diagn Prod. Inc. | |
Kahraman et al. | No | Yes | Cycle | Not stated | Ongoing | Immunometr.: Diagn. Prod. Corp. | |
Vd Stege et al. | Yes | Yes | Cycle | Cancel <3 follicles 18 mm | Clinical | RIA: Roche Diagn. | |
Csemiczky et al. | No | Yes | Cycle | Cancel <3 follicles 17 mm | Ongoing | RIA: Farmos Group | |
Yanushpolsky et al. | Yes | No | Retrieval | Not stated | Delivery | Techn. Imm. Syst.: Bayer Corp. | |
Kwee et al. | Yes | Yes | Cycle | Poor response <6 oocytes | Not stated | Immunomet.: Amerlite/Delfia | |
Erdem et al. | Yes | Yes | Cycle | Cancel <4 follicles 15 mm or Poor response <5 oocytes | Clinical | Chemolum. Immunometr. Assay | |
Hendriks et al. | Yes | Yes | Cycle | Poor response <4 oocytes or cancel no follicle growth | Ongoing | AxSYM immunoanal.: Abbott Lab. |
Author . | Consecutive . | One cycle per couple . | Data per . | Definition . | . | FSH-assay . | |
---|---|---|---|---|---|---|---|
. | . | . | . | Poor response/Cancel . | Pregnancy . | . | |
Tanbo et al. | No | Yes | Cycle | Cancel <3 follicles | Not stated | RIA: Amerlex | |
Tanbo et al. | No | No | Cycle | Cancel <2 follicles | Clinical | Fluoroimmunoassay: Delfia | |
Loumaye et al. | No | Yes | Cycle | Cancel <2 follicles 20 mm | Not stated | Immunoradiometr.: IRMA | |
Tanbo et al. | No | No | Cycle | Cancel <3 follicles | Ongoing | Fluoroimm.assay: Delfia | |
Csemiczky et al. | No | No | Cycle | Not stated | Clinical | RIA: Diagn Prod. Inc. | |
Kahraman et al. | No | Yes | Cycle | Not stated | Ongoing | Immunometr.: Diagn. Prod. Corp. | |
Vd Stege et al. | Yes | Yes | Cycle | Cancel <3 follicles 18 mm | Clinical | RIA: Roche Diagn. | |
Csemiczky et al. | No | Yes | Cycle | Cancel <3 follicles 17 mm | Ongoing | RIA: Farmos Group | |
Yanushpolsky et al. | Yes | No | Retrieval | Not stated | Delivery | Techn. Imm. Syst.: Bayer Corp. | |
Kwee et al. | Yes | Yes | Cycle | Poor response <6 oocytes | Not stated | Immunomet.: Amerlite/Delfia | |
Erdem et al. | Yes | Yes | Cycle | Cancel <4 follicles 15 mm or Poor response <5 oocytes | Clinical | Chemolum. Immunometr. Assay | |
Hendriks et al. | Yes | Yes | Cycle | Poor response <4 oocytes or cancel no follicle growth | Ongoing | AxSYM immunoanal.: Abbott Lab. |
Accuracy of poor response prediction
For the prediction of ovarian response, sensitivities and specificities of each study are summarized in Table XXVIII. Homogeneity could not be rejected for sensitivity (χ2-test statistic: P-value 0.09), but had to be rejected for specificity (χ2-test statistic: P‐value <0.001). Therefore, calculation of one summary point estimate for sensitivity and specificity was not feasible. Moreover, values of the DOR (range 2.4–38.8) from the various studies appeared heterogeneous, indicating that the individual ROC curves were quite heterogeneous. Also, the Spearman correlation coefficient for sensitivity and specificity values was −0.46, which was judged not to be sufficient to estimate a summary ROC-curve. A plot of the sensitivity–specificity points in an ROC space is shown in Figure 16, showing the considerable heterogeneity which appeared not be attributable to differences in threshold level used.
Author . | Cycles (n) . | FSH threshold value (IU/L) . | Prediction of poor response . | . | . | . | Pre-CCCT probability (%) . | Post-CCCT probability (%) . | Proportion of patients/cycles with abnormal CCCT (%) . | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Sensitivity . | Specificity . | LR+ . | DOR . | . | . | . | |||
Tanbo et al. | 109 | Day 7 > 26 | 0.55 | 0.97 | 17.7 | 37.8 | 40 | 92 | 24 | |||
Tanbo et al. | 70 | Day 10 > 26 | 0.75 | 0.47 | 1.4 | 2.7 | 46 | 55 | 63 | |||
Loumaye et al. | 114 | Day 3 + 10 > 26.03 | 0.83 | 0.86 | 6.0 | 31.0 | 5 | 25 | 18 | |||
Tanbo et al. | 165 | Day 10 > 12 | 0.57 | 0.91 | 5.9 | 12.5 | 49 | 85 | 33 | |||
Kahraman et al. | 198 | Day 10 > 10 | 0.43 | 0.76 | 1.8 | 2.4 | 25 | 37 | 29 | |||
Vd Stege et al. | 51 | Day 3 or 10 > 10 | 0.50 | 0.82 | 2.7 | 4.4 | 4 | 10 | 20 | |||
Csemiczky et al. | 279 | Day 10 > 10 | 0.54 | 0.84 | 3.3 | 6.1 | 25 | 53 | 26 | |||
Kwee et al. | 56 | Day 3 + 10 > 14 | 0.93 | 0.68 | 2.9 | 30.1 | 27 | 52 | 48 | |||
Day 3 + 10 > 16 | 0.80 | 0.83 | 4.7 | 19.4 | 27 | 63 | 34 | |||||
Day 3 + 10 > 18 | 0.73 | 0.95 | 15.0 | 53.6 | 27 | 85 | 23 | |||||
Day 3 + 10 > 20 | 0.60 | 0.98 | 24.6 | 60.0 | 27 | 90 | 18 | |||||
Day 3 + 10 > 22 | 0.53 | 0.98 | 21.9 | 45.7 | 27 | 89 | 16 | |||||
Erdem et al. | 32 | Day 3 or 10 > 10 | 0.69 | 0.88 | 5.5 | 15.4 | 50 | 85 | 41 | |||
Hendriks et al. | 63 | Day 10 > 10 | 0.65 | 0.87 | 5.0 | 12.2 | 27 | 65 | 27 | |||
Day 10 > 15 | 0.35 | 0.96 | 8.1 | 12.0 | 27 | 75 | 13 |
Author . | Cycles (n) . | FSH threshold value (IU/L) . | Prediction of poor response . | . | . | . | Pre-CCCT probability (%) . | Post-CCCT probability (%) . | Proportion of patients/cycles with abnormal CCCT (%) . | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Sensitivity . | Specificity . | LR+ . | DOR . | . | . | . | |||
Tanbo et al. | 109 | Day 7 > 26 | 0.55 | 0.97 | 17.7 | 37.8 | 40 | 92 | 24 | |||
Tanbo et al. | 70 | Day 10 > 26 | 0.75 | 0.47 | 1.4 | 2.7 | 46 | 55 | 63 | |||
Loumaye et al. | 114 | Day 3 + 10 > 26.03 | 0.83 | 0.86 | 6.0 | 31.0 | 5 | 25 | 18 | |||
Tanbo et al. | 165 | Day 10 > 12 | 0.57 | 0.91 | 5.9 | 12.5 | 49 | 85 | 33 | |||
Kahraman et al. | 198 | Day 10 > 10 | 0.43 | 0.76 | 1.8 | 2.4 | 25 | 37 | 29 | |||
Vd Stege et al. | 51 | Day 3 or 10 > 10 | 0.50 | 0.82 | 2.7 | 4.4 | 4 | 10 | 20 | |||
Csemiczky et al. | 279 | Day 10 > 10 | 0.54 | 0.84 | 3.3 | 6.1 | 25 | 53 | 26 | |||
Kwee et al. | 56 | Day 3 + 10 > 14 | 0.93 | 0.68 | 2.9 | 30.1 | 27 | 52 | 48 | |||
Day 3 + 10 > 16 | 0.80 | 0.83 | 4.7 | 19.4 | 27 | 63 | 34 | |||||
Day 3 + 10 > 18 | 0.73 | 0.95 | 15.0 | 53.6 | 27 | 85 | 23 | |||||
Day 3 + 10 > 20 | 0.60 | 0.98 | 24.6 | 60.0 | 27 | 90 | 18 | |||||
Day 3 + 10 > 22 | 0.53 | 0.98 | 21.9 | 45.7 | 27 | 89 | 16 | |||||
Erdem et al. | 32 | Day 3 or 10 > 10 | 0.69 | 0.88 | 5.5 | 15.4 | 50 | 85 | 41 | |||
Hendriks et al. | 63 | Day 10 > 10 | 0.65 | 0.87 | 5.0 | 12.2 | 27 | 65 | 27 | |||
Day 10 > 15 | 0.35 | 0.96 | 8.1 | 12.0 | 27 | 75 | 13 |
DOR, diagnostic odds ratio; LR+, likelihood ratio for a positive test result.
If a study reported on multiple threshold values, data for all threshold values are shown.
Author . | Cycles (n) . | FSH threshold value (IU/L) . | Prediction of poor response . | . | . | . | Pre-CCCT probability (%) . | Post-CCCT probability (%) . | Proportion of patients/cycles with abnormal CCCT (%) . | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Sensitivity . | Specificity . | LR+ . | DOR . | . | . | . | |||
Tanbo et al. | 109 | Day 7 > 26 | 0.55 | 0.97 | 17.7 | 37.8 | 40 | 92 | 24 | |||
Tanbo et al. | 70 | Day 10 > 26 | 0.75 | 0.47 | 1.4 | 2.7 | 46 | 55 | 63 | |||
Loumaye et al. | 114 | Day 3 + 10 > 26.03 | 0.83 | 0.86 | 6.0 | 31.0 | 5 | 25 | 18 | |||
Tanbo et al. | 165 | Day 10 > 12 | 0.57 | 0.91 | 5.9 | 12.5 | 49 | 85 | 33 | |||
Kahraman et al. | 198 | Day 10 > 10 | 0.43 | 0.76 | 1.8 | 2.4 | 25 | 37 | 29 | |||
Vd Stege et al. | 51 | Day 3 or 10 > 10 | 0.50 | 0.82 | 2.7 | 4.4 | 4 | 10 | 20 | |||
Csemiczky et al. | 279 | Day 10 > 10 | 0.54 | 0.84 | 3.3 | 6.1 | 25 | 53 | 26 | |||
Kwee et al. | 56 | Day 3 + 10 > 14 | 0.93 | 0.68 | 2.9 | 30.1 | 27 | 52 | 48 | |||
Day 3 + 10 > 16 | 0.80 | 0.83 | 4.7 | 19.4 | 27 | 63 | 34 | |||||
Day 3 + 10 > 18 | 0.73 | 0.95 | 15.0 | 53.6 | 27 | 85 | 23 | |||||
Day 3 + 10 > 20 | 0.60 | 0.98 | 24.6 | 60.0 | 27 | 90 | 18 | |||||
Day 3 + 10 > 22 | 0.53 | 0.98 | 21.9 | 45.7 | 27 | 89 | 16 | |||||
Erdem et al. | 32 | Day 3 or 10 > 10 | 0.69 | 0.88 | 5.5 | 15.4 | 50 | 85 | 41 | |||
Hendriks et al. | 63 | Day 10 > 10 | 0.65 | 0.87 | 5.0 | 12.2 | 27 | 65 | 27 | |||
Day 10 > 15 | 0.35 | 0.96 | 8.1 | 12.0 | 27 | 75 | 13 |
Author . | Cycles (n) . | FSH threshold value (IU/L) . | Prediction of poor response . | . | . | . | Pre-CCCT probability (%) . | Post-CCCT probability (%) . | Proportion of patients/cycles with abnormal CCCT (%) . | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Sensitivity . | Specificity . | LR+ . | DOR . | . | . | . | |||
Tanbo et al. | 109 | Day 7 > 26 | 0.55 | 0.97 | 17.7 | 37.8 | 40 | 92 | 24 | |||
Tanbo et al. | 70 | Day 10 > 26 | 0.75 | 0.47 | 1.4 | 2.7 | 46 | 55 | 63 | |||
Loumaye et al. | 114 | Day 3 + 10 > 26.03 | 0.83 | 0.86 | 6.0 | 31.0 | 5 | 25 | 18 | |||
Tanbo et al. | 165 | Day 10 > 12 | 0.57 | 0.91 | 5.9 | 12.5 | 49 | 85 | 33 | |||
Kahraman et al. | 198 | Day 10 > 10 | 0.43 | 0.76 | 1.8 | 2.4 | 25 | 37 | 29 | |||
Vd Stege et al. | 51 | Day 3 or 10 > 10 | 0.50 | 0.82 | 2.7 | 4.4 | 4 | 10 | 20 | |||
Csemiczky et al. | 279 | Day 10 > 10 | 0.54 | 0.84 | 3.3 | 6.1 | 25 | 53 | 26 | |||
Kwee et al. | 56 | Day 3 + 10 > 14 | 0.93 | 0.68 | 2.9 | 30.1 | 27 | 52 | 48 | |||
Day 3 + 10 > 16 | 0.80 | 0.83 | 4.7 | 19.4 | 27 | 63 | 34 | |||||
Day 3 + 10 > 18 | 0.73 | 0.95 | 15.0 | 53.6 | 27 | 85 | 23 | |||||
Day 3 + 10 > 20 | 0.60 | 0.98 | 24.6 | 60.0 | 27 | 90 | 18 | |||||
Day 3 + 10 > 22 | 0.53 | 0.98 | 21.9 | 45.7 | 27 | 89 | 16 | |||||
Erdem et al. | 32 | Day 3 or 10 > 10 | 0.69 | 0.88 | 5.5 | 15.4 | 50 | 85 | 41 | |||
Hendriks et al. | 63 | Day 10 > 10 | 0.65 | 0.87 | 5.0 | 12.2 | 27 | 65 | 27 | |||
Day 10 > 15 | 0.35 | 0.96 | 8.1 | 12.0 | 27 | 75 | 13 |
DOR, diagnostic odds ratio; LR+, likelihood ratio for a positive test result.
If a study reported on multiple threshold values, data for all threshold values are shown.
Accuracy of non-pregnancy prediction
For the prediction on non-pregnancy, the sensitivities and specificities of each study are summarized in Table XXIX. Homogeneity was rejected for both sensitivity and specificity (χ2-test statistic: P-value <0.001 and 0.04, respectively) and calculation of one summary point estimate for sensitivity and specificity was not meaningful. Also, the values of the DOR in the various studies (range 1.0–35.4) appeared non-homogeneous. A plot of sensitivity–specificity points in an ROC space is shown in Figure 17. The Spearman correlation between sensitivity and specificity was −0.20, which again was judged not to be sufficient to estimate a summary ROC curve.
Author . | Cycles (n) . | FSH threshold value (IU/L) . | Prediction of non-pregnancy . | . | . | . | Pre-CCCT probability (%) . | Post-CCCT probability (%) . | Proportion of patients/cycles with abnormal CCCT (%) . | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Sensitivity . | Specificity . | LR+ . | DOR . | . | . | . | |||
Tanbo et al. | 70 | Day 10 > 26 | 0.66 | 0.80 | 3.3 | 7.8 | 93 | 98 | 63 | |||
Loumaye et al. | 114 | Day 3 + 10 > 26.03 | 0.23 | 0.96 | 6.5 | 8.2 | 76 | 95 | 19 | |||
Tanbo et al. | 165 | Day 10 > 12 | 0.34 | 0.86 | 2.4 | 3.1 | 96 | 98 | 33 | |||
Csemiczky et al. | 53 | Day 10 > 7 | 0.61 | 0.96 | 14.5 | 35.4 | 58 | 95 | 37 | |||
Kahraman et al. | 198 | Day 10 > 10 | 0.30 | 0.85 | 2.4 | 3.0 | 92 | 96 | 29 | |||
Vd Stege et al. | 51 | Day 3 or 10 > 10 | 0.22 | 0.84 | 1.4 | 1.5 | 63 | 70 | 20 | |||
Csemiczky et al. | 140 | Day 10 > 10 | 0.30 | 0.97 | 8.6 | 11.8 | 79 | 97 | 24 | |||
Yanushpolsky et al. | 483 | Day 10 > 10 | 0.36 | 0.82 | 2.0 | 2.5 | 62 | 76 | 29 | |||
Erdem et al. | 32 | Day 3 or 10 > 10 | 0.45 | 0.67 | 1.4 | 1.6 | 63 | 69 | 41 | |||
Hendriks et al. | 63 | Day 10 > 10 | 0.27 | 0.73 | 1.0 | 1.0 | 76 | 77 | 27 | |||
Day 10 > 15 | 0.13 | 0.87 | 0.9 | 0.9 | 76 | 75 | 13 |
Author . | Cycles (n) . | FSH threshold value (IU/L) . | Prediction of non-pregnancy . | . | . | . | Pre-CCCT probability (%) . | Post-CCCT probability (%) . | Proportion of patients/cycles with abnormal CCCT (%) . | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Sensitivity . | Specificity . | LR+ . | DOR . | . | . | . | |||
Tanbo et al. | 70 | Day 10 > 26 | 0.66 | 0.80 | 3.3 | 7.8 | 93 | 98 | 63 | |||
Loumaye et al. | 114 | Day 3 + 10 > 26.03 | 0.23 | 0.96 | 6.5 | 8.2 | 76 | 95 | 19 | |||
Tanbo et al. | 165 | Day 10 > 12 | 0.34 | 0.86 | 2.4 | 3.1 | 96 | 98 | 33 | |||
Csemiczky et al. | 53 | Day 10 > 7 | 0.61 | 0.96 | 14.5 | 35.4 | 58 | 95 | 37 | |||
Kahraman et al. | 198 | Day 10 > 10 | 0.30 | 0.85 | 2.4 | 3.0 | 92 | 96 | 29 | |||
Vd Stege et al. | 51 | Day 3 or 10 > 10 | 0.22 | 0.84 | 1.4 | 1.5 | 63 | 70 | 20 | |||
Csemiczky et al. | 140 | Day 10 > 10 | 0.30 | 0.97 | 8.6 | 11.8 | 79 | 97 | 24 | |||
Yanushpolsky et al. | 483 | Day 10 > 10 | 0.36 | 0.82 | 2.0 | 2.5 | 62 | 76 | 29 | |||
Erdem et al. | 32 | Day 3 or 10 > 10 | 0.45 | 0.67 | 1.4 | 1.6 | 63 | 69 | 41 | |||
Hendriks et al. | 63 | Day 10 > 10 | 0.27 | 0.73 | 1.0 | 1.0 | 76 | 77 | 27 | |||
Day 10 > 15 | 0.13 | 0.87 | 0.9 | 0.9 | 76 | 75 | 13 |
DOR, diagnostic odds ratio; LR+, likelihood ratio for a positive test result.
If a study reported on multiple threshold values, data for all threshold values are shown.
Author . | Cycles (n) . | FSH threshold value (IU/L) . | Prediction of non-pregnancy . | . | . | . | Pre-CCCT probability (%) . | Post-CCCT probability (%) . | Proportion of patients/cycles with abnormal CCCT (%) . | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Sensitivity . | Specificity . | LR+ . | DOR . | . | . | . | |||
Tanbo et al. | 70 | Day 10 > 26 | 0.66 | 0.80 | 3.3 | 7.8 | 93 | 98 | 63 | |||
Loumaye et al. | 114 | Day 3 + 10 > 26.03 | 0.23 | 0.96 | 6.5 | 8.2 | 76 | 95 | 19 | |||
Tanbo et al. | 165 | Day 10 > 12 | 0.34 | 0.86 | 2.4 | 3.1 | 96 | 98 | 33 | |||
Csemiczky et al. | 53 | Day 10 > 7 | 0.61 | 0.96 | 14.5 | 35.4 | 58 | 95 | 37 | |||
Kahraman et al. | 198 | Day 10 > 10 | 0.30 | 0.85 | 2.4 | 3.0 | 92 | 96 | 29 | |||
Vd Stege et al. | 51 | Day 3 or 10 > 10 | 0.22 | 0.84 | 1.4 | 1.5 | 63 | 70 | 20 | |||
Csemiczky et al. | 140 | Day 10 > 10 | 0.30 | 0.97 | 8.6 | 11.8 | 79 | 97 | 24 | |||
Yanushpolsky et al. | 483 | Day 10 > 10 | 0.36 | 0.82 | 2.0 | 2.5 | 62 | 76 | 29 | |||
Erdem et al. | 32 | Day 3 or 10 > 10 | 0.45 | 0.67 | 1.4 | 1.6 | 63 | 69 | 41 | |||
Hendriks et al. | 63 | Day 10 > 10 | 0.27 | 0.73 | 1.0 | 1.0 | 76 | 77 | 27 | |||
Day 10 > 15 | 0.13 | 0.87 | 0.9 | 0.9 | 76 | 75 | 13 |
Author . | Cycles (n) . | FSH threshold value (IU/L) . | Prediction of non-pregnancy . | . | . | . | Pre-CCCT probability (%) . | Post-CCCT probability (%) . | Proportion of patients/cycles with abnormal CCCT (%) . | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Sensitivity . | Specificity . | LR+ . | DOR . | . | . | . | |||
Tanbo et al. | 70 | Day 10 > 26 | 0.66 | 0.80 | 3.3 | 7.8 | 93 | 98 | 63 | |||
Loumaye et al. | 114 | Day 3 + 10 > 26.03 | 0.23 | 0.96 | 6.5 | 8.2 | 76 | 95 | 19 | |||
Tanbo et al. | 165 | Day 10 > 12 | 0.34 | 0.86 | 2.4 | 3.1 | 96 | 98 | 33 | |||
Csemiczky et al. | 53 | Day 10 > 7 | 0.61 | 0.96 | 14.5 | 35.4 | 58 | 95 | 37 | |||
Kahraman et al. | 198 | Day 10 > 10 | 0.30 | 0.85 | 2.4 | 3.0 | 92 | 96 | 29 | |||
Vd Stege et al. | 51 | Day 3 or 10 > 10 | 0.22 | 0.84 | 1.4 | 1.5 | 63 | 70 | 20 | |||
Csemiczky et al. | 140 | Day 10 > 10 | 0.30 | 0.97 | 8.6 | 11.8 | 79 | 97 | 24 | |||
Yanushpolsky et al. | 483 | Day 10 > 10 | 0.36 | 0.82 | 2.0 | 2.5 | 62 | 76 | 29 | |||
Erdem et al. | 32 | Day 3 or 10 > 10 | 0.45 | 0.67 | 1.4 | 1.6 | 63 | 69 | 41 | |||
Hendriks et al. | 63 | Day 10 > 10 | 0.27 | 0.73 | 1.0 | 1.0 | 76 | 77 | 27 | |||
Day 10 > 15 | 0.13 | 0.87 | 0.9 | 0.9 | 76 | 75 | 13 |
DOR, diagnostic odds ratio; LR+, likelihood ratio for a positive test result.
If a study reported on multiple threshold values, data for all threshold values are shown.
Clinical value
Because of the absence of estimated ROC curves for response and non-pregnancy prediction, the interrelation between positive LR, post-test probability and percentage of abnormal tests could not be calculated. It is considered that a challenge test used as a diagnostic tool to identify poor responders should have sensitivity and specificity at a certain desired level. If these levels are set at 75 and 85%, respectively, it can be concluded from Figure 16 that hardly any study will fulfil these criteria. Moreover, in comparative studies the clinical performance of the CCCT in response prediction appeared not better than that of the AFC or FSH (Jain et al., 2004; Hendriks et al., 2005c). Regarding prediction of non-pregnancy, desired levels for a test that excludes cases from entering an IVF program should arbitrarily be set at 40% for sensitivity and 95% for specificity. The vast majority of studies fail to reach both criteria as shown in Figure 17 As such the CCCT performs no better than other tests like the AFC or basal FSH, especially because of a loss in specificity.
Exogenous FSH ORT
Systematic review
We detected three studies from the literature reporting on the predictive capacity of the exogenous FSH ORT (EFORT) that were suitable for data extraction (Fanchin et al., 1994; Kwee et al., 2003; Yong et al., 2003). The characteristics of these studies are listed in addendum Table XXX.
Author . | Consecutive . | One cycle per couple . | Data per . | Definition . | . | Estradiol-assay . | |
---|---|---|---|---|---|---|---|
. | . | . | . | Poor response/Cancel . | Pregnancy . | . | |
Fanchin et al. | Yes | Yes | Cycle | <3 oocytes | Not stated | Estradiol-60 Amerlite (Kodak clin. Diagn. UK) | |
Kwee et al. | Yes | Yes | Cycle | <6 oocytes | Not stated | Amerlite (Amersham UK) | |
Yong et al. | No | Yes | Cycle | <4 oocytes or cancel | Not stated | Radioimmunoassay |
Author . | Consecutive . | One cycle per couple . | Data per . | Definition . | . | Estradiol-assay . | |
---|---|---|---|---|---|---|---|
. | . | . | . | Poor response/Cancel . | Pregnancy . | . | |
Fanchin et al. | Yes | Yes | Cycle | <3 oocytes | Not stated | Estradiol-60 Amerlite (Kodak clin. Diagn. UK) | |
Kwee et al. | Yes | Yes | Cycle | <6 oocytes | Not stated | Amerlite (Amersham UK) | |
Yong et al. | No | Yes | Cycle | <4 oocytes or cancel | Not stated | Radioimmunoassay |
Author . | Consecutive . | One cycle per couple . | Data per . | Definition . | . | Estradiol-assay . | |
---|---|---|---|---|---|---|---|
. | . | . | . | Poor response/Cancel . | Pregnancy . | . | |
Fanchin et al. | Yes | Yes | Cycle | <3 oocytes | Not stated | Estradiol-60 Amerlite (Kodak clin. Diagn. UK) | |
Kwee et al. | Yes | Yes | Cycle | <6 oocytes | Not stated | Amerlite (Amersham UK) | |
Yong et al. | No | Yes | Cycle | <4 oocytes or cancel | Not stated | Radioimmunoassay |
Author . | Consecutive . | One cycle per couple . | Data per . | Definition . | . | Estradiol-assay . | |
---|---|---|---|---|---|---|---|
. | . | . | . | Poor response/Cancel . | Pregnancy . | . | |
Fanchin et al. | Yes | Yes | Cycle | <3 oocytes | Not stated | Estradiol-60 Amerlite (Kodak clin. Diagn. UK) | |
Kwee et al. | Yes | Yes | Cycle | <6 oocytes | Not stated | Amerlite (Amersham UK) | |
Yong et al. | No | Yes | Cycle | <4 oocytes or cancel | Not stated | Radioimmunoassay |
Accuracy of poor response prediction
The individual values for sensitivity and specificity pairs are summarized in Table XXXI and plotted in Figure 18. As can be seen from this ROC space, the three detected studies report sensitivities around 80%, whereas specificities vary around 60% in the study of Kwee et al. and Yong et al. and above 90% in the study of Fanchin et al. In view of these different results between the studies, further assessment of heterogeneity appeared not useful and therefore a summary point or curve in the ROC space could not be constructed.
Author . | Cycles (n) . | Estradiol threshold value (pmol/l) . | Prediction of poor response . | . | . | . | Pre-EFORT probability (%) . | Post-EFORT probability (%) . | Proportion of patients/cycles with abnormal EFORT (%) . | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Sensitivity . | Specificity . | LR+ . | DOR . | . | . | . | |||
Fanchin et al. | 52 | < 110 | 0.79 | 0.92 | 2.7 | 42.8 | 27 | 79 | 27 | |||
Kwee et al. | 54 | < 110 | 0.64 | 0.68 | 1.98 | 3.7 | 26 | 41 | 41 | |||
< 120 | 0.64 | 0.65 | 1.8 | 3.3 | 26 | 39 | 43 | |||||
< 130 | 0.71 | 0.65 | 2.0 | 4.6 | 26 | 42 | 44 | |||||
< 140 | 0.79 | 0.60 | 1.96 | 5.5 | 26 | 41 | 50 | |||||
< 150 | 0.86 | 0.58 | 2.0 | 8.1 | 26 | 41 | 54 | |||||
Yong et al. | 46 | < 124 | 0.50 | 0.68 | 1.6 | 2.2 | 17 | 25 | 35 |
Author . | Cycles (n) . | Estradiol threshold value (pmol/l) . | Prediction of poor response . | . | . | . | Pre-EFORT probability (%) . | Post-EFORT probability (%) . | Proportion of patients/cycles with abnormal EFORT (%) . | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Sensitivity . | Specificity . | LR+ . | DOR . | . | . | . | |||
Fanchin et al. | 52 | < 110 | 0.79 | 0.92 | 2.7 | 42.8 | 27 | 79 | 27 | |||
Kwee et al. | 54 | < 110 | 0.64 | 0.68 | 1.98 | 3.7 | 26 | 41 | 41 | |||
< 120 | 0.64 | 0.65 | 1.8 | 3.3 | 26 | 39 | 43 | |||||
< 130 | 0.71 | 0.65 | 2.0 | 4.6 | 26 | 42 | 44 | |||||
< 140 | 0.79 | 0.60 | 1.96 | 5.5 | 26 | 41 | 50 | |||||
< 150 | 0.86 | 0.58 | 2.0 | 8.1 | 26 | 41 | 54 | |||||
Yong et al. | 46 | < 124 | 0.50 | 0.68 | 1.6 | 2.2 | 17 | 25 | 35 |
DOR, diagnostic odds ratio; LR+, likelihood ratio for a positive test result.
If a study reported on multiple threshold values, data for all threshold values are shown.
Author . | Cycles (n) . | Estradiol threshold value (pmol/l) . | Prediction of poor response . | . | . | . | Pre-EFORT probability (%) . | Post-EFORT probability (%) . | Proportion of patients/cycles with abnormal EFORT (%) . | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Sensitivity . | Specificity . | LR+ . | DOR . | . | . | . | |||
Fanchin et al. | 52 | < 110 | 0.79 | 0.92 | 2.7 | 42.8 | 27 | 79 | 27 | |||
Kwee et al. | 54 | < 110 | 0.64 | 0.68 | 1.98 | 3.7 | 26 | 41 | 41 | |||
< 120 | 0.64 | 0.65 | 1.8 | 3.3 | 26 | 39 | 43 | |||||
< 130 | 0.71 | 0.65 | 2.0 | 4.6 | 26 | 42 | 44 | |||||
< 140 | 0.79 | 0.60 | 1.96 | 5.5 | 26 | 41 | 50 | |||||
< 150 | 0.86 | 0.58 | 2.0 | 8.1 | 26 | 41 | 54 | |||||
Yong et al. | 46 | < 124 | 0.50 | 0.68 | 1.6 | 2.2 | 17 | 25 | 35 |
Author . | Cycles (n) . | Estradiol threshold value (pmol/l) . | Prediction of poor response . | . | . | . | Pre-EFORT probability (%) . | Post-EFORT probability (%) . | Proportion of patients/cycles with abnormal EFORT (%) . | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Sensitivity . | Specificity . | LR+ . | DOR . | . | . | . | |||
Fanchin et al. | 52 | < 110 | 0.79 | 0.92 | 2.7 | 42.8 | 27 | 79 | 27 | |||
Kwee et al. | 54 | < 110 | 0.64 | 0.68 | 1.98 | 3.7 | 26 | 41 | 41 | |||
< 120 | 0.64 | 0.65 | 1.8 | 3.3 | 26 | 39 | 43 | |||||
< 130 | 0.71 | 0.65 | 2.0 | 4.6 | 26 | 42 | 44 | |||||
< 140 | 0.79 | 0.60 | 1.96 | 5.5 | 26 | 41 | 50 | |||||
< 150 | 0.86 | 0.58 | 2.0 | 8.1 | 26 | 41 | 54 | |||||
Yong et al. | 46 | < 124 | 0.50 | 0.68 | 1.6 | 2.2 | 17 | 25 | 35 |
DOR, diagnostic odds ratio; LR+, likelihood ratio for a positive test result.
If a study reported on multiple threshold values, data for all threshold values are shown.
Accuracy of non-pregnancy prediction
No single study reported on the predictive accuracy using the outcome pregnancy as test reference.
Clinical value
Because of the absence of an estimated ROC curve for poor response prediction, the interrelation between positive LR, post-test probability and percentage of abnormal tests could not be calculated. It is considered that a challenge test used as a diagnostic tool to identify poor responders should have sensitivity and specificity at a certain desired level. If these levels are set at a minimum level of 75 and 85%, respectively, it can be concluded from Figure 18 that only one study fulfils these criteria (Fanchin et al., 1994). Especially, the false positive prediction may hamper the use of this test if a high level of detection is needed and patients are refused IVF on the basis of the test result. Finally, in comparison to basal tests, challenge tests should clearly improve prediction if they are to be preferred.
Gonadotrophin: releasing hormone agonist stimulation test
Systematic review
Through the search and selection strategy, a total of four studies reporting on the predictive capacity of the Gonadotrophin releasing hormone agonist stimulation test (GAST) were identified and considered suitable for data extraction and meta-analysis (Ranieri et al., 1998; Padilla et al., 1990; Winslow et al., 1991; Hendriks et al., 2005b). Characteristics of the included studies are listed in addendum Table XXXII. Considerable variation among the definitions of poor response and the study quality and design characteristics was observed, but as only three studies reported on each of the two endpoints, a systematic analysis of these study characteristics was not indicated.
Author . | Consecutive . | One cycle per couple . | Data per . | Definition . | . | Estradiol assay . | |
---|---|---|---|---|---|---|---|
. | . | . | . | Poor response/Cancel . | Pregnancy . | . | |
Padilla et al. | No | No | Cycle | Not stated | Clinical | RIA (Diagnostic Products USA) | |
Winslow et al. | Yes | Yes | Cycle | Not stated | Clinical | Radioimmunoassay (Pantex CA) | |
Ranieri et al. | No | Yes | Cycle | <5 follicles 15 mm | Not stated | RIA (Amersham Int. UK) | |
Hendriks et al. | Yes | Yes | Cycle | <4 oocytes or <3 follicles 18 mm | Ongoing | AxSYM immunoanalyser (Abbott Lab USA) |
Author . | Consecutive . | One cycle per couple . | Data per . | Definition . | . | Estradiol assay . | |
---|---|---|---|---|---|---|---|
. | . | . | . | Poor response/Cancel . | Pregnancy . | . | |
Padilla et al. | No | No | Cycle | Not stated | Clinical | RIA (Diagnostic Products USA) | |
Winslow et al. | Yes | Yes | Cycle | Not stated | Clinical | Radioimmunoassay (Pantex CA) | |
Ranieri et al. | No | Yes | Cycle | <5 follicles 15 mm | Not stated | RIA (Amersham Int. UK) | |
Hendriks et al. | Yes | Yes | Cycle | <4 oocytes or <3 follicles 18 mm | Ongoing | AxSYM immunoanalyser (Abbott Lab USA) |
Author . | Consecutive . | One cycle per couple . | Data per . | Definition . | . | Estradiol assay . | |
---|---|---|---|---|---|---|---|
. | . | . | . | Poor response/Cancel . | Pregnancy . | . | |
Padilla et al. | No | No | Cycle | Not stated | Clinical | RIA (Diagnostic Products USA) | |
Winslow et al. | Yes | Yes | Cycle | Not stated | Clinical | Radioimmunoassay (Pantex CA) | |
Ranieri et al. | No | Yes | Cycle | <5 follicles 15 mm | Not stated | RIA (Amersham Int. UK) | |
Hendriks et al. | Yes | Yes | Cycle | <4 oocytes or <3 follicles 18 mm | Ongoing | AxSYM immunoanalyser (Abbott Lab USA) |
Author . | Consecutive . | One cycle per couple . | Data per . | Definition . | . | Estradiol assay . | |
---|---|---|---|---|---|---|---|
. | . | . | . | Poor response/Cancel . | Pregnancy . | . | |
Padilla et al. | No | No | Cycle | Not stated | Clinical | RIA (Diagnostic Products USA) | |
Winslow et al. | Yes | Yes | Cycle | Not stated | Clinical | Radioimmunoassay (Pantex CA) | |
Ranieri et al. | No | Yes | Cycle | <5 follicles 15 mm | Not stated | RIA (Amersham Int. UK) | |
Hendriks et al. | Yes | Yes | Cycle | <4 oocytes or <3 follicles 18 mm | Ongoing | AxSYM immunoanalyser (Abbott Lab USA) |
Accuracy of poor response prediction
There were three studies that reported on the prediction of poor response. The sensitivities and specificities, the positive LR and the DOR for the prediction of poor ovarian response, as calculated from each study, are summarized in Table XXXIII. Calculation of one summary point estimate for sensitivity and specificity was not meaningful as both test characteristics shown in Figure 19 were heterogeneous among studies (χ2-test statistic: P-value for sensitivity <0.001 and P-value for specificity 0.014). As the Spearman correlation coefficient for sensitivity and specificity was −0.57, it appeared justified to estimate a summary ROC curve as shown in Figure 19.
Author . | Cycles (n) . | Estradiol threshold value (pmol/l) . | Prediction of poor response . | . | . | . | Pre-GAST probability (%) . | Post-GAST probability (%) . | Proportion of patients/cycles with abnormal GAST (%) . | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Sensitivity . | Specificity . | LR+ . | DOR . | . | . | . | |||
Winslow et al. | 228 | E2 /E1 < 2 | 0.58 | 0.95 | 11.5 | 26.1 | 5 | 39 | 8 | |||
Ranieri et al. | 177 | ΔE2 < 180 | 0.89 | 0.86 | 6.4 | 53.0 | 27 | 70 | 34 | |||
Hendriks et al. | 57 | ΔE2 < 80 | 0.32 | 0.97 | 12.0 | 17.1 | 33 | 86 | 12 | |||
ΔE2 < 100 | 0.37 | 0.89 | 3.5 | 4.6 | 33 | 64 | 19 | |||||
ΔE2 < 180 | 0.68 | 0.79 | 3.3 | 8.1 | 33 | 62 | 37 |
Author . | Cycles (n) . | Estradiol threshold value (pmol/l) . | Prediction of poor response . | . | . | . | Pre-GAST probability (%) . | Post-GAST probability (%) . | Proportion of patients/cycles with abnormal GAST (%) . | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Sensitivity . | Specificity . | LR+ . | DOR . | . | . | . | |||
Winslow et al. | 228 | E2 /E1 < 2 | 0.58 | 0.95 | 11.5 | 26.1 | 5 | 39 | 8 | |||
Ranieri et al. | 177 | ΔE2 < 180 | 0.89 | 0.86 | 6.4 | 53.0 | 27 | 70 | 34 | |||
Hendriks et al. | 57 | ΔE2 < 80 | 0.32 | 0.97 | 12.0 | 17.1 | 33 | 86 | 12 | |||
ΔE2 < 100 | 0.37 | 0.89 | 3.5 | 4.6 | 33 | 64 | 19 | |||||
ΔE2 < 180 | 0.68 | 0.79 | 3.3 | 8.1 | 33 | 62 | 37 |
DOR, diagnostic odds ratio; LR+, likelihood ratio for a positive test result.
If a study reported on multiple threshold values, data for all threshold values are shown.
Author . | Cycles (n) . | Estradiol threshold value (pmol/l) . | Prediction of poor response . | . | . | . | Pre-GAST probability (%) . | Post-GAST probability (%) . | Proportion of patients/cycles with abnormal GAST (%) . | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Sensitivity . | Specificity . | LR+ . | DOR . | . | . | . | |||
Winslow et al. | 228 | E2 /E1 < 2 | 0.58 | 0.95 | 11.5 | 26.1 | 5 | 39 | 8 | |||
Ranieri et al. | 177 | ΔE2 < 180 | 0.89 | 0.86 | 6.4 | 53.0 | 27 | 70 | 34 | |||
Hendriks et al. | 57 | ΔE2 < 80 | 0.32 | 0.97 | 12.0 | 17.1 | 33 | 86 | 12 | |||
ΔE2 < 100 | 0.37 | 0.89 | 3.5 | 4.6 | 33 | 64 | 19 | |||||
ΔE2 < 180 | 0.68 | 0.79 | 3.3 | 8.1 | 33 | 62 | 37 |
Author . | Cycles (n) . | Estradiol threshold value (pmol/l) . | Prediction of poor response . | . | . | . | Pre-GAST probability (%) . | Post-GAST probability (%) . | Proportion of patients/cycles with abnormal GAST (%) . | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Sensitivity . | Specificity . | LR+ . | DOR . | . | . | . | |||
Winslow et al. | 228 | E2 /E1 < 2 | 0.58 | 0.95 | 11.5 | 26.1 | 5 | 39 | 8 | |||
Ranieri et al. | 177 | ΔE2 < 180 | 0.89 | 0.86 | 6.4 | 53.0 | 27 | 70 | 34 | |||
Hendriks et al. | 57 | ΔE2 < 80 | 0.32 | 0.97 | 12.0 | 17.1 | 33 | 86 | 12 | |||
ΔE2 < 100 | 0.37 | 0.89 | 3.5 | 4.6 | 33 | 64 | 19 | |||||
ΔE2 < 180 | 0.68 | 0.79 | 3.3 | 8.1 | 33 | 62 | 37 |
DOR, diagnostic odds ratio; LR+, likelihood ratio for a positive test result.
If a study reported on multiple threshold values, data for all threshold values are shown.
Accuracy of non-pregnancy prediction
There were also three studies that reported on the capacity of the GAST to predict non-pregnancy after IVF. Sensitivities and specificities for the prediction of non-pregnancy, as calculated from each study, are summarized in Table XXXIV. Again, sensitivity and specificity, as shown in Figure 20, were heterogeneous between studies (χ2-test statistic: P-value for sensitivity <0.001 and P-value for specificity 0.005). The Spearman correlation between sensitivity and specificity showed a coefficient of −0.98, sufficient to estimate a summary ROC curve (Figure 20).
Author . | Cycles (n) . | Estradiol threshold value (pmol/l) . | Prediction of non-pregnancy . | . | . | . | Pre-GAST probability (%) . | Post-GAST probability (%) . | Proportion of patients/cycles with abnormal GAST (%) . | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Sensitivity . | Specificity . | LR+ . | DOR . | . | . | . | |||
Padilla et al. | 97 | E2/E1 < 2 | 0.27 | 0.90 | 2.8 | 3.5 | 68 | 86 | 22 | |||
Winslow et al. | 228 | ΔE2 < 50 | 0.42 | 0.70 | 1.4 | 1.69 | 77 | 82 | 39 | |||
ΔE2 < 75 | 0.66 | 0.53 | 1.4 | 2.2 | 77 | 82 | 62 | |||||
ΔE2 < 100 | 0.76 | 0.38 | 1.2 | 1.92 | 77 | 80 | 73 | |||||
Hendriks et al. | 57 | ΔE2 < 80 | 0.16 | 1.00 | 2.4 | 2.7 | 79 | 89 | 12 | |||
ΔE2 < 100 | 0.24 | 1.00 | 3.6 | 4.6 | 79 | 92 | 19 | |||||
ΔE2 < 180 | 0.40 | 0.75 | 1.6 | 2.0 | 79 | 86 | 37 |
Author . | Cycles (n) . | Estradiol threshold value (pmol/l) . | Prediction of non-pregnancy . | . | . | . | Pre-GAST probability (%) . | Post-GAST probability (%) . | Proportion of patients/cycles with abnormal GAST (%) . | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Sensitivity . | Specificity . | LR+ . | DOR . | . | . | . | |||
Padilla et al. | 97 | E2/E1 < 2 | 0.27 | 0.90 | 2.8 | 3.5 | 68 | 86 | 22 | |||
Winslow et al. | 228 | ΔE2 < 50 | 0.42 | 0.70 | 1.4 | 1.69 | 77 | 82 | 39 | |||
ΔE2 < 75 | 0.66 | 0.53 | 1.4 | 2.2 | 77 | 82 | 62 | |||||
ΔE2 < 100 | 0.76 | 0.38 | 1.2 | 1.92 | 77 | 80 | 73 | |||||
Hendriks et al. | 57 | ΔE2 < 80 | 0.16 | 1.00 | 2.4 | 2.7 | 79 | 89 | 12 | |||
ΔE2 < 100 | 0.24 | 1.00 | 3.6 | 4.6 | 79 | 92 | 19 | |||||
ΔE2 < 180 | 0.40 | 0.75 | 1.6 | 2.0 | 79 | 86 | 37 |
DOR, diagnostic odds ratio; LR+, likelihood ratio for a positive test result.
If a study reported on multiple threshold values, data for all threshold values are shown.
Author . | Cycles (n) . | Estradiol threshold value (pmol/l) . | Prediction of non-pregnancy . | . | . | . | Pre-GAST probability (%) . | Post-GAST probability (%) . | Proportion of patients/cycles with abnormal GAST (%) . | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Sensitivity . | Specificity . | LR+ . | DOR . | . | . | . | |||
Padilla et al. | 97 | E2/E1 < 2 | 0.27 | 0.90 | 2.8 | 3.5 | 68 | 86 | 22 | |||
Winslow et al. | 228 | ΔE2 < 50 | 0.42 | 0.70 | 1.4 | 1.69 | 77 | 82 | 39 | |||
ΔE2 < 75 | 0.66 | 0.53 | 1.4 | 2.2 | 77 | 82 | 62 | |||||
ΔE2 < 100 | 0.76 | 0.38 | 1.2 | 1.92 | 77 | 80 | 73 | |||||
Hendriks et al. | 57 | ΔE2 < 80 | 0.16 | 1.00 | 2.4 | 2.7 | 79 | 89 | 12 | |||
ΔE2 < 100 | 0.24 | 1.00 | 3.6 | 4.6 | 79 | 92 | 19 | |||||
ΔE2 < 180 | 0.40 | 0.75 | 1.6 | 2.0 | 79 | 86 | 37 |
Author . | Cycles (n) . | Estradiol threshold value (pmol/l) . | Prediction of non-pregnancy . | . | . | . | Pre-GAST probability (%) . | Post-GAST probability (%) . | Proportion of patients/cycles with abnormal GAST (%) . | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Sensitivity . | Specificity . | LR+ . | DOR . | . | . | . | |||
Padilla et al. | 97 | E2/E1 < 2 | 0.27 | 0.90 | 2.8 | 3.5 | 68 | 86 | 22 | |||
Winslow et al. | 228 | ΔE2 < 50 | 0.42 | 0.70 | 1.4 | 1.69 | 77 | 82 | 39 | |||
ΔE2 < 75 | 0.66 | 0.53 | 1.4 | 2.2 | 77 | 82 | 62 | |||||
ΔE2 < 100 | 0.76 | 0.38 | 1.2 | 1.92 | 77 | 80 | 73 | |||||
Hendriks et al. | 57 | ΔE2 < 80 | 0.16 | 1.00 | 2.4 | 2.7 | 79 | 89 | 12 | |||
ΔE2 < 100 | 0.24 | 1.00 | 3.6 | 4.6 | 79 | 92 | 19 | |||||
ΔE2 < 180 | 0.40 | 0.75 | 1.6 | 2.0 | 79 | 86 | 37 |
DOR, diagnostic odds ratio; LR+, likelihood ratio for a positive test result.
If a study reported on multiple threshold values, data for all threshold values are shown.
Clinical value
Based on the summary ROC curves depicted in Figure 19, a range of positive LRs was calculated and for each ratio, pre-GAST-test probabilities of poor response or non-pregnancy (set at 20 and 80%, respectively) were converted into post-GAST-test probabilities. Table XXXV depicts the probability of obtaining a certain GAST test result and the corresponding LR within different LR ranges for the prediction of poor response and non-pregnancy. At a modest LR of 4–5, the post-GAST-test probability of poor response will not be higher than ∼50%, while the chance of obtaining such a test result is quite high, 49%. However, only with an extreme threshold a post-test probability of poor response that approaches 70% can be retained in a considerable number of cases (30%).
Prediction of poor response (pre-test probability = 20%) . | . | . | Prediction of non-pregnancy (pre-test probability = 80%) . | . | . | ||||
---|---|---|---|---|---|---|---|---|---|
LR range . | Occurrence of test results in this range (%) . | Post-test probability of poor response (%) . | LR range . | Occurrence of test results in this range (%) . | Post-test probability of non-pregnancy (%) . | ||||
0–1 | 31 | <20 | 0–1 | 70 | <80 | ||||
1–2 | 8 | 20–33 | 1–2 | 22 | 80–89 | ||||
2–3 | 5 | 33–43 | 2–3 | 2 | 89–93 | ||||
3–4 | 6 | 43–50 | 3–4 | 6 | 93–94 | ||||
4–5 | 3 | 50–56 | 4–5 | 0 | 94–95 | ||||
5–6 | 4 | 56–60 | 5–6 | 0 | 95–96 | ||||
6–7 | 5 | 60–64 | 6–7 | 0 | 96–96.5 | ||||
7–8 | 7 | 64–67 | 7–8 | 0 | 96.5–97 | ||||
>8 | 30 | >67 | >8 | 0 | >97 |
Prediction of poor response (pre-test probability = 20%) . | . | . | Prediction of non-pregnancy (pre-test probability = 80%) . | . | . | ||||
---|---|---|---|---|---|---|---|---|---|
LR range . | Occurrence of test results in this range (%) . | Post-test probability of poor response (%) . | LR range . | Occurrence of test results in this range (%) . | Post-test probability of non-pregnancy (%) . | ||||
0–1 | 31 | <20 | 0–1 | 70 | <80 | ||||
1–2 | 8 | 20–33 | 1–2 | 22 | 80–89 | ||||
2–3 | 5 | 33–43 | 2–3 | 2 | 89–93 | ||||
3–4 | 6 | 43–50 | 3–4 | 6 | 93–94 | ||||
4–5 | 3 | 50–56 | 4–5 | 0 | 94–95 | ||||
5–6 | 4 | 56–60 | 5–6 | 0 | 95–96 | ||||
6–7 | 5 | 60–64 | 6–7 | 0 | 96–96.5 | ||||
7–8 | 7 | 64–67 | 7–8 | 0 | 96.5–97 | ||||
>8 | 30 | >67 | >8 | 0 | >97 |
Prediction of poor response (pre-test probability = 20%) . | . | . | Prediction of non-pregnancy (pre-test probability = 80%) . | . | . | ||||
---|---|---|---|---|---|---|---|---|---|
LR range . | Occurrence of test results in this range (%) . | Post-test probability of poor response (%) . | LR range . | Occurrence of test results in this range (%) . | Post-test probability of non-pregnancy (%) . | ||||
0–1 | 31 | <20 | 0–1 | 70 | <80 | ||||
1–2 | 8 | 20–33 | 1–2 | 22 | 80–89 | ||||
2–3 | 5 | 33–43 | 2–3 | 2 | 89–93 | ||||
3–4 | 6 | 43–50 | 3–4 | 6 | 93–94 | ||||
4–5 | 3 | 50–56 | 4–5 | 0 | 94–95 | ||||
5–6 | 4 | 56–60 | 5–6 | 0 | 95–96 | ||||
6–7 | 5 | 60–64 | 6–7 | 0 | 96–96.5 | ||||
7–8 | 7 | 64–67 | 7–8 | 0 | 96.5–97 | ||||
>8 | 30 | >67 | >8 | 0 | >97 |
Prediction of poor response (pre-test probability = 20%) . | . | . | Prediction of non-pregnancy (pre-test probability = 80%) . | . | . | ||||
---|---|---|---|---|---|---|---|---|---|
LR range . | Occurrence of test results in this range (%) . | Post-test probability of poor response (%) . | LR range . | Occurrence of test results in this range (%) . | Post-test probability of non-pregnancy (%) . | ||||
0–1 | 31 | <20 | 0–1 | 70 | <80 | ||||
1–2 | 8 | 20–33 | 1–2 | 22 | 80–89 | ||||
2–3 | 5 | 33–43 | 2–3 | 2 | 89–93 | ||||
3–4 | 6 | 43–50 | 3–4 | 6 | 93–94 | ||||
4–5 | 3 | 50–56 | 4–5 | 0 | 94–95 | ||||
5–6 | 4 | 56–60 | 5–6 | 0 | 95–96 | ||||
6–7 | 5 | 60–64 | 6–7 | 0 | 96–96.5 | ||||
7–8 | 7 | 64–67 | 7–8 | 0 | 96.5–97 | ||||
>8 | 30 | >67 | >8 | 0 | >97 |
For prediction of non-pregnancy, extreme threshold levels are necessary to obtain a modest positive LR of 4–5, leading to a post-test pregnancy rate of approximately 5%. Such abnormal test results occur only in a very limited number of patients, while the false positive rate will lead to unnecessary exclusions from IVF programs if the test is used in a diagnostic fashion.
It can be concluded that with the use of the GAST in regularly cycling women, the accuracy in the prediction of poor response is quite high and could match with those obtained by the use of the AFC. For non-pregnancy prediction the test may only be adequate at a very low threshold level, where hardly any abnormal tests can be found. The results show that the GAST is a candidate for more extensive confirmation research.
Multivariate models
Systematic review
Through the search and selection strategy, a total of nine studies reporting on the predictive capacity of several multivariate models were identified and considered suitable for data extraction and meta-analysis (Balasch et al., 1996; Ranieri et al., 1998; Creus et al., 2000; Fabregues et al., 2000; Bancsi et al., 2002a; van Rooij et al., 2002; Durmusoglu et al., 2004; Erdem et al., 2004; Muttukrishna et al., 2004). Characteristics of the included studies are listed in addendum Table XXXVI. As with most studies on ORTs, definitions for poor response varied considerably. It should be noted that none of the multifactor studies revealed usable data on pregnancy prediction. Moreover, the total number of cases included in these aggregated studies is modest (n=991).
Author . | Consecutive . | One cycle per couple . | Data per . | Definition . | . | |
---|---|---|---|---|---|---|
. | . | . | . | Poor response/Cancel . | Pregnancy . | |
Balasch et al. | Yes | Yes | Cycle | <2 follicles 17 mm. or <5 follicles 14 mm | Not applicable | |
Fabregues et al. | Yes | Yes | Cycle | <3 follicles 14 mm | Not applicable | |
Ranieri et al. | No | Yes | Cycle | <5 follicles 15 mm | Not applicable | |
Creus et al. | Yes | Yes | Cycle | <3 follicles 14 mm | Not applicable | |
Bancsi et al. | Yes | Yes | Cycle | <4 oocytes or <3 follicles 18 mm | Not applicable | |
Van Rooij et al. | Yes | Yes | Cycle | <4 oocytes or <3 follicles | Not applicable | |
Muttukrishna et al. | No | Yes | Cycle | <4 follicles 15 mm | Not applicable | |
Erdem et al. | Yes | Yes | Cycle | Cancel <4 follicles 15mm or poor response <5 oocytes | Not applicable | |
Durmusoglu et al. | No | No | Cycle | Poor follicles growth or <3 oocytes (MII) | Not applicable |
Author . | Consecutive . | One cycle per couple . | Data per . | Definition . | . | |
---|---|---|---|---|---|---|
. | . | . | . | Poor response/Cancel . | Pregnancy . | |
Balasch et al. | Yes | Yes | Cycle | <2 follicles 17 mm. or <5 follicles 14 mm | Not applicable | |
Fabregues et al. | Yes | Yes | Cycle | <3 follicles 14 mm | Not applicable | |
Ranieri et al. | No | Yes | Cycle | <5 follicles 15 mm | Not applicable | |
Creus et al. | Yes | Yes | Cycle | <3 follicles 14 mm | Not applicable | |
Bancsi et al. | Yes | Yes | Cycle | <4 oocytes or <3 follicles 18 mm | Not applicable | |
Van Rooij et al. | Yes | Yes | Cycle | <4 oocytes or <3 follicles | Not applicable | |
Muttukrishna et al. | No | Yes | Cycle | <4 follicles 15 mm | Not applicable | |
Erdem et al. | Yes | Yes | Cycle | Cancel <4 follicles 15mm or poor response <5 oocytes | Not applicable | |
Durmusoglu et al. | No | No | Cycle | Poor follicles growth or <3 oocytes (MII) | Not applicable |
Author . | Consecutive . | One cycle per couple . | Data per . | Definition . | . | |
---|---|---|---|---|---|---|
. | . | . | . | Poor response/Cancel . | Pregnancy . | |
Balasch et al. | Yes | Yes | Cycle | <2 follicles 17 mm. or <5 follicles 14 mm | Not applicable | |
Fabregues et al. | Yes | Yes | Cycle | <3 follicles 14 mm | Not applicable | |
Ranieri et al. | No | Yes | Cycle | <5 follicles 15 mm | Not applicable | |
Creus et al. | Yes | Yes | Cycle | <3 follicles 14 mm | Not applicable | |
Bancsi et al. | Yes | Yes | Cycle | <4 oocytes or <3 follicles 18 mm | Not applicable | |
Van Rooij et al. | Yes | Yes | Cycle | <4 oocytes or <3 follicles | Not applicable | |
Muttukrishna et al. | No | Yes | Cycle | <4 follicles 15 mm | Not applicable | |
Erdem et al. | Yes | Yes | Cycle | Cancel <4 follicles 15mm or poor response <5 oocytes | Not applicable | |
Durmusoglu et al. | No | No | Cycle | Poor follicles growth or <3 oocytes (MII) | Not applicable |
Author . | Consecutive . | One cycle per couple . | Data per . | Definition . | . | |
---|---|---|---|---|---|---|
. | . | . | . | Poor response/Cancel . | Pregnancy . | |
Balasch et al. | Yes | Yes | Cycle | <2 follicles 17 mm. or <5 follicles 14 mm | Not applicable | |
Fabregues et al. | Yes | Yes | Cycle | <3 follicles 14 mm | Not applicable | |
Ranieri et al. | No | Yes | Cycle | <5 follicles 15 mm | Not applicable | |
Creus et al. | Yes | Yes | Cycle | <3 follicles 14 mm | Not applicable | |
Bancsi et al. | Yes | Yes | Cycle | <4 oocytes or <3 follicles 18 mm | Not applicable | |
Van Rooij et al. | Yes | Yes | Cycle | <4 oocytes or <3 follicles | Not applicable | |
Muttukrishna et al. | No | Yes | Cycle | <4 follicles 15 mm | Not applicable | |
Erdem et al. | Yes | Yes | Cycle | Cancel <4 follicles 15mm or poor response <5 oocytes | Not applicable | |
Durmusoglu et al. | No | No | Cycle | Poor follicles growth or <3 oocytes (MII) | Not applicable |
Accuracy of poor response prediction
All ten studies only reported on the prediction of poor response. The sensitivities and specificities, the positive LR and the DOR for the prediction of poor ovarian response are summarized in Table XXXVII. Calculation of one summary point estimate for sensitivity and specificity did not appear to be possible, as both test characteristics (shown in Figure 21) were heterogeneous among studies (χ2-test statistic: P-value for sensitivity <0.001 and P-value for specificity 0.014). As the Spearman correlation coefficient for sensitivity and specificity was −0.45, it appeared unjustified to estimate a summary ROC curve. Regression analysis showed that the performance of one particular test model was not superior to the other, as can also be seen in addendum Table XXXVI from the listing of sensitivities and specificities.
Author . | Cycles (n) . | Test Model . | Prediction of poor response . | . | . | . | Pre-test probability (%) . | Post-test probability (%) . | Proportion of patients/cycles with abnormal test (%) . | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Sensitivity . | Specificity . | LR+ . | DOR . | . | . | . | |||
Balasch et al. | 120 | Age + FSH | 0.53 | 0.81 | 2.8 | 4.8 | 33 | 58 | 30 | |||
Age + inhibin B | 0.59 | 0.67 | 1.8 | 2.8 | 33 | 48 | 42 | |||||
Inhibin B + FSH | 0.57 | 0.69 | 1.8 | 3.0 | 33 | 48 | 40 | |||||
Age + FSH + inhibin B | 0.39 | 0.89 | 3.5 | 5.2 | 33 | 64 | 21 | |||||
Fabregues et al. | 80 | FSH + Inhibin B | 0.42 | 0.86 | 3.0 | 4.4 | 35 | 63 | 24 | |||
Ranieri et al. | 177 | FSH + GAST | 0.97 | 0.55 | 2.2 | 39.5 | 33 | 45 | 59 | |||
Creus et al. | 120 | Age + FSH | 0.83 | 0.77 | 3.6 | 16.3 | 33 | 65 | 43 | |||
Age + inhibin B | 0.74 | 0.50 | 1.5 | 2.8 | 33 | 43 | 58 | |||||
FSH + inhibin B | 0.77 | 0.73 | 2.9 | 9.1 | 33 | 58 | 44 | |||||
Age + FSH + inhibin B | 0.83 | 0.77 | 3.6 | 16.3 | 33 | 65 | 43 | |||||
Bancsi et al. | 120 | FSH + inhibin B | 0.58 | 0.94 | 9.7 | 21.6 | 30 | 81 | 22 | |||
AFC + inhibin B | 0.69 | 0.88 | 5.6 | 16.3 | 30 | 71 | 29 | |||||
AFC + FSH | 0.72 | 0.93 | 10.3 | 34.2 | 30 | 81 | 27 | |||||
AFC + inihbin B + FSH | 0.75 | 0.95 | 15 | 57.0 | 30 | 87 | 26 | |||||
Van Rooij et al. | 119 | AMH + inhibin B + FSH | 0.69 | 0.91 | 7.7 | 22.5 | 29 | 75 | 27 | |||
Muttukrishna et al. | 69 | FSH + inhibin B + AMH | 0.63 | 0.83 | 3.7 | 8.3 | 25 | 65 | 29 | |||
Erdem et al. | 32 | CCCT + age | 0.81 | 0.69 | 2.6 | 9.5 | 50 | 72 | 56 | |||
CCCT + age + OVVOL + AFC | 0.81 | 0.75 | 3.2 | 12.8 | 50 | 76 | 53 | |||||
Durmusoglu et al. | 91 | Age + AFC | 0.52 | 0.88 | 4.3 | 7.9 | 26 | 62 | 23 |
Author . | Cycles (n) . | Test Model . | Prediction of poor response . | . | . | . | Pre-test probability (%) . | Post-test probability (%) . | Proportion of patients/cycles with abnormal test (%) . | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Sensitivity . | Specificity . | LR+ . | DOR . | . | . | . | |||
Balasch et al. | 120 | Age + FSH | 0.53 | 0.81 | 2.8 | 4.8 | 33 | 58 | 30 | |||
Age + inhibin B | 0.59 | 0.67 | 1.8 | 2.8 | 33 | 48 | 42 | |||||
Inhibin B + FSH | 0.57 | 0.69 | 1.8 | 3.0 | 33 | 48 | 40 | |||||
Age + FSH + inhibin B | 0.39 | 0.89 | 3.5 | 5.2 | 33 | 64 | 21 | |||||
Fabregues et al. | 80 | FSH + Inhibin B | 0.42 | 0.86 | 3.0 | 4.4 | 35 | 63 | 24 | |||
Ranieri et al. | 177 | FSH + GAST | 0.97 | 0.55 | 2.2 | 39.5 | 33 | 45 | 59 | |||
Creus et al. | 120 | Age + FSH | 0.83 | 0.77 | 3.6 | 16.3 | 33 | 65 | 43 | |||
Age + inhibin B | 0.74 | 0.50 | 1.5 | 2.8 | 33 | 43 | 58 | |||||
FSH + inhibin B | 0.77 | 0.73 | 2.9 | 9.1 | 33 | 58 | 44 | |||||
Age + FSH + inhibin B | 0.83 | 0.77 | 3.6 | 16.3 | 33 | 65 | 43 | |||||
Bancsi et al. | 120 | FSH + inhibin B | 0.58 | 0.94 | 9.7 | 21.6 | 30 | 81 | 22 | |||
AFC + inhibin B | 0.69 | 0.88 | 5.6 | 16.3 | 30 | 71 | 29 | |||||
AFC + FSH | 0.72 | 0.93 | 10.3 | 34.2 | 30 | 81 | 27 | |||||
AFC + inihbin B + FSH | 0.75 | 0.95 | 15 | 57.0 | 30 | 87 | 26 | |||||
Van Rooij et al. | 119 | AMH + inhibin B + FSH | 0.69 | 0.91 | 7.7 | 22.5 | 29 | 75 | 27 | |||
Muttukrishna et al. | 69 | FSH + inhibin B + AMH | 0.63 | 0.83 | 3.7 | 8.3 | 25 | 65 | 29 | |||
Erdem et al. | 32 | CCCT + age | 0.81 | 0.69 | 2.6 | 9.5 | 50 | 72 | 56 | |||
CCCT + age + OVVOL + AFC | 0.81 | 0.75 | 3.2 | 12.8 | 50 | 76 | 53 | |||||
Durmusoglu et al. | 91 | Age + AFC | 0.52 | 0.88 | 4.3 | 7.9 | 26 | 62 | 23 |
AFC, antral follicle count; CCCT, clomiphene citrate challenge test; DOR, diagnostic odds ratio; GAST, gonadotrophin agonist stimulation test; LR+, likelihood ratio for a positive test result; OVVOL, ovarian volume.
If a study reported on multiple threshold values, data for all threshold values are shown.
Author . | Cycles (n) . | Test Model . | Prediction of poor response . | . | . | . | Pre-test probability (%) . | Post-test probability (%) . | Proportion of patients/cycles with abnormal test (%) . | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Sensitivity . | Specificity . | LR+ . | DOR . | . | . | . | |||
Balasch et al. | 120 | Age + FSH | 0.53 | 0.81 | 2.8 | 4.8 | 33 | 58 | 30 | |||
Age + inhibin B | 0.59 | 0.67 | 1.8 | 2.8 | 33 | 48 | 42 | |||||
Inhibin B + FSH | 0.57 | 0.69 | 1.8 | 3.0 | 33 | 48 | 40 | |||||
Age + FSH + inhibin B | 0.39 | 0.89 | 3.5 | 5.2 | 33 | 64 | 21 | |||||
Fabregues et al. | 80 | FSH + Inhibin B | 0.42 | 0.86 | 3.0 | 4.4 | 35 | 63 | 24 | |||
Ranieri et al. | 177 | FSH + GAST | 0.97 | 0.55 | 2.2 | 39.5 | 33 | 45 | 59 | |||
Creus et al. | 120 | Age + FSH | 0.83 | 0.77 | 3.6 | 16.3 | 33 | 65 | 43 | |||
Age + inhibin B | 0.74 | 0.50 | 1.5 | 2.8 | 33 | 43 | 58 | |||||
FSH + inhibin B | 0.77 | 0.73 | 2.9 | 9.1 | 33 | 58 | 44 | |||||
Age + FSH + inhibin B | 0.83 | 0.77 | 3.6 | 16.3 | 33 | 65 | 43 | |||||
Bancsi et al. | 120 | FSH + inhibin B | 0.58 | 0.94 | 9.7 | 21.6 | 30 | 81 | 22 | |||
AFC + inhibin B | 0.69 | 0.88 | 5.6 | 16.3 | 30 | 71 | 29 | |||||
AFC + FSH | 0.72 | 0.93 | 10.3 | 34.2 | 30 | 81 | 27 | |||||
AFC + inihbin B + FSH | 0.75 | 0.95 | 15 | 57.0 | 30 | 87 | 26 | |||||
Van Rooij et al. | 119 | AMH + inhibin B + FSH | 0.69 | 0.91 | 7.7 | 22.5 | 29 | 75 | 27 | |||
Muttukrishna et al. | 69 | FSH + inhibin B + AMH | 0.63 | 0.83 | 3.7 | 8.3 | 25 | 65 | 29 | |||
Erdem et al. | 32 | CCCT + age | 0.81 | 0.69 | 2.6 | 9.5 | 50 | 72 | 56 | |||
CCCT + age + OVVOL + AFC | 0.81 | 0.75 | 3.2 | 12.8 | 50 | 76 | 53 | |||||
Durmusoglu et al. | 91 | Age + AFC | 0.52 | 0.88 | 4.3 | 7.9 | 26 | 62 | 23 |
Author . | Cycles (n) . | Test Model . | Prediction of poor response . | . | . | . | Pre-test probability (%) . | Post-test probability (%) . | Proportion of patients/cycles with abnormal test (%) . | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | Sensitivity . | Specificity . | LR+ . | DOR . | . | . | . | |||
Balasch et al. | 120 | Age + FSH | 0.53 | 0.81 | 2.8 | 4.8 | 33 | 58 | 30 | |||
Age + inhibin B | 0.59 | 0.67 | 1.8 | 2.8 | 33 | 48 | 42 | |||||
Inhibin B + FSH | 0.57 | 0.69 | 1.8 | 3.0 | 33 | 48 | 40 | |||||
Age + FSH + inhibin B | 0.39 | 0.89 | 3.5 | 5.2 | 33 | 64 | 21 | |||||
Fabregues et al. | 80 | FSH + Inhibin B | 0.42 | 0.86 | 3.0 | 4.4 | 35 | 63 | 24 | |||
Ranieri et al. | 177 | FSH + GAST | 0.97 | 0.55 | 2.2 | 39.5 | 33 | 45 | 59 | |||
Creus et al. | 120 | Age + FSH | 0.83 | 0.77 | 3.6 | 16.3 | 33 | 65 | 43 | |||
Age + inhibin B | 0.74 | 0.50 | 1.5 | 2.8 | 33 | 43 | 58 | |||||
FSH + inhibin B | 0.77 | 0.73 | 2.9 | 9.1 | 33 | 58 | 44 | |||||
Age + FSH + inhibin B | 0.83 | 0.77 | 3.6 | 16.3 | 33 | 65 | 43 | |||||
Bancsi et al. | 120 | FSH + inhibin B | 0.58 | 0.94 | 9.7 | 21.6 | 30 | 81 | 22 | |||
AFC + inhibin B | 0.69 | 0.88 | 5.6 | 16.3 | 30 | 71 | 29 | |||||
AFC + FSH | 0.72 | 0.93 | 10.3 | 34.2 | 30 | 81 | 27 | |||||
AFC + inihbin B + FSH | 0.75 | 0.95 | 15 | 57.0 | 30 | 87 | 26 | |||||
Van Rooij et al. | 119 | AMH + inhibin B + FSH | 0.69 | 0.91 | 7.7 | 22.5 | 29 | 75 | 27 | |||
Muttukrishna et al. | 69 | FSH + inhibin B + AMH | 0.63 | 0.83 | 3.7 | 8.3 | 25 | 65 | 29 | |||
Erdem et al. | 32 | CCCT + age | 0.81 | 0.69 | 2.6 | 9.5 | 50 | 72 | 56 | |||
CCCT + age + OVVOL + AFC | 0.81 | 0.75 | 3.2 | 12.8 | 50 | 76 | 53 | |||||
Durmusoglu et al. | 91 | Age + AFC | 0.52 | 0.88 | 4.3 | 7.9 | 26 | 62 | 23 |
AFC, antral follicle count; CCCT, clomiphene citrate challenge test; DOR, diagnostic odds ratio; GAST, gonadotrophin agonist stimulation test; LR+, likelihood ratio for a positive test result; OVVOL, ovarian volume.
If a study reported on multiple threshold values, data for all threshold values are shown.
Clinical value
The impossibility of creating summary characteristics makes it difficult to assess the interrelation between positive LR, post-test probability and percentage of abnormal tests. Obviously, clinical value can only be discussed regarding prediction of poor response. It is considered that a challenge test used as a diagnostic tool to identify poor responders should have sensitivity and specificity at a certain desired level. If these levels are set at 75 and 85%, respectively, it can be concluded from Figure 21 that only one study will fulfil these criteria (Bancsi et al., 2002a). Especially, the false positive prediction may hamper the use of this test if a high level of detection is needed and patients are refused IVF on the basis of this test. From these data it seems that compared to other ORTs, multifactor models do not create a definite improvement in predictive capacity.
Implications for daily practice
With the postponement of childbearing, the age-related fertility decline has been shown to play an important role in the increase in infertility among couples who are trying to conceive. In IVF treatment, this age effect has been shown in much accumulated data. Because of the variation of female fertility within a certain age category, the need was felt for tests which better identified cases with a state of ovarian reserve that is clearly too low for their age. Because a benchmark for ovarian reserve status in the sense of quantity and quality is lacking, the occurrence of poor ovarian response to maximal stimulation and the occurrence of pregnancy in IVF are used as parameters to assess the accuracy of the test. The ideal ORT should identify a substantial percentage of IVF‐indicated cases which have a practically zero chance of becoming pregnant because of the adverse effects of diminished ovarian reserve in a series of treatment cycles. Those cases can be refrained from entering the programme, as the very high costs involved will have only minimal results. If not too expensive and not too demanding for the patient, such a test would be readily embraced by physicians, patients, health politicians and insurance companies.
From the systematic and meta-analytic reviews presented here, it can be concluded that the ORTs known to date have only very modest predictive properties and are therefore far from suitable for relevant clinical use. Although mostly cheap and not very demanding, their accuracy, especially in the prediction of the occurrence of pregnancy, is very limited. If a high threshold is used, to prevent couples from wrongly being refused IVF, a very small minority of IVF-indicated cases (∼3%) were identified as having unfavourable prospects in an IVF treatment cycle (pregnancy rate for that cycle = 5%). It should be noted that the use of pregnancy as outcome parameter for the assessment of ovarian reserve status may be insufficient if only one exposure cycle is taken into account. As such, the possibility of misjudgement on the basis of currently known ORTs is hard to rule out. This implies that the use of the test as a method to deny treatment to assumed ovarian aged women should be declined and, as a consequence the test should not be applied on a regular basis and should only be used for counselling or screening purposes.
Accuracy of testing for the occurrence of poor ovarian response to stimulation appeared to be clearly better than for the occurrence of pregnancy. This may be understood in the light of the following factors: (i) that the chance of pregnancy after IVF depends on many more factors than ovarian reserve alone, (ii) that the occurrence of pregnancy after an ORT was usually evaluated in only one IVF cycle and as such may not accurately represent a female’s true reproductive capacity and (iii) that the response to stimulation is likely to represent the size of the cohort of FSH-sensitive follicles continuously present in the ovaries and is directly related to the magnitude of ovarian reserve (i.e. the remaining primordial follicle pool (Gougeon, 1984). Poor ovarian response has been associated with a reduced chance of pregnancy in the actual treatment cycle as well as in subsequent cycles and as such may well be indicative of ovarian reserve status in both the quantitative and qualitative sense (Ulug et al., 2003; Klinkert et al., 2004; Klinkert et al., 2005a). Accurate prediction of poor response could therefore have clinical value if the pregnancy prospects are so unfavourable that a predicted poor responder would be denied treatment. Accuracy in response prediction, however, will only be high if the false positives are prevented by using extreme threshold levels, implicating that only minor percentages of abnormal tests will be found and many future poor responders will pass unrecognized. At the same time it is necessary to know whether the predicted poor responder indeed has very low prospects for success in subsequent cycles. As much of this is unknown at the present time, the use of any ORT for poor response prediction cannot be supported, not even if it would be used for adapting the treatment schedule in anticipated poor responders, as an altered treatment schedule has consistently been shown to be effective in women with a severely reduced size of follicle cohort (Tarlatzis et al., 2003; Klinkert, 2005; Klinkert et al., 2005a).
One aspect of clinical value deserves some special attention. ORTs are mostly used as a diagnostic test, indicating that in case of an abnormal test result, the diagnosis that there is diminished ovarian reserve is made (Scott and Hofmann, 1995; Levi et al., 2001). From the fact that for evaluation of the test, proxy variables of true ovarian reserve (poor ovarian response and non-pregnancy) are used and that false positive test results may eliminate couples from the IVF trail even if they do have adequate prospects, it becomes clear that ORTs may better be considered as screening tests. All this implies that an abnormal test necessitates confirmation by another test. This other test may, for instance, be a first IVF attempt where ovarian response is the additional test. Alternatively, combinations of independent predictive tests or repeating of the initial test could improve the diagnostic performance of the single test (Ng et al., 2000; Bancsi et al., 2002b; van Rooij et al., 2002; Popovic-Todorovic et al., 2003a,b; Bancsi et al., 2004a,b).
As poor ovarian response will provide some information on ovarian reserve status, especially if the stimulation is maximal, entering the first cycle of IVF without any prior testing seems to be the preferable strategy. Once a poor response is obtained, the question arises whether this finding is based on depleted ovaries or other causes, like underdosing for instance, based on the presence of certain FSH receptor polymorphisms (Perez et al., 2000; Behre et al., 2005; Greb et al., 2005; de Koning et al., 2005). A repeat cycle with adequate, maximal stimulation or a post hoc-performed ORT [basal FSH or AFC (Hendriks et al., 2005c)] may correctly classify the poor responder patient as having an aged ovary and may correctly suggest that they refrain from further treatment (Klinkert et al., 2004).
It should be remembered that the purpose of any ORT is the identification of women with poor ovarian reserve for their age. This implies that chronological age always is the first step in ovarian reserve assessment. In young women, ORTs may help to classify poor responders and in direct management in these cases by estimating the size of the FSH-sensitive cohort. In older women, ORTs may help to identify those cases that, in spite of their age, still may have acceptable chances of becoming pregnant through IVF as the quantity of response to stimulation is anticipated to be normal or even high (Klinkert et al., 2005b).
Future perspectives in this research field may be found in studies where success rates in cumulative treatment cycles or in units of time (1-year treatment periods) are analysed to answer the question of whether any test will correctly identify those couples who will not become pregnant in such series of exposures. Novel tests that most accurately estimate the age at which menopause is expected to take place in an individual woman may facilitate the estimation of the remaining reproductive potential at a certain age. Such tests will probably be based on family history (age at menopause of mother) or will comprise testing for genetic markers, which may be discovered from large-scale population genetic screening.
Addendum
References
National Collaborating Center for Women’s and Children’s Health. (
Anonymous (
Author notes
1Department of Reproductive Medicine, University Medical Centre Utrecht, Utrecht, 2Division of Reproductive Endocrinology and Fertility and the IVF Centre, Department of Obstetrics and Gynaecology, Vrije Universiteit Medical Centre and 3Centre for Reproductive Medicine, Department of Obstetrics and Gynecology, Academic Medical Centre, Amsterdam, The Netherlands