Making sense of endometriosis? Exploring one century of scientific research

“Enigmatic”, “elusive”, or even “mysterious” are ways that some people, including medical doctors, have described endometriosis. This chronic gynecological disease is associated with pelvic pain and an increased risk of infertility. It is characterized by the presence of cells that resemble those of the innermost layer of the uterus (the endometrium) outside the womb. You may think that these descriptions of the disease highlighting its incomprehensible nature date back to decades or even centuries ago. But in reality, understanding this illness remains a struggle even today. For instance, Professor Tasuku Harada (a leading researcher from Tottori University, Japan) wrote a book chapter literally entitled “Endometriosis: a mysterious disease” in 20141

There are good reasons that underlie these difficulties in grasping this illness, including diagnosis challenges, the diversity of symptoms, complex mechanisms at play… But this situation also stems from a fair dose of less noble reasons. It includes a limited interest in investigating this disease in the past, social taboos, starveling investments. Even worse, outright downplaying the patients’ struggle, blaming symptoms on purely psychological factors2See for instance for the psychologization of the disease and discrimination from doctors, other professionals and relatives are partly responsible as well.

However, there is evidence that things are now beginning to change in several countries. Social movements centered on endometriosis patients are gaining traction, with many associations involved in awareness-raising actions and support to scientific research3See for a presentation of recent activities of endometriosis-related associations. Mediatization has allowed people in the general public to get exposed to vivid testimonies of affected individuals. This increasing attention has also spread to research activities. For instance, a French foundation exclusively dedicated to research on endometriosis was created in 20214

It is critical to analyze where scientific research is heading, as it will be, among other elements, a key factor in the future management of the disease and quality of life of patients. Obviously, research is not synonymous with care. It may take time for the results of scientific research to feed into actual practices on the ground. The focus of research may also not always be adapted to operational constraints. Scientific research is far from perfect: it is not the truth, it is not miraculous, it is not carried out by solitary geniuses separated from the broader context, it creates power dynamics that can be detrimental to vulnerable actors… However, with the appropriate incentives and checks, it is one of the best ways to approximate the truth and – hopefully – contribute to improving the situation. It should also be noted that research dedicated to endometriosis is not restricted to sole medical aspects. Indeed, it also covers the many intertwined aspects linked to this illness, including social, economic, political and psychological considerations.

Taking a grasp of endometriosis research

Despite its importance, it is difficult to obtain a complete picture of the extent of endometriosis research that is both accessible and rigorous. In this series of articles, I thus propose to take a closer look at the scientific research dedicated to endometriosis that has been carried out over the last century. The ambition is mainly descriptive, but I will also provide relevant elements of reflection to steer public debates in the matter. More precisely, I will address a series of questions structured into three themes.

A series of articles are structured according to these themes, favoring reader-friendliness. They can be read independently depending on your interests. I also provide an overall concluding synthesis. The graphs and maps are interactive and can be clicked for further details.

What is the scope of the research effort dedicated to endometriosis, and how has it evolved over time?

  • Which are the main steps in the development of research linked to endometriosis?
  • How many scientific publications have been dedicated to endometriosis?
  • How does it compare to the evolution of scientific research in general?
  • Can we estimate the related funding attributed to research on this disease?

Which topics of interest are covered by the research on endometriosis? How has the focus evolved over time?

  • Which scientific disciplines analyze endometriosis?
  • Which are the main topics/subtopics of endometriosis research? This is particularly critical as the topics under investigation (and symmetrically those NOT under investigation) condition future breakthroughs.
  • Is there a trend towards research diversification over time, some areas that undergo consolidation or (relative) decline?
  • How can we summarize the content of the priorities and critical developments in the endometriosis literature (at a broad level)? 
  • In particular, I am interested in the extent to which pain and fertility are studied in the scientific literature? (There is a strong focus on infertility among some medical professionals treating endometriosis patients, potentially to the detriment of pain management. It would be interesting to see whether this bias is also present in research or not).
  • What is the activity linked to clinical trials (i.e., research explicitly aiming at developing treatments, diagnosis tools or other interventions)? Has it evolved over time? (e.g., in terms of scope, tested interventions, etc.)

Who is studying endometriosis (from a national, institutional and gender perspective)? How has it evolved over time? Does it influence the carried-out research?

  • Which countries are most active regarding endometriosis research? How has it changed over time? Are some countries punching above their weight? Are there specific specializations in terms of topics?
  • Which specific institutions are involved in endometriosis research? Are there differences by types of institutions in terms of topics/activity (e.g., private companies vs universities…)?
  • What is the gender profile of endometriosis researchers? How has it evolved over time? How is it linked to the broader context?
  • Does gender influence the studied topics in endometriosis R&D?

This series of articles is explicitly focused on better understanding the situation of R&D dedicated to endometriosis. It does not constitute medical advice of any sort. If you or one of your relatives has symptoms or issues that could correspond to endometriosis, please reach out to a medical professional or a patient association.

Methodological summary

To tackle these questions, I developed a novel dataset of publications dedicated to endometriosis research from the early 1920s to 2020. It relies on an extraction from the Dimensions database and encapsulates various forms of scientific publications (e.g., individual articles, but also synthesis articles, case studies, presentations during conferences, book chapters…). It is not necessarily fully comprehensive but covers about 37,000 unique documents written by tens of thousands of authors worldwide. It should thus give a good overview of the situation at play. I complemented that with an extraction of the website (in October 2021), which lists the studies performed in the context of developing treatments or other interventions aiming at immediate improvements in clinical care.

If you do not want to get bored with technicalities, skip this section and directly move towards the analytical articles below. I will just briefly explain the methodology for the dataset construction by mentioning the adopted approach and used tools.

I first extracted raw data on endometriosis-related articles from the Dimensions database in October 2020. The papers were selected if they had “endometriosis” in their title or abstract (it may thus contain research where endometriosis is only a secondary theme, e.g., papers focused on cancer and aiming at distinguishing it from endometriosis). This search was conducted in English and in French, German, Portuguese, Spanish, Chinese, Japanese and Russian to collect the existing literature in the broadest possible way. The selected publications were checked for duplicates using unique identifiers provided by Dimensions. Pre-prints that led to subsequent publications were also removed.

The obtained database was then enriched by collecting additional data from external sources and running ad-hoc analyses.

Missing abstracts from the Dimensions database were completed by linking the DOI of publications to online resources (e.g., rcrossref, ORCID…). The different abstracts and publications’ titles were translated into English using Google Translate.

The abstracts were used as the source material to categorize the different publications by identifying “topics”. This approach is based on the assumption that the abstracts genuinely reflect the full content of the publications (which may be somewhat false). Identifying topics relies on a LDA algorithm, which was optimized to set the relevant number of topics.

The scientific disciplines of the different publications were derived from the original Dimensions database. However, they were controlled manually and simplified, in line with the abstracts, titles and journals of the publications.

The authors’ names are an essential element feeding the study (in particular, the first names). Unfortunately, the Dimensions database does not provide a unique identifier for the different authors. Missing names were first completed through requests to the ORCID website. Then, an ad-hoc algorithm was run to identify which authors could actually be the same despite non-identical names (e.g., first names not available but only initials, presence of middle names or not, transcription errors…). When the situation was not ambiguous, a full name could thus be attributed to uncomplete authors in many cases (e.g., initials could be turned into a first name). This process was performed through an algorithm but also manually. Then, each first name was searched into the Namespedia database, which provides the probability that a name is male or female. This probability was extracted for each name (when a full first name was available). A “male” or “female” identification was attributed in cases where the first name had a probability above 75% to be of a given gender. Sensitivity analyses with a threshold of 90% were carried out for the different analyses linked to gender and confirmed that the results were robust.

The affiliations (i.e., the list of organizations) linked to each author in the Dimensions database were extracted and analyzed through a specific algorithm. This algorithm matched the name of the affiliations with institutions from the GRID database. This database covers multiple institutions for science and medicine across the globe. Each author could be attributed one or more institutions for a given publication based on this process. In cases where the algorithm could not identify an institution, the data were interpreted manually. In instances where GRID codes were unavailable, an ad-hoc code was created manually. As these codes feature rich data on locations, types of institutions and countries, they can be used for further analyses (e.g., geographical…). Google Maps was used to geocode institutions that were missing from the GRID database. All this information was then linked to individual publications (i.e., not counting multiple times the same institution if different authors of the publication worked at the same place).

Once the dataset was enriched and consolidated, analyses and data visualizations were computed by using R and various packages, especially plotly for interactive plots.

A GitHub repository provides the main scripts used in the analyses for these articles, with my sincere apologies for the poor optimization and awkward coding. Datasets can be obtained upon request. Comments, suggestions and corrections of potential errors are more than welcome.  


