The equation P(R|C)x + P(G|C)y = 1 may change the way rare diseases get diagnosed. That equation is part of find zebra’s information retrieval program that ranks medical information relevant to rare diseases. As many in the rare disease community know, one of the biggest hurdles facing a patient with a rare disease is getting a diagnosis. And one of the reasons it is so hard to get a diagnosis is that the doctors will suspect the patient has a more common ailment. In a similar manner, if a doctor hears hoofbeats he will assume it is a horse. Not a zebra. So, in order to more quickly diagnose a rare disease, one must remove the horses (common ailments) and only leave the zebras (rare diseases) in the search results. And that is the basis for the program developed by a group at the Technical University of Denmark. The search engine they have developed ignores the common ailments and only includes the rare ones.
The search engine has a library of 31,000+ medical articles on rare and genetic diseases. The articles are from numerous sources that are focused on rare diseases such as Orphanet, NORD, GARD, Madisons Foundation, Swedish Information Center for Rare Diseases, and many more. These documents are also ranked according to their relevance. So in the equation mentioned earlier, x = φy (φ is the boosting factor), and P(R|C) (resp. P(G|C)) denotes the probability of all rare disease (resp. genetic disease) documents in the collection C.
The above equation - and many more equations - are at the core of find zebra’s search engine. And at the heart of find zebra are a group from the Technical University of Demark who wanted to create something that would help transform patient’s lives (and probably get them a passing grade - let’s hope they got an A).
And let’s hope people in the rare disease community start to recognize that programs like this one are the programs that may have the greatest impact on patients. That is, programs that lead to a quicker diagnosis so patients can get the treatment they need.
How does it work? And does it work?
The search engine employed by find zebra has two datasets. One contains over 10,000 documents about rare diseases (referred to as RARE), and one containing over 31,000 documents about rare and genetic disease articles (referred to as RARE&GENET).
The layout of their website is quite sparse. There is a search bar and below that, some brief information about find zebra. Very google-esque. The site is designed for a clinician to type in a list of symptoms and nothing else. Which is probably for the best.
We did a test run using symptoms for Klien-Levine syndrome (Jewish, male, 16 years old, monthly seizures, sleep deficiency, aggressive and irritable when woken, highly increased sexual appetite and hunger) and Gaucher disease (osteopenia, hepatomegaly, anemia, fatigue, thrombocytopenia, nosebleed). Unfortunately, no results matched our queries. We also tried a search that was used successfully in another article by the MIT Technology Review (searched “Boy, normal birth, deformity of both big toes [missing joint], quick development of bone tumor near spine and osteogenesis at biopsy”). Again, that did not bring up the intended result for us (Fibrodysplasia ossificans progressive). However, we attribute these search problems to us not fully understanding the search terms needed for this engine, the high traffic this site is getting this week following a report by MIT Technology Review, and the fact that the project is still new and has to work out some of its bugs.
However, we believe that this project has the potential to change the efficiency in which clinicians diagnose patient with rare disease. It still has a long way to go to get there but we are cautiously optimistic this team from the Technical University of Denmark can pull it off.