My Attempt To Make Clinical Trials More Efficient

cr net screenshot

For a few months, on nights and weekends while working at my most recent job, I worked on a project to help make clinical trials more efficient, and even built a prototype (the screenshot above, you can play around with it here)–I gave it the memorable and exciting name “Clinical Research Network”.

Though my project didn’t “succeed” in the traditional sense, I learned a lot about this interesting area of health/biotech, and got to practice several important product development skills. The following are the important parts of my story, but warning, it’s still a long post.

Clinical trials have a hard time recruiting enough patients, which causes a lot of waste.

I received an email from HeroX one day about a competition to see who could come up with the best idea to help clinical trials recruit more patients. Intrigued, I did more research on the problem, and decided to enter the competition: worst case I would spend a little time writing a proposal that didn’t win, but still get to learn more about this fascinating problem.

As discussed in a previous post, roughly 10% of clinical trials terminate unsuccessfully because they’re unable to recruit enough patients for the study. There are roughly a thousand new clinical trials every year, and since a clinical trial costs on average $30M-$40M, a lot of money is spent on clinical trials that don’t end up contributing much to the advancement of science and medicine.*

The HeroX competition’s more quantifiable goal was to come up with ideas that could double the patient recruitment rate from 3% to 6%, patient recruitment rate being defined as number of patients who participate in clinical trials / total number of patients out there. The more patients participate in clinical trials, the faster medical research accelerates.

*The numbers used to “size up” the problem are very rough, and taken from various sources. My model also did not account for the fact that a lot of clinical trials that do complete successfully still have trouble recruiting patients fast enough, so go way over-schedule and over-budget. But the order of magnitude should be close. See the model for more details.

Questioning assumptions, asking why

The problem was framed so that solutions tackling recruitment first came to mind e.g. increasing patient awareness of clinical trials through tools, advertising, etc., connecting patients to clinical trials automatically by leveraging EMR data.

But I wanted to understand the problem at a deeper level, vs. taking things at face value. I put together a simple model in Google Sheets and let the numbers shed some light on the problem. Interestingly, even if all clinical trials were able to recruit enough patients with a wave of a magical wand, the patient recruitment rate would only increase by 4%, much less than the competition’s desired 100% increase, or doubling, of the patient recruitment rate. This suggests that if we really want to accelerate medical research and get more of the patient population to participate in clinical trials, we’re not only going to need to recruit patients better, but we’ll also need a lot more clinical trials, clinical trials that happen faster and more efficiently.

Screenshot of Patient Recruitment Model
Screenshot of Patient Recruitment Model

I wrote a proposal for the competition, submitted it, and…

What idea did I submit?

An idea for a SaaS product that would mine/learn from all the data we have on previous clinical trials (a lot of it public), and help pharmaceutical companies and investigators learn from the past. This product would essentially be a search engine on top of a “similarity graph”, where pharma and/or doctors/investigators could describe their clinical trial, and see other trials that were similar in some way (perhaps disease treated, or similar inclusion/exclusion criteria), and learn from what made those clinical trials succeed or fail.

Why did I submit that?

  1. There’s a lot of data out there on clinical trials, even publicly available data like There has to be some sort of knowledge we can learn from all the clinical trials we’ve already conducted, from both the successes and failures.
  2. Clinical trials face many different obstacles to recruiting patients, mostly because they themselves are very different–different populations, different diseases, different treatments, different investigators running the trial, different locations. But this doesn’t mean that trials aren’t similar to other trials in some way, so something that worked for one trial could also work for another, depending on how they’re similar.
  3. As mentioned before, I realized that the actual clinical trial process needs to be faster, more efficient, and cheaper to drive a meaningful acceleration of medical research. This was a tool that pharma and investigators/doctors could use to both plan and run a clinical trial more efficiently.

My idea didn’t win any of the prizes for the competition, but that’s ok.

If interested, you can see the winning entries (as well as the “top 10”, not sure where all the other entries went).

Getting out of the office

I asked for feedback on how my entry was judged, but didn’t get anything back. Still following my curiosity for the problem, I decided to talk to more people actually involved in clinical trials–I had originally found out about the competition two weeks before the deadline, so given some more time I felt I could come up with something more useful.

I developed a script to scrape for investigator contact info, and was able to gather a good list of physicians in the NYC area. I also used Mechanical Turk to fill in what I wasn’t able to scrape, such as a doctor’s research institution. After writing a bunch of emails to request to meet, one doctor actually got back to me! After that it was a bit easier, as I would ask the doctors if they knew anyone else I could talk to, and also name-drop the institutions I had visited already. I got to speak to a couple ex-pharma individuals from this effort too.

The two biggest things I learned from speaking to the handful of physicians and ex-pharma folk:

  1. Physicians don’t really talk to and learn from each other when it comes to clinical trials, e.g. about patient recruitment best practices. They’re extremely busy, and there isn’t really an incentive to help another physician who may be seen as a “competitor” (both in terms of revenue and research).
  2. Though investigators (physicians) recruit patients for a clinical trial, pharma and “contract research organizations” (CROs) recruit the investigators to run a clinical trial (among a ton of other stuff to set up and support the trial). It seemed that industry’s methods for investigator selection were pretty manual: they would rely on their own personal, immediate networks, maybe look at which investigators they worked with in the past.

Building something fast

I decided to build an MVP that was based on my learnings. There’s a lot that can be improved in the clinical trials process, so I thought about leverage, and a decision tree: decisions made earlier in a process can have a big impact on the decisions made later. This early task of “investigator selection” that pharma does when setting up a clinical trial (point 2) sounded like a good one to try and tackle with technology. It also isn’t something that investigators themselves are super concerned with, which would get around the obstacles discovered in point 1. There’s a lot of public data out there on clinical trials ( and research that came out of the trials (PubMed), so I wanted my tool to leverage this data.

I threw together something really quickly using Flask, the python framework. Use cases: pharma could type in a drug and find the researchers who published the most research on that drug–those physicians might be good candidates as investigators for a clinical trial that used that drug (to perhaps treat a different disease). Patients could type in the disease they had and find the physicians who were perhaps the most knowledgable on that disease. On the backend, data was scraped from PubMed, and essentially just restructured to be more useful for this particular case.

I started showing the “Clinical Research Network” to people in the biotech space to see what they thought…

The end?

…and I quickly found out that several companies, both small and large, were tackling this exact problem. They had way better credentials, more money, and free snacks at the office–how can I compete with free snacks?

So I put this project on hold, mulled over the possibility of working for them, and decided to move onto other ideas I was thinking about. I like writing post-mortems for my projects, and one of the biggest learnings was that I seemed to have “overextended” myself in a sense: I felt like my struggle was a very steep uphill climb from the beginning because I didn’t have the industry credentials and I didn’t yet have the industry network, very important aspects in an industry like biotech and healthcare.

Overall, the project was a great learning experience, and I got to practice several problem solving skills I find powerful and fun.

Pharma Paid Physicians $6.5B in 2014 – Looking Into The Open Payments Dataset

My friend Jesse introduced me the Open Payments Dataset, which tracks the details of all payments made by “applicable” healthcare manufacturers (like pharmaceutical companies, medical device manufacturers) to any doctor they work with. A federal program maintains this database, which is a product of the Sunshine Act, part of the Affordable Care Act.

Why does this database exist? Basically because of the incentives created by industry being able to pay doctors to work on things that will ultimately help industry–like new drugs or medical devices. The hope is that more transparency will reduce any harmful influence that industry could have on medical research, education, and clinical decision making. In the words of Senator Grassley, co-author of the Sunshine Act:

Disclosure brings about accountability, and accountability will strengthen the credibility of medical research, the marketing of ideas and, ultimately, the practice of medicine. The lack of transparency regarding payments made by the pharmaceutical and medical device community to physicians has created a culture that this law should begin to change substantially. The reform represented in the Grassley-Kohl Sunshine Law is in patients’ best interest.

The healthcare industry pays physicians a lot, almost $6.5B in 2014 alone. What is being paid for though (or, what does industry report the payments are for)? Who’s getting paid, and how much? I decided to do a quick analysis to start answering these questions and to see if there was anything interesting at a high level.

Most top paid physicians get paid royalties or license fees

The most a single physician got paid in 2014 was almost $44M. The interesting thing is that for this physician and several other top paid physicians, almost the entire total came from payments that were categorized is this unhelpfully-named category, “Compensation for services other than consulting, including serving as faculty or as a speaker at a venue other than a continuing education program” (orange).

A large majority of the other of the top paid physicians got paid primarily from “Royalty or License” (green), which makes sense: a surgeon may invent a new surgical technique and license it to a medical device company.

Another interesting phenomenon is that a handful of doctors in the top 100 earners were paid by industry solely for their research (purple). The status quo of industry having all the money and thus paying/funding research–sometimes both the design of and execution of the research–can create incentives with negative consequences for the validity of the results.

You can play around with the charts like the one below by zooming, mousing over data points to see their values, and showing/hiding different data series by clicking on each one in the legend. Physician names have been replaced with numbers for anonymity.

Chart embedded below, or link

Orthopedic surgeons received the most industry payments, followed cardiovascular physicians

Orthopedic surgeons received the most money from industry, almost twice the amount that cardiovascular physicians received, in 2014. Interestingly, most of payments to orthopedic surgeons, and other types of surgeons, were for royalties or licenses (green), whereas most payments for physicians–cardiovascular and otherwise–were for “Compensation for services other than consulting” (orange), “Research” (purple), and “Consulting” (purple).

Click to show interactive chart (some labels are crazy long so embedding didn’t look good. “A&O” stands for “Allopathic & Osteopathic Physicians”):
Payment Received by Physician Specialty in 2014 (Top 50)

The healthcare industry pays a lot of money for research

Out of the $6.5B total payments to physicians in 2014, $3.2B, or almost half, of those payments were for research. We can see this when aggregating the payments by the name of the drug or device manufacturer: companies like Genentech, Pfizer, and Novartis dominate the dollar amount of payments made to physicians, and most of their payments are for “Research” (brown). Further down the line, you can see medical device manufacturers like Stryker and Medtronic paying physicians mostly for “Royalty and License” (green).

Click to show interactive chart:

Payment Sources in 2014 (Top 50)

Physicians in CA received, by far, the most amount of money from industry.

The graph below shows how much money physicians received for research and “general” payments (any payment that isn’t classified as “Research”), grouped by the state they work in; the size of each bubble represents the number of physicians in that state.

CA had significantly more physicians receive payments (8081) than the runner-up state, NY (5981), and thus the physicians that worked in CA received a lot more money from industry, in aggregate.

Payments Received by State
Though drilling into state by state differences in the data (e.g. the dominant “purpose” CA physicians vs. physicians in other states get paid for) is an exercise for another time, we get a hint for why this phenomenon might exist by looking at the teaching hospitals that were affiliated with the physicians who got paid by industry the most.

Click to show interactive chart:

Payment Sources in 2014 (Top 50)

Physicians affiliated with the City of Hope National Medical Center in Los Angeles received the most industry payments, by far, and almost all if it from royalties or license fees (green). Genentech has been known to pay massive royalties for the drugs developed at City of Hope, including the crazy expensive cancer treatments Herceptin and Avastin.

Do physicians get rewarded with fancy dinners and extravagant trips?

By looking at the data, we can find which physicians got paid the most for “Entertainment”, “Food and Beverage”, and “Travel and Lodging”. But we won’t know for sure, because remember, all this payment data is reported by the healthcare industry themselves, and while there are some financial penalties for inaccurate reports, I don’t see an easy way for the government to verify the validity of the data.

The “worst offenders” were essentially given, by industry, $60 meals three meals a day for every day of the year, went on $590 per day trips, and spent $43 a day (about $300 a week) for entertainment and fun. Sounds like the life (except a little more on the entertainment and fun please).


There’s a lot of money being transferred from the healthcare industry to physicians, which means a ton of data since all of this has to be reported now. In fact, I didn’t even touch another part of the dataset, how much ownership each physician has in a particular drug or device manufacturer, which could give even more color on misaligned incentives. Also, without aggregation of some of the data fields, the raw, transaction/payment level data took up close to 6GB of space, and I didn’t want to spin up a Spark cluster or something. Luckily, the Open Payments site provides a web service that allowed me to aggregate and filter the raw data, dramatically reducing the dataset’s size.

With the Sunshine Act being first introduced in 2007, then shot down, then enacted as part of the ACA in 2010, and with the Centers for Medicare and Medicaid Services (CMS) now responsible for collecting this data on top of everything else it does, hopefully we find some useful applications for the Open Payments dataset.

This analysis and post were done pretty quickly, many thanks to Carol for giving me some immediate ideas and feedback! And to iPython Notebook, and the pandas and plotly libraries.