What I learned from my side project in education technology: Formata

Screenshots of what the student would see, taken from the deck I sent to teachers.

Last winter, I built an MVP for an ed-tech product, called Formata. Here’s what it was, why I did it, and what I learned from it.

Why Education

I had been (and still am) trying little side projects in different industries because I like learning about and understanding new things. At the time, I had done some stuff in productivity and fintech, and I knew I wanted to have an impact on education eventually in my life. It’s been so influential on me and and is a huge lever to get us closer to what I call “opportunity equality” worldwide, so I decided to do a small project in education this time.

Principles of Educational Impact

I did a little thought experiment: I imagined myself as a middle school kid again, and thought about what influenced me the most, in my education. “My teachers” was the answer. Students spend the majority of their week day in school, and it’s the teachers that interact with them, and understand each and every child. I saw it first hand on a farm on the other side of the world: way more than the facilities and the curriculum, it’s the teacher that inspires the student and really has an impact on him or her.

Next, I asked, “Ok, so if teachers have the most impact on a child’s education, what makes a good teacher? What does “good” even mean? And how do you measure it?” I did some research, and came across the Gates Foundation’s Measures of Effective Teaching project, a project backed by hundreds of millions of dollars and pursuing these exact questions. Awesome!

Some more research led me to the interesting and sometimes controversial world of teacher evaluation. Traditionally, teachers have been evaluated by two methods: student test scores (also known as “value added”), and observations by someone like the principal. The thought is basically that student test scores, as the outcome of a teacher’s teaching, should correlate with his or her teaching ability. Sometimes, administration has a rubric for what they think makes a teacher good, and so a few times a year, the principal might sit in on a class for 15 or so minutes to observe and evaluate the teacher.

There are some fundamental issues with both methods, which I’ll mention briefly. It’s hard to see the principal observing each teacher a few times a year, for 15 minutes, having any strong relationship with how good the teacher actually is. The Gates Foundation has done research that shows that teacher observations are less reliable than test scores; however, tests on which teachers are usually evaluated (usually state-wide standardized ones) only happen once every year, and if they know this is tied to their employment, there’s a strong incentive to “teach to the test”.

Who interacts with teachers the most? Who would be best at evaluating them? The students themselves. Again, the Gates Foundation did a bunch of research on what exactly students should evaluate teachers on, sort of quantifying the aspects of a good teacher. They narrowed the most important characteristics down to what they called the “7 C’s”: caring, control, captivate, clarify, confer, consolidate, and challenge. Structured in the right way (e.g. low-stakes and anonymized, so the students aren’t incentivized to fudge), student perception questionnaires that asked about these characteristics were pretty reliable in discerning high performing teachers from the rest.

Building A Product

I noticed that in the Gates Foundation’s research, the student perception surveys were being administered with pen, paper, envelopes, stickers, etc. I felt like the surveys could be administered much more efficiently with technology; the results could also be tabulated and organized much better for teachers and administrators to learn from.

To further validate my idea, I went to a bunch of ed-tech meet-ups, talking to teachers and asking them what they thought about my idea. They all agreed that having more feedback, more frequently, on their teaching would be helpful.

I thought this was a pretty quick MVP to build, I could even do some of the analysis of feedback for the teachers manually myself at first. All the teacher would have to do was give me the email addresses of his/her students, and I could auto-generate emails and questionnaires, send them off, and aggregate the results.

Visualizations of student feedback I could generate for teachers, so they could pinpoint where to work on
Visualizations of student feedback I could generate for teachers, so they could pinpoint where to work on

Moving On

After a month of reaching out to teachers, those who I already met or knew and also those who I didn’t, and sending them my slide deck about Formata and its benefits, I finally got a few who said they were willing to try it. They were extremely busy though (all teachers are overworked), and had to get permission from their department heads, who had to get permission from the principal, to use it. Their effort fizzled out, and I did a re-evaluation of my own time, and moved on.

What I Learned

I learned about a lot of different things, but overall, I think this project reinforced two principles for me:

  • Ask better questions when doing customer development, and solve a problem.
    • My idea never really solved an important problem for my target audience, teachers. I should’ve talked to more administrators, who may care more about teacher evaluation. Also, you’re bound to get positive but not very useful answers when you ask someone what they think about your idea: whether it solves a big enough problem for them to actually integrate your product into their life is a different story. Not solving an important enough problem for teachers coupled with lots of bureaucracy and the fact that they’re overworked was not a recipe for excited users.
  • Keep doing things, don’t worry about failure.
    • I got to learn about an important and fascinating area of education by doing this project. I also got to learn about the realities of the space. I learned more about the power of customer development: that through observation and/or asking better questions, you can get to true pain points that people will pay you to solve. I learned that some types of problems and tasks excite me more than others. This project was also a great way for me to practice first principles thinking.

Thanks for reading this journal of sorts.

Cancer clinical trials and the problem of low patient accrual

Inspired by this contest to come up with ideas to increase the low amount of patient accrual for cancer clinical trials, I decided to look more into the data. Bold, by the way, is one of my all time favorite books, and was co-authored by the creator of the herox.com website, the xprize Foundation, and co-founder of Planetary Resources: Peter Diamandis. Truly someone to look up to.

Anyways, the premise of the contest is that over 20% of cancer clinical trials don’t complete, so the time and effort spent is wasted. The most common reason for this termination is the clinical trial not being able to recruit enough patients. Just how common is the low accrual reason though? And are there obvious characteristics of clinical trials that can help us better predict which ones will complete successfully, and what does that suggest about building better clinical trial protocols? I saw this as an opportunity to explore an interesting topic, while playing around with the trove of data at clinicaltrials.gov and various data analysis python libraries: seaborn for graphing, scikit-learn for machine learning, and the trusty pandas for data wrangling.

Basic data characteristics

I pulled the trials for a handful of the cancers with the most clinical trials (completed, terminated, and in progress), got around 27,000 trials, and observed the following:

  • close to 60% of the studies are based in the US*
*where a clinical trial is “based” can mean where the principal investigator (the researcher who’s running the clinical trial) is based. clinicaltrials.gov doesn’t give the country in which the principal investigator’s institution is in, so as a proxy, I used the country which had the largest number of hospitals the study could recruit patients at.
  • almost 25% of all US based trials ever (finished and in progress) are still recruiting patients


  • of those trials that are finished and have results, close to 20% terminated early, and 80% completed successfully (which matches the numbers the contest cited)


  • almost 50% of all US based trials are in Phase II, almost 25% are in Phase I


  • and interestingly, the termination rate does not differ very significantly across studies in different phases


Termination reasons

Next, I was interested in finding out just how common insufficient patient accrual was as a trial termination reason vs. others reasons. This was a little tricky, as clinicaltrials.gov gives principal investigators a free-form text field to enter their termination reason. So “insufficient patient accrual” could be described as “Study closed by PI due to lower than expected accrual” or “The study was stopped due to lack of enrollment”. So I used k-means clustering (after term frequency-inverse document frequency feature extraction) of the termination reasons to find groups of reasons that meant similar things, and then manually de-duped the groups (e.g. combining the “lack of enrollment” and “low accrual” groups into the same group because they meant the same thing).

I found that about 52% of terminated clinical trials end because of insufficient patient accrual. This implies that about 10% of clinical trials that end (either successfully, or because they’re terminated early) do so because they can’t recruit enough patients for the study.


Predicting clinical trial termination?

Clinicaltrials.gov provides a bunch of information on each clinical trial–trial description, recruitment locations, eligibility criteria, phase, sponsor type (industry, institutional, other) to name a few–which begs the question: can this information be used to predict whether a trial will terminate early, specifically because of low patient? Are there visible aspects of a clinical trial that are related to a higher or lower probability that it fails to recruit enough patients? One might think that the complexity of trial eligibility criteria and the number of hospitals from which the trial can recruit from could be related to sufficient patient accrual.

Here was my attempt to get at a solution to this question analytically: fitting/training a logit regression multi class classifier–whether a trial would be “completed”, “terminated because of insufficient accrual”, or “terminated for other reasons”–on a random partition of clinical trial data, and measuring its accuracy at classifying out-of-sample clinical trials. The predictors were of two types: characteristic (e.g. phase, number of locations, sponsor type, etc.) and “textual”, or features extracted from text based data like the study’s description and eligibility criteria. Some of these features came from a similar tf-idf vectorization process as described in the k-means section above, other features were the simple character lengths of these text blocks. Below is a plot showing the relationship between two of these features: length of the eligibility criteria block of text, and length of the study’s title, two metrics that perhaps get at the complexity of a clinical trial.


The result: the logit model could only predict correctly whether trials would complete successfully, terminate because of low accrual, or terminate for other reasons 83.6% of the time. This is a pretty small improvement over saying “I think this trial will complete successfully” to every trial you come across, in which case you would be correct 80.6% of the time (see the Completed vs. Terminated pie chart above). Cancer clinical trials are very diverse, so it makes sense that there don’t seem to be any apparent one-size-fits-all solutions to improving patient accrual.


Free flashcards for the fantastic book on Customer Development, The Mom Test

I made this set of flashcards for the excellent book on Customer Development, The Mom Test, since the process and questions in the book are quite important and worth memorizing. Learn what the pain points are by talking to customers first, before building. Feedback on the flashcards welcome! Enjoy 🙂 http://quizlet.com/_r26z6

How our talented team won $2500 at the TechCrunch Disrupt NYC Hackathon


We had an absolutely amazing and talented team at the TechCrunch Disrupt NYC 2014 Hackathon! Shout outs to our awesome front end designers Amanda Gobaud and Michelle Lee, and our tireless devs, Amine Tourki, Andrew Furman, and Teddy Ku. Here are the lessons that I learned from building a web application that won the $2500 Concur Technologies API first place prize.

  • Our app, CorpSquare (Concur + Foursquare), solved a problem. Several of the team members (me included) used Concur in the companies we worked for. So we had experience with problems or cool and practical use cases that an app designed around the Concur API could do. Even the  Concur VP of Platform Marketing told us afterwards that he had seen many with the problem we were trying to solve.
  • But, we also played the game strategically. Concur is a business expense tracking platform; most of their clients are big businesses. We felt that a business expense API wouldn’t seem as “exciting” or “sexy” as some of the other consumer-facing start-up APIs (Evernote, Weather Underground, to name a few). Since the different companies who sponsored the hackathon had API specific rewards for teams that used their API in the coolest way, this implied that there might be less competition for the Concur API reward. We made a “value” bet of sorts, as value investors would say–the strategy seems to have paid off.
  • Our team’s skills were complementarybut not too much so. A good hackathon team probably needs both design and dev skills, and different people should specialize in one or the other to make things most efficient. But, everyone should be well versed enough in non-specialty skills (like designers in dev, devs in design) to be able to communicate efficiently. For example, our designers were comfortable with both UI/UX design as well front end development like CSS. Several of our developers were full-stack, implementing the back end but also helping out with the front end. We used technologies (frameworks, languages) that we were all comfortable with, which, perhaps out of coincidence for us, was also an advantage.
  • Presentation matters, a lot. Our two wonderful front end designers spearheaded the movement to make our web application beautiful. With the help of everyone, beautiful it was. For the actual 60 second demo, we also selected the most energetic and enthusiastic speakers to present. First impressions matter, but when you’re being explicitly judged in comparison to at least 250 other people, and 60 seconds of talking and app visuals is all you’ve got, first impressions really matter.

Hindsight is 20/20, of course. Causally linking our tactics and strategies to our success is fuzzy at best. But learning never stops; whatever happens, success or failure, there is always something to take away and improve yourself, and others, with.

Spreed – the exciting journey so far, and lessons learned


Spreed, the speed reading Chrome extension I developed last year to scratch my own itch, recently took off in popularity. People wrote about it in a few different places, and our installs in Chrome went up dramatically. The journey has just begun, but I’ve already learned some lessons that I wanted to share.

Lessons learned

  • Piggybacking on buzz can be an effective technique to increase awareness
    • We piggybacked (not deliberately) on the buzz created by the launch of Spritz, the speed reading startup. People wanted to learn more about speed reading, and came across our Chrome extension when they searched for it. We could have done better if we had optimized our web presence for the keyword “Spritz” after the launch, but my excitement at going from 2k installs to 20k installs in less than 5 days blinded me. Which leads me to my next lesson…
  • Be aware of emotions, instead of letting them take control
    • My excitement at our growth caused me to naively focus on vanity metrics like installs and visits, which blinded me to the SEO opportunity mentioned above.
    • Another example: I recently almost made a grossly sub-optimal decision regarding the outsourcing of development. Again, I let excitement and optimism tempt me to “forget” to use a disciplined decision making approach. The particular one I like to use is called the WRAP technique (pdf), which I learned from the fantastic book Decisive, by the Heath brothers.
  • To quote Steve Jobs: “A lot of times, people don’t know what they want until you show it to them”
    • We’ve not only developed the features that our users have said would be most helpful to them, we’ve developed (and are developing) game changing features that  we anticipate users will find immensely helpful. We test our hypotheses by collecting feedback from users and doing small tests/experiments. The lesson here, I think applicable to all of life, and not just product development: be proactive instead of just reactive.

What has been most exciting has been working with our users to make Spreed the most helpful it can be. Building things that help people, having those people reach out to thank you, and then having conversations with them to make the product even better has been extremely meaningful. Some excerpts from our most enthusiastic and dedicated users:

“Your chrome app is phenomenal. I have been using it for 4 days now, and still find it hard to believe that such a basic app can change one’s life so much.”

“Thank you so much, this has revolutionized my life.”

“I am a dyslexic and I have always had difficulty reading with full comprehension.  I can’t believe how this has changed this for me.  I can read at 350 words with great comprehension.  What happens for dyslexics is the words flow together sometimes forming new words that aren’t there.  With this app I see only the word!  It is going to be a life changer for me.”

There’s still a lot more to do, but I’m looking forward to the future. Learn by doing and building, strive to help others, and the journey will be an exciting one.

Shout out to Ryan Ma for the beautiful redesign of the Spreed Chrome extension!


Weekend hack: AngelList Alumni Bot


Ok, it’s more of a scraper than a “bot”. But the reason I developed it was because I was looking through NYC startups on AngelList, and wanted to find founders who had graduated from my alma mater, the University of Pennsylvania. I didn’t want to click through the AngelList startup pages one by one and then click on every founder. There  was no easy way of doing what I wanted, and I also wanted to get to know the AngelList API a little better.

The AngelList Alumni Bot basically gets all startups given an input city (e.g. NYC), grabs the founder’s name, and checks AngelList or LinkedIn to see if they are a graduate of an input school (e.g. University of Pennsylvania).

There are a lot of areas for improvement (e.g. it’s not a web app, it’s really slow, it currently only supports two cities/locations NYC and SV and one school UPenn, it only grabs one founder for each start-up in a very hacky way by exploiting AngelList page meta tags). You can make contributions to the source code at https://github.com/troyshu/angellistalumnibot.

Everything was done in Python. I used and extended, this AngelList API Python wrapper: my extended version is at https://github.com/troyshu/AngelList.

Blast from the past: my first web app built using a framework


www.wtfconverter.appspot.com converts between common units of measurement (e.g. liters, seconds, etc) and silly units (e.g. butts, barns, etc.).

It was the first web application that I had developed using a web framework, in this case the webapp2 framework, on Google App Engine. This was two and a half years ago. Before that, I had developed everything from scratch, using PHP and MySQL for the backend.

This introduction to web frameworks intrigued me, and is what jump-started my journey into Ruby on Rails. Pushing local code to the Google App Engine production server and just having the site work blew my mind. Templating (the GAE tutorial taught how to use jinja2) was like magic, creating and managing dynamic content was so much easier.

I started out by following the GAE Python tutorial word for word, which walked the user through actually building a site. Then I developed my own little webapp that was a little more useful and wasn’t much more complicated than what I had learned in the tutorial. This is exactly how I learned Ruby on Rails too: I walked through the Rails tutorial, building a microblogging app along with the author. Then I built my own web app, Pomos the Pomodoro Technique timer, using what I learned from the tutorial. Pomos has since been deprecated, but here’s a screenshot:



Anyways, I learned a lot from following these tutorials where I actually developed something concrete, and then branching off to do my own thing. This is the heart of experiential learning, and what Sal Khan, founder of Khan Academy, talks about in his book One World Schoolhouse; when the student has ownership of his education by actually applying it, e.g. by building something, he is much more likely to enjoy learning new knowledge and skills. But reforming the current state education is a topic for another post.

My holiday break project: quarterly earnings snapshot webapp

Link: www.qesnapshot.herokuapp.com


Problem: I just wanted to find out how much a company’s current quarterly earnings grew compared to their earnings for the same quarter last year (called year-over-year). I wanted this info for companies that just released their quarterly earnings recently (e.g. within the past few days), so I could generate new investment ideas. “I see AEHR released earnings recently, on December 23. How did its 2013 Q4 earnings grow from 2012 Q4 earnings?”

There’s no easy way to do this. The current options are:

  1. go to an earnings calendar site like Bloomberg’s, then look up the symbol and find quarterly earnings on a site like Morningstar, calculate the growth % number yourself OR manually find and sift through press releases to find earnings growth %
  2. pay a ton for data that tells you, through an API or web interface that isn’t at all user friendly

Solution: Quarterly Earnings Snapshot is a webapp that scrapes an earnings calendar and then scrapes SEC EDGAR filings for companies’ recently released, and historical, quarterly earnings numbers. It displays earnings per share (EPS) and year-over-year (same quarter) EPS growth in an easy to read format so I can get the relevant numbers I need at a glance.

After being in development for only a couple days, the webapp has already helped me generate new stock investing ideas quickly. For example, a few days ago, i checked the site and saw that KBH (KB Home) had released earnings a week or two ago on Dec 19, and that earnings per share had grown a whopping 671% (see below screenshot).


This prompted me to do more research on KBH, as well as its competitors in the home-building industry, an industry that seems to be rebounding from a bottom. Some homebuilder stocks have already risen a lot, others are still undervalued, and  so present potential investment opportunities.

Feedback and comments are always welcome! I know there are many different features I could add, many different directions i could take this. My short term goal over the holidays was just to build something simple in both design and usage, and to share it.

Thanks for reading. Happy holidays and happy new year!

PS: the site is Ruby on Rails + Heroku. Extremely grateful, rapidly prototyping webapps for free/cheap would not be possible without them.

adaptivwealth: the new web app that I made to bring adaptive asset allocation to the masses

adaptivwealth: www.adaptivwealth.herokuapp.com


I recently finished the beta version of a web app I’ve been building, a web app that brings adaptive asset allocation to the masses.

What is adaptive asset allocation?

I’ve written about it in several previous posts. Essentially, it’s the idea that traditional Markowitz mean-variance asset allocation can be improved–generating portfolios that have better risk-adjusted performance–by making the models more adaptive to market changes.

What’s the point of the web app?

adaptivwealth’s goal is to make models that try to improve upon the weaknesses of traditional asset allocation more accessible to individual investors.

Asset allocation–allocating one’s money to different asset classes such as equities, bonds, and commodities–often produces more diversified portfolios than, for example, just picking stocks. Portfolios constructed using asset allocation can have decreased risk and increased returns (see the above screen shot of the performance of the Minimum Variance Portfolio vs. the performance of the S&P 500 for an example). A portfolio’s holdings can be optimized such that return is maximized given a level of risk. Asset allocation is powerful: the famous Brinson, Hood, and Beebower study showed that asset allocation is responsible for 91.5% of pension funds’ returns. Not stock selection, not market timing.

Asset allocation is traditionally not very accessible to individual investors. Individual investors have data, computation, knowledge, and/or time constraints that prevent them from running asset allocation algorithms to optimize their portfolios; asset allocation services are usually performed by financial advisers for individual investors, and large institutions like pension funds and hedge funds obviously have the resources to do it for themselves. Companies like https://www.wealthfront.com/ are closing this gap, taking out the middle man, financial advisers, and lowering the costs of implementing asset allocation for the individual investor.

Companies like wealthfront implement traditional asset allocation algorithms. adaptivwealth differentiates itself by using models that try to improve upon the weaknesses of traditional asset allocation, and by making these models more accessible to individual investors. One approach to addressing the weakness of traditional asset allocation is by making the models more adaptive to market changes.

A call for help

adaptivwealth is still very rough around the edges, and I have a whole list of features that I want to implement, ideas for growth, etc. But I wanted to get a minimum viable product out there and collect feedback as quickly as possible. Let me know your thoughts! Questions, suggestions for features, advice, criticisms, anything and everything helps. Thank you.

Naive Bayes classification and Livingsocial deals

Naive Bayes

Problem: I was planning my trip to Florida and looking for fun things (“adventure” activities like jet ski rentals, kayaking, and go karting) to do in Orlando and Miami. I like saving money, so I subscribed to Groupon, Livingsocial, and Google Offers for those cities. Those sites then promptly flooded my inbox with deals for gym membership, in-ear headphones, and anti-cellulite treatment. Not useful. Going to each site and specifying my deal preferences took a while. Plus, if I found a deal that I liked, I had to copy-paste the link to that deal in another document so that I had it for future reference (in case I wanted to buy it later). Too many steps, too much hassle, unhappy email inbox.

Solution: So I wanted to build a site that scraped the fun/adventure deals automatically from these deal sites. Example use case: if a person plans to visit a new city (e.g. Los Angeles), he or she could just visit the site and see in one glance a list of the currently active adventure deals (e.g. scuba diving) in that city. Sure, it seems that aggregator sites like Yipit solve this. Almost all aggregation sites like Yipit require users to give them their email address before showing them any deals (most are also difficult to navigate). More unnecessary steps for the user. Plus, I found that the Yipit deals weren’t the same as the ones displayed on the actual Groupon/Livingsocial/Google Offer sites.

“pre” minimum viable product: I gathered feedback for my idea to see if other people besides me would actually use it. This time, I just made a few quick posts on reddit (in the city subreddits), and got many comments. People said they would use it. Next.

MVP: The site I built scrapes Livingsocial; Groupon generates its pages dynamically with ajax… can’t scrape that w/o a JS engine, a big PITA to set up. Google Offers didn’t have very many quality deals, and I thought I’d simplify by making the MVP only for Livingsocial for now.

Applying the Naive Bayes classifier

After scraping all the deals, they need to be classified as “adventure” or not. Obviously, doing this by hand is not scalable if I wanted to scrape deals for more than a couple cities. So I implemented the Naive Bayes classifier. Naive Bayes is often used in author text identification, e.g. finding out if Madison or Hamilton wrote certain unidentified essays in the Federalist Papers.

At a high level, Naive Bayes treats each “document” or block of text as a “bag of words”, meaning that it doesn’t care about the order of the words. When given a new “document” to classify, Naive Bayes asks and answers the question, “given each classification/category, what is the probability that this new document belongs to that classification/category?” The category with the highest probability is then the category that Naive Bayes has “predicted” the new document should belong to.

The site currently uses the deal “headline” (e.g. “Five Women’s Fitness Classes” or “Chimney Flue Sweep”) as the document text that Naive Bayes uses. I also tried using the actual deal description (i.e. the paragraph or two of text that Livingsocial writes to describe the deal), and from eyeballing the predictions, it looked like both gave similar prediction accuracy. Using the deal headline is a lot faster though.

Prediction accuracy is still pretty bad. I didn’t want Naive Bayes to automatically assign its predicted categories to the deals, so I decided to keep categorizing the deals manually, but with the help of Naive Bayes’s recommendations. I also decided to make its binary classification decisions more “fuzzy”. Here’s a screenshot of the admin page that tells me the predicted deal type of the scraped deals, with a column called “prediction confidence”, which is a score derived from the Naive Bayes output that signifies how strong its prediction is.


No better way to learn than to do

Doing is the best way to learn, because working on your own projects forces you to engage in deliberate practice (Cal Newport’s key to living a remarkable life). Not only do you practice your skills, but you also learn about learning: when faced with an obstacle while working on a personally initiated project, you have just you and your own resourcefulness–no boss telling you what to do or professor giving guidelines. For example, this time, I encountered the issue of my requests timing out when in production on Heroku, since Heroku has a max request time of 30 seconds and some of my requests were taking up to a few  minutes (when my Naive Bayes implementation was inefficient). I googled my problem, found a stackoverflow post, and learned about worker queues and the Ruby library delayed_job, which fixed my problem by allowing more time intensive requests to be run in the background.

The site is at https://adrenalinejunkie.herokuapp.com/