I felt battered around by the seas. I never quite felt like my full self.
Right after college, I went into finance, then moved into software engineering, then data science. And in between, I experimented with freelancing, worked on startup ideas with friends, and even considered going into product management.
By day, I was exploring and steering my career through the waters, trying to find myself in the paths that others presented to me. And I struggled. I got rejected. A lot.
This time, I had gotten past the gatekeepers and got hired for a cushy tech company job. But by night, I persisted in following a stream of hope, one that, looking back, I had started following in middle school: crafting software as an independent creator to solve problems for myself and other people. I had no idea where this passion would take me, but the joy of using my unique combination of strengths to solve problems, without having to seek the permission of someone else, propelled me forward.
Then, the COVID pandemic hit. The seismic shifts in the world made me and my wife question some assumptions about our careers. Our work was entirely digital: why did we have to go into an office every day? Carol makes brands come alive because she’s really good at both brand strategy and design: why did she have to offer her valuable services to just one company? I made this free productivity Chrome extension a while ago to scratch my own itch, and it now had a couple thousand daily active users: why couldn’t I build out a new set of value-add features and offer a paid plan?
We got to work. Putting one foot in front of the other on our uncertain journey to becoming independent creators, full-time. In six months, we had some side income flowing in. It wasn’t much—definitely not enough to cover our expenses in New York City, but enough to cover our expenses if we traveled abroad. Perfect: we also always wanted to travel the world together. The results of these small experiments gave us enough confidence to take the plunge. So, we quit our jobs and are roaming the world as digital nomads.
I decided to focus on building a meditation app as my main project. I came up with the idea because I saw others around me struggling to stick to meditation, or not knowing how to make progress with it, things that I struggled with myself before. We all intuitively know that the largest benefits from any habit come if we stick to it and get better at it. I’ve been meditating for a decade now and it has made a life-changing difference on my peace, confidence, and openness. The goal of my app is to help others experience the lasting impacts a meditation practice can bring by making it easy for anyone to stick to and deepen a habit of meditation.
We still have down days of doubt and imposter syndrome, and boy is it an emotional roller coaster. We’re also putting ourselves out there, joining communities to meet new people and learn new things, while also sharing our story. All this felt unnatural to us at first, but we’re getting better at it. Our challenges, both real-world and mental, teach us about ourselves, other people, and our place in the world.
Though we are traveling to unfamiliar places, I finally feel like I’ve come “home” in my career. We’re working a lot, but we wouldn’t trade it for the world. We get to work on what we want and use all our strengths to their maximum potential, all while sitting in our underwear and making a living. During the day, we work. On nights and weekends, we explore.
We’re finally free to be our full selves, with no limits on what we can create and what kind of life we can design together.
Listen for that thing that calls you to be yourself, to live life on your terms. Take the next step. Keep learning. And don’t give up hope.
I’m building my meditation app in public and sharing my founder journey on Twitter. Follow me at twitter.com/troyshu.
I hope everyone has been doing well during these unprecedented times.
I spend a lot of time learning about how to improve my experience in life. As a result, I wanted to introduce a new writing experiment I cooked up to share more of what I’ve learned. The series will be short, somewhat regular (monthly?) updates on life-improving ideas or tactics my wife and I have recently learned about or experienced and are trying out ourselves.
I strongly believe that the right seed of information at the right time can be life-changing, as long as you’re curious, open, and action-oriented. Our hope is that at least one seed in this series will become a new tree and eventually a forest of new possibilities for you.
Here are the ideas we’ve been toying with over the past month:
🖥 Standing desks. Sitting all-day is bad for your health. I don’t know about all the longer-term cancer and heart disease stuff but I do know that it messes up my lower-back and posture: I see and feel that first hand now. In the first few weeks of working from home, because I sat so much I actually felt a kind of lower back pain that I haven’t felt in ages (I have and use a standing desk at the office). Sitting causes tight hip flexors which pulls on your lower back, and I had injured my lower back a while ago so I definitely felt pain this time. I decided to get a monitor and set up my own standing desk at home (see photo above).
🌻 The Power of Now. My former coworkers at Squarespace got me this book because I had the Kindle version and always wanted a physical copy to reference (such a sweet gift!). Carol’s been reading it and has had some moments of insight about how attachment to what the author calls “form” causes suffering in her own life.
🤴 How to Live: A Life of Montaigne in One Question and Twenty Attempts at An Answer. I’ve never read Montaigne before. I’m enjoying reading him and find his style philosophy refreshing, entertaining, and thought-provoking. And what better question to try and answer than “how to live”? Except he never answers that question directly. In contrast to other kinds of philosophy, Montaigne doesn’t tell you how you should live. He just observes and studies life’s experiences through his own stories and adds his own flavor of insight, producing gems like “Even on the most exalted throne in the world we are only sitting on our own bottom.”
🏄♂️ Living in Flow: The Science of Synchronicity and How Your Choices Shape Your World. A great book about living more authentically and how intention coupled with action produces meaningful outcomes. I’ve already put some of the tactics in this book into practice (namely “LORRAX”: listen, open, reflect, release, act, repeat) and have seen first-hand how it helps me turn initially challenging situations or emotions into opportunities. The book gets a bit too speculative and “woo woo” sometimes which loses me for a little, but I’ve already learned at least one tactic to authentic living that I’ll take away for the rest of my life.
🧠 Learning How to Learn Coursera course. I’ve heard a lot about this course over the years and finally had some time to take it. Learning is the ultimate meta-skill so I was looking forward to it: getting better at learning means accelerating your development. It only took a few hours, speeding through the videos and skipping some of the less interesting topics. Takeaways I was reminded of include the importance of practicing recall, even just taking a moment after reading something to recall something you learned, and the importance of using both focused and diffuse-mode thinking. Which brings me to…
🌁 More diffuse-mode thinking. Diffuse-mode thinking happens when you let your mind wander and make connections “all over” the brain. Whereas focused-mode thinking occurs when you’re focused on a specific idea or problem. Both modes of thinking are required for optimal creativity and learning but I definitely spend way more time in focused-mode. I wrote a little about my “most important question” practice which engages the subconscious and diffuse-mode thinking to help me solve problems. I’ve also started experimenting with hypnagogic naps like Thomas Edison and Salvador Dali used to do to harness diffuse-mode thinking for insight. I’ve seen some interesting results so far.
Did you know that when I worked at a hedge fund, I used to meditate at my desk after lunch? People who walked by my desk thought I was taking a nap. At the end of the year, my team gave out bogus awards to various people as a joke. I got the “Sleeping Beauty Award”.
😊 Science of Well-Being Coursera course. Speed through this one. Still, it’s a great review of the well-being practices we all know and love—like a gratitude journal, spending time with people, and spending money on experiences over things—and why they work.
This prompted me to buy some Jackbox Games and now I host a fun virtual game night with friends and family every Friday night. Mafia / Secret Hitler in space, anyone (the game’s called Push the Button)?
🍳 Carol and I have basically been cooking every meal and trying out new recipes. Cooking and learning its principles serves as one of our creative outlets and it’s healthier and less expensive. I think we’ll be cooking a lot more meals ourselves and being more self-sufficient after this pandemic is over. Here’s Carol’s favorite easy bread recipe. Also, did you know that growing your own scallions is super easy and only requires water? Look it up.
🧘♀️ Carol started meditating with me regularly in the mornings. We use Sam Harris’s Waking Up app which I’ve happily paid for over the past few years. I know I’ve already talked about this app a lot on this blog but it’s the best meditation app I’ve tried. Sam’s guided meditations are relaxing and easy to follow while containing nuggets of insight about consciousness and how we experience life. He also has a lot of other thought-provoking content on his app, like guided loving-kindness meditations and a lecture on why he thinks we don’t have free will.
🏋️♀️ Peloton app for home workouts. I really miss the gym and all the equipment. I also missed the boat on buying dumbells, kettlebells, even sandbags before everything sold out. So, I’ve resorted to bodyweight workouts. I had built a sort of ritual with going to the gym but now that I can’t, I found myself reluctant about doing body workouts at home. Then my co-worker introduced me to the Peloton app, which is offering a 90-day free trial! Catchy music and a person guiding and yelling at you is definitely motivational (I see why “social fitness” like Crossfit and Orange Theory are so popular). They have a good amount of bodyweight workouts, and while I’m likely losing a lot of strength (oh well) the Peloton bodyweight workouts are decently intense and only require a yoga mat.
🤯 The High Existence podcast is one of my favorite podcasts because it’s focused on self-improvement but through a more “thoughtful” lens. Some favorite episodes this time:
Learning The Ultimate Meta-Skill and Bending Reality (HEx Dialogues #3). This one’s all about getting better at learning. It has similar concepts to the Learning how to Learn Coursera course, but one takeaway that sticks with me is to have a balance of “consumption, production, and stillness” time in my life. Learning requires consuming information, but it also requires putting it into practice and producing. Lastly, periods of stillness, not even meditation or napping but sitting in silence and staring out the window, induce diffused-mode thinking, which as we learned complements focused-mode thinking for better learning and creativity.
On Engineering Your Own Luck and Surfing Serendipity with Eric James (HEx Podcast #31). The interviewee shares some pretty awesome stories about how he manufactured his own luck to meet Elon Musk and Richard Branson, and how he got his photography featured in National Geographic. The takeaways are: set ambitious goals, live authentically, put yourself out there and don’t be afraid of rejection, be open to the potential opportunities that come your way, and then take bold action.
Have you been experimenting with interesting ways to improve your life? Or just have questions or comments? Reach out!
And remember to subscribe to my newsletter if you want to get these updates and other future posts in your inbox.
What things, experiences, and ideas have impacted me the most in 2019
Similar to what I did in 2018, I wanted to record what things, experiences, and ideas had the most positive impact on me in 2019 (and beyond).
I married my best friend and love of my life! People always ask, “has anything changed since you got married?” And my answer is: “Yes!” Things changed subtly. For example: we both started going to the gym more, eating healthier, and putting more emphasis on our health. We’re both pursuing our passions with even more vigor and alignment to our core selves. I describe some of these changes in more detail elsewhere in this post, but basically we both agree that getting married has encouraged us even more to strive to do all that we’re capable of doing, for each other and for our future together.
In March, I got a new job at Lyft as a Data Scientist, which has definitely impacted my life positively. I have great co-workers, I love that I get to focus on product and user-oriented challenges, and I’m learning a ton about myself, how to work with others, and of course the art and science of learning from data. I joined right before the IPO, and I’d be lying if I didn’t say that the success of the company itself, and how well it’s run, plays a huge role your experience of working at that company.
What about my side projects? I launched ShiftReader this year! I’m excited to keep improving it. I’m also exploring a different side project, and want to keep it a secret for now, but let’s just say that I’m more excited about it than almost all other similar experiences in the past. It’s also a direct result of some of the “inner-strength” practices I describe below.
I changed my workout routine and diet a bit this year, and as a result I’ve lost fat, gained muscle, and feel more energized on most days. What did I do?
Firstly, I introduced supersets into my workout routine. The idea of supersets is to move quickly from one exercise to another, without taking a break, thus making your workout more intense and shorter. I pair exercises for opposing muscle groups together, so that my muscles don’t get fatigued too quickly between sets. For example, I’ll do a set of bench presses (7 reps) which work my chest and triceps. Then, I immediately do rows (7 reps) which work my back and biceps.
I used to work each muscle group once every week. I’d have a push day for biceps and back, a pull day for chest, triceps, and shoulders, and a leg day. Now, with supersets, I work the “push” and “pull” muscles twice a week. Because with supersets I can work opposing muscle groups on the same day. I’ve experienced noticeable strength gains and physique changes after implementing supersets, while also shortening my workout a little.
I also implemented a form of intermittent fasting this year, technically called time restricted eating, which has metabolic health and longevity benefits. Basically, it means only eating during a certain range of time every day: for me, I only eat during the 8 hour timespan from noon to 8pm. I just skip breakfast in the morning and make sure to bring it to work so that I don’t decrease my caloric intake too much (my breakfast was Soylent anyways).
Lastly, I started doing light cardio when I can on my weightlifting “off days”. I only do around 15 minutes on the treadmill or stationary bicycle, making sure to keep my heart rate in zone 2 to burn fat while doing light enough exercise for recovery. Just getting my heart rate up and working up a sweat re-energizes me for the rest of the day.
I call this theme “inner-strength” because I’m referring to the power of your mind–conscious and subconscious–your spirit, and your energy. While I’m a big believer in having balance in life, I think that living a good life starts with inner-strength. Your thoughts manifest themselves as behaviors, which then change your reality.
Here are the experiences that have and continue to have an outsized impact on my inner-strength:
Meditation continues to benefit my life. I still try to do every day (in the morning). This year, a “newer” benefit of meditation emerged more prominently, and that is being able to recognize potential opportunities better. See point 4 here. I’ve used Sam Harris’s Waking Up app all year.
I also journal more regularly now. I set a goal to journal every morning, and I write in a balanced, structured and unstructured way. For part of my daily journal, I just write what’s on my mind, which is incredibly therapeutic. For the rest, I 1) brainstorm on the “Most Important Question” and 2) write down three things I’m grateful for.
I learned about the “Most Important Question” practice from an interview Josh Waitzkin did with Tim Ferriss (show notes). For the “Most Important Question” practice, you ask yourself a question about where in life you feel stuck, preferably before bed. The next morning, you brainstorm around this question in a journal. By doing so, you train yourself to focus on the “most important questions” throughout your life. At every moment, there is always the One Thing that you can do such that by doing it, everything else will be easier or unnecessary.
This “Most Important Question” practice also helps open the channel between your conscious and subconscious mind. Asking yourself the Most Important Question before you sleep puts your subconscious mind to work on that question.
The Habit app (iOS, maybe Android) has noticeably helped me stick to and develop habits that I’m less consistent at. For me, there’s something about seeing my progress over time as a line that goes up and to the right. I only have a few key habits on it though, including journaling and meditation. Developing too many habits at once overwhelms me and I fall off the bandwagon for all of them.
Tim Ferriss always mentions that he re-reads The Magic of Thinking Bigwhen he needs to feels doubt and fear creeping in. So I re-listened to it on Audible. Afterwards, I immediately felt more confident and optimistic. I noticed it in how I carried myself and interacted with people that day. I realized that I had been living in a “background haze” of doubt and negativity for a while. Listening to The Magic of Thinking Big again brought me out of it.
Lastly, I spent several days journaling and chatting with my wife and brother to discover myself more. I felt like I had lost sight of my true self a little, and that I needed to get closer to it again. I kid you not, I Googled “how to find yourself again” and followed some of the prompts that a WikiHow article suggested. Some particularly helpful ones included “Distinguish your thoughts from the thoughts of others”. Another one: “If I had all the resources in the world — if I didn’t need to make money — what would I be doing with my life and why?”. That got the juices flowing.
Those are the things that changed my life the most in 2019
What were yours? And what does becoming the best version of yourself look like in 2020?
I love carving out R&R time. It is time for “reflection and re-alignment” (in addition to rest and relaxation), and it always leaves me feeling refreshed and re-energized.
I try to reflect and re-align at the end of every year. But instead of doing the traditional “New Year’s Resolutions” (we all know how well those work), I’ve improved the process for better results.
I go through a process where reflection, not just resolution, is the core. Because to create our future, we must learn from the past. I also ask a few questions that serve the following purposes:
Create pride and gratitude, emotions shown to be associated with more perseverance.
Focus on what matters most, e.g. using the 80/20 principle.
Promote outside-the-box thinking to break out of normal thought patterns.
Major thanks to Tim Ferriss’s tips and all the research on how we achieve for inspiring my version of “New Year’s Resolutions”–or should I say “New Year’s Reflections”.
So what is my process?
I run through the reflection and goal-setting questions in a Google Doc template I created. You can copy and modify it for your own use.
Tips for R&R time
I do it least a few times a year, and want to do it once a quarter, so that I don’t veer too far the path I want to be on.
It’s best to try to leave your normal environment for at least a few days. Changing your environment changes your thought patterns.
Do nothing but R&R while away (or at least carve out a few days to do nothing but R&R on a longer trip). Save the minute-granularity itinerary planning, rushing from one destination to another, and adrenaline (or cortisol) producing activities for your other “vacations”.
What does one actually do to R&R? Here’s what I do: pen and paper journaling (stream of consciousness, about the past, about my dreams, anything), meditate, read, walk a lot in nature, eat, spend time by myself but also with loved ones.
Do you have any New Years rituals that help you start the year off right? Reach out and share them!
As 2018 comes to an end, I wanted to reflect and write down some of the things that have impacted me this year, and into the future. I made these thoughts brief, as I want to be concise and prioritize what had the most impact. Hopefully readers find my thoughts useful in a practical or thought provoking way. I’m happy to talk more about any of these topics, just reach out or comment!
The following thoughts are roughly categorized, and not in any particular order. Disclaimer: this page does not contain medical advice, every individual’s body and mind is different.
Stretching (before and after every workout) and continual rehab/strengthening (after every workout) has completely eliminated the re-emergence of weight training related injuries *knocks on wood*. As well as avoiding certain exercises that naturally aggravate old injuries. Stretching and softening muscles is one of Tom Brady’s secrets to his longevity. And LeBron’s too: “play hard, have fun, and stretch”.
Some of my favorite stretches and rehab/warmup exercises this year:
Medicine ball rolling (as an alternative to foam rolling), self explanatory
Zinc supplements have staved off oncoming colds several times for me this year. This is my go-to immune health supplement, which I “superdose” (i.e. 3-5 tablets a day) when I feel a cold coming.
A cup of coffee (caffeine) works wonders for me. I never realized how much more energy and alertness it gave me before this year, when I started drinking it more often because it’s free and tasty at work. I can only have one cup though, and earlier in the morning, or else I stay up all night. I’ve been using it together with L-theanine. I like to save this combo for special situations (also so that I don’t develop caffeine dependence and withdrawal).
Floating has helped me relax and stay centered. It’s also given me some thought provoking experiences. I like Lift in Brooklyn. Sign up for their mailing list, they have deals/coupons a few times every year.
I’ve really enjoyed Sam Harris’s Waking Up App. His meditations and lessons are educational and thought provoking, in addition to being very relaxing of course.
Speaking of Sam, I found his recent podcast with the TV mentalist and hypnotist Derren Brown fascinating; hypnosis can be powerful. I’m exploring self-hypnosis, as well as acupuncture, after hearing of a friend of a friend having allergies “cured” from it. I expect the placebo effect—namely the power of expectation and belief—to play a huge role in why these things “work”. Even if that’s true though, it means these practices can still be beneficial.
The mind and body are so connected that all of this might as well be under Health.
I continue to love building digital products that people use. Some of the things I created in 2018:
[in progress] ShiftReader: a better speed reading training tool than what Spreed was. The link is just a landing page with fake pricing (I’m doing price testing), so click “Sign Up” and enter your email if you’re interested in email updates.
[sorta dead] CryptoMint: was previously a paid subscription newsletter for crypto news with automated sentiment analysis on scraped articles, which actually had a good amount of subscribes. After deciding I did not want to be in the business of selling “predictions”, esp. in a market like crypto, I turned it into a free crypto newsletter (where the articles are still being scraped) that I only sometimes send out. I have about 430 people on the mailing list.
[dead] CryptoSaver: a web app that automated dollar cost averaging into crypto. I killed it after realizing that users were still terrified of some web app placing crypto buy orders automatically through Coinbase, even though it was via oauth, each buy order had to be manually approved, and that the app wouldn’t have any permissions to do anything else on the account like sell or transfer. I didn’t invest much before talking to users about this idea (and I try not to with most of my ideas): I only put up a legit looking landing page and did some light Python work to understand how the Coinbase API worked.
I’m really happy to have found “solo entrepreneurship” communities this year, like the Indie Hackers community and Microconf, and specific people in that community I can talk to, like Christian
I’ve been working at Squarespace as a Data Scientist for a little over a year and a half now, working closely to support Product. The thoughts below are primarily about that kind of Data Science, vs. machine learning engineering type roles, or Data Scientists that support other stakeholders like Marketing or Sales. I’ve gotten a good chance to learn and think about:
How Data Scientists and PMs should work together: more of a partnership and less of a conduit for data access. Like any good relationship, it takes time and effort to develop that partnership.
Event data standardization, event tracking “grammars” that are intuitive and self documenting, and the importance of data governance in a truly data-driven organization. And by data-driven orgs I mean orgs that use data (and Data Science) in a meaningful way to drive product-level and even company strategy-level decisions, not an org that only looks at if metrics are going up. 📈 Like all things in life, a balance of both is necessary.
The power of quantitative + qualitative research in understanding users i.e. what Data Scientists (can) do + what User Researchers do. Data shows what users do. User interviews get at why users do what they do, or what they couldn’t do (which you can’t observe with data). Together, they are the voice of the user.
I’m very bullish on Segment, and the massive and growing value they provide for Product orgs that want to be data driven (which is also a growing number). For example, I love what they’ve created with Protocols and Typewriter. Now that they’re the centralized data hub for companies, they can build powerful analytical products like Personas too.
As always, you can follow along with what I’m reading on Goodreads.
A few of the most impactful ones I read this year:
Understanding your users is the best way to continue building a product that they want and ultimately cannot live without. One way to better understand your users and how they experience your product is to talk to them or survey them; another way is to dig into data on how they’re using your product–what actions are they taking, how much do they come back to use your product–to gain insight into how you might be able to improve their experience. This is often called “product analytics”. While preparing my first iOS app for release, I thought about how I might track user behavior in my app so that I’d have the data needed to explore how they’re using it, such as what buttons they’re pressing, what screens they’re visiting, etc. In web development projects, I’ve traditionally relied on tools like Mixpanel to track events easily and explore and visualize user behavior in different ways, but Mixpanel has been too limiting and expensive, so I decided to go with a cheaper and more flexible solution (but less user friendly on the visualization side of things) for mobile app event tracking and analytics, Google Analytics for Firebase. We all make mistakes when using any new tool, but I came across some nuances of Google Analytics for Firebase (Firebase Analytics for short) that I wish I knew about before I started using it. Here is a list and short description of each, which will hopefully help new users of Firebase Analytics learn from the mistakes I made.
List of Firebase Analytics Nuances
“Turn on” parameter reporting from the start if you have dimensions in your events that you want to see numbers for, at a glance, in Firebase.
Link Firebase to BigQuery from the start if you want access to your raw event data.
Firebase’s default Funnel reports are “open” funnels, not “closed” funnels.
“Turn on” parameter reporting from the start if you have dimensions in your events that you want to see numbers for, at a glance, in Firebase Analytics.
Firebase Analytics gives you some basic visualizations out of the box, like how many times a certain event fires, over time. I had an event that would fire whenever an upgrade popup was shown to a user, and I specified a parameter called “source” which would note which action preceded the upgrade screen, so I could see the most common paid features that free-tier users tried to access. However, Firebase Analytics did not report on this “source” dimension at all until I manually set up “parameter reporting” for it. So don’t forget to enable “parameter reporting” for important event parameters/dimensions that you care about!
In the Event view, click the three vertical dots to the far right of your event, then add a parameter of your event to the table by clicking and dragging
Firebase Analytics will start collecting numbers for that parameter (here, “source”), which you’ll be able to see in the report for the parent event (here, “upgrade_popup_show”)
Link Firebase to BigQuery from the start if you want access to your raw event data.
By default, your raw event data is collected and made available to you only after you link Firebase to BigQuery. When I first implemented Firebase, launched my app, and got a handful of users, I could see a high level picture of their behavior via Firebase Analytics’ basic visualizations. A few weeks later, I found out that I had to link Firebase to BigQuery explicitly to start telling Firebase to “save” my raw event data, and only after doing so did I see that raw data coming in (and saved into tables in BigQuery). So I had “lost” the first several weeks of raw event data, which isn’t bad for my small app, but could be more costly for a high profile, heavily marketed app launch where mobile analytics and being able to mine insights from the data matters more.
Note that when you link Firebase to BigQuery, you’ll need to upgrade to Google Cloud Platform’s Blaze plan, which is a pay-as-you-go, or pay only for the bandwidth, storage, etc. that you use, plan. You can visit their calculator to estimate your costs, but so far, collecting the data and running infrequent BigQuery SQL queries for my app has been free.
Firebase’s default Funnel reports are “open” funnels, not “closed” funnels.
If you go into Firebase Analytics’ Funnels page, you’ll see an area where you can create a funnel easily. After trying to do so, I found out that the funnels Firebase creates are “open” funnels, meaning that at each step of the funnel, a user doesn’t have to have completed the previous step of the funnel to be included in the count of that step. In my opinion, “closed” funnels, where at each step of a funnel a user at that step has to have completed the preceding step, are much more informative; it’s also a core feature of other event analytics tools like Mixpanel and Heap. Several others are also confused about Google’s decision to have Firebase only report open funnels.
For example, I created a funnel in Firebase Analytics to report on what percentage of users who open my app for the first time go on to take their 1st photo with my app, then what percentage of those go on to take their 2nd photo, etc. I expected fewer and fewer users to make it to each step of the funnel, so was surprised when I saw what appeared to be 100% of users who take one photo take two, 100% of users who take two photos take three, etc. Until I found out that Firebase had constructed an open funnel:
There isn’t a setting in Firebase Analytics to see closed funnels yet, so I decided to create a closed funnel in BigQuery with SQL, on my raw event data.
I won’t go into the details here, but I tested a few different kinds of SQL queries for constructing closed funnels, and the following “LEFT JOIN”-based one had much better performance than a “subqueries”-based one that you may find elsewhere on the internet. You too can create closed funnels to better understand the flow of your users, if your event data is in BigQuery: here’s my SQL query for the closed funnel “first open -> take 1st photo -> take 2nd photo -> take 3rd photo” (using UNNEST to flatten arrays because BigQuery stores stuff like that):
count(distinct e0.user_dim.app_info.app_instance_id) as first_openers
, count(distinct e1_user) as photo_taken_1
, count(distinct e2_user) as photo_taken_2
, count(distinct e3_user) as photo_taken_3
FROM `youday_IOS.app_events_*` as e0, UNNEST (e0.event_dim) as e0_events
LEFT JOIN (
events.name as e1_eventname
, e.user_dim.app_info.app_instance_id as e1_user
, events.timestamp_micros as e1_ts
FROM `youday_IOS.app_events_*` as e, UNNEST (e.event_dim) as events
) ON e0.user_dim.app_info.app_instance_id = e1_user
AND e1_eventname = 'add_photo_from_camera'
LEFT JOIN (
events.name as e2_eventname
, e.user_dim.app_info.app_instance_id as e2_user
, events.timestamp_micros as e2_ts
FROM `youday_IOS.app_events_*` as e, UNNEST (e.event_dim) as events
) ON e1_user = e2_user
AND e2_eventname = 'add_photo_from_camera'
AND e2_ts > e1_ts -- 2nd photo taken after 1st
LEFT JOIN (
events.name as e3_eventname
, e.user_dim.app_info.app_instance_id as e3_user
, events.timestamp_micros as e3_ts
FROM `youday_IOS.app_events_*` as e, UNNEST (e.event_dim) as events
) ON e2_user = e3_user
AND e3_eventname = 'add_photo_from_camera'
AND e3_ts > e2_ts -- 3rd photo taken after 2nd
WHERE e0_events.name = 'first_open'
Firebase for Mobile Product Analytics
Firebase makes it easy to track events and collect all of them into a datastore, so you have the data you need to (quantitatively) understand how users are using your mobile app. There are just a few “manual switches” that someone using Firebase Analytics should know about, to ensure that they’re collecting complete behavioral data from the start. Firebase can also improve its visualizations to be more informative and insightful, so users don’t have to write SQL as much. Firebase certainly has the potential to get there, with its relatively affordable “utility” or “pay-as-you-go” pricing model and its superior data storage and querying capabilities (good luck trying to get your raw data out of the other event analytics platforms). I enjoy learning from my users to build a better product, and having the data to do so, and am excited to see what Firebase Analytics can do over time for the advancement of product analytics.
I’ve been reading Data Science for Business, by Provost and Fawcett, a very useful book that explains some of the most important principles and topics in data science. The authors’ language and structure helps a lot in developing an intuitive understanding of key data science concepts like model tuning, model evaluation, and various models themselves like decision trees, linear models, and k nearest neighbors. I highly recommend the book if you’re someone who works with data scientists, if you’re a beginner data scientist, or even if you’re a data science expert who’s looking for a good resource to refresh your fundamentals with.
I found this one chapter particularly interesting because it talks about a framework, or way of thinking, that I haven’t really heard about elsewhere. While specific tactics, such as how different kinds of models work, are definitely important and a large part of what a Data Scientist needs to know and be able to do, I think higher level strategy is also important. Anyways, the framework is highly practical, which fits the authors’ theme for the book: that data science isn’t just about analyzing data, but also about understanding the business problem in an analytical way. I wished there was something tangible and interactive to go along with their explanations in this chapter (and others), so I decided to create a guide of sorts, this blog post plus an interactive Jupyter Notebook you can download and play with. The blog post provides context if you haven’t read the corresponding chapter in the book yet, so the Jupyter Notebook is near the end.
If you have the book already, this blog post corresponds to the latter “half” of Chapter 7, “Decision Analytic Thinking I: What Makes a Good Model?”. This guide and especially the Jupyter Notebook assumes that the reader already has some familiarity with the basic ideas of machine learning, such as supervised learning (specifically classification), data pre-processing, holdout set testing, and model evaluation.
When applying data science to solve business problems: what is the real goal?
Like approaching any sort of problem, you have to uncover what the real goal of a data analytic project is. It can be tempting to get caught up with the surface level question or jump straight into solutions.
For example, questions about customers come up a lot in business: which customers are most likely to churn? Which customers are most receptive to upselling? The idea is that once we can predict which customers are most likely to be upsold, we can call them, try to get them to buy more items like an add-on for the thingamajig they just bought, and generate more revenue for the business. Let’s run with this “upselling” case as an example.
The real business goal for answering “which customers are most receptive to upselling?” is so that we can not only generate more revenue from upselling customers, but also maximize the profit generated from our efforts. Not all customers will be equally likely to be upsold (some are curmudgeons, others might have a real need for the other products we’re selling), those who we do upsell could purchase different amounts of stuff, and the act of upselling costs us time and money (which can also be variable). So how do we even structure a problem like this, and then decide what to do?
Introduction to the expected value framework, and how it helps break down problems
Let’s introduce the expected value framework, and weave it into how we’d structure and break down our business objective for this “upselling” project.
As a quick refresher:
expected value (of a variable) – a predicted value of a variable, calculated as the sum of all possible values, each multiplied by the probability of its occurrence
Basically, what do we anticipate, or expect, the value of some variable to be, given that there is some uncertainty in the chances of different outcomes happening.
Frame the question in terms of expected value
Back to our upselling question. Each customer has his/her own probability of being upsold, and likely amount that they will be upsold for; there’s also a cost to upselling, which we may have to eat if we call a customer who doesn’t want to buy anything else from us. So, thinking in terms of expected value, each customer will have an expected profit, given that we reach out to that customer to try and upsell them. More specifically:
Which means that, assuming we reach out to a customer, the expected value of profit () equals the probability of upselling the customer () times the profit we’d get from upselling the customer, plus the probability of failing to upsell the customer (1 minus the probability of upselling the customer) times the profit we’d get from failing to upsell the customer.
Breaking out profit in each potential outcome:
Where is the value, or revenue generated, from upselling the customer, and is the cost of trying to upsell the customer (we assume the cost is constant across customers for simplicity). Notice in the second half of the equation that if we fail to upsell the customer, the outcome is that we get $0 in revenue and eat the cost () of trying.
Now, the path to obtaining our original business goal, to maximize total profits, is clear: try to upsell all customers where the expected profit of trying to upsell each one is greater than 0 (assuming we don’t have any budget or constraint on how many customers we can upsell to).
Expected value breaks the problem down for us
Also, thinking in terms of expected value has now broken up the problem nicely for us: to figure out the expected profit of trying to upsell a customer, (1) figure out the probability that upselling will work , the (2) value of a successful upsell , and the (3) cost of trying to upsell a customer.
Now, we can go more low level and think about how we might address each piece analytically. We can build a machine learning model, a classifier, on historical customer data of which kinds of customers were successfully upsold and which kinds weren’t, to address (1) and generate a predicted , or probability that upselling will work, for each customer. For simplicity, we’ll assume that both (2) and (3) are constant are constant across all customers, but technically, you could build another model to predict (2), the value of a successful upsell for a given customer.
More specifically, for (1), our historical customer data is a snapshot of all customers that we’ve previously tried to upsell to, at time t. One column in the data is whether or not (e.g. a 1 or -1, or 1 or 0) we were able to successfully upsell each customer by some future date t+1, say 3 months later; this is the target variable. The other columns, or features, contain data on each customer before time t, such as number of previous purchases, number of times customer has been back to our online store, shipping zip code (which we can estimate income level with), etc.
Now we have a structure, thanks to EV (expected value), for evaluating whether we should try to upsell any individual customer in order to maximize company profits.
Let’s plug in some numbers to see how we might use our structure to make decisions on whether we should try to upsell a customer or not.
Take Customer A. Based off of what we know about other customers that are similar to him, our machine learning model predicts that he has a 91% chance of being upsold, if we call him.
Let’s assume that if we upsell a customer, they will spend $100 to buy an add-on to the thingamajig they already bought. Let’s also assume that on average, it takes a 30 minute phone call at a salesperson’s hourly wage of $30 / hour, to try to upsell someone, so the cost of upselling is $15.
Therefore, the expected profit for trying to upsell Customer A will be:
And since the expected profit is positive, it is worth it to try and upsell him, because on average (if we keep trying to upsell people like him), we will generate $76 in profits each time for the company.
Now let’s look at Customer B. Based off of what we know about other customers that are similar to her, our machine learning model predicts that she has a 4% chance of being upsold, if we call her.
So, the expected profit for trying to upsell Customer B will be:
We should not try to upsell customers like Customer B, because on average, we will lose $11 each time.
If we do this expected value calculation for each customer we’re thinking about upselling to, we can arrive at a subset of customers where the expected profit of upselling each one is positive, and thus if we try to upsell all of them, our expected total profit will be maximized.
See this Jupyter Notebook for a full example of training a machine learning model on historical customer data to predict whether or not a customer will be upsold or not, and the associated probabilities of each outcome happening. These probabilities, along with the expected value framework, are then used to show which customers we should try to upsell to maximize our company’s profit.
Note that using the expected value framework to calculate something like expected profit depends entirely on two things: the probabilities of different outcomes (e.g. a customer successfully being upsold or not) and the benefit or cost of each outcome. Both can be estimated with models and comprehensive data, but not always very well, or it may be impossible in the first place. This is where both business and data understanding come into play: a data scientist has to understand what data is available and what it can be used for, and also understand how the business works so that accurate cost/benefit numbers can be gathered. This also means that the results of using expected value are sensitive to changes in either type of variable, probabilities or cost/benefit numbers. Though the expected value framework can be a practical and structured way to break down a business analytic problem, the data scientist may have to use other methods to inform action if he/she doesn’t have enough confidence in the probability or cost/benefit estimates. Like all things in life, there is no one size fits all approach: the EV framework is a tool in a data scientist’s big toolbox.
Thanks for reading, I’m always open to questions, suggestions, or other kinds of feedback!
We all know how hard making decisions about own own lives can be sometimes, such as decisions about your career, or your relationships.
Here’s a list of several thought experiments I’ve come across over the years that have personally given me more perspective, making hard decision making a little bit easier sometimes. Though they’re all slightly different, they seem to operate similarly, cutting out fear and external influences to drill into what our deepest personal values are.
Ruth Chang’s idea that every hard choice is an opportunity to “become the authors of our own lives”. Watch her full TED Talk (15 minutes), it’s amazing.
I’m not sure if any of these will always give the “right” answer, and I also think that these thought experiments are just part of the puzzle to improve decision making about one’s own life. As Kahneman, Mauboussin, and Munger suggest, we should use a rational decision making framework or even a checklist* because humans are very prone to cognitive biases and shortcuts that can lead to bad decisions. Even as just a piece of the puzzle, these thought experiments have allowed me to think about decisions from different perspectives, which is always valuable.
Please add any other relevant thought experiments, and/or thoughts about decision making!
*I personally use a checklist similar to WRAP, which is simple to remember and covers a majority of the most common cognitive traps we can fall into. The Heath brothers describe WRAP more in Decisive. Using their terminology, the above thought experiments could belong to the “A” step of WRAP, or “attaining distance/perspective”.
One of the side projects I worked on in the past handful of months was Mr. Market Feels: a stock market sentiment Twitter bot that used automated image processing to extract and tweet the value of CNN Money’s Fear and Greed Index every day.
There have been attempts to backtest the predictive power of the Fear and Greed Index when buying and selling the overall stock market index depending on the value (the results suggest there isn’t much much edge for that particular strategy). Anecdotally though, I’ve found the CNN Fear and Greed Index (what I’ll call FGI for short) to be a pretty good indicator of when this bull market has bottomed out during a short-term retracement, and when I used to have more time, have used it to trade options with decent success. Going to CNN’s website every day to check the FGI was a pain, and I also wanted the numerical values in case I wanted to run some analyses in the future, so I wondered if I could automatically extract the daily Fear and Greed Index values.
I saw this as a fun and short coding project that would help me and others while giving me practice with image processing, so I dove in.
The goal was to extract the FGI “value” and “label” from CNN’s site every day. The value of the Index is 95 and the label is “Extreme Greed” in the screenshot of the FGI below:
Extracting the FGI value and label isn’t as easy as using OCR (optical character recognition) on the image and getting the results: for one, there is a lot of extraneous text in the image. Two: the pixel location of the value and label that we want changes as the FGI changes. Three: the relative position of the value and label also changes as the FGI changes. You can see points two and three in the image below: now, the FGI label (“Extreme Fear”) is to the top left of the FGI value (1). In the original image, the FGI label (“Neutral”) is directly right of the FGI value (53).
Why does all of this matter? Because for clean OCR, images need to be standardized. Or at least they do for Tesseract, the open source OCR engine created by Google. In Tesseract’s case, images of text shouldn’t contain any other artifacts (that the engine might try to interpret as text), should be scaled large enough, have as much image contrast as possible (e.g. black text on white), and be either horizontally or vertically aligned.
Most of the pre-processing of the FGI images to standardize them for Tesseract was straight forward enough. Without going into way too much detail, I used the Python Pillow library to automatically convert the image to black and white, apply image masks to eliminate extraneous parts of the image–like the “speed dial” and the “historical FGI table” on the right hand side–and crop the image down leave only the FGI value and label, like this:
Here’s where challenge number three came up: the FGI value and label aren’t always either horizontally or vertically aligned, and this reduced Tesseract’s accuracy. For example, in the first image, the FGI label is diagonal from the FGI value. Running Tesseract OCR on it returns “NOW:[newline]Extreme[newline]Fear”, which completely misses the value “10” because of the diagonal alignment. You can try out Tesseract OCR with the above images, or with your own, here.
An Interdisciplinary Solution of Sorts
One solution to the challenge above split the resulting image into two images, one with the FGI value and a separate one with the label, so that Tesseract could be run on both and know that both images were either horizontally or vertically aligned. Basically, from a single FGI image, I wanted two images that looked like these:
In thinking about ways to implement that, I first thought about the principles of unsupervised clustering, from the field of machine learning. With clustering, the intermediate, processed FGI image could be segmented and split appropriately by finding the cluster of pixels that corresponded to the FGI value (“10”), and the other cluster of pixels that corresponded to the FGI label (“Now: Extreme Fear”).
Turns out that using the k-means clustering algorithm for image segmentation is pretty common practice.
First, a copy of the image was “pixelated” to ensure that the k-means algorithm would converge on the two correct clusters:
Then, the code applied k-means to find the centroids of the two clusters (green dots). It then derived the line connecting the two centroids (green line), and calculated the bisecting perpendicular line (red line), which can be seen as a “partition” between the two clusters of black pixels.
From there, the original black and white FGI image could be split along the partition line, which would result in the desired two images: one for the FGI value and one for the FGI label. From here, Tesseract would have these two standardized images as inputs and would be able to cleanly extract the FGI value and label.
Lastly, I put the script onto a web server, told a cron job to run it daily, and hooked it up to Twitter’s API to automatically post to the Twitter account Mr. Market Feels. I named it after Ben Graham’s moody Mr. Market.
I just finished reading Poor Charlie’s Almanack (an amazing book full of wisdom and life principles) so Charlie Munger’s multidisciplinary approach to life is on my mind. Though this project was probably a little less multidisciplinary than he means because machine learning and image processing are closely related fields, I still saw it as an example of how broad and varied knowledge and skills can come together to solve a problem effectively. To quote Munger on specialized knowledge: “To the man with only a hammer, every problem looks like a nail.”
Thanks for reading!
UPDATE 6/9/2018: Mr. Market Feels has been been broken for a handful of months because various financial data APIs that I’ve tried using have been deprecated. I recently found out about IEX’s free and publicly available financial data API, which Mr. Market Feels is now using and will hopefully make its first tweet post-fix on Monday. I would also highly recommend reading Flash Boys: Michael Lewis tells such an intriguing story about the arms race going on in high frequency trading and the birth of IEX.
In my downtime, I’ve been using Kaggle to get better at applying machine learning to solve problems. The process is not only teaching me new technical skills, but also reminding me of some useful principles that can be applied elsewhere. To keep things digestible, this is the second post of two (the first one is here).
A short list of important skills for a data scientist
When trying to get better at a skill, I try to tackle the highest leverage points–here’s what I’ve been able to gather about three skills that are important in being a data scientist*, from talking with others and reading about machine learning, and experiencing it firsthand with the client projects I do.
Communication (includes visualization)
The first two are relatively self-explanatory, ensembling brings some pretty interesting concepts that apply to decision-making, in my opinion.
*I’ll be referring to the “applier of machine learning” aspect of “data science”.
Feature engineering is the process of cleaning, transforming, combining, disaggregating, etc. your data to improve your machine learning model’s predictive performance. Essentially, you’re using existing data to come up with new representations of the data in the hopes of providing more signal to the model–feature selection is removing less useful features, thus feeding the model less noise, which is also good. The practitioner’s own domain knowledge and experience is used a lot here to engineer features in a way that will improve the model’s performance instead of hurt it.
There are a few tactics that can be generally applied to engineer better features, such as normalizing the data to help certain kinds of machine learning models perform better. But usually, the largest “lift” in performance comes from engineering features in a way that’s specific to the domain or even problem.
An example is using someone’s financial data to predict likelihood of default, on a loan for example. You might have the person’s annual income and monthly debt payments (e.g. for auto loans, mortgages, credit cards, the new loan they’re applying for), but those somewhat closer to the lending industry will tell you that a “debt to income ratio” is a better metric for predicting default, because it essentially measures how capable the person is of paying of his/her debt, all in one number. After calculating it, a data scientist would add this feature to the training data, and would find that their machine learning model performs better at predicting default.
As such, feature engineering (and in fact, most of machine learning) is sort of an art vs. a science, where a creative spark for an innovative way to engineer a domain specific feature is more effective than hard and fast rules. They say feature engineering can’t be taught from books, only experience, which is why I think Kaggle is in an interesting position because they’re essentially crowdsourcing the best machine learning methodologies for all sorts of problems and domains. There’s a treasure trove of knowledge on there, and if structured a little better, Kaggle could contribute a lot to machine learning education.
What potentially useful features/data could we engineer from timestamp strings? We could generate year, month, day, day of week, etc. numeric data columns–much more readable by a machine learning model.
During a recent chat with one of the core developers of the Python scikit-learn package, I asked what he thought some of the most important skills for a data scientist are. I sort of expected technical skills, but one of the first things that came up was communication, or being able to convey findings and why those findings matter to both internal and external stakeholders, like customers. This one’s self explanatory–what good is data if you can’t act upon it.
In fact, it seems like communicating well for data scientists might be even more important than it is for professions like programmers or designers because there’s a larger gap between result and action. For example, with a design or app, a decision maker can look at it or play around with it do understand it reasonably well to make decision, whereas a decision maker usually can’t just see a bunch of numbers that were spit out by a machine learning model and know what to do: how are those numbers actionable, why should someone believe those numbers, etc. Visualization is a piece of this, as it’s choosing the right charts, design, etc. to communicate your data’s message most effectively.
In machine learning, an ensemble is a collection of models that can be combined into something that performs better than the individual models.
An example: one way this is done is via the voting method. The different base, or “level 0”, models each make a prediction on, say, whether a person is going to go into default in the next 90 days. Model A predicts “yes”, model B predicts “yes”, and model C predicts “no”. The final decision then becomes the majority vote, here “yes”.
There are many other ways of ensembling models together. An important and powerful one is called stacking, and it is applying another machine learning model–called a “generalizer”, or “level 1” model–on the predictions of the base models themselves. This is better than the voting method because you’re letting the level 1 machine learning model decide which level 0 models to believe more than others based on the training data you feed into the system, instead of arbitrarily saying “the majority rules”.
A high level flow chart of how stacking works.
Ensembling is a key technique in machine learning to improve predictive performance. Why does it work? We all have an intuitive understanding for why it should work, because it’s a decision making framework we all have probably used, or been a part of, before. Different people know different things, and so may make different decisions given a particular problem. When we combine them in some way–like a majority vote in Congress or at the company we work at–we “diversify” away the potential biases and randomness that comes from just following one decision maker. Then, if you add in some mechanism to learn which decision makers should have their decisions weighed more than others based off of past performance, the system can become even more predictive–what areas could benefit from this improved, performance based decision-making process?*
*Proprietary trading companies, where every trade is a data point and thus generated very frequently, do this more intelligent way of ensembling, in a way, by allocating more money to traders who’ve performed better than others historically. A trader who is maybe slightly profitable but makes uncorrelated trades–for example by trading in another asset class–will still be given a decently sized allocation, because his trades hedge other traders’ trades, thus improving the overall performance of the prop trading company. Analogously, in machine learning, ensembling models that make uncorrelated predictions improves overall predictive performance.
Here are some resources related to the topics described above that were recommended to me and that I found most useful, I hope they’re helpful to you too.
A good overview of the principles of data science and machine learning for non-technical and technical folk alike: Data Science for Business
An important thing for a data scientist to have before any of the stuff above is a good understanding of statistics, Elements of Statistical Learning is a detailed survey of the statistical underpinnings of machine learning.