“Let’s order Thai.”
“Great, what’s your go-to dish?”
This has bugged me for years and is the genesis for this project.
People need to know they have other choices aside from Pad Thai. Pad Thai is one of 53 individual dishes and stopping there risks missing out on at least 201 shared Thai dishes (source: wikipedia).
This project is an opportunity to build a data set of Thai dishes by scraping tables off Wikipedia. We will use Python for web scraping and R for visualization. …
Build a project portfolio.
Arguably the most pervasive advice in data science.
Listening to the excellent Build a Career in Data Science Podcast, I was surprised to learn few people heed this advice.
A portfolio showcases your interests, skills and abilities to reason about data. It can convince a hiring manager to give you a chance, and it’s also effective for learning and joining a community.
So why do few take this advice?
1. Peak performance happens when you’re stretched beyond your comfort zone, but not too much into the panic zone (see Yerkes-Dodson). …
“Let’s order Thai.”
“Great, what’s your go-to dish?”
This has bugged me for years.
Pad Thai shouldn’t be your first choice of Thai food.
Like Turkey on Thanksgiving, most Pad Thai is overrated. Instead of a bang, it’s a whimper.
There, I said it.
Pad Thai was created in the 1930’s to cultivate a sense of nationalism and combat rice shortages by promoting a noodle dish. Through a stroke of marketing genius, “Thai” found its way into the name.
So, what’s the alternative?
What you actually want is Kua Gai (คั่วไก่). Unlike Pad Thai, this stir fry dish…
Data suggests student debt bites twice.
First, stalling wealth creation.
Second, if it prevents people from finishing college, this further sets back wealth creation.
Previously, I examined differences in college degree attainment, between White, Black and Hispanic Americans.
The Widening Gap  lead to a hypothesis:
Wealth inequality is positively related to the widening gap in college degree attainment among the three groups.
Data on Families with Student Loan Debt allows us to indirectly support or contradict our hypothesis.
Here are the results.
African American families are shouldering more student debt over the years than Hispanic or White Families .
In light of recent euphoria, here’s a compelling bear case.
I’m bullish Bitcoin and Ethereum.
And any technology to redistribute power, resist censorship and preserve privacy.
In light of the current crypto euphoria, I’d like to entertain the best bear case I’ve heard. Paraphrasing Demetri Kofinas, host of Hidden Forces:
The U.S. dollar’s legitimacy comes from the government’s ability to level force and violence. Men with guns can demand your private keys.
When a gun is pointed at our face, will cryptography save us? I’d add, this is true for any nation state.
Demetri’s point is well taken.
Using R and Python to visualize the relationship between Market Cap and Hourly Cost to Attack
In this post, I use Python and R to access, parse, manipulate, then visualize data from Crypto51.app to show the strong relationship between Market Capitalization and Cost to Attack among public crypto networks.
The more a network is thought to be worth, the more expensive it is to attack. An important, but often overlooked reason to celebrate price gains.
In this post, I query an API endpoint setup at Crypto51.app to get
JSON data. Then, I use Python to parse and convert to
NLP is subfield of linguistic, computer science and artificial intelligence (wiki), and you could spend years studying it.
However, I wanted a quick dive to a get an intuition for how NLP works, and we’ll do that via sentiment analysis, categorizing text by their polarity.
We can’t help but feel motivated to see insights about our own social media post, so we’ll turn to a well known platform.
To find out, I downloaded 14 years of posts to apply text and sentiment analysis. We’ll use
Python to read and parse
json data from Facebook.
We’ll perform tasks such as tokenization…
This post uses various R libraries and functions to help you explore your Twitter Analytics Data. The first thing to do is download data from analytics.twitter.com. The assumption here is that you’re already a Twitter user and have been using for at least 6 months.
Once there, you’ll click on the
Tweets tab, which should bring you to your Tweet activity with the option to Export data:
In this post, we’ll explore Gradient Descent from the ground up starting conceptually, then using code to build up our intuition brick by brick.
While this post is part of an ongoing series where I document my progress through Data Science from Scratch by Joel Grus, for this post I am drawing on external sources including Aurélien Geron’s Hands-On Machine Learning to provide a context for why and when gradient descent is used.
We’ll also be using external libraries such as
numpy, that are generally avoided in Data Science from Scratch, to help highlight concepts.
While the book introduces gradient…
The following code can be run in RMarkdown or an R script. For interactive visuals, you’ll want to use RMarkdown.
The two main libraries are
dplyr so you can just load that if you want) and
sunburstR. There are other packages for sunburst plots including: plotly and ggsunburst (of ggplot), but we'll explore sunburstR in this post.
The data is from week 50 of TidyTuesday…