Build a project portfolio.
Arguably the most pervasive advice in data science.
Listening to the excellent Build a Career in Data Science Podcast, I was surprised to learn few people heed this advice.
A portfolio showcases your interests, skills and abilities to reason about data. It can convince a hiring manager to give you a chance, and it’s also effective for learning and joining a community.
So why do few take this advice?
Some theories:
1. Peak performance happens when you’re stretched beyond your comfort zone, but not too much into the panic zone (see Yerkes-Dodson). …
“Let’s order Thai.”
“Great, what’s your go-to dish?”
“Pad Thai.”
This has bugged me for years.
Pad Thai shouldn’t be your first choice of Thai food.
Like Turkey on Thanksgiving, most Pad Thai is overrated. Instead of a bang, it’s a whimper.
There, I said it.
Pad Thai was created in the 1930’s to cultivate a sense of nationalism and combat rice shortages by promoting a noodle dish. Through a stroke of marketing genius, “Thai” found its way into the name.
So, what’s the alternative?
What you actually want is Kua Gai (คั่วไก่). Unlike Pad Thai, this stir fry dish…
Data suggests student debt bites twice.
First, stalling wealth creation.
Second, if it prevents people from finishing college, this further sets back wealth creation.
Previously, I examined differences in college degree attainment, between White, Black and Hispanic Americans.
The Widening Gap [1] lead to a hypothesis:
Wealth inequality is positively related to the widening gap in college degree attainment among the three groups.
Data on Families with Student Loan Debt allows us to indirectly support or contradict our hypothesis.
Here are the results.
African American families are shouldering more student debt over the years than Hispanic or White Families [2].
In light of recent euphoria, here’s a compelling bear case.
I’m bullish Bitcoin and Ethereum.
And any technology to redistribute power, resist censorship and preserve privacy.
In light of the current crypto euphoria, I’d like to entertain the best bear case I’ve heard. Paraphrasing Demetri Kofinas, host of Hidden Forces:
The U.S. dollar’s legitimacy comes from the government’s ability to level force and violence. Men with guns can demand your private keys.
When a gun is pointed at our face, will cryptography save us? I’d add, this is true for any nation state.
Demetri’s point is well taken.
I don’t…
Using R and Python to visualize the relationship between Market Cap and Hourly Cost to Attack
In this post, I use Python and R to access, parse, manipulate, then visualize data from Crypto51.app to show the strong relationship between Market Capitalization and Cost to Attack among public crypto networks.
The more a network is thought to be worth, the more expensive it is to attack. An important, but often overlooked reason to celebrate price gains.
In this post, I query an API endpoint setup at Crypto51.app to get JSON
data. Then, I use Python to parse and convert to dataframe
…
NLP is subfield of linguistic, computer science and artificial intelligence (wiki), and you could spend years studying it.
However, I wanted a quick dive to a get an intuition for how NLP works, and we’ll do that via sentiment analysis, categorizing text by their polarity.
We can’t help but feel motivated to see insights about our own social media post, so we’ll turn to a well known platform.
To find out, I downloaded 14 years of posts to apply text and sentiment analysis. We’ll use Python
to read and parse json
data from Facebook.
We’ll perform tasks such as tokenization…
This post uses various R libraries and functions to help you explore your Twitter Analytics Data. The first thing to do is download data from analytics.twitter.com. The assumption here is that you’re already a Twitter user and have been using for at least 6 months.
Once there, you’ll click on the Tweets
tab, which should bring you to your Tweet activity with the option to Export data:
In this post, we’ll explore Gradient Descent from the ground up starting conceptually, then using code to build up our intuition brick by brick.
While this post is part of an ongoing series where I document my progress through Data Science from Scratch by Joel Grus, for this post I am drawing on external sources including Aurélien Geron’s Hands-On Machine Learning to provide a context for why and when gradient descent is used.
We’ll also be using external libraries such as numpy
, that are generally avoided in Data Science from Scratch, to help highlight concepts.
While the book introduces gradient…
This is a quick walk through of using the sunburstR
package to create sunburst plots in R. The original document is written in RMarkdown
, which is an interactive version of markdown.
The following code can be run in RMarkdown or an R script. For interactive visuals, you’ll want to use RMarkdown.
The two main libraries are tidyverse
(mostly dplyr
so you can just load that if you want) and sunburstR
. There are other packages for sunburst plots including: plotly and ggsunburst (of ggplot), but we'll explore sunburstR in this post.
library(tidyverse)
library(sunburstR)
The data is from week 50 of TidyTuesday…
This is a continuation of my progress through Data Science from Scratch by Joel Grus. We’ll use a classic coin-flipping example in this post because it is simple to illustrate with both concept and code. The goal of this post is to connect the dots between several concepts including the Central Limit Theorem, hypothesis testing, p-Values and confidence intervals, using python to build our intuition.
Terms like “null” and “alternative” hypothesis are used quite frequently, so let’s set some context. The “null” is the default position. The “alternative”, alt for short, is something we’re comparing to the default (null).
The…
Data-Informed People Decisions