In a recent post, I displayed the social network graph that I created using the Twitter API and Plotly. There are a number of interesting applications here. Given my history with education, one that I think that shouldn’t be overlooked is as an interesting way to teach graph theory for an innovative teacher and school. I taught graph theory myself for several years as part of a discrete mathematics course. While the textbook I used included many examples of “real world” problems that I found engaging, the students didn’t always agree.
Using the Twitter API and Plotly with Python, I created a visualization of a recent #EdTechChat on Twitter, held on December 14. If you aren’t familiar with graph theory, the dots in this visualization are referred to as nodes or vertices. They represent the Twitter users that participated in the chat. The line segments connecting them are called edges and represent a relationship between two Twitter users: one user follows the other.
Because I just couldn’t get enough of the new Machine Learning Specialization from the University of Washington, I decided to fill fill my schedule to the brim with another Coursera class, Social and Economic Networks: Models and Analysis, from the University of Stanford. I took a graph theory course at the University of Illinois while getting my master’s degree around the dawn of the new millennium, which among many other topics, covered things like Euler circuits, Hamiltonian paths, coloring, and the like.
After completing the Data Science Specialization from Johns Hopkins in 2014, my MOOC studies in 2015 have been fairly sporadic, partly as a result of starting a new job, and partly as a result of not seeing something that seemed like the right fit. That’s no longer the case, as I’ve recently jumped into a new specialization, the Machine Learning Specialization from the University of Washington. As great an experience as I had with the JHU specialization, this new specialization checks a couple of continuing education boxes for me that I felt the JHU specialization left lacking.
Not too long ago, I did my first post on Apache Spark, a Spark dataframes tutorial. I’ve continued to experiment with Spark since taking my first tentative steps with it just a few months ago. One of the challenges with Spark is that it has a reputation for being difficult to deploy at scale. Stepping in to try to solve that problem is Databricks. Databricks offers the ability for corporations to deploy an optimized Spark via the cloud with some very nice extra bells and whistles.
My journey into data science is taking me all sorts of interesting places that I didn’t originally expect. That’s what I love about it. While I can feel myself accelerating into the learning curve, there’s no shortage of new things to learn and won’t be for years to come. One of the latest has been setting up of a “dual boot” environment of my new Dell PC to run both Windows 10 and Linux Ubuntu.
NOTE: I have created an updated version of my Python Spark Dataframes tutorial that is based on Spark 2.1 uses an easier, updated Spark ML API. I would encourage readers to check that out over this older post. A couple of months ago, I got my first experience with Apache Spark. While I am just starting to use it to implement meaningful problems, in my experience when working with a new tool or technology, just getting one’s feet wet can be crucial to getting a learning snowball rolling.
I just received my certificate from Stanford’s Statistical Learning course, taught by the legendary Trevor Hastie and Rob Tribshirani. This was the first MOOC I’ve completed since making the jump from education to the corporate world, and I did find it challenging to keep up with the material despite the fact that this class required quite a bit less on a per week basis than most of the Johns Hopkins Data Science Specialization on Coursera.
One of my favorite learning methods is via podcasts. They allow me to multitask–exercising, driving, or doing chores–while listening to experts on a particular topic. Some of the podcasts I listen to are purely for entertainment (think Serial or StartUp) but many others are for educational purposes. As I’ve been trying to build up my data science awareness in a variety of areas, I’ve been putting together list of podcasts specific to data science.
If you’ve looked into MOOCs (Massive Online Open Courses) at all, you have probably wondered how successful students are at completing them compared to traditional courses. The short answer? Not very. I’ve seen various numbers floating around in a variety of studies, citing completion rates as low as 4% and as high as 8%, but never have I seen an aggregate number over 10%. People take MOOCs for a variety of reasons.