It’s been a couple of weeks since Johns Hopkins issued final certificates for their Data Science Specialization on Coursera. I’m glad to say that I am now among the first crop of “alums” of the program. According to the last email we students received from our Johns Hopkins professors, about 2.3 million students have attempted at least one of the courses in the Data Science Specialization. Of those, 68,000 verified certificates were issued for completing a single course.
Overview of the Data Science Capstone Project and Approach The Johns Hopkins Data Science Capstone project concluded around Christmas last month. It was an interesting experience, and very different than the other classes. The project, a partnership with smartphone app maker SwiftKey, required students to create a predictive text web app that worked much like a smartphone keyboard. I spent much of the almost 2 months of the project getting up to speed on the basic terminology and approaches of Natural Language Processing, a field dedicated to the interaction between computers and human languages.
A process that began 4 months ago, the sequence of 9 Johns Hopkins Data Science Specialization courses on Coursera, wrapped up for me late last week with my last quiz in course 9, Developing Data Products. While I haven’t truly finished the specialization yet (the first ever capstone project doesn’t launch until late October), I still feel a sense of accomplishment. According to our JHU professors, as of early August, over 800,000 students have attempted at least one course in the sequence.
The ninth and final course prior to the capstone in Johns Hopkins Data Science Specialization on Coursera is Developing Data Products. This is the third and final course in the sequence taught by Brian Caffo. After taking the lead on two statistics courses, Statisical Inference and Regression Models, this class seemed to bring out a more humorous side in Caffo. On a couple of occasions, including the very first video, he had a bit of fun at his co-instructors expense with Go Animate videos.
The eighth course in Johns Hopkins Data Science Specialization on Coursera is Practical Machine Learning This is the third and final course in the sequence taught by Jeff Leek. Probably more than any other course in the JHU series of classes, this is the one that feels like it brought the whole sequence together. Students of Practical Machine Learning need the skills developed throughout the rest of the sequence to be successful in this course, from basic R Programming (course 2) through Regression Models (course 7).
The seventh course in Johns Hopkins Data Science Specialization on Coursera is Regression Models. This is the second course in the sequence taught by Brian Caffo, after Statistical Inference. Much like that course, the emphasis here is on mathematics, and people who have been out of the mathematical loop for a while will probably find this class to be a struggle. In fact, after breezing through most of Statistical Inference, I found significant portions of this class to be more challenging.
The sixth course in Johns Hopkins Data Science Specialization on Coursera is Statistical Inference. This is the first course in the specialization taught by Brian Caffo. In my review of the R Programming course, I mentioned that there were two places in the sequence that seemed (based solely on my observations of forum comments) to be bogging students down. R Programming was obviously the first. Statistical Inference is the second.
The fifth course in Johns Hopkins Data Science Specialization on Coursera is Reproducible Research. This is the third and final course in the sequence taught by Roger Peng. Reproducible Research is the course among the first five in the specialization (except The Data Scientist’s Toolbox), where I spent the least time learning new R code. Instead, the emphasis of this course was more philosophical in nature. Here the emphasis was on writing your research findings up in a way that they could be shared with others in such a way that they were considered to be reproducible, though not necessarily replicable.
The fourth course in Johns Hopkins Data Science Specialization on Coursera is Exploratory Data Analysis. This is the second class in the sequence taught by Roger Peng, after R programming. This course could just about as well be titled “Visualizing Data,” since most everything in the class emphasized methods of presenting data visually in R. The bulk of the time in the class was spent on the 3 most popular methods of graphing in R: the base plotting system, lattice plot, and ggplot2.