Capstone did provide a true test of Data Analytics skills. Its like a being left alone in a jungle to survive for a month. Either you succumb to nature or come out alive with a smile and confidence.
Wow i finally managed to finish the specialization!! definitely learned a lot and also found out difficulties in building predictors by trying to balancing speed, accuracy and memory constraints!!!
by Marcio G•
The whole specialization is a bit of a mixed bag... Many of the courses rely too heavily on teaching R programming and not sufficiently on data science concepts (such statistics or machine learning). The instructors (specially Peng) spent way too much time detailing R syntax that could have been picked up by the students on their own from other resources available on the web...
The regression models and statistical inference courses are exceptions though: Together with the machine learning course, these are probably the most useful from the whole specialization.
The materials in this capstone project are way sloppier than materials in other courses by the way. They lack structure and feel confusing. I'm not even sure if the instructors tried to implement the proposed project themselves to have a base of reference. Feels like they were already growing tired of the whole thing and put the capstone project together in a hurry without much thought or care.
The theme of the project is indeed interesting (text-mining and NLP), but I think that would have been more productive for me to take a NLP course instead. You are going to use very little from what you have learned from the other courses in the specialization (for the most part the data product course) and you will need to learn text-mining and NLP from scratch on your own to complete the capstone (no videos nor materials available in the course on these subjects).
Also, if I was going to implement the same app on my own these days, I would probably use RNNs, not Katz Back-off and Markov Transition Matrices as in the capstone and I would probably use SparkR. Heck, I might not even use R, probably Scala or Python with Spark instead. In short, data science moves fast and this course already feels very outdated...
The instructors seem quite experienced in statistical analysis, so it's a shame that they decided to focus so heavily on R programming instead... That would have made the specialization more resilient to technological innovations in the field...
The specialization surely could be improved and these issues corrected, but all courses seem pretty much abandoned by the instructors. Most of the courses still have active "mentors" (volunteers not associated with Coursera nor Johns Hopkins) , but "mentors" seem to have lost contact with the instructors: For example, a couple of assignments require data that is no longer available (dead links) and "mentors" have provided this data in the discussion fora. I reckon that if "mentors" could contact instructors, the dead links would have been fixed in the materials by now...
The peer-grading doesn't work so well... Most of the submissions I graded were painful to review (extremely low quality). Not surprisingly, the graders were also pretty low-skilled. They can't even understand the requirements (and I suspect not even the English language) and they will take points from correct submissions.
I urge any employers to look at the actual code for this capstone from candidates given the general incompetence and poor skills of the students I graded. The grading criteria is pretty relaxed, so even though I would like to fail them, I still had to give them a passing grade. Such a weak grading criteria is detrimental to all people who actually have the skills and put hard work on their submissions. Many undeserving people will, unfortunately, pass and receive a certificate.
by Thej K•
I spent 80 hrs on this course. I hated so many things. 1. There was lot of uncertainty in the course. For example we didn't know how far to go with NLP. And I constantly came across in the forum where people were complaining about how there was 0 guidance and had no idea what to do. Saviours were those few people who put up help posts on the forum and sharded thier trecherous experience going down different paths. 3. The topic was already hard enough NLP, something I had no clue about and then there was this additional problem all the fucing time about memory. Jesus! One of the most painful courses primarily due to overload, lack of clear instructions and their refusal to edit one letter in the course since 5 years! Fuck them!
by Roberto G•
This class is challenging and a lot of people complained so I'll tell you my approach since I was able to complete it on the first try in my free time from my full time job. Not having any knowledge of Natural Language Programming, I found Youtube videos and presentations from the Stanford class taught by Dan Jurafsky and Christopher Manning. Study it up to the explanation of n-grams, it should be enough for the class. I completed the first weeks in few days so I had more time to actually build the model and the app (you'll need more than the scheduled weeks if you have no prior experience). I found valuable resources in the course forum. Then you're pretty much on your own, identify the best packages, how to use them, look on Stack Overflow when you get stuck. Start using a very small set of data so you can quickly build the model and the app until you get something that works. After that you can improve the model by using more data, finding the balance between processing time, app time response and prediction accuracy. Everyone understands the limitation of the project so give importance to quickness rather than accuracy.
My overall evaluation of the project is a mixed bag. The positive is that it introduces you to a new topic (NLP) and the goal is reasonable, it takes a lot of effort but it's not impossible and it forces you to learn something meaningful (something easier would have not made me learn something valuable). The negative is that there is no explanation whatsoever about NLP, which was never mentioned in the previous courses, so there's not much teaching or guidance. The involvement of Swiftkey is limited to providing the data.
by Paul R•
The project topic itself is interesting, but longer (structured as 7 weeks); not much guidance until you find the right threads from mentors in the discussion forum from a few years ago or repeatedly google stackoverflow; it is much more technical than the rest of the course; and doesn't really use much of what was learned during the meat of the specialization's statistics/regression/ML courses, other than data science principles and tools (though new R libraries were needed). These issues aside, the project was an interesting challenge to complete nonetheless. Overall this specialization is now a few years old, and the plethora of 4 and 5 star reviews across all courses seem generous and out-dated. Materials are not being updated, forums are a mess of years-old threads with not much current activity; there is a feeling of waning interest and participation. This was clearly cutting edge material and course back in 2014-6, if JH/Coursera intend to continue offering it, the material needs some refresh and reordering, tougher grading rubrics (I saw a lot of inconsistency and poor quality which met the rubric criteria, alongside great quality work), and more active involvement from lecturers and mentors (and, please fix the typos).
by Jose A V C•
Very disappointed with this final course. Little to no support. Discussion Forum provides some level of help but you are basically on your own.
Very challenging to come up to speed with Natural Language Processing techniques if you have never taken any class about it.
My recommendation to JHU and Coursera is to add a separate course for NLP where you cover all the basics and then have the Capstone.
by Tony W•
In my opinion, this course is a waste of time, it simply throws a bunch of links and terminology for you to google and research. The project is interesting but once again, you have to do tons of research and take up other courses to fill the gaps (might as well do the other courses instead of this one).
I do not recommend this course or the specialization.
by E. C•
NLP is a total different thing and should be a course by itself. I would prefer a a large scale machine learning capstone where we could make models and it would fit better to real life situation! Through all the courses I worked hard only to reach NLP capstone? this doesn't feel right! Please fix it!
by Piyush V•
On the Capstone Course, those who are reading this review I would say, skip everything (videos) and directly start writing codes and building the app. Otherwise this course is somewhat unnecessarily stretched too much, it could have been cut way short. I will tell you what I did: I skipped everything, got the gist of the objective, scanned through the codes and worked on my idea.
I started the specialization in December of 2015 and I am ending it today, March of 2018. I remember struggling with R in the beginning (I was a novice programmer writing dirty codes). Now I can't stop thinking about plethora of data product opportunities surrounding me.
by Fulvio B•
It is nice when you arrive at the end of the specialization and I understand that being this the final step of the specialization that has to demonstrate that you are able to put together more or less all the things you were presented during the journey you have to be left a bit alone. However I think this capstone is now outdated. It does not mentioning new packages that are now available and performing very well (e.g. tidytext) and also some of the references mentioned in the "lessons" are not available anymore at the url given. I think at least these should be maintained.
by Noel T•
Capstone did provide a true test of Data Analytics skills. Its like a being left alone in a jungle to survive for a month. Either you succumb to nature or come out alive with a smile and confidence.
by Marcos d S M•
In data science, two of the best specializations taught by the Coursera platform are from The JHU and IBM. There are many comparative assessments to choose which would be the best choice according to the profile of the postulant. It's probably worth doing both. In any case, I ratify jhu's data science expertise, in the sense that it is quite rich, deepens very little explored topics in regular undergraduate courses, and is directed specifically to statistics and big data. It assumes that the student has a good basis of mathematics and statistics of higher level and good knowledge base in R. I say only good base, because the R is quite vast, and the specific classes of programming in R are sequenced, practically from scratch, but of well accelerated progression. In some more specific topics, I had to complement the knowledge with very objective and punctual parallel courses, such as the themes of the DataCamp platform, which serves well to unlock some punctual subject. StackOverFlow's help and feedback alsocomes as a great help at all levels of learning, including professor Roger Peng's first lesson: knowing how and where to seek help to move forward, that is, the first major lesson in data science is the humility of being an astronaut in a virtually infinite universe,and expanding every day. This is the most fascinating of Data Science, Biostatistics and R: the themes never run out and become concatenated in the face of the phenomena that surround us daily. From there, the sequence of 10 courses represents a long way (not so long for some) of development, feedbacks, evaluations and model building in R. Undoubtedly an excellent specialization, which is worth the investment, especially time. The final part of the specialization represents the last steps, but the steepest of the journey. In the latter, in particular, metaphorically you are confronted with yourself, a feeling of having been blindfolded in the middle of a dense dark forest, and now need to find your way back using what you have learned so far. For all specialization graduates it is a stage of relief, rather than celebration. The percentage of evasion of specialization from the first course is very large. In a master's and doctorate, I believe it is a specialization of great support, for the publication of studies and analysis of field data, in order to reach assertive conclusions from hypothesis tests. Upon completion of the specialization, JHU encourages the publication of the Certificate of Completion on LinkedIn and, with this, you receive an invitation from Professor Brian Caffo, after a brief verification of authenticity (around 1 week), to join a private Data Science group moderated by him, where there are excellent networking opportunities with other scientists,partnerships, job opportunities and project development.
by Carlos S•
I took this specialization a couple of months ago and did not comment as such. Now I turned around to remember some topics and started reading comments.
I found many comments that say the final project has nothing to do with the previous 9 courses and when I did it I thought the same.
Looking at it in perspective, I think the previous courses are absolutely necessary for the final project. The objective of carrying out a project with such characteristics is to apply the knowledge by oneself.
The first courses of programming in R, extraction and cleaning, and exploratory analysis are fundamental to understand the problem. In this case the cleaning has to do with the transformations using regular expressions and tokenization. The exploratory analysis should be done in any data science project, otherwise you may encounter surprises when implementing the models.
Statistical inference was necessary and closely linked to exploratory analysis, especially to select samples well and review distributions, since some machine learning methods may be affected by distributions. I must say that I did not see this when I took this course, but it was because of my lack of experience. Maybe there was a lack of guidance.
The algorithm I used was regression on the ngrams for simplicity, time and capacity of my computer, but it could have been combined with other methods such as neural networks or svm.
Implementing the model in shiny and then adjusting it because it was very heavy was also interesting.
As a summary, I really liked this specialization and although it was very hard and many times I did not know how to move forward (especially in the capstone), I think the challenge was important for my learning and I was very entertained.
by Jesse S•
Coursera lost my thoughtful 2-star review so I am replacing it with this. I learned a lot through my own efforts and through the efforts of students who bothered to post in the forums. The one mentor disappeared half-way through the course.
by Jerome C•
Capstone very challenging. Minimal instructions force the students to do a lot of research on the subject. But this is extremely rewarding. Doing is good job is possible (well, my grade is still pending at the time of this comment!) and makes students take a huge leap forward in data exploration, data cleaning, setting up a strategy for analysis and algorithm, make an Rpresentation, create an online app (by the way, I also created an small app for my company thanks to this training, especially the "Developing Data Product" course).
by Ken K•
This class provided a good background on the principles and process of Data Science and related research. The R material was very good and the assignments and capstone project will force you to become a good R programmer. The statistical analysis materials were also very thorough. Overall, the courses were well taught and the material was relatively easy to follow and learn.
by Fernando S e S•
Honestly, there is very little guidance for the project and it deals with a whole new type of data: text. That's when you find out that working with quantitative data, like all the previous courses, is easy. I got my ass kicked throughout 3 sessions in order to finish this thing. But you know what? Maybe that's how it should be for one to learn something.
by Ben S•
Great times! It took me almost four years to get through this!! I had a child, sold a house, went to graduate school in statistics and I'm about to graduate. The DSS classes gave me a lot of great tips for graduate school and really cool reports, apps, ideas to show off to potential employers. Just got to get that job now!!
by Francesco C•
In my opinion this last course is a great way to conclude the Data Science specialization, because not only it "forces" you to apply a lot of lessons learned during the other 10 courses, but also because it gives you the opportunity to understand how important is to set the problem in a good way before trying to solve it.
by Ken W•
To tell you the truth, when I started this capstone, I felt like I was thrown into the deep end of the pool. You are asked to build a NLP app using Shiny and, unfortunately, most, if not all, of the concepts required to design and build the app are not covered in the earlier courses in the specialization. Can you say 'Google'? I would have liked to have seen the instructors walk through the relevant concepts required to successfully complete this project. The videos consist of Dr. Peng basically saying "Good luck!"...a little lame if you ask me.
by Pablo R•
I can't just finisih this specialization without commenting my personal experiences regarding this. I have taken several courses related to data science; however, this specialization has set the quality in such high standard that I feel dissapointed with most of the courses around the web.
I'm a technical guy who has most of his professional experience in academy, I know the process of creating new courses, of summarizing complex topics into understandable concepts but the most important is to show the application of the kwoledge into real world projects. Yes, all of this what is what I got from these courses.
Does the specialization worth?. Abosolutely yes!, the amount of knowledge that you would get is huge!. As everything in life, it will depend on how much effort you invest on it, there's a big part of the specialization that depends on you and if you are thinking on becoming a Data Scientist, you better get used to spend lots of time on doing your own research.
Finally, I just wanna let you know that I have started my career as a data scientist and I had several interviews with many companies/clients that were amazed by the quality of my portafolio that I built using all the final projects of this specialization.
I'm really greatfull with all the people involved in creating this specialization and I will absolutely recommend this to anyone who is decided to become a data scientist.
by Zoran K•
Overall this was excellent track. While there was a difference in level of difficulty between the individual courses, it is probably unavoidable given the range of subject areas.
I think it would be great improvement if there was a additional 'post-grad' 'course'-like few weeks to connect to industry that is hiring from this background and get those connections to lead the 'grads' into real job interviews; Also, more projects that are direct connection to the industry, like the capstone - where those project would be dine perhaps in some kind of cooperation with the industry reps, so that graduate student here has direct path and had already worked with people that might hire him/her, where the time spent working on the capstone project includes meeting with the reps from the industry whom would have interest in the work. Something along the lines of grants for university projects (not talking about money here) but of a connection to the needs of the industry. Students working on that if they deliver good and interesting results would have one foot into the new job. This would also allow for higher fees to be charged for the classes since there would be more tangible 'selling' path.
by Fiona E Y•
This course is unlike all the others. Although you will need information gained in the previous nine modules, the Capstone Project requires you to work on a long and difficult problem using your own initiative. Mentors, tutors and Swiftkey employees are lacking throughout this project.
I worked through many different R packages to generate the word prediction N-Grams because R has a tendency to run out of memory. Many students are forced to use a cut down version of the three million lines of text because of memory issues but I managed to find the proverbially needle in the R packages haystack that allowed me to use the entire dataset!
I had problems with publishing the presentation to RPubs - it just would not work using either RStudio or RConsole but at least I had a fall back position of placing the presentation on my own website.
It took me three attempts to complete this project, nine months (Jan-Sep 2016) and about 300 hours in total, I didn't give up so nor should you, you can do it! And Good Luck! Hope to chat with you on the Data Science Specialism LinkedIn Group for Completers!
Finally was it worth paying for all of the certificates. Yes, it was!
by MEKIE Y R K•
Really liked this overall course. I was able to get directly into data science aside from my job (quantitative analyst). This specialisation helped me makeing my way in quantitative finance with much more understanding in computing models; much more confidence in the way I will face (I am facing) datas/algorithm issues. Really struggled with the last course(capstone) I even sometime wanted to give up as I went really deep in NLP and was facing issues with my memory.
Finally I'm getting out with strenght, smile, confidence and the taste of hard work in data science projects.
Some other really important point is to learn to be humble :) . This capstone project shows us enough how far it's a constant work to be a data scientist.
Really glad to have completed all the courses; going from zero on R to near hero :)
by MUZAFFAR B H -•
Although this course was the most complicated part, it was a really good experience in implementing our understanding and try to develop a practical product. I really like the approach of providing a data product that is presentable to the other community other than data specialist. I will refer to the course content from time to time in the future. I would recommend the course set to my colleagues if they have interest on data science.
by John H•
This course significantly challenged my skills in programming, probability, machine learning and applied mathematics (eg Katz's backoff theory-equations). The collaboration in the discussion forums and the information on-line is absolutely critical and is the only way you can succeed in this project. I appreciate all the help from my classmates and from those who took the time to post helpful information on-line.