This course provides an introduction of some important concepts and tools on a very important aspect of data science: cleaning and organizing data before any analysis. A must for any data scientist.
Easy, mostly instructive Course. The Assignments and quizzes are quite good, and illustrates the lessons very well.
See the videos for general presentation, but use the energy on the excersizes.
by William G C•
This course is amazing! I have spent the majority of my time in R merely doing analytics. This course taught me the tools needed to go out and grab the data that I need for those analytics.
by Maria S•
I did not get anything out of this course. This course was pointless because it wasn't really a course just a random scavenger hunt. If I wanted to wander around the Internet aimlessly trying to solve random problems by hacking away, I could have just done that on my own. I signed up for the class because I was looking for a structured way to learn the content and get in some exercises to practice & drill in the skills learned. This course is a waste of time -- if you are interested in learning R, go through some tutorials online. If you want to learn data science principles, try one of the other Data Science specializations. If you want to mimic this class but have more fun, pick some problems that you are interested in, find some data that could help you solve those problems, and try to clean that data.
by Narin P D•
The course is very helpful when it comes to exploring commonly used R packages and learning certain best practices involved in data cleaning. I'd definitely recommend it to any data science enthusiasts. One area with slight scope for improvement could be the final project. The instructions are quite open to interpretation, which means that the final grade which you get via peer review is always going to be debatable. Other than that, I have no complaints whatsoever :)
by Marc H•
It was an effective course, in which we were given the right amount of knowledge to know how to find information. R is a difficult language for me, (I'm a C++/Java and Rails developer) but the projects increased my confidence and my ability to find the information I needed. That being said - the course needs to be updated - many of the links were 404s.
by Nima A•
A very useful course. The audio quality of some lectures (especially those by the main instructor) was not good. This course completes the sister course of R programming and they work together.
great course, requires a little bit of programming background with no rigid specifics though.
by Alessandro V•
I found this course very useful for my learning needs, nevertheless I have a remark about this course. The timing estimation provided for each section are quite inaccurate, for instance: 3h for a swirl exercise are really excessive, may be 45 minutes are more realistic, but the main problem is related to time underestimation ! I mean, especially for the final assignment I spent more than 20h for completion and part of this time has been used to convince myself that a negative standard deviation was acceptable for the assignment goals. The provided estimation instead is 2h (<< 20h !!)
by cristian b•
this course really explains all the programming tools available to get and clean data from different sources, however, i feel it is missing some extra activities to consolidate the knowledge they are sharing in the course.
by Raw N•
Would have preferred if there were programming assignments that incorporated reading from data sources on the web.
For those planning to take the course, note the following:
*The course covers reading data from a myriad of sources, but largely in passing superficial detail. These sources include XML files, mySQL databases, HDF5 files, csv files, txt files with various formats (for example fixed-with files), JSON objects, and web API.
However, the course project only involves reading data from several txt files and combining them into a single R dataset.
Course topic order: In the first two weeks of the course, a lot of information is glossed over in passing- this information involves reading from the various file formats mentioned above. Week 3 involves subsetting, sorting, reshaping and merging data. Some of this may be review for you if you've taken the R programming course or the "R Programming Environment" course in the "Mastering Software Development in R" specialization. Week 4 involves string manipulation, regular expressions and working with the Dates. A lot of this is covered in Roger Peng's ebooks "R Programming for Data Science" and "Mastering Software Development in R" (both are freely available- google them).
Assessments: The only assessments in the course are 4 quizzes- each of which involves about 5 short programming exercises- and a final project which only involves topics from weeks 3 and 4 (specifically- subsetting data, sorting data, reshaping data, and working with regular expressions). So you can do the course project without understanding anything covered in weeks 1 and 2 of the course.
Mentor David Hood is fantastic for providing valuable resources to aid you with each assessment and so is Xing Su for providing a complete set of course notes. USE THE DISCUSSION FORUMS IF YOU GET STUCK!
by Vladimir C•
Although the subject covered is important, and I learned something, I cannot recommend this course. The course is 7 years old and is badly in need of updating. R language is very dynamic and rapidly evolving and the course covers many packages and functions that are deprecated, retired or superseded by newer, more efficient tools. If this is meant to be an online course, it needs to stand the course of time or needs to be updated regularly. Data sources used as examples were from webpages no longer available. There is no expectation that they will be after 7 years. A different approach is needed for an online course. I spent significant amount of time troubleshooting outdated course material on user forums and searching the web. If you read user forums, you will see lots of frustrated people commenting on this. Unless, the course is recently updated, l recommend learning the material using a more up to date course.
by Pamela M•
I would have given just one star except the swirl() assignments are actually very good. The videos are just a (poorly) narrated glossary. Topics I learned in another course were presented here in such I way I actually got confused. Can you imagine? my knowledge was actually worsened, not improved by thus course. (!!) // If the swirl() functions were made the centerpiece of the course, and the videos were described as just a narrated glossary, at least our expectations would be in line with reality. // Even so, I come to Coursera because I WANT to be taught by an instructor. If I'd wanted a curated list of tutorials so I could teach myself, I would have done that already. Anyone who pays for this should get their money back. NOT recommended for beginners. // I going to complete it because I'm stubborn that way, but it is an unpleasant experience for me and everyone within earshot as I have to vent my frustration often just to make it through. // After week 2 I resorted to just reading the pdf of the slides and stopped watching the videos. The videos added NOTHING to my understanding. More often than not they put me to sleep. And what's worse, the narrator mispronounces "attribute". There IS a difference. I atTRIbute certain ATtributes to native speakers who mispronounce important vocabulary.
by Liam C•
Week 1 and 2 are completely worthless. They're cursory 5-10m introductions to topics that show you HOW to start to do something, but don't explain any commands or what is going on, it's just instructions to follow. This leaves you completely unprepared to do any actual work. Then you get the assignments and you basically have to go learn everything independently. The course info is useless. I skipped these. When I want to do the type of work they cover, I'll watch some tutorials and read documentation to actually learn it. They need to focus in on one or two topics (e.g. APIs, MySQL) and actually teach you the basics of them. The lecture videos even use weird syntax without explanation (e.g. using = instead of <-. Using par(), etc.).
Like the other courses in this specialization, you'll spend almost all of your time learning independently, and not using any of the materials provided. The discussion board is sometimes useful, but you can see how little work is done to improve the course there, as people point out errors and issues which are still outstanding months/years later.
by Md. Z M•
Pros: After putting in many hours of effort in understanding the problem statement and then actually solving it, the sense of achievement is fulfilling. I learnt a lot of skills in this course. Those skills are very important to understand the data before start doing the analyses, but are usually ignored when data science is taught to a beginner.
Cons: The course project is extraordinarily difficult and you won't get any help from the discussion forums as there are no TAs live. However, there are some threads that can help understand the problem statement. So, sift through the thread dump to find the topics relevant to you.
The quality of the video lectures are very bad; many of the packages referenced in the lectures are outdated, and require you to search for its alternative on your own, which is helpful in the long run, but demands many hours of googling and reading through the documentations.
Overall, I would recommend this course for understanding the skills required in data cleaning.
by Alex F•
The content on downloading files needs to be explained much better. Including more practice with the different file types would have been great. Also needs an demonstration and lecture on what makes a good codebook and readme file. The content with dplyr was really well done though. For something so important in data science I would expect this course to have been done so much better.
by laurent h•
Content is fundamental but teaching was under expectations
by Anthony B K•
This was, by far, one of the worst courses I've ever taken. Considering that I have three degrees and completed military PME, I've taken many. The content of the course was significantly out of date. If you're going to teach a course in computer science or on a programming language, you should be updating your lectures at least annually. The websites referenced in the lectures here were either missing or they had changed to the point that the "examples" that were presented were useless. That means that at best, those lectures were a waste of time. If I wanted to spend hours and hours on the web trying to figure out what you meant to do, then I could have bought a book and taught myself this material. Secondly, some languages (including R) are actively being developed; this means that because the lecture material was so dated, the methods presented were (in some cases) obsolete. From a presentation point of view, the lectures were sub-optimal because the slides themselves were just images. Having to re-type long lines of code where you can easily make "fat finger" mistakes isn't helpful; those slides should contain text that can be copied and pasted into either notes or into an R session. I'd also suggest a better microphone or better sound levels, but that's minor compared to the terrible content.
by Neil J•
R is really just the worst, and the instructors do not make it better. The code in this class is unreadable:
- too many one liners, because "it's faster to write", though harder for other people to read
- variables are named cryptic things like spIns or x, rather than names with meaning (eg, sprays.by.insect), again "because it's faster to type"
- way too many cases of "there is more than one way to do it", which just makes things confusing because the other ways tend not to be equivalent
What I'm most concerned about is that I've seen lots of poorly written code in many different languages: Java, C++, C, Python, Perl, and now R. But I've also seen really well-written code in all the languages *but* R, I have yet to see any code in R that is flexible, maintainable, and clear. Which leads me to think that no such code exists, or it's so rare that it doesn't matter. It is clear to me that if I am to do data analysis, then I will need a different set of tools; but because this specialization is taught entirely around R (the lectures are about R, not about higher-level concepts), then this specialization is not useful to me.
by jake s•
There is a lot of fluff in this course and at the same time it assumes that you have knowledge and skills that are not covered in this course or in the previous two (e.g. github). I'm really disappointed in the quality of this course--specifically at how vague many of the instructions were in the quiz questions and the final project-- and that most the time when explanations were asked for on the message board the professors just did some hand waving and said that figuring it out was part of the assignment. That isn't teaching (online or otherwise). And if your instructions aren't clear, you aren't doing the job of an instructor when you pass the buck and try to sell it as "part of the learning experience." I hope this fall off in quality isn't reflective of the rest of the courses in the data spec.
by Mariia D•
If you are wondering is it worth paying - the answer is "NO".
Course is badly outdated, lectures are useless and even do not help to complete quizzes. Too much of a real life - information is old or incomplete or wrong and you need to sort out dozens of additional sources looking for an answer.
I suppose that that the reason why we want to learn before going for real tasks is that it is much more productive to go step-by-step, using reliable instruments, and proceed to troubleshooting only with the good knowledge of working solutions.
This is not the case with this cource, here you need to troubleshoot from the very beginning. That is an exercise in frustration and googling, seriously.
I was going to take the entire specialization, but I changed my mind and stop now.
by Yusof A•
Horrible lectures which have not been updated even though the websites that are referenced may have changed and options on those websites for data required for the course may have changed.. for example, there is no way to download as excel using "download.file" a file that is only available as .csv since the excel option was removed from the time these lectures were made. I finished the first 2 courses and had high expectations from this one... started off well but in the middle of week 1, we realize this can be a very frustrating experience////well, the pdfs could have been revised and updated.. but this is probably the same material from 7 years ago with the same websites references from then. Worst course I have encountered on Coursera till date.
by Lindsay E M•
The first two courses in this specialization were good, but the third course, Getting and Cleaning Data, was honestly very disappointing. The lectures are extremely out of date (made in 2013, and it's already June 2020...), and a lot of the code in the lectures and examples no longer works correctly because of this. Beyond that, the "updates" posted by the mentors in the discussion forums are also out of date (2016) and have limited usefulness. This is a course that is meant to teach you how to acquire and clean data in the R program, and methods and technology from 7 years ago are not the standard that I expected - technology constantly changes and updates, and this course should reflect that (but clearly doesn't).
by Ash S•
So much of the material is out of date. As other people in the forums have mentioned, the course doesn't cover the necessary information needed to succeed and is also at a much higher level than listed (course says beginner level, but it's not). I have since switched to a different course and there are so many basic things that were explained that never were in this course. Even after taking a different intro level course, this course is still too difficult for me. This should at least be listed in the information so that people don't waste their time and money.
by Daniel G•
This course was the final straw for me on Coursera. I will not be continuing my subscription if this is the quality of instruction I can expect to receive, which is none. The disconnect between the lectured material and what is expected on the quizzes with no instruction in between is just mind-boggling. And to see that students have been complaining about these issues in the forums FOR YEARS and the instructors have done nothing to update or change their material has put me off entirely.
by Laura G M•
DON'T DO THIS COURSE!!!
THIS IS A SHAME OF COURSE, A LOSS OF TIME AND MONEY. THE TEACHERS ARE NOT PRESENT IN THE FORUM, THE ASSIGMENTS ARE IN A WHOLE DIFFERENT LEVEL OF DIFFICULTY AND YOU HAVE TO SEARCH A LOT THROUGH THE INTERNET TO FIND SOLUTIONS TO EACH QUETION. THE EXAMPLES ARE TOTALLY OUTDATED AND GIVE TONS OF BUGS AND IT'S SO FRUSTRATING TO SEARCH IN PROGRAMMING FORUMS AND FIND OTHER STUDENTS WITH THE SAME QUESTIONS, BECAUSE THE MATERIAL GIVE PREHISTORICAL GUIDES.
DEFINITELY A FRAUD.
by Paul Y•
Once again, the projects are way beyond the skill level or tasking in the previous lessons.
The lectures are ALL powerpoint; the instructor does not open R.
The lectures are based on R as it was 10-15 years ago, when it was 3.0. Nobody at Hopkins can update the lectures, or do an errata sheet?
I can't believe my company is spending money for this.