First things first,
- Top 10 DS courses # http://bigdata-madesimple.com/review-of-top-10-online-data-science-courses/
- http://brettromero.com/wordpress/data-science-a-kaggle-walkthrough-introduction/ Kaggle
- https://github.com/jmschrei/pomegranate Pomegrante
- Git: How to set up remote git branch # http://www.gitguys.com/topics/adding-and-removing-remote-branches/
Quick Short Cuts
Ipython Notes for learning
Lots of quick & interesting slides
- https://speakerdeck.com/jakevdp Statistics 4 Hackers
- https://www.youtube.com/watch?v=nCPf8zDJ0d0 Introduction of Deep Learning.
Data Scientist Workbench:
It’s a free all-in-one solution for people interested in performing data analysis. The Data Scientist Workbench includes:
- OpenRefine to clean up messy data.
- Jupyter notebooks supporting Python, R, and Scala (with access to Apache Spark for Big Data processing).
- Apache Zeppelin notebooks.
- RStudio in your browser.
QuickSlides on NLTP – Natural Language Text Processing
- https://www.cse.iitb.ac.in/~neelamadhavg09/docs/dependency_parsing.pdf # Articles on semantic text-parsing – dependency parsing.
Part 1 of this blog post series: Orientation
Part 2b: Ranking and regression metrics
Part 3: Validation and offline testing
Part 4: Hyperparameter tuning
Part 5: A/B testing
Tom Fawcett’s 2006 Pattern Recognition Letters paper on An Introduction to ROC Analysis.
Chapter 7 of Data Science for Business discusses the use of Expected Value as a useful classification metric, especially in cases of skewed data sets.
- Do we Need Hundreds of Classifiers to Solve Real World Classification Problems? Evaluate 179 classifiers arising from 17 families
- A Data Complexity Analysis of Comparative Advantages of Decision Forest Constructors
Note: This post was updated on April 16, 2015. Thanks to @aatallah for demystifying the origin of the name “ROC curve,” and to Joe McCarthy for the helpful references.
- http://www.tug.org/mactex/ # This is mac version of LaTeX. There is an extension called Beamer to try.