3 – 4 个月
20 – 25 小时
理论知识 + 团队协作
本课程以Python为基础，深入讲解DS就业的各个维度。涉及众多Data Wrangling, Machine Learning and Big Data Tools, 例如 SciKit Learn, xgBoost, SPARK, 和 H2O with Python API等方面。分小组完成项目，锻炼团队协作能力和领导力。学员将学到项目中的实际理念如Version Control（GitHub）, Pair Programming, Test-Driven Development and Agile，也将学会如何在工作中继续学习，提高工作能力。
讲解python最基础的概念和语法，比如number, string, list和dictionary
- Using these basic syntaxes, we can write equations and functions. We will demonstrate that basic does not equal boring by coding localization module of self-driving car. We will then take a step further to show how packages such as Numpy can make data handling a lot easier.
- Starting from the very first lesson, we will follow the test-driven development process and form agile teams.
介绍packages, 例如pandas，来操作tabular data
Using pandas, we will learn how to calculate simple statistics and do sorting and sampling.
Each agile team will propose a “product”, and using pandas as the core packages to make it.
学习imputation, dummification, capping和 flooring等程序
We will visit one of the most popular websites for data scientists (Kaggle), download the list of passengers who boarded Titanic and prepare the data set for machine learning models.
In this lesson, using machine learning models, we will travel back in time, and take a peek of what happened during the tragic event of Titanic.
We will learn how to create features, fit machine learning models and evaluate model performance.
In addition, we will assemble a team to participate Kaggle competition! Have fun and win the price!
There are many advanced topics, which we will cover in future classes, such as “python as an object-oriented language”, “write your own package”, “SPARK”, etc.