Resources for the Data Science course designed and taught by Richard Davies with Denes Csala, Charlie Meyrick and Emilien Valat at the University of Bristol.
Examples of projects produced by the 2021-22 cohort are here.
Week 1
data portfolio
In this course we extract data from web sites, often by building our own scrapers. To prepare for that, it is helpful to understand the basics of how the web works. In the first lecture and lab we introduce HTML, CSS and JavaScript as you build and style your first web site. You will also embed your first automated and interactive charts.
Skills and concepts: Text editors, HTML, CSS, GitHub.
Week 2
live data
This week we will build two live charts, embedding them in the web site you built in week 1. The first will run direct from data provided by an API, auto-updating itself daily. The second will run from your GitHub repository. We will discuss the strengths and weaknesses of these two approaches, and how you can use them in your project.
Skills and concepts: APIs, Javascript, JSON, Vega-Lite, Charts.js.
Week 3
api data
Many analysts access data by clicking download icons to get Excel or csv files. As data scientists we want to access data programmatically—without touching our mouse or keyboard—since this avoids error and repetitive tasks, means our work is transparent and verifiable, and makes time-saving automation possible. In our third class we access data from APIs, discussing the benefits, pitfalls and debugging.
Skills and concepts: APIs, CORS, JSON.
Week 4
scraping data
Lots of interesting and useful data is not provided by an API but is embedded in a website. This week we will build our first scrapers to extract data from websites and build our own data sets. We learn the art of inspecting a web site to find the data within it, and use this new skill to extract data from three different websites, comparing the results we get.
Skills and concepts: Python, BeautifulSoup, HTML, Stata.
Week 5
automating data
Repetition is dull, slow and a source of error. This is a problem since much of what we have learned so far: fetching data from an API, scraping a web site you will want to repeat many times. This week is devoted to loops, one of the most powerful tools in any coder’s arsenal. Our class will show how a loop takes you from a small dataset to the world of big data.
Loops. Layers. Python. JavaScript.
Week 6
reading week
There are no classes or office hours this week.
Relax.
Then work on your project!
Week 7
cleaning data
Data is only helpful when it is in a clean and useable form. This week we discuss how to get your dataset into shape for analysis. We focus on three unglamorous skills that are the foundation of data science: cleaning data; matching and merging datasets, and re-shaping data.
Data manipulation functions, Python, Stata, JavaScript.
Week 8
data learning
By this stage you have the tools to access data and loops that allow you to automate and repeat this. The result is large and interesting datasets. We now build tools to learn from this data. This week we begin to discuss Machine Learning (ML) and the difference between “supervised” and “unsupervised” learning, and the use of labelled and unlabelled data.
Machine learning. Python (PyTorch, TensorFlow)
Week 9
data patterns
We continue building our ML skills by discuss four common tasks that these tools can be used for: for supervised learning—classification and regression and for unsupervised—clustering and association. We apply these tools to example datasets and discuss how you could use them.
Machine learning. Python (PyTorch, TensorFlow)
Week 10
data stories
Our final week of analysis we discuss the way data can be used to prove or disprove a point. We re-cap on how to use moments of a distribution—the spread and range of data—on correlation, and on steps to establish causation. We discuss, with examples, ways to calculate and visualise the results of this in-depth analysis.
Machine learning. Python (PyTorch, TensorFlow)
Week 11
interactive data
As the course draws to a close we have the tools to define a research question, build and clean a complex data set and analyse it. In our final session we discuss how to make charts interactive in ways that help users draw their own stories and conclusions. We use large data sets to demonstrate this.
Interactives (filters, toggles, sliders), colour, tone and opacity, Vega-Lite.
Deadline day: Monday 9th January 2023.
Your project
We will discuss project ideas each week, and help you with code and data.
A reminder that the deadline for your DS project is Monday 9th January 2023.
Good luck!
Guidance on coursework, readings, further materal and office hours.
build
Coursework
Your project will present between 3 and 8 charts. These must be embedded in your site, hosted by GitHub pages. You must also briefly discuss four topics: (1) the aims of your project; (2) the data you used, how you accessed it, including notes on automation/replication; (3) challenges in data cleaning and/or analysis, and the tools you used to overcome them; (4) your conclusions. Each section must not exceed 200 words.
Get help
office hours
There are four office hours each week, at the following times:
There are no office hours during reading week (W6). The final slots are held in week 11.
Read
books, papers and sites
Some useful books and papers:
Watch
videos
Links to videos that will help you cover the material and with your project:
Resources
links
Tools and links to assist your Data Science project.