This course provides an introduction to the methods and tools of Data Science in a big data context. Delegates will be introduced to a variety of machine learning algorithms and how to use them in practical examples of real-world problems.
What you will learn
- An overview of the field of Data Science, with example use cases, and the skills required by its practitioners
- An overview of big data and an introduction to to the concepts of distributed computing
- An introduction to Python, the language of choice for many Data Scientists, including basic functionality and the most useful libraries for analysing and manipulating data
- An introduction to machine learning describing the different general approaches one can take when building a model
- Explanation and implementation, by way of example, of several of the most widely used machine learning algorithms
- Introduction to graph analysis
Data Science in a Big Data world
- Course overview and introductions
- What is big data? – The four Vs
- Big data in action
- Distributed computing
- Databases – SQL vs. NoSQL
- Data Science – What is Data Science?, What is a Data Scientist?, Data Science tools, Use Cases
- Data Protection and governance
- Why Python?
- Demo 1: Data types and functions
- Demo 2: Data analysis
- Lab 1: Data type manipulation and functions
- Lab 2: Data analysis
- Summarising data
- Data distributions
- Confidence intervals
- Correlations and similarity measures
- Simpson’s paradox
- Demo: Exploring UK weather data
Introduction to machine learning
- What is machine learning?
- Types of machine learning approaches – Supervised learning and unsupervised learning
- Predicting the airspeed velocity of an unladen swallow with linear regression- Linear regression explained, Under- and over-fitting, Cost function, Gradient descent, Demo
- Spam detection using natural language processing and logistic regression – What is classification?, Logic regression explained, Data preparation, Feature construction, Cost function, Training, testing and validation, Assessing model performance, The accuracy fallacy, Demo
- Other supervised learning methods – k-nearest neighbours, Support vector machines, Support vector regression, Naive Bayes classification
- Scaling supervised learning
- Grouping iris varieties using k-means clustering and principal component analysis – What is clustering?, Cluster analysis using k-means, Other types of clustering, Demo 1: k-means clustering, Dimensionality reduction using principal component analysis, Feature scaling, Demo 2: principal component analysis
- Scaling k-means clustering
Building a recommender system
- Recommender system explanation
- Types of recommender system
- Examples of recommender systems
- Building a movie recommender
- Improving your recommender – Dithering, Cross-recommendation
- Demo: Building a movie recommendation system
Social network analysis using graph theory
- What is a graph?
- Social networks
- Demo: Social network analysis using Gephi
The session will contain a variety of instructor demos and guided hands-on for the students to walk through to aid understanding and appreciation of the topics. There will be plenty of discussion and interactivity.
The course content can be customised to cover any specialised material you may require for your specific training needs.
This course can be offered as private on-site training hosted at your offices. For more information, please contact us at [email protected]
Related Training Courses
Apache Cassandra This is a fast-paced, vendor agnostic technical Apache Cassandra course that focuses on the key aspects of the technology for developers and system operations staff, covering core internal and distributed architecture fundamentals.
HDP Analyst: Apache Hbase Essentials This 2-day workshop introduces HBase basics, structure and operations in an intensely hands-on experience.
Apache Hadoop Essentials This course is designed to help attendees understand the concepts and benefits of Apache Hadoop and how it can help them meet their business goals.
Machine Learning with Apache Hadoop This course is designed to help attendees understand the high-level concepts and classifications of machine learning systems with a strong focus on building Recommender Systems.