Data Science in the Cloud

Data Science, machine learning, deep learning; these are the different driving forces of the current revolution which is changing the way businesses, companies and people make decisions, work and innovate. Data Science is triggering profound innovations in healthcare, finance, transportation, manufacturing and many other sectors.

Data science is evolving at lightning speed with a multiplication of approaches, tools and platforms. In parallel to script-based data science, think python and scikit-learn, the major cloud providers are developing platforms to power and facilitate the data scientist’s daily work and the implementation of data science projects in production. The Google Cloud Platform offers one of the most innovative and user friendly data science ecosystem.

My name is Alexis Perrier, I am a data science consultant and I’m very excited to be the instructor on this data science course on the google cloud platform. I teach data science in colleges, bootcamps and also for company training sessions. I recently wrote a couple of books on Machine Learning on the Google Cloud Platform and on AWS, both are with Packt Publishing.

I have a PhD in signal processing from Telecom-ParisTech followed by over 20 years in software engineering and 5 years ago I went back to data science. Although the data science ecosystem is fast evolving, it is deeply grounded in solid applied mathematics. Case in point, in the mid 90s, my signal processing PhD was on echo cancellation in hands-free phones and for that we were already working with the stochastic gradient algorithm which is widely used now days to train deep learning networks.

In 2015, Google released Tensorflow which is now the most popular deep learning framework. And in recent years Google has launched several high performing services across the whole data science workflow: from data storage with Google Storage and BigQuery, distributed computing with the Google Compute Engine, specific Deep Learning APIs for text, images, videos and speech and the Google Machine Learning Engine dedicated to training deep learning models.

Learning Outcomes

In this course, you will focus on the cloud infrastructure, data storage, and machine learning services of the google cloud platform.

At the end of this course you will be able to:

  • Launch your own Compute Engine instances and build a data science stack running Jupyter notebooks
  • Use advanced features of Google Storage such as synchronization, access control lists, signed urls and others
  • Host and query your data in BigQuery which is google’s data warehouse solution
  • Launch Datalab instances to work collaboratively in Jupyter notebooks using data from BigQuery and other sources
  • Use Google Deep learning APIs to extract information from text, images, videos and speech
  • Use Google ml-engine to rapidly train tensorflow models in the cloud without having to configure virtual instances

Throughout the course we will work with the Google SDK command lines in the terminal and develop simple python scripts to interact with the google services.

I’ve created this course with the following objectives:

  • To enable you to leverage the powerful google infrastructure for all your data science projects
  • To demonstrate some of the limitations of these google services
  • To make sure that the important implementation details stand out from the overall technical documentation
  • To reflect real-world scenarios as much as possible by using non-trivial datasets whenever possible

Course Requirements

This course is intended for data scientists of all levels who want to get a full understanding as well as hands-on practice of the Google Cloud Platform services for data science projects.

You should be familiar with the overall concepts in data science but more importantly have some minimal experience with Python scripting, sql queries, Shell commands. Nothing we do in this course requires deep knowledge of Shell or Python scripts, but being comfortable working from the terminal will help.

All the Shell and scripts in the videos are available on this github repository.


The Google Cloud Platform offers a very powerful set of services for data science. And in my personal experience it is quite a user-friendly environment to work with. Since most services are server-less you are able to leverage the amazing power of the google cloud infrastructure without the pain of setting up, launching and scaling servers manually or programmatically.

Google Cloud is a fast evolving ecosystem with periodic updates, and frequent alpha and beta releases of new features and services. Mastering these google services will definitely boost your data science knowledge and skills.

Please feel free to drop me a line if you have any question or comments.

You can leave a response, or trackback from your own site.

Leave a Reply


CCIE Bloggers