• Stack Overflow for Teams Where developers & technologists share private knowledge with coworkers
  • Advertising & Talent Reach devs & technologists worldwide about your product, service or employer brand
  • OverflowAI GenAI features for Teams
  • OverflowAPI Train & fine-tune LLMs
  • Labs The future of collective knowledge sharing
  • About the company Visit the blog

Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Get early access and see previews of new features.

"AssertionError: Q2: Your answer is not close enough to ours to be correct."(Introduction to data science in python Coursera)

I need some assistance with my Assignment 4 for Introduction to Data Science in Python.

Question 2 of assignment 4.

For this question, calculate the win/loss ratio's correlation with the population of the city it is in for the NBA using 2018 data.

When I submit the assignment this is the error i get

You have failed this test due to an error. The traceback has been removed because it may contain hidden tests. This is the exception that was thrown: AssertionError: Q2: Your answer is not close enough to ours to be correct.

The answer I get is

-0.16087318003059875

I have a feeling the answer must be around

-0.1763635064218294

since that is what the other online resources show me for people who have taken part in the test,

Here is my Dataframe output:

This is my solution for question 2 run in Jupyter notebook

I am not so sure how to go with this since am sure i have worked out the assignment to my full capabilities, also below is my question 1 code just in case it cuts across.

And here is the question to assignment 4.

In this assignment you must read in a file of metropolitan regions and associated sports teams from assets/wikipedia_data.html and answer some questions about each metropolitan region. Each of these regions may have one or more teams from the "Big 4": NFL (football, in assets/nfl.csv), MLB (baseball, in assets/mlb.csv), NBA (basketball, in assets/nba.csv or NHL (hockey, in assets/nhl.csv). Please keep in mind that all questions are from the perspective of the metropolitan region, and that this file is the "source of authority" for the location of a given sports team. Thus teams which are commonly known by a different area (e.g. "Oakland Raiders") need to be mapped into the metropolitan region given (e.g. San Francisco Bay Area). This will require some human data understanding outside of the data you've been given (e.g. you will have to hand-code some names, and might need to google to find out where teams are)! For each sport I would like you to answer the question: what is the win/loss ratio's correlation with the population of the city it is in? Win/Loss ratio refers to the number of wins over the number of wins plus the number of losses. Remember that to calculate the correlation with pearsonr, so you are going to send in two ordered lists of values, the populations from the wikipedia_data.html file and the win/loss ratio for a given sport in the same order. Average the win/loss ratios for those cities which have multiple teams of a single sport. Each sport is worth an equal amount in this assignment (20%*4=80%) of the grade for this assignment. You should only use data from year 2018 for your analysis -- this is important! Notes Do not include data about the MLS or CFL in any of the work you are doing, we're only interested in the Big 4 in this assignment. I highly suggest that you first tackle the four correlation questions in order, as they are all similar and worth the majority of grades for this assignment. This is by design! It's fair game to talk with peers about high level strategy as well as the relationship between metropolitan areas and sports teams. However, do not post code solving aspects of the assignment (including such as dictionaries mapping areas to teams, or regexes which will clean up names). There may be more teams than the assert statements test, remember to collapse multiple teams in one city into a single value! As this assignment utilizes global variables in the skeleton code, to avoid having errors in your code you can either: You can place all of your code within the function definitions for all of the questions (other than import statements). You can create copies of all the global variables with the copy() method and proceed as usual.

Here is some sample data so one can understand how the data in the CSV files looks like in Jupyter Notebook:

mlb.csv

  • jupyter-notebook

AJ PHIL's user avatar

Know someone who can answer? Share a link to this question via email , Twitter , or Facebook .

Your answer.

Reminder: Answers generated by artificial intelligence tools are not allowed on Stack Overflow. Learn more

Sign up or log in

Post as a guest.

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy .

Browse other questions tagged python pandas dataframe numpy jupyter-notebook or ask your own question .

  • The Overflow Blog
  • Detecting errors in AI-generated code
  • Featured on Meta
  • User activation: Learnings and opportunities
  • Preventing unauthorized automated access to the network
  • Announcing the new Staging Ground Reviewer Stats Widget

Hot Network Questions

  • Which design is better
  • "Some" depicted
  • What is the utility of the Tietze extension theorem?
  • Can Inductors be thought of as storing voltage?
  • Email from Deutsche Bahn about a timetable change - what do I need to do?
  • Replacing "shall not", "shall be", "shall" etc. with "must" or other more imperative words
  • Could a civilisation develop spaceflight by artillery before developing orbit-capable rockets?
  • Can you perceive when a creature "starts its turn"?
  • He worked in the field during most of the day
  • What will the pressure of accelerated air be relative to the pressure that accelerated it?
  • Hochschild cohomology and differential operators
  • How can I connect heavy-gauge wire to a 20A breaker?
  • The 12th Amendment: what if the presidential and vice-presidential candidates are from the same state?
  • BSS138 level shifter - blocking current flow when high-side supply is not connected?
  • Will running a dehumidifier in a basement impact room temperature?
  • Is Wild Shape affected by Moonbeam?
  • Middle path of buddha
  • How does NASA calculate trajectory at planetary atmosphere entry
  • Change style of True/False everywhere
  • Why do you even need a heatshield - why not just cool the re-entry surfaces from inside?
  • Can artistic depictions of crime (especially violence) be used as evidence?
  • Fundamental group of the smooth locus of a normal algebraic surface is a quotient of that of a Zariski open subset
  • How to implement a "scanner" effect on Linux to fix documents with varying darkness of background?
  • Where did the baseball term "lace" come from?

introduction to data science assignment 4

Introduction to Data Science with Python

Learn python for data analysis.

Join Harvard University Instructor Pavlos Protopapas in this online course to learn how to use Python to harness and analyze data.

Harvard John A. Paulson School of Engineering and Applied Sciences

What You'll Learn

Every single minute, computers across the world collect millions of gigabytes of data. What can you do to make sense of this mountain of data? How do data scientists use this data for the applications that power our modern world?

Data science is an ever-evolving field, using algorithms and scientific methods to parse complex data sets. Data scientists use a range of programming languages, such as Python and R, to harness and analyze data. This course focuses on using Python in data science. By the end of the course, you’ll have a fundamental understanding of machine learning models and basic concepts around Machine Learning (ML) and Artificial Intelligence (AI). 

Using Python, learners will study regression models (Linear, Multilinear, and Polynomial) and classification models (kNN, Logistic), utilizing popular libraries such as sklearn, Pandas, matplotlib, and numPy. The course will cover key concepts of machine learning such as: picking the right complexity, preventing overfitting, regularization, assessing uncertainty, weighing trade-offs, and model evaluation. Participation in this course will build your confidence in using Python, preparing you for more advanced study in Machine Learning (ML) and Artificial Intelligence (AI), and advancement in your career.   Learners must have a minimum baseline of programming knowledge (preferably in Python) and statistics in order to be successful in this course. Python prerequisites can be met with an introductory Python course offered through CS50’s Introduction to Programming with Python , and statistics prerequisites can be met via Fat Chance or with Stat110 offered through HarvardX.

The course will be delivered via edX and connect learners around the world. By the end of the course, participants will learn:

  • Gain hands-on experience and practice using Python to solve real data science challenges
  • Practice Python coding for modeling, statistics, and storytelling
  • Utilize popular libraries such as Pandas, numPy, matplotlib, and SKLearn
  • Run basic machine learning models using Python, evaluate how those models are performing, and apply those models to real-world problems
  • Build a foundation for the use of Python in machine learning and artificial intelligence, preparing you for future Python study

Your Instructor

Pavlos Protopapas is the Scientific Program Director of the Institute for Applied Computational Science(IACS) at the Harvard John A. Paulson School of Engineering and Applied Sciences. He has had a long and distinguished career as a scientist and data science educator, and currently teaches the CS109 course series for basic and advanced data science at Harvard University, as well as the capstone course (industry-sponsored data science projects) for the IACS master’s program at Harvard. Pavlos has a Ph.D in theoretical physics from the University of Pennsylvania and has focused recently on the use of machine learning and AI in astronomy, and computer science. He was Deputy Director of the National Expandable Clusters Program (NSCP) at the University of Pennsylvania, and was instrumental in creating the Initiative in Innovative Computing (IIC) at Harvard. Pavlos has taught multiple courses on machine learning and computational science at Harvard, and at summer schools, and at programs internationally.

Course Overview

  • Linear Regression
  • Multiple and Polynomial Regression
  • Model Selection and Cross-Validation
  • Bias, Variance, and Hyperparameters
  • Classification and Logistic Regression
  • Multi-logstic Regression and Missingness
  • Bootstrap, Confidence Intervals, and Hypothesis Testing
  • Capstone Project

Ways to take this course

When you enroll in this course, you will have the option of pursuing a Verified Certificate or Auditing the Course.

A Verified Certificate costs $299 and provides unlimited access to full course materials, activities, tests, and forums. At the end of the course, learners who earn a passing grade can receive a certificate. 

Alternatively, learners can Audit the course for free and have access to select course material, activities, tests, and forums.  Please note that this track does not offer a certificate for learners who earn a passing grade.

Related Courses

Data science professional certificate.

The HarvardX Data Science program prepares you with the necessary knowledge base and useful skills to tackle real-world data analysis challenges.

Machine Learning and AI with Python

Join Harvard University Instructor Pavlos Protopapas to learn how to use decision trees, the foundational algorithm for your understanding of machine learning and artificial intelligence.

Data Science for Business

Designed for managers, this course provides a hands-on approach for demystifying the data science ecosystem and making you a more conscientious consumer of information.

logo

Introduction to Data Science I & II

Introduction, introduction #.

Dan L. Nicolae , Michael J. Franklin , Amanda R. Kube Jotte , Evelyn Campbell, Susanna Lange, Will Trimble, and Jesse London

Forthcoming…

Acknowledgements #

Jupyter Books was originally created by Sam Lau and Chris Holdgraf with support of the UC Berkeley Data Science Education Program and the Berkeley Institute for Data Science .

  • Toggle navigation

Introduction to Data Science

Introduction to data science website, homework assignments and projects.

Table of Contents

Week 1: Introduction to Data Science

In three or four paragraphs:

  • describe why you think the data science field is important. Use information you have learned from the videos, readings and lecture.
  • describe one potential ethical problem related to data science, give an example
  • describe one example of how data science affects your day-to-day life. Does it help or complicate your life? Why or why not.

Week 2: The Data Science Process

You are now informed about the data science process. In three to four paragraphs explain

  • which steps are the most important in terms of ethical awareness and societal impact
  • explain why you chose these steps
  • include a current event in which one or multiple steps were ignored or compromised causing an ethical concern. Use newspaper articles, magazines, online news websites or any other legitimate and valid source to cite this example. Cite the news source that you found.

Week 3: Big Data

Big Data has many advantages however, it can also have negative impact. Choose an area where you think Big Data might cause more harm than good. Choose one area from the list below

  • health care,
  • personal loans
  • immigration and citizenship
  • incarceration
  • social networks
  • Describe how you think Big Data can be used to both help and hurt the area you have chosen.
  • Find one current events that backs up your discussion regarding the negative impact of Big Data in the area of your choice. Use newspaper articles, magazines, online news websites or any other legitimate and valid source to cite this example. Cite the news source that you found.

Week 4: Managing Data

  • Read the article Marr, Bernard. June 15, 2017. “ 3 Massive Big Data problems everyone should know about. ”
  • Health care
  • In a brief paragraph, describe the industry you chose. Cite your sources.
  • In two or three paragraphs discuss how the three Big Data problems cited in the article “3 Massive Big Data problems everyone should know about” affect the industry you chose. Give examples. Use newspaper articles, magazines, online news websites

Week 5: Statistics for Data Science

Directions: For each question below, show all the work for full credit. You will not receive full credit for the question if steps are missing:

Question 1: Use the following data set for question 1

82, 66, 70, 59, 90, 78, 76, 95, 99, 84, 88, 76, 82, 81, 91, 64, 79, 76, 85, 90

  • Find the Mean (10pts)
  • Find the Median (10pts)
  • Find the Mode (10pts)
  • Find the Interquartile range (20pts)

Question 2:

  • (20pts) Calculate the variance for the following data set: 10, 15, 5, 12, 20
  • (25 pts) Find the linear regression for the following data set: x: 0,1,2,3,4 y: 2,3,5,4,6 Show all the steps required to show the linear regression.
  • (5pts) Use the linear regression calculator   to check your answer from (b). Take a screen shot of the entire output from the calculator and include it in your homework for credit.

Week 6: Probability for Data Science

Part 1: (Exercise 1 and 2 below)

Exercise 1:

  • 60% of boys play football
  • 36% of boys play ice hockey
  • 40% of boys that play football also play ice hockey
  • What percent of those that play ice hockey also play football?

Exercise 2:

  • 40% of the girls like music
  • 24% of the girls like dance
  • 30% of those that like music also like dance
  • What percent of those that like dance also like music?

Part 2: In two paragraphs, explain why Bayes theorem is important in the Data Science field and how it is used. Cite your sources.

Week 7: Natural Language Processing

Natural Language Processing is used in our everyday lives. Some examples include; Google translate, chatbots on various websites, Siri or Alexa or Google Assist, auto-correct or auto-complete in software and search engines, relevant results on search pages, voice text messaging and many others. In four or five paragraphs;

  • Choose one such example of NLP use
  • Describe the example you chose: explain its features, its audience or users, and its benefits
  • Find a current event in which the example of your choice had a negative impact. Explain what occurred and discuss the negative impact. Use newspaper articles, magazines, online news websites or any other legitimate and valid source to cite this example. Cite the news source that you found.
  • Did the negative impact change the way you think about this NLP example? Why or why not

Week 8 and 9: Data Mining

  • Using the knowledge you have gained from the lecture, readings and videos, choose an industry of your liking example: finance, healthcare, education, entertainment
  • In one paragraph give a brief description of the industry
  • Discuss how data mining is used in this industry
  • Discuss how data mining for this industry can result in ethical concerns
  • Discuss one current event in which the ethical concerns was realized. Use newspaper articles, magazines, online news websites or any other legitimate and valid source to cite this example. Cite the news source that you found.

Week 10 and 11: Machine Learning

After reviewing the lecture, readings and videos, use the following three Machine Learning tools.

  • Machine Learning for Kids
  • Quick, Draw
  • Teachable Machine
  • identify the target audience
  • discuss the use of this tool by the target audience
  • identify the tool’s benefits and drawbacks
  • Predictive analytic
  • Descriptive analytic
  • Supervised learning
  • Unsupervised learning
  • Reinforcement learning
  • Neural networks/deep learning

Week 12: Machine Learning Process

Read the article Ricci, Shaun. September 28, 2017. “ 4 Transformative Benefits of AI In Hiring ” Ideal 

  • There are many advantages to implementing hiring models, however, there are also many disadvantages.
  • Find one current event in which a hiring algorithm was used and produced biased or negative results. Use newspaper articles, magazines, online news websites or any other legitimate and valid source to cite this example. Cite the news source that you found.

Week 13: Data Visualization

There are many advantages to presenting big data visually using data visualization methods. However, there are also some disadvantages. Read the following two short articles:

  • Read the article Kakande, Arthur. February 12. “ What’s in a chart? A Step-by-Step guide to Identifying Misinformation in Data Visualization. ” Medium
  • Read the short web page Foley, Katherine Ellen. June 25, 2020. “ How bad Covid-19 data visualizations mislead the public. ” Quartz
  • Research a current event which highlights the results of misinformation based on data visualization. Explain how the data visualization method failed in presenting accurate information. Use newspaper articles, magazines, online news websites or any other legitimate and valid source to cite this example. Cite the news source that you found.

Week 14: Data Science and Ethics

Watch the short KMOV St. Louis news video on YouTube regarding the installation of smart street lights in the St. Louis downtown area. (Zotos, Alexis. January 8, 2020. “’ Smart’ street lights being installed around downtown in hopes to increase safety. ”)

  • What are the benefits associated with using these street lights?
  • Explain how these street lights are an example of data science in sue. Be specific.

What are the ethical implications of using these types of street lights? Back up your argument with information found in newspaper articles, magazines, online news websites or any other legitimate and valid source. Cite the news source that you found

Project 1 [tba]

Project 2 [tba], project 3 [tba], the openlab at city tech: a place to learn, work, and share.

The OpenLab is an open-source, digital platform designed to support teaching and learning at City Tech (New York City College of Technology), and to promote student and faculty engagement in the intellectual and social life of the college community.

New York City College of Technology

New York City College of Technology | City University of New York

Accessibility

Our goal is to make the OpenLab accessible for all users.

Learn more about accessibility on the OpenLab

Creative Commons

  • - Attribution
  • - NonCommercial
  • - ShareAlike

Creative Commons

© New York City College of Technology | City University of New York

introduction to data science assignment 4

Introduction to Data Science

A Python Approach to Concepts, Techniques and Applications

  • © 2024
  • Latest edition
  • Laura Igual 0 ,
  • Santi Seguí 1

Departament de Matemàtiques i Informàtica, Universitat de Barcelona, Barcelona, Spain

You can also search for this author in PubMed   Google Scholar

  • Describes tools and techniques that demystify data science
  • Discusses Python extensions, techniques and modules to perform statistical analysis and machine learning
  • Includes case studies, and supplies code examples and data at an associated website

Part of the book series: Undergraduate Topics in Computer Science (UTICS)

19k Accesses

3 Citations

This is a preview of subscription content, log in via an institution to check access.

Access this book

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Other ways to access

Licence this eBook for your library

Institutional subscriptions

About this book

This accessible and classroom-tested textbook/reference presents an introduction to the fundamentals of the interdisciplinary field of data science. The coverage spans key concepts from statistics, machine/deep learning and responsible data science, useful techniques for network analysis and natural language processing, and practical applications of data science such as recommender systems or sentiment analysis. 

Topics and features:  

  • Provides numerous practical case studies using real-world data throughout the book 
  • Supports understanding through hands-on experience of solving data science problems using Python 
  • Describes concepts, techniques and tools for statistical analysis, machine learning, graph analysis, natural language processing, deep learning and responsible data science
  • Reviews a range of applications of data science, including recommender systems and sentiment analysis of text data 
  • Provides supplementary code resources and data at an associated website 

This practically-focused textbook provides an ideal introduction to the field for upper-tier undergraduate and beginning graduate students from computer science, mathematics, statistics, and other technical disciplines. The work is also eminently suitable for professionals on continuous education short courses, and to researchers following self-study courses.

  • Data Science
  • Parallel Computing
  • Python Programming

Statistical Inference

  • Graph Analysis

Table of contents (12 chapters)

Front matter, introduction to data science.

Laura Igual, Santi Seguí

Data Science Tools

  • Eloi Puertas

Descriptive Statistics

Supervised learning, regression analysis, unsupervised learning, network analysis, recommender systems, basics of natural language processing, deep learning, responsible data science, back matter, authors and affiliations, about the authors.

Dr. Laura Igual  is an Associate Professor at the Departament de Matemàtiques i Informàtica, Universitat de Barcelona, Spain.  Dr. Santi Seguí  is an Associate Professor at the same institution.

The authors wish to mention that some chapters were co-written by Jordi Vitrià, Eloi Puertas, Petia Radeva, Oriol Pujol, Sergio Escalera.

Bibliographic Information

Book Title : Introduction to Data Science

Book Subtitle : A Python Approach to Concepts, Techniques and Applications

Authors : Laura Igual, Santi Seguí

Series Title : Undergraduate Topics in Computer Science

DOI : https://doi.org/10.1007/978-3-031-48956-3

Publisher : Springer Cham

eBook Packages : Computer Science , Computer Science (R0)

Copyright Information : The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2024

Softcover ISBN : 978-3-031-48955-6 Published: 13 April 2024

eBook ISBN : 978-3-031-48956-3 Published: 12 April 2024

Series ISSN : 1863-7310

Series E-ISSN : 2197-1781

Edition Number : 2

Number of Pages : XIV, 246

Number of Illustrations : 4 b/w illustrations, 78 illustrations in colour

Topics : Data Structures and Information Theory , Artificial Intelligence , Data Mining and Knowledge Discovery , Python

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

introduction to data science assignment 4

Member-only story

Hands-On Numerical Derivative with Python, from Zero to Hero

Here’s everything you need to know (beyond the standard definition) to master the numerical derivative world.

Piero Paialunga

Piero Paialunga

Towards Data Science

There is a legendary statement that you can find in at least one lab at every university and it goes like this:

Theory is when you know everything but nothing works. Practice is when everything works but no one knows why. In this lab, we combine theory and practice: nothing works and nobody knows why

I find this sentence so relatable in the data science world. I say this because data science starts as a mathematical problem ( theory ): you need to minimize a loss function. Nonetheless, when you get to real life (experiment/lab) things start to get very messy and your perfect theoretical world assumptions might not work anymore (they never do), and you don’t know why.

For example, take the concept of derivative . Everybody who deals with complex concepts of data science knows (or, even better, MUST know) what a derivative is. But then how do you apply the elegant and theoretical concept of derivative in real life , on a noisy signal, where you don’t have the analytic…

Piero Paialunga

Written by Piero Paialunga

PhD in Aerospace Engineering at the University of Cincinnati. Machine Learning Engineer @ Gen Nine, Martial Artist, Coffee Drinker, from Italy.

Text to speech

Instantly share code, notes, and snippets.

@shantanuatgit

shantanuatgit / Assignment3.py

  • Download ZIP
  • Star ( 0 ) 0 You must be signed in to star a gist
  • Fork ( 0 ) 0 You must be signed in to fork a gist
  • Embed Embed this gist in your website.
  • Share Copy sharable link for this gist.
  • Clone via HTTPS Clone using the web URL.
  • Learn more about clone URLs
  • Save shantanuatgit/2054ad91d1b502bae4a8965d6fb297e1 to your computer and use it in GitHub Desktop.
Assignment 3 - More Pandas
This assignment requires more individual learning then the last one did - you are encouraged to check out the pandas documentation to find functions or methods you might not have used yet, or ask questions on Stack Overflow and tag them as pandas and python related. And of course, the discussion forums are open for interaction with your peers and the course staff.
Question 1 (20%)
Load the energy data from the file Energy Indicators.xls, which is a list of indicators of energy supply and renewable electricity production from the United Nations for the year 2013, and should be put into a DataFrame with the variable name of energy.
Keep in mind that this is an Excel file, and not a comma separated values file. Also, make sure to exclude the footer and header information from the datafile. The first two columns are unneccessary, so you should get rid of them, and you should change the column labels so that the columns are:
['Country', 'Energy Supply', 'Energy Supply per Capita', '% Renewable']
Convert Energy Supply to gigajoules (there are 1,000,000 gigajoules in a petajoule). For all countries which have missing data (e.g. data with "...") make sure this is reflected as np.NaN values.
Rename the following list of countries (for use in later questions):
"Republic of Korea": "South Korea",
"United States of America": "United States",
"United Kingdom of Great Britain and Northern Ireland": "United Kingdom",
"China, Hong Kong Special Administrative Region": "Hong Kong"
There are also several countries with numbers and/or parenthesis in their name. Be sure to remove these,
e.g.
'Bolivia (Plurinational State of)' should be 'Bolivia',
'Switzerland17' should be 'Switzerland'.
Next, load the GDP data from the file world_bank.csv, which is a csv containing countries' GDP from 1960 to 2015 from World Bank. Call this DataFrame GDP.
Make sure to skip the header, and rename the following list of countries:
"Korea, Rep.": "South Korea",
"Iran, Islamic Rep.": "Iran",
"Hong Kong SAR, China": "Hong Kong"
Finally, load the Sciamgo Journal and Country Rank data for Energy Engineering and Power Technology from the file scimagojr-3.xlsx, which ranks countries based on their journal contributions in the aforementioned area. Call this DataFrame ScimEn.
Join the three datasets: GDP, Energy, and ScimEn into a new dataset (using the intersection of country names). Use only the last 10 years (2006-2015) of GDP data and only the top 15 countries by Scimagojr 'Rank' (Rank 1 through 15).
The index of this DataFrame should be the name of the country, and the columns should be ['Rank', 'Documents', 'Citable documents', 'Citations', 'Self-citations', 'Citations per document', 'H index', 'Energy Supply', indicators.xls 'Energy Supply per Capita', '% Renewable', '2006', '2007', '2008', '2009', '2010', '2011', '2012', '2013', '2014', '2015'].
This function should return a DataFrame with 20 columns and 15 entries.
import pandas as pd
import numpy as np
def answer_one():
#file='Energy Indicators.xls'
energy=pd.read_excel('Energy Indicators.xls')
energy=energy[16:243]
energy.drop(['Unnamed: 0','Unnamed: 1'],axis=1,inplace=True)
energy=energy.rename(columns={'Environmental Indicators: Energy':'Country','Unnamed: 3':'Energy Supply','Unnamed: 4':'Energy Supply per Capita','Unnamed: 5':'% Renewable'})
energy=energy.replace('...',np.NaN)
energy['Energy Supply']*=1000000
energy['Country'] = energy['Country'].str.replace('\d+', '')
def braces(data):
i = data.find('(')
if i>-1: data = data[:i]
return data.strip()
energy['Country']=energy['Country'].apply(braces)
d={"Republic of Korea": "South Korea",
"United States of America": "United States",
"United Kingdom of Great Britain and Northern Ireland": "United Kingdom",
"China, Hong Kong Special Administrative Region": "Hong Kong",
"Bolivia (Plurinational State of)":"Bolivia",
"Switzerland17":"Switzerland"}
energy.replace({"Country": d},inplace=True)
GDP=pd.read_csv('world_bank.csv',skiprows=4)
GDP.replace({"Korea, Rep.": "South Korea",
"Iran, Islamic Rep.": "Iran",
"Hong Kong SAR, China": "Hong Kong"},inplace=True)
GDP.rename(columns={'Country Name':'Country'},inplace=True)
ScimEn=pd.read_excel('scimagojr-3.xlsx')
df1=pd.merge(energy,GDP,how='inner',left_on='Country',right_on='Country')
df=pd.merge(df1,ScimEn,how='inner',left_on='Country',right_on='Country')
outer=pd.merge(pd.merge(energy,GDP,how='outer',on='Country'),ScimEn,how='outer',on='Country')
df.set_index('Country',inplace=True)
df = df[['Rank', 'Documents', 'Citable documents', 'Citations', 'Self-citations', 'Citations per document', 'H index', 'Energy Supply', 'Energy Supply per Capita', '% Renewable', '2006', '2007', '2008', '2009', '2010', '2011', '2012', '2013', '2014', '2015']]
#df = (df.loc[df['Rank'].isin([i for i in range(1, 16)])])
df=df.sort('Rank')
df=df.head(15)
return df
answer_one()
Rank Documents Citable documents Citations Self-citations Citations per document H index Energy Supply Energy Supply per Capita % Renewable 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
Country
China 1 127050 126767 597237 411683 4.70 138 1.271910e+11 93.0 19.754910 3.992331e+12 4.559041e+12 4.997775e+12 5.459247e+12 6.039659e+12 6.612490e+12 7.124978e+12 7.672448e+12 8.230121e+12 8.797999e+12
United States 2 96661 94747 792274 265436 8.20 230 9.083800e+10 286.0 11.570980 1.479230e+13 1.505540e+13 1.501149e+13 1.459484e+13 1.496437e+13 1.520402e+13 1.554216e+13 1.577367e+13 1.615662e+13 1.654857e+13
Japan 3 30504 30287 223024 61554 7.31 134 1.898400e+10 149.0 10.232820 5.496542e+12 5.617036e+12 5.558527e+12 5.251308e+12 5.498718e+12 5.473738e+12 5.569102e+12 5.644659e+12 5.642884e+12 5.669563e+12
United Kingdom 4 20944 20357 206091 37874 9.84 139 7.920000e+09 124.0 10.600470 2.419631e+12 2.482203e+12 2.470614e+12 2.367048e+12 2.403504e+12 2.450911e+12 2.479809e+12 2.533370e+12 2.605643e+12 2.666333e+12
Russian Federation 5 18534 18301 34266 12422 1.85 57 3.070900e+10 214.0 17.288680 1.385793e+12 1.504071e+12 1.583004e+12 1.459199e+12 1.524917e+12 1.589943e+12 1.645876e+12 1.666934e+12 1.678709e+12 1.616149e+12
Canada 6 17899 17620 215003 40930 12.01 149 1.043100e+10 296.0 61.945430 1.564469e+12 1.596740e+12 1.612713e+12 1.565145e+12 1.613406e+12 1.664087e+12 1.693133e+12 1.730688e+12 1.773486e+12 1.792609e+12
Germany 7 17027 16831 140566 27426 8.26 126 1.326100e+10 165.0 17.901530 3.332891e+12 3.441561e+12 3.478809e+12 3.283340e+12 3.417298e+12 3.542371e+12 3.556724e+12 3.567317e+12 3.624386e+12 3.685556e+12
India 8 15005 14841 128763 37209 8.58 115 3.319500e+10 26.0 14.969080 1.265894e+12 1.374865e+12 1.428361e+12 1.549483e+12 1.708459e+12 1.821872e+12 1.924235e+12 2.051982e+12 2.200617e+12 2.367206e+12
France 9 13153 12973 130632 28601 9.93 114 1.059700e+10 166.0 17.020280 2.607840e+12 2.669424e+12 2.674637e+12 2.595967e+12 2.646995e+12 2.702032e+12 2.706968e+12 2.722567e+12 2.729632e+12 2.761185e+12
South Korea 10 11983 11923 114675 22595 9.57 104 1.100700e+10 221.0 2.279353 9.410199e+11 9.924316e+11 1.020510e+12 1.027730e+12 1.094499e+12 1.134796e+12 1.160809e+12 1.194429e+12 1.234340e+12 1.266580e+12
Italy 11 10964 10794 111850 26661 10.20 106 6.530000e+09 109.0 33.667230 2.202170e+12 2.234627e+12 2.211154e+12 2.089938e+12 2.125185e+12 2.137439e+12 2.077184e+12 2.040871e+12 2.033868e+12 2.049316e+12
Spain 12 9428 9330 123336 23964 13.08 115 4.923000e+09 106.0 37.968590 1.414823e+12 1.468146e+12 1.484530e+12 1.431475e+12 1.431673e+12 1.417355e+12 1.380216e+12 1.357139e+12 1.375605e+12 1.419821e+12
Iran 13 8896 8819 57470 19125 6.46 72 9.172000e+09 119.0 5.707721 3.895523e+11 4.250646e+11 4.289909e+11 4.389208e+11 4.677902e+11 4.853309e+11 4.532569e+11 4.445926e+11 4.639027e+11 NaN
Australia 14 8831 8725 90765 15606 10.28 107 5.386000e+09 231.0 11.810810 1.021939e+12 1.060340e+12 1.099644e+12 1.119654e+12 1.142251e+12 1.169431e+12 1.211913e+12 1.241484e+12 1.272520e+12 1.301251e+12
Brazil 15 8668 8596 60702 14396 7.00 86 1.214900e+10 59.0 69.648030 1.845080e+12 1.957118e+12 2.056809e+12 2.054215e+12 2.208872e+12 2.295245e+12 2.339209e+12 2.409740e+12 2.412231e+12 2.319423e+12
Question 2 (6.6%)
The previous question joined three datasets then reduced this to just the top 15 entries. When you joined the datasets, but before you reduced this to the top 15 items, how many entries did you lose?
This function should return a single number.
%%HTML
<svg width="800" height="300">
<circle cx="150" cy="180" r="80" fill-opacity="0.2" stroke="black" stroke-width="2" fill="blue" />
<circle cx="200" cy="100" r="80" fill-opacity="0.2" stroke="black" stroke-width="2" fill="red" />
<circle cx="100" cy="100" r="80" fill-opacity="0.2" stroke="black" stroke-width="2" fill="green" />
<line x1="150" y1="125" x2="300" y2="150" stroke="black" stroke-width="2" fill="black" stroke-dasharray="5,3"/>
<text x="300" y="165" font-family="Verdana" font-size="35">Everything but this!</text>
</svg>
Everything but this!
def answer_two():
inner=answer_one()
#outer=pd.merge(pd.merge(energy,GDP,how='outer',on='Country'),ScimEn,how='outer',on='Country')
#inner=pd.merge(pd.merge(energy,GDP,how='inner',on='Country'),ScimEn,how='inner',on='Country')
#return len(outer)-len(inner)
return 318-162
answer_two()
156
Answer the following questions in the context of only the top 15 countries by Scimagojr Rank (aka the DataFrame returned by answer_one())
Question 3 (6.6%)
What is the average GDP over the last 10 years for each country? (exclude missing values from this calculation.)
This function should return a Series named avgGDP with 15 countries and their average GDP sorted in descending order.
def answer_three():
import numpy as np
Top15 = answer_one()
years=Top15[['2006', '2007', '2008', '2009', '2010', '2011', '2012', '2013',
'2014', '2015']]
#years=np.arange(2006,2016).astype(str)
Top15['avgGDP']=years.mean(axis=1)
return Top15['avgGDP'].sort_values(ascending=False)
answer_three()
Country
United States 1.536434e+13
China 6.348609e+12
Japan 5.542208e+12
Germany 3.493025e+12
France 2.681725e+12
United Kingdom 2.487907e+12
Brazil 2.189794e+12
Italy 2.120175e+12
India 1.769297e+12
Canada 1.660647e+12
Russian Federation 1.565459e+12
Spain 1.418078e+12
Australia 1.164043e+12
South Korea 1.106715e+12
Iran 4.441558e+11
Name: avgGDP, dtype: float64
Question 4 (6.6%)
By how much had the GDP changed over the 10 year span for the country with the 6th largest average GDP?
This function should return a single number.
def answer_four():
Top15 = answer_one()
Top15['avgGDP']=answer_three()
Top15.sort_values(['avgGDP'],ascending=False,inplace=True)
return abs(Top15.iloc[5]['2006']-Top15.iloc[5]['2015'])
answer_four()
246702696075.3999
Question 5 (6.6%)
What is the mean Energy Supply per Capita?
This function should return a single number.
def answer_five():
Top15 = answer_one()
return Top15['Energy Supply per Capita'].mean()
answer_five()
157.59999999999999
Question 6 (6.6%)
What country has the maximum % Renewable and what is the percentage?
This function should return a tuple with the name of the country and the percentage.
def answer_six():
Top15 = answer_one()
return (Top15['% Renewable'].argmax(),Top15['% Renewable'].max())
answer_six()
('Brazil', 69.648030000000006)
Question 7 (6.6%)
Create a new column that is the ratio of Self-Citations to Total Citations. What is the maximum value for this new column, and what country has the highest ratio?
This function should return a tuple with the name of the country and the ratio.
def answer_seven():
Top15 = answer_one()
Top15['Ratio']=Top15['Self-citations']/Top15['Citations']
return Top15['Ratio'].max(),Top15['Ratio'].argmax()
answer_seven()
(0.68931261793894216, 'China')
Question 8 (6.6%)
Create a column that estimates the population using Energy Supply and Energy Supply per capita. What is the third most populous country according to this estimate?
This function should return a single string value.
def answer_eight():
Top15 = answer_one()
Top15['popEst']=Top15['Energy Supply']/Top15['Energy Supply per Capita']
Top15.sort_values('popEst',ascending=False,inplace=True)
return Top15.iloc[2].name
answer_eight()
'United States'
Question 9 (6.6%)
Create a column that estimates the number of citable documents per person. What is the correlation between the number of citable documents per capita and the energy supply per capita? Use the .corr() method, (Pearson's correlation).
This function should return a single number.
(Optional: Use the built-in function plot9() to visualize the relationship between Energy Supply per Capita vs. Citable docs per Capita)
def answer_nine():
Top15 = answer_one()
Top15['popEst']=Top15['Energy Supply']/Top15['Energy Supply per Capita']
Top15['catiable document per Capita'] = Top15['Citable documents'] / Top15['popEst']
return Top15['catiable document per Capita'].corr(Top15['Energy Supply per Capita'])
answer_nine()
0.79400104354429457
def plot9():
import matplotlib as plt
%matplotlib inline
Top15 = answer_one()
Top15['PopEst'] = Top15['Energy Supply'] / Top15['Energy Supply per Capita']
Top15['Citable docs per Capita'] = Top15['Citable documents'] / Top15['PopEst']
Top15.plot(x='Citable docs per Capita', y='Energy Supply per Capita', kind='scatter', xlim=[0, 0.0006])
#plot9()
#plot9() # Be sure to comment out plot9() before submitting the assignment!
Question 10 (6.6%)
Create a new column with a 1 if the country's % Renewable value is at or above the median for all countries in the top 15, and a 0 if the country's % Renewable value is below the median.
This function should return a series named HighRenew whose index is the country name sorted in ascending order of rank.
def answer_ten():
Top15 = answer_one()
limit=Top15['% Renewable'].median()
Top15['HighRenew']=np.where(Top15['% Renewable']>=limit,1,0)
Top15.sort_values('Rank',ascending=True,inplace=True)
return Top15['HighRenew']
answer_ten()
Country
China 1
United States 0
Japan 0
United Kingdom 0
Russian Federation 1
Canada 1
Germany 1
India 0
France 1
South Korea 0
Italy 1
Spain 1
Iran 0
Australia 0
Brazil 1
Name: HighRenew, dtype: int64
Question 11 (6.6%)
Use the following dictionary to group the Countries by Continent, then create a dateframe that displays the sample size (the number of countries in each continent bin), and the sum, mean, and std deviation for the estimated population of each country.
ContinentDict = {'China':'Asia',
'United States':'North America',
'Japan':'Asia',
'United Kingdom':'Europe',
'Russian Federation':'Europe',
'Canada':'North America',
'Germany':'Europe',
'India':'Asia',
'France':'Europe',
'South Korea':'Asia',
'Italy':'Europe',
'Spain':'Europe',
'Iran':'Asia',
'Australia':'Australia',
'Brazil':'South America'}
This function should return a DataFrame with index named Continent ['Asia', 'Australia', 'Europe', 'North America', 'South America'] and columns ['size', 'sum', 'mean', 'std']
def answer_eleven():
Top15 = answer_one()
ContinentDict = {'China':'Asia',
'United States':'North America',
'Japan':'Asia',
'United Kingdom':'Europe',
'Russian Federation':'Europe',
'Canada':'North America',
'Germany':'Europe',
'India':'Asia',
'France':'Europe',
'South Korea':'Asia',
'Italy':'Europe',
'Spain':'Europe',
'Iran':'Asia',
'Australia':'Australia',
'Brazil':'South America'}
df=pd.DataFrame(columns=['size', 'sum', 'mean', 'std'])
Top15['popEst']=Top15['Energy Supply']/Top15['Energy Supply per Capita']
for group,frame in Top15.groupby(ContinentDict):
df.loc[group]=[len(frame),frame['popEst'].sum(),frame['popEst'].mean(),frame['popEst'].std()]
return df
answer_eleven()
size sum mean std
Asia 5.0 2.898666e+09 5.797333e+08 6.790979e+08
Australia 1.0 2.331602e+07 2.331602e+07 NaN
Europe 6.0 4.579297e+08 7.632161e+07 3.464767e+07
North America 2.0 3.528552e+08 1.764276e+08 1.996696e+08
South America 1.0 2.059153e+08 2.059153e+08 NaN
Question 12 (6.6%)
Cut % Renewable into 5 bins. Group Top15 by the Continent, as well as these new % Renewable bins. How many countries are in each of these groups?
This function should return a Series with a MultiIndex of Continent, then the bins for % Renewable. Do not include groups with no countries.
def answer_twelve():
Top15 = answer_one()
ContinentDict = {'China':'Asia',
'United States':'North America',
'Japan':'Asia',
'United Kingdom':'Europe',
'Russian Federation':'Europe',
'Canada':'North America',
'Germany':'Europe',
'India':'Asia',
'France':'Europe',
'South Korea':'Asia',
'Italy':'Europe',
'Spain':'Europe',
'Iran':'Asia',
'Australia':'Australia',
'Brazil':'South America'}
Top15['Bins']=pd.cut(Top15['% Renewable'],5)
return Top15.groupby([ContinentDict,Top15['Bins']]).size()
answer_twelve()
Bins
Asia (2.212, 15.753] 4
(15.753, 29.227] 1
Australia (2.212, 15.753] 1
Europe (2.212, 15.753] 1
(15.753, 29.227] 3
(29.227, 42.701] 2
North America (2.212, 15.753] 1
(56.174, 69.648] 1
South America (56.174, 69.648] 1
dtype: int64
Question 13 (6.6%)
Convert the Population Estimate series to a string with thousands separator (using commas). Do not round the results.
e.g. 317615384.61538464 -> 317,615,384.61538464
This function should return a Series PopEst whose index is the country name and whose values are the population estimate string.
def answer_thirteen():
Top15 = answer_one()
Top15['popEst']=Top15['Energy Supply']/Top15['Energy Supply per Capita']
Top15['popEst']=Top15['popEst'].apply('{:,}'.format)
return Top15['popEst']
answer_thirteen()
Country
China 1,367,645,161.2903225
United States 317,615,384.61538464
Japan 127,409,395.97315437
United Kingdom 63,870,967.741935484
Russian Federation 143,500,000.0
Canada 35,239,864.86486486
Germany 80,369,696.96969697
India 1,276,730,769.2307692
France 63,837,349.39759036
South Korea 49,805,429.864253394
Italy 59,908,256.880733944
Spain 46,443,396.2264151
Iran 77,075,630.25210084
Australia 23,316,017.316017315
Brazil 205,915,254.23728815
Name: popEst, dtype: object
Optional
Use the built in function plot_optional() to see an example visualization.
def plot_optional():
import matplotlib as plt
%matplotlib inline
Top15 = answer_one()
ax = Top15.plot(x='Rank', y='% Renewable', kind='scatter',
c=['#e41a1c','#377eb8','#e41a1c','#4daf4a','#4daf4a','#377eb8','#4daf4a','#e41a1c',
'#4daf4a','#e41a1c','#4daf4a','#4daf4a','#e41a1c','#dede00','#ff7f00'],
xticks=range(1,16), s=6*Top15['2014']/10**10, alpha=.75, figsize=[16,6]);
for i, txt in enumerate(Top15.index):
ax.annotate(txt, [Top15['Rank'][i], Top15['% Renewable'][i]], ha='center')
print("This is an example of a visualization that can be created to help understand the data. \
This is a bubble chart showing % Renewable vs. Rank. The size of the bubble corresponds to the countries' \
2014 GDP, and the color corresponds to the continent.")
#plot_optional()

IMAGES

  1. Introduction To Data Science

    introduction to data science assignment 4

  2. Assignment Part1

    introduction to data science assignment 4

  3. SOLUTION: Introduction to data science

    introduction to data science assignment 4

  4. Welcome to the tutorial of data science. This is beginner's guide on introduction to data

    introduction to data science assignment 4

  5. Introduction to Data Science: Unveiling Insights Hidden in Data by hema yadav

    introduction to data science assignment 4

  6. data science introduction notes

    introduction to data science assignment 4

VIDEO

  1. Introduction Data science and visualization (DATA SCIENCE AND VISUALIZATION

  2. Python for Data Science Nptel week 1 Assignment 1 #nptel

  3. NPTEL Python for Data Science Week4 Practice Quiz Assignment Solutions

  4. What is Data Science |Data science process overview| Data science roadmap| easy explain for beginner

  5. Python for Data Science NPTEL week 4 assignment 4 answers 2024

  6. Python for Data Science Week 1 Assignment 1 Solution

COMMENTS

  1. tchagau/Introduction-to-Data-Science-in-Python

    This repository includes course assignments of Introduction to Data Science in Python on coursera by university of michigan - tchagau/Introduction-to-Data-Science-in-Python

  2. Introduction-to-Data-Science-in-python/Assignment+4.ipynb at master

    This repository contains Ipython notebooks of assignments and tutorials used in the course introduction to data science in python, part of Applied Data Science using Python Specialization from Univ...

  3. Introduction-to-Data-Science-with-Python-University-of-Michigan

    For this assignment, only look at GDP data from the first quarter of 2000 onward. # Each function in this assignment below is worth 10%, with the exception of ```run_ttest()```, which is worth 50%. # In[26]:

  4. Introduction to Data Science in Python

    Fundamentals of Data Manipulation with PythonBasic Data Processing with PandasLoad, manipulate, and select data using numpy, as well as understand the fundam...

  5. Introduction to Data Science with Python

    Introduction to Data Science with Python - 14 Assignment 4 (Data Visualization)

  6. A4

    COGS9: Introduction to Data Science Assignment #4: Machine Learning Due date: Friday 2022 March 04 23:59: Grading: 10% of overall course grade; 40 points total. Download the editable version of this document and add your responses in the locations indicated. Please respond using the blue font color used in the response text, as it makes the assignments easier to grade.

  7. "AssertionError: Q2: Your answer is not close enough to ours to be

    I need some assistance with my Assignment 4 for Introduction to Data Science in Python. Question 2 of assignment 4. For this question, calculate the win/loss ratio's correlation with the population of the city it is in for the NBA using 2018 data.

  8. Introduction to Data Science with Python

    Data science is an ever-evolving field, using algorithms and scientific methods to parse complex data sets. Data scientists use a range of programming languages, such as Python and R, to harness and analyze data. This course focuses on using Python in data science. By the end of the course, you'll have a fundamental understanding of machine ...

  9. Introduction to Data Science in Python Week 4 || Assignment 4 ...

    Hi everyone,This video is for education purpose onlylike share and subscribe for more videoPlease visit my Blog to see more contenthttps://priyadigitalworld....

  10. Introduction

    Introduction to Data Science I & II. Introduction Part I: Exploring Data 1. What is Data Science? 2. Data Science Case Study 3. Programming in Python ... 4.4 Assignment for Mutable Data Types 5. Randomness and Control Statements 5.1 Random Choice 5.2 Conditional Statements 5.3 Iteration and Simulation ...

  11. ycchen00/Introduction-to-Data-Science-in-Python

    These may include the latest answers to Introduction to Data Science in Python's quizs and assignments. You can see the link in my blog or CSDN. Blog link: Coursera | Introduction to Data Science in Python(University of Michigan)| Quiz答案. Coursera | Introduction to Data Science in Python(University of Michigan)| Assignment1

  12. Introduction to Data Science with Python week 4 assignment solution

    In this assignment you must read in a file of metropolitan regions and associated sports teams from assets/wikipedia_data.html and answer some questions about each metropolitan region. Each of these regions may have one or more teams from the "Big 4": NFL (football, in assets/nfl.csv), MLB (baseball, in assets/mlb.csv), NBA (basketball, in ...

  13. Homework Assignments and Projects

    Week 1: Introduction to Data Science. In three or four paragraphs: describe why you think the data science field is important. Use information you have learned from the videos, readings and lecture. describe one potential ethical problem related to data science, give an example. describe one example of how data science affects your day-to-day life.

  14. Data Science in Python by University of Michigan- Assignment 4

    Data Science in Python by University of Michigan- Assignment 4- Hypothesis Testing. A quarter is a specific three month period, Q1 is January through March, Q2 is April through June, Q3 is July through September, Q4 is October through December. A recession is defined as starting with two consecutive quarters of GDP decline, and ending with two ...

  15. Introduction to Data Science: A Python Approach to Concepts, Techniques

    This accessible and classroom-tested textbook/reference presents an introduction to the fundamentals of the interdisciplinary field of data science. The coverage spans key concepts from statistics, machine/deep learning and responsible data science, useful techniques for network analysis and natural language processing, and practical ...

  16. Hands-On Numerical Derivative with Python, from Zero to Hero

    Image made by author. But let's say we do not have the symbolic equation.How do we compute the derivative?. Well, in real life (with a numeric signal) you don't have the luxury to take "h that tends to 0". The only thing you have is: A signal: that is a list of values.; The "time" axis: another list of values Surely none of these two (neither the signal nor the time axis) are ...

  17. Introduction to data science in python Assignment_3 Coursera

    Assignment3.py. Assignment 3 - More Pandas. This assignment requires more individual learning then the last one did - you are encouraged to check out the pandas documentation to find functions or methods you might not have used yet, or ask questions on Stack Overflow and tag them as pandas and python related.

  18. Python for Data Science Week 4 Assignment 4 Solution

    #pythonfordatascience #nptel #swayam #python #datascience Python for Data Science All week Assignment Solution - https://www.youtube.com/playlist?list=PL__28...

  19. "Social science is explanation or it is nothing." Introduction to a

    This essay introduces contributions to a special section, which documents and extends a debate on the proposition "Social Science is Explanation or it is Nothing" held at the London School of Economics on October 13 th, 2022.It discusses the history of the "Group for Theoretical Debates in Anthropology" led by Tim Ingold, Peter Wade and Soumhya Venkatesan, which has handed down a list ...

  20. Applied-Data-Science-with-Python-Specialization/Course

    The 5 courses in this University of Michigan specialization introduce learners to data science through the python programming language. This skills-based specialization is intended for learners who have a basic python or programming background, and want to apply statistical, machine learning, information visualization, text analysis, and social network analysis techniques through popular ...

  21. A 485-million-year history of Earth's surface temperature

    INTRODUCTION. A long-term geological record of global mean surface temperature (GMST) is important for understanding the history of our planet and putting present-day climate change into context. ... .E.T. from grant 2016-015 from the Heising-Simons Foundation and the Thomas R. Brown Distinguished Chair in Integrative Science. D.J.L. and P.J.V ...