SQL for Data Science - Notes
待补充
注:同一类编程题型只摘选1~2道做笔记;所有概念题型均不摘选,详见课堂讲义
Module 1:Select and Retrieve Data with SQL
随堂练习
- Rtrieve all the data from the tracks table
Select *
From Tracks;
DATA: all the data
TABLE: tracks table
- Return the playlist id, and name from the playlists table
Select Playlistid,
Name
From Playlists;
DATA: playlist id and name
TABLE: playlists table
- Select all the columns from the Playlist Track table and limit the results to 10 records
Select *
From Playlist_track
Limit 10;
DATA: all the columns
TABLE: playlist track list
FILTER: limit 10 results
代码测评
太简单了,不做记录
Module 2:Filtering, Sorting, and Calculating Data with SQL
随堂练习
- Find the distinct values for the extended step from salary_range_by_job_classification
SELECT
distinct Extended_step
From salary_range_by_job_classification
DATA: extended step
TABLE: salary_range_by_job_classification
FILTER: distinct
- Excluding $0.00, what is the minimum bi-weekly high rate of pay (please include the dollar sign and decimal point in your answer)
Select
min(Biweekly_high_Rate)
From salary_range_by_job_classification
Where Biweekly_high_Rate <> '$0.00'
DATA: bi-weekly high rate of pay
CAL: minimum
TABLE: salary_range_by_job_classification
FILTER: Excluding $0.00,用到了<>符号,表示排除
- What is the pay type for all the job codes that start with ‘03’
Select
job_code,
pay_type
From salary_range_by_job_classification
Where job_code Like '03%'
DATA: pay type,job codes
TABLE: salary_range_by_job_classification
FILTER: start with ‘03’
- Run a query to find the Effective Date (eff_date) or Salary End Date (sal_end_date) for grade Q90H0
Select
grade,
eff_date,
sal_end_date
From salary_range_by_job_classification
Where grade = 'Q90H0'
DATA: Effective Date (eff_date),Salary End Date (sal_end_date) ,grade
p.s 凡是出现了的都要SELECT
TABLE: salary_range_by_job_classification
FILTER: for grade Q90H0
- What Step are Job Codes 0110-0400
SELECT
Step
,Job_Code
From salary_range_by_job_classification
Where Job_Code Between '0110' AND '0400'
DATA: Step,Job Codes
p.s 凡是出现了的都要SELECT
TABLE: salary_range_by_job_classification
FILTER: Job Codes 0110-0400
- Sort the Biweekly low rate in ascending order
SELECT
Biweekly_Low_Rate
From salary_range_by_job_classification
Order by Biweekly_Low_Rate ASC
DATA: Biweekly low rate
TABLE: salary_range_by_job_classification
ORDER: in ascending order
- What is the Biweekly High Rate minus the Biweekly Low Rate for job Code 0170
SELECT
(Biweekly_High_Rate - Biweekly_Low_Rate) AS Dif
,Job_Code
From salary_range_by_job_classification
Where Job_Code='0170'
DATA: Biweekly High Rate,Biweekly Low Rate,Job Code
CAL: Biweekly High Rate minus the Biweekly Low Rate
TABLE: salary_range_by_job_classification
FILTER: job Code 0170
- What is the Extended Step for Pay Types M, H, and D
SELECT
Extended_Step
,Pay_Type
From salary_range_by_job_classification
Where Pay_Type IN ('M', 'H', 'D')
DATA: Extended Step,Pay Types
TABLE: salary_range_by_job_classification
FILTER: Pay Types M, H, and D
- What is the step for Union Code 990 and a Set ID of SFMTA or COMMN
SELECT
Step
,Union_Code
,SetID
From salary_range_by_job_classification
Where Union_Code = '990' AND (SetID = 'SFMTA' OR SetID = 'COMMN')
DATA: step,Union Code,Set ID
TABLE: salary_range_by_job_classification
FILTER: Union Code 990 and a Set ID of SFMTA or COMMN
代码测评
All of the questions in this quiz refer to the open source Chinook Database. Please familiarize yourself with the ER diagram to familiarize yourself with the table and column names to write accurate queries and get the appropriate answers.
- Find all the tracks that have a length of 5,000,000 milliseconds or more
Select *
From Tracks
Where Milliseconds >= 5000000
DATA: all the tracks
TABLE: tracks table (in Chinook Database)
FILTER: 5,000,000 milliseconds or more
- Find all the invoices whose total is between $5 and $15 dollars
Select *
From Invoices
Where Total Between 5 AND 15
DATA: all the invoices
TABLE: invoices table (in Chinook Database)
FILTER: total is between $5 and $15 dollars(这里的5和15都不是字符,是数字,所以用BA)
放这两题是为了展示:从Chinook Database里选数据时,DATA和TABLE选取和之前有所不同,请细品
- Find all the customers from the following States: RJ, DF, AB, BC, CA, WA, NY
Select *
From Customers
Where State IN ('RJ','DF','AB','BC','CA','WA','NY')
DATA: all the customers
TABLE: customers table (in Chinook Database)
FILTER: customers from the following States: RJ, DF, AB, BC, CA, WA, NY (这里都是字符,所以用IN方便,IN其实相当于OR,但处理速度比OR快)
- Find all the invoices for customer 56 and 58 where the total was between $1.00 and $5.00
Select *
From Invoices
Where (CustomerId IN ('56','58')) AND (Total Between 1 AND 5)
DATA: all the invoices
TABLE: invoices table (in Chinook Database)
FILTER: customer 56 and 58(字符),the total was between $1.00 and $5.00(数字)
- Find all the tracks whose name starts with ‘All’
Select *
From Tracks
Where Name Like 'All%'
DATA: all the tracks
TABLE: tracks table (in Chinook Database)
FILTER: name starts with ‘All’
- Find all the customer emails that start with “J” and are from gmail.com
Select Email
From Customers
Where Email Like 'J%gmail.com'
DATA: email (其实都没有唯一答案,这里选all colums也行)
TABLE: customers table (in Chinook Database)
FILTER: start with “J” and are from gmail.com
- Show the number of orders placed by each customer and sort the result by the number of orders in descending order
Select *
,Count(InvoiceId) AS total_order_number
From Invoices
Group by CustomerId
Order by total_order_number DESC
DATA: each customer(读表可知暗指CustomerId),orders(读表可知暗指InvoiceId)
CAL: the number of(看到number就要count计数)
TABLE: invoices table (根据所需DATA找合适的表,只有invoices表同时包含了两个数据)
GROUP: by each customer(如果不GROUP,就会返回所有customer的order总数,可细品)
ORDER: the number of orders in descending order(为了这里调用方便,在之前COUNT时就加个alias AS)
- Find the albums with 12 or more tracks
Select *
,Count(TrackId) AS number
From Tracks
Group by AlbumId
Having number >=12
DATA: album的信息均可,所以可以选all,但tracks必选
CAL: 表里没有直接的number代表track数量,因此要用count计数
TABLE: tracks table (根据所需DATA找合适的表,只有Tracks表同时包含了tracks和album的信息)
GROUP: each album
FILTER: WHERE只能按行row筛选,而HAVING是可以按组group筛选的;where必须放在GROUP前面,HAVING 必须放在GROUP后面(好理解,因为要先有GROUP,才有必要HAVING)
Module 3:Subqueries and Joins in SQL
随堂练习
All of the questions in this quiz pull from the open source Chinook Database. Please refer to the ER Diagram below and familiarize yourself with the table and column names to write accurate queries and get the appropriate answers
- How many albums does the artist Led Zeppelin have?
Select Count(Distinct a.AlbumId) AS album_number, a.ArtistId, ar.Name
From albums a, artists ar
Where a.ArtistId in (Select ar.ArtistId
From artists ar
Where ar.Name = 'Led Zeppelin')
DATA: albums,artist name
CAL: 表里没有直接的number代表album数量,因此要用count计数,还要注意要用DISTINCT来排除重复项
TABLE: albums,artists(根据所需DATA找合适的表,发现信息分散在两张表中,因此两张表都要写在这;所幸只需要从每张表中找1列数据,因此可以用subquery)
Subquery selects can only retrieve a single column
FILTER: SUBQUERY在使用时,纽带元素很重要(即两表共有元素,此题就是ArtistId,注意要加表头.)
- Create a list of album titles and the unit prices for the artist “Audioslave”
Select a.Title, t.UnitPrice, ar.Name
From ((albums a INNER JOIN artists ar ON a.ArtistId = ar.ArtistId)
INNER JOIN tracks t ON a.AlbumId = t.AlbumId)
Where ar.Name = 'Audioslave'
DATA: album titles,the unit prices,artist name
TABLE: albums,artists,tracks(根据所需DATA找合适的表,发现信息分散在三张表中,用subquery就太麻烦了,所以用JOIN合并表格,搭配ON)
Joins allows data retrieval from multiple tables in one query
Tables are related through common values (keys)
The INNER JOIN keyword selects records that have matching values in both tables
FILTER: 因为JOIN后变成one query,所以注意这里SELECT, FROM, WHERE是对齐的,就是为了说明这只是one query(对比SUBQUERY的格式细品)
- Find the first and last name of any customer who does not have an invoice
Select c.FirstName, c.LastName, i.InvoiceId
From (customers c LEFT JOIN invoices i ON c.CustomerId = i.CustomerId)
Where InvoiceId is NULL
DATA: first and last name,InvoiceId
TABLE: customers c LEFT JOIN invoices i(因为存在有customer无invoice的情况,有空值的应为右表)
LEFT JOIN return all records from the left table, and the matched records from the right table
”The result is NULL from the right side, if there is no match.
RIGHT JOIN不用记,直接调换LEFT JOIN的顺序即可
FILTER: is NULL,为空值
- Find the total price for each album, What is the total price for the album “Big Ones”?
Select a.Title, SUM(t.UnitPrice)
From (albums a INNER JOIN tracks t ON a.AlbumId = t.AlbumId)
Where a.Title = 'Big Ones'
Group by a.Title
DATA: album title,price
CAL: 要求总价钱就用SUM(注意和COUNT区分开来,COUNT计数,SUM求和)
TABLE: albums a INNER JOIN tracks t
FILTER: album “Big Ones”
GROUP: for each album(看到for,by这种就暗指要分组)
- How many records are created when you apply a Cartesian join to the invoice and invoice items table?
Select i.InvoiceID
From invoices i CROSS JOIN invoice_items it
笛卡尔法(拉丁语和法语不一样)就是元素直接相乘,不常用
代码测评
All of the questions in this quiz refer to the open source Chinook Database. Please familiarize yourself with the ER diagram in order to familiarize yourself with the table and column names in order to write accurate queries and get the appropriate answers
- Using a subquery, find the names of all the tracks for the album “Californication”
SELECT Name, t.AlbumId
FROM Tracks t
WHERE t.AlbumId in (SELECT a.AlbumId
From Albums a
WHERE a.Title = 'Californication')
DATA: names,album
TABLE: 数据分散在两表,且每表只需提取一列数据,满足可用SUBQUERY的条件
FILTER: 纽带元素即共有数据拿出来,t.AlbumId,a.AlbumId,再有a.Title = ‘Californication’
- Find the total number of invoices for each customer along with the customer’s full name, city and email
SELECT COUNT(i.InvoiceId) AS total_number
,c.CustomerId
,c.FirstName
,c.LastName
,c.City
,c.Email
FROM Customers c INNER JOIN Invoices i
GROUP BY c.CustomerId
DATA: invoices,customer’s full name,city and email
CAL: the total number of,见number用COUNT
TABLE: 数据分散在两表,但需要提取多列数据,不满足可用SUBQUERY的条件,则用INNER JOIN(只要没说有空值,就用INNER JOIN)
GROUP: for each customer
- Retrieve the track name, album, artistID, and trackID for all the albums. What is the song title of trackID 12 from the “For Those About to Rock We Salute You” album?
SELECT t.Name, t.TrackId, a.Title, ar.ArtistId
FROM ((Tracks t INNER JOIN Albums a ON t.AlbumId = a.AlbumId)
INNER JOIN Artists ar ON a.ArtistId = ar.ArtistId)
WHERE (t.TrackId = 12) OR (a.Title = 'For Those About to Rock We Salute You')
DATA: track name, album, artistID, and trackID
TABLE: 数据分散在三表,用SUBQUERY太复杂,因此用INNER JOIN(只要没说有空值,就用INNER JOIN),记得要ON各自的纽带元素
FILTER: trackID 12,“For Those About to Rock We Salute You” album,注意这俩数据在不同的表格,所以这里不能用AND并列,要用OR,或者只写其中一个条件也行
- Retrieve a list with the managers last name, and the last name of the employees who report to him or her
SELECT m.LastName AS Manager, e.LastName AS Employee
FROM Employees m, Employees e
WHERE e.ReportsTo = m.EmployeeId
DATA: managers last name, the last name of the employees
TABLE: 数据在一个表里,但数据之间有关系(补充知识点,数据关系有三种:一对一,一对多,多对一),满足SELF JOIN使用条件,把表重命名为m和e即可
Take the table and treat it like two separate tables
FILTER: employees who report to the manager,即e.ReportsTo = m.EmployeeId
- Find the name and ID of the artists who do not have albums
SELECT ar.ArtistId, a.AlbumId, ar.Name
FROM (Artists ar LEFT JOIN Albums a ON ar.ArtistId = a.ArtistId)
WHERE a.AlbumId is NULL
DATA: name and ID of the artists,albums
TABLE: 要找空值,则触发了LEFT JOIN使用条件,有空值的是右表,即为Albums
Take the table and treat it like two separate tables
FILTER: a.AlbumId is NULL
- Use a UNION to create a list of all the employee’s and customer’s first names and last names ordered by the last name in descending order
SELECT e.FirstName AS FirstName, e.LastName AS LastName
FROM Employees e
UNION
SELECT c.FirstName, c.LastName
FROM Customers c
ORDER BY LastName DESC
UNION用法:
Each SELECT statement with UNION must have the same number of columns
Columns must have similar data types
The columns in each SELECT statement must be in the same order
UNION selects only distinct values, UNION ALL can also select duplicate values!
UNION后的表,表头会以第一个表为准,所以这里把AS放第一个SELECT里
- See if there are any customers who have a different city listed in their billing city versus their customer city
SELECT COUNT(i.InvoiceId) AS order_amount
,c.CustomerId, c.FirstName, c.LastName, c.City, i.BillingCity
FROM (Customers c INNER JOIN Invoices i
ON c.CustomerId = i.CustomerId)
WHERE c.City != i.BillingCity
GROUP BY c.CustomerId
ORDER BY order_amount DESC
DATA: customers相关信息(随意),billing city,customer city
TABLE: 数据分散在两表里,但需要提取多列数据,所以不能用SUBQUERY,用INNER JOIN即可
FILTER: have a different city,即c.City != i.BillingCity
GROUP: 这行必须要,因为any customers也相当于for each customer了,请细品
ORDER: 可写可不写
Module 4:Modifying and Analyzing Data with SQL
代码测评
All of the questions in this quiz refer to the open source Chinook Database. Please familiarize yourself with the ER diagram in order to familiarize yourself with the table and column names in order to write accurate queries and get the appropriate answers
- Pull a list of customer ids with the customer’s full name, and address, along with combining their city and country together. Be sure to make a space in between these two and make it UPPER CASE. (e.g. LOS ANGELES USA)
SELECT CustomerId
,FirstName ||' '|| LastName AS FullName
,Address
,UPPER(City ||' '|| Country) AS City_State
FROM Customers
DATA: customer ids,customer’s full name(暗指需要拼接FirstName和LastName), address,且combining their city and country together,也需要拼接
STRING: Concatenations拼接是一种string function,只适用于字符,用 pipe symbol ||,但有些服务器支持 +;|| || 之间放’ '表示连接字符时空一格,如果啥都不用加,就只写一个||;UPPER和LOWER分别用于大写化和小写化
TABLE: 所需信息全在customers表里
- Create a new employee user id by combining the first 4 letters of the employee’s first name with the first 2 letters of the employee’s last name. Make the new field lower case and pull each individual step to show your work
SELECT FirstName
,SUBSTR(FirstName, 1,4) AS SFirstName
,LastName
,SUBSTR(LastName, 1,2) AS SLastName
,LOWER(SUBSTR(FirstName, 1,4) || SUBSTR(LastName, 1,2)) AS New_Employee_User_ID
FROM Customers
DATA: employee’s first name,employee’s last name
STRING: by combining … the first 4 letters of … 触发了字符串的Concatenations拼接和Trimming剪切功能,这里拼接时啥都不用加,因此只写一个||
SUBSTR returns the specified number of characters from a particular position of a given strings
如1,2,代表从第1位开始数2位,保留
TABLE: 所需信息全在customers表里
- Show a list of employees who have worked for the company for 15 or more years using the current date function. Sort by lastname ascending
SELECT LastName, FirstName
,DATE('now') - HireDate AS Tenure
FROM Employees
WHERE Tenure >= 15
ORDER BY LastName ASC
DATA: employees的相关信息(随意,但用全名最直观),tenure任期
STRING: 时间计算成功触发DATE功能,DATE(‘now’) 即为current date function
TABLE: 所需信息全在Employees表里
FILTER: 15 or more years
ORDER: sort by lastname ascending
- Create a new customer invoice id by combining a customer’s invoice id with their first and last name while ordering your query in the following order: firstname, lastname, and invoiceID. Select all of the correct “AstridGruber” entries
SELECT c.FirstName || c.LastName || i.InvoiceId AS New_INID
FROM Customers c INNER JOIN Invoices i
ON c.CustomerId = i.CustomerId
WHERE New_INID LIKE 'AstridGruber%'
ORDER BY c.FirstName, c.LastName, i.InvoiceId
DATA: new customer invoice id by combining a customer’s invoice id with their first and last name
STRING: 按要求拼接即可
TABLE: 所需信息分散在两张表里,但需要提取多列信息,则需要用INNER JOIN,ON纽带元素
FILTER: “AstridGruber” entries,即以"AstridGruber"开头,LIKE ‘AstridGruber%’
ORDER: in the following order: firstname, lastname, and invoiceID,没有说ASC和DESC,那直接把列名称写上去即可
来源:oschina
链接:https://my.oschina.net/u/4327913/blog/4474198