Data Science/Business Analytics for Small Business Applications › Forums › SQL › Count of a column with Group By and Order By
Tagged: COUNT, GROUP BY, Order By in SQL
- This topic is empty.
-
AuthorPosts
-
April 24, 2022 at 6:23 am #146AdminKeymaster
Here is the problem:
All of the questions in this quiz refer to the open source Chinook Database. Please familiarize yourself with the ER diagram in order to familiarize yourself with the table and column names in order to write accurate queries and get the appropriate answers.
Find the cities with the most customers and rank in descending order.
Source: https://www.coursera.org/learn/sql-for-data-science%5B/caption%5D
The only progress I could make is with this code:
SELECT * FROM customers ORDER BY city
I could see for instance Berlin appearing twice. Now, these are the tasks that I believe will take me closer to the solution:
Count the occurrence of each city in a new column. For instance, Amsterdam with a count of 1. Berlin with count 2…. Next, sort that column in descending order.
Given this problem set after a lecture on CASE statements, there should be the usage of CASE.
Reply
You don’t need to use a CASE statement, just GROUP BY, COUNT and ORDER BY
SELECT COUNT(city) AS countcity, city FROM customers GROUP BY city ORDER BY countcity desc
Are there any NULLs in the city column?
SELECT CASE WHEN COUNT(*) > COUNT(COMPANY) THEN 'COMPANY HAS NULLS' ELSE NULL END, CASE WHEN COUNT(*) > COUNT(PostalCode) THEN 'PostalCode has nulls' ELSE NULL END, CASE WHEN COUNT(*) > COUNT(city) THEN "CITY HAS NULLS" ELSE "NO DOES NOT HAVE NULL" END FROM CUSTOMERS
Company, postal code have nulls. City does not have null.
Now change the problem just slightly
instead of finding the cities with the most customers and rank in descending order
do this —
Find the companies with the fewest customers and rank in ascending order.
it’s important to learn the nuances of NULLs
SELECT COUNT(company) AS countcompany, company, customerid FROM customers GROUP BY company ORDER BY company
ORDER BY company
wasn’t what asked, but it will do.
Look at the first row of results.
Why 0 companies with NULL, when you know there is at least one NULL company, namely for customer 59.
By the way, having customerid in your SELECT clause but not in your GROUP BY clause causes an error. Here is the solution.SELECT company -- this is what we're grouping by , COUNT(*) AS number_of_rows_including_nulls , COUNT(company) AS number_of_non_null_values FROM customers GROUP BY company ORDER BY number_of_rows_including_nulls ASC
Make sure you understand why customerid is not included in the GROUP BY and also not included in the SELECT.
But most important, make sure you understand what COUNT(*) is showing you, and what COUNT(company) is showing you.
If necessary, go back to where you we were using several CASE expressions to figure out which columns have nulls.
-
AuthorPosts
- You must be logged in to reply to this topic.