SQL - Group By Clause
The GROUP BY
clause in SQL Server is used to group rows based on one or more columns in a query.
It is often used in conjunction with aggregate functions like SUM
, COUNT
, AVG
, and MAX
to perform calculations on groups of rows.
The GROUP BY
clause divides the result set into groups based on the specified columns and applies aggregate functions to each group.
The basic syntax of the GROUP BY
clause is as follows:
SELECT column1, column2, ..., aggregate_function(column)
FROM table_name
GROUP BY column1, column2, ...;
In the above syntax, column1, column2, ... represent the columns you want to group by, and aggregate_function(column) represents the aggregate function you want to apply to each group.
Let's explore how you can use the GROUP BY
clause with an "Orders" table that contains fields such as "OrderID", "CustomerID", "ProductID", and "Quantity".
Assuming you have an "Orders" table with the following sample data:
Table: Orders
OrderID |
CustomerID |
ProductID |
Quantity |
1 | 101 | 1 | 5 |
2 | 101 | 2 | 3 |
3 | 102 | 1 | 2 |
4 | 103 | 3 | 4 |
5 | 102 | 2 | 1 |
6 | 101 | 3 | 2 |
7 | 103 | 1 | 3 |
8 | 102 | 3 | 2 |
Now, let's say you want to use GROUP BY
to find the total quantity of each product ordered by customers. You can write a query like this:
SELECT
ProductID,
SUM(Quantity) AS TotalQuantity
FROM
Orders
GROUP BY
ProductID;
In this query:
-
We select the "ProductID" column to identify each product.
- We use the
SUM
function to calculate the total quantity of each product ordered.
- We give an alias to the calculated column using AS TotalQuantity to make the result more readable.
- We use the
GROUP BY
clause to group the rows by "ProductID," so each group represents a unique product.
The result of this query would be:
ProductID |
TotalQuantity |
1 | 10 |
2 | 4 |
3 | 8 |
This result shows the total quantity of each product ordered by customers.
You can include multiple columns in the GROUP BY
clause to create more granular groups. In such cases, the result set will be grouped based on the combination of values in the specified columns.
Here's a new set of sample data that includes multiple records for the same customer ordering the same product, allowing us to demonstrate the GROUP BY
with both columns:
Table: Orders
OrderID |
CustomerID |
ProductID |
Quantity |
1 | 101 | 1 | 5 |
2 | 101 | 1 | 3 |
3 | 102 | 1 | 2 |
4 | 103 | 3 | 4 |
5 | 102 | 1 | 1 |
6 | 101 | 1 | 2 |
7 | 103 | 1 | 3 |
8 | 102 | 3 | 2 |
Here's an example that groups the orders by both CustomerID and ProductID:
SELECT
CustomerID,
ProductID,
SUM(Quantity) AS TotalQuantity
FROM
Orders
GROUP BY
CustomerID,
ProductID;
In this query:
-
We select both "CustomerID" and "ProductID" columns to identify each customer's orders for each product.
- We use the
SUM
function to calculate the total quantity of each product ordered by each customer.
- We give an alias to the calculated column using AS TotalQuantity to make the result more readable.
- We use the
GROUP BY
clause to group the rows by both "CustomerID" and "ProductID," creating unique groups for each combination of customer and product.
The result of this query would be:
CustomerID |
ProductID |
TotalQuantity |
101 | 1 | 10 |
102 | 1 | 3 |
102 | 3 | 2 |
103 | 1 | 3 |
103 | 3 | 4 |
This result displays the total quantity of each product ordered by each customer. Each row represents a unique combination of "CustomerID" and "ProductID."
When using the GROUP BY
clause, it's important to note that any column included in the SELECT clause must either be part of the GROUP BY
clause or an aggregate function. This ensures that the query produces meaningful results by grouping and aggregating the data correctly.
Additionally, you can use other aggregate functions such as COUNT
, AVG
, MIN
, MAX
, etc., in combination with the GROUP BY
clause to perform various calculations on each group of data.