TOP

SQL-Lesson 8. Data grouping (GROUP BY)

Data grouping allows you to divide all data into logical sets, which makes it possible to perform statistical calculations separately in each group.


1. Creating groups (GROUP BY)

Groups are created using the GROUP BY statement of the SELECT operator. Let's consider an example.

Run SQLSELECT Product, SUM(Quantity) AS Product_num 
FROM Sumproduct 
GROUP BY Product

With this request, we extracted information on the number of products sold in each month. The operator SELECT orders to output two columns Product - the name of the product and Product_num - the calculated field that we created to display the quantity of sold products (field formula SUM(Quantity)). The GROUP BY clause tells the DBMS to group data by the Product column.

It is also worth noting that GROUP BY must come after the WHERE clause and before ORDER BY.

2. Filtering groups (HAVING)

Just as we filtered rows in a table, we can filter on grouped data. For this, there is the HAVING operator in SQL. Let's take the previous example and add group filtering.

Run SQLSELECT Product, SUM(Quantity) AS Product_num 
FROM Sumproduct 
GROUP BY Product 
HAVING SUM(Quantity) > 4000

We can see that after the number of sold goods was calculated for each product, the DBMS "cut off" those products that were sold less than 4,000 units.

As you can see, the HAVING operator is very similar to the WHERE operator, but there is a significant difference between them: WHERE filters data before it is grouped, and HAVING filters after grouping. Thus, the rows that were removed by the WHERE clause will not be included in the group. So, WHERE and HAVING operators can be used in the same sentence. Consider an example:

Run SQLSELECT Product, SUM(Quantity) AS Product_num 
FROM Sumproduct 
WHERE Product <> 'Skis Long' 
GROUP BY Product 
HAVING SUM(Quantity) > 4000

We added the WHERE operator to the previous example, where we specified the product "Skis Long", which in turn affected the grouping by the HAVING operator. As a result, we can see that the product "Skis Long" did not make it into the list of groups with more than 4,000 sold products.

3. Grouping and sorting

As with normal data sampling, we can sort the groups after grouping with the HAVING operator. For this, we can use the already familiar operator ORDER BY. In this situation, its application is similar to the previous examples. Example:

Run SQLSELECT Product, SUM(Quantity) AS Product_num 
FROM Sumproduct 
GROUP BY Product 
HAVING SUM(Quantity) > 3000
ORDER BY SUM(Quantity)

or simply specify the field number in the order in which we want to sort:

Run SQLSELECT Product, SUM(Quantity) AS Product_num 
FROM Sumproduct 
GROUP BY Product 
HAVING SUM(Quantity) > 3000
ORDER BY 2

We see that in order to sort the summary results, we just need to write the clause with ORDER BY after the operator HAVING.

MS Access does not support sorting groups by column aliases, that is, in our example, to sort the values, we will not be able to write ORDER BY Product_num at the end of the query.