codeXmonkey - SQL-Group By

SQL

SQL - Group By

The GROUP BY statement in SQL is used to arrange identical data into groups. This is often used with aggregate functions (such as COUNT, SUM, AVG, MAX, MIN) to perform operations on each group of data. Here’s a detailed tutorial on how to use the GROUP BY statement effectively.

Basic Syntax


SELECT column1, aggregate_function(column2)
FROM table_name
WHERE condition
GROUP BY column1;

Steps to Use GROUP BY

Specify the columns: Indicate the columns to be selected. You should include the column you want to group by.
Use aggregate functions: Apply aggregate functions to the columns you want to aggregate.
Group the data: Use the GROUP BY clause to specify the column(s) by which you want to group the data.

Examples

1. Grouping and Counting

Assume you have a table named employees with columns department and salary.

Count the number of employees in each department:


SELECT department, COUNT(*)
FROM employees
GROUP BY department;

2. Grouping and Summing

Calculate the total salary for each department:


SELECT department, SUM(salary)
FROM employees
GROUP BY department;

3. Grouping and Averaging

Calculate the average salary for each department:


SELECT department, AVG(salary)
FROM employees
GROUP BY department;

4. Grouping with Multiple Columns

Assume you also have a column job_title. Group by both department and job_title to get the count of employees for each job title within each department:


SELECT department, job_title, COUNT(*)
FROM employees
GROUP BY department, job_title;

5. Using HAVING with GROUP BY

The HAVING clause is used to filter groups based on a condition, similar to the WHERE clause but applied to groups.

Get departments with more than 10 employees:


SELECT department, COUNT(*)
FROM employees
GROUP BY department
HAVING COUNT(*) > 10;

6. Grouping and Finding Maximum/Minimum

Find the maximum salary in each department:


SELECT department, MAX(salary)
FROM employees
GROUP BY department;

Find the minimum salary in each department:


SELECT department, MIN(salary)
FROM employees
GROUP BY department;

Example Table Creation and Data Grouping

1. Create the employees table:


CREATE TABLE employees (
    id INT PRIMARY KEY AUTO_INCREMENT,
    first_name VARCHAR(50),
    last_name VARCHAR(50),
    department VARCHAR(50),
    job_title VARCHAR(50),
    salary DECIMAL(10, 2)
);

2. Insert sample data into the employees table:


INSERT INTO employees (first_name, last_name, department, job_title, salary)
VALUES 
('John', 'Doe', 'Engineering', 'Engineer', 70000),
('Jane', 'Smith', 'Marketing', 'Manager', 80000),
('Alice', 'Johnson', 'HR', 'Specialist', 50000),
('Bob', 'Brown', 'Engineering', 'Manager', 90000),
('Tom', 'Harris', 'Marketing', 'Specialist', 60000);

Performing Grouping

Group by Department and Count Employees:


SELECT department, COUNT(*)
FROM employees
GROUP BY department;

Calculate Total Salary by Department:


SELECT department, SUM(salary)
FROM employees
GROUP BY department;

Get Average Salary by Department:


SELECT department, AVG(salary)
FROM employees
GROUP BY department;

Group by Department and Job Title and Count Employees:


SELECT department, job_title, COUNT(*)
FROM employees
GROUP BY department, job_title;

Filter Groups with HAVING Clause:


SELECT department, COUNT(*)
FROM employees
GROUP BY department
HAVING COUNT(*) > 1;

Find Maximum Salary by Department:


SELECT department, MAX(salary)
FROM employees
GROUP BY department;

Find Minimum Salary by Department:


SELECT department, MIN(salary)
FROM employees
GROUP BY department;

By following these examples, you can effectively use the GROUP BY statement to group and aggregate data in your database tables.

General Tips!

Use Proper Indexing: Index frequently queried columns to improve query performance, but avoid over-indexing as it can slow down write operations.
Write Efficient Queries: Optimize your queries by selecting only the necessary columns, avoiding SELECT *, and using joins wisely.
Normalize Your Data: Organize your database into normalized tables to reduce redundancy and improve data integrity.
Use Transactions: Ensure data consistency and integrity by wrapping multiple related operations in transactions, especially in multi-step processes.
Understand and Use SQL Functions: Leverage built-in SQL functions for data manipulation and retrieval, such as aggregate functions (SUM, COUNT) and string functions (CONCAT, SUBSTRING).