What is data analysis?
This is basically data cleaning, analysis, reporting and dashboard development.
Data analysis revolves around data quality, data warehouses, dashboarding, documentation and using data to improve decision making.
What is a data warehouse?
A data warehouse is a centralized system where data from different sources is collected, cleaned, organized and stored for reporting, analysis, dashboards and decision making.
What does data quality mean?
It mean ensuring data is fit for use. Good quality data should be complete, accurate, consistent, timely, valid and unique. This are the data quality dimensions.
Complete - is the required data available?
Accuracy - is the data correct?
Consistency - does the data match across systems?
Timeliness - was the data submitted on time?
Validity - is it in the right format?
Uniqueness - are there any duplicates?
How to clean a dataset
First you need to understand the structure of the dataset and the expected fields. Then check for missing values, duplicates, incorrect formats, inconsistent names, outliers, and invalid outputs(will be covered in excel and python article)
What makes a good dashboard?
A good dashboard should be simple, accurate, interactive, and action-oriented. It should show the most important indicators clearly, allow users to filter by relevant categories, and help them identify area that need attention.
How do you develop a dashboard?
Start by understanding the users' information needs and key indicators they want to monitor. Then prepare and clean the data, model relationships between tables, create measures where needed, and design visuals that clearly communicate performance. After building the dashboard, validate the numbers against the source data and collect feedback from the users.
What is a data dictionary?
It is a documentation that explains the fields in a dataset. It usually includes the column name, description, data type, allowed values, source, and business rules. It helps users understand and interpret data consistently.
SQL FOR DATA ANALYSIS
SQL is a language used to communicate with a database.
GOAl
Knowing how to explain and use SQL to extract, summarize, join, and validate data from a database.
SELECT
Clause used to select either every column from a table or specific columns from a table.
i.e SELECT *
FROM table_name; this selects everything.
SELECT
column1, column2
FROM table_name; this selects specific columns
WHERE
This filters rows into your specifications
i.e SELECT *
FROM table_name
WHERE column1<80;
ORDER BY
It sorts data either in ascending or descending order.
i.e SELECT
column1, column2
FROM table_name
ORDER BY column1 ASC/DESC
GROUP BY
It groups selected data.
i.e SELECT *
FROM table_name
GROUP BY column1;
HAVING
It filters rows after grouping
i.e SELECT *
FROM table_name
GROUP BY column1
HAVING column1<80;
AGGREGATE FUNCTIONS
SUM - adds up a selected column.
COUNT(*) - counts the number of rows in a column.
AVG - finds the average of a selected column.
GROUP BY - Groups rows.
HAVING - filters rows after grouping.
MIN - finds the minimum value in a row.
MAX - finds the maximum value in a row.
JOINS
They help us connect tables.
For instance you have three tables.
types
Inner join
Returns only rows that have matching values in both tables.
i.e SELECT column1
FROM table_name1
INNER JOIN table_name2
ON table_name1.column1 = table_name2.column1;
Left join
It takes all the records on the left table and attaches matching records from the right table
i.e SELECT column1, column2, column3
FROM table_name1
LEFT JOIN table_name2
ON table_name1_id = table_name2_id;
Right join
Returns all the rows from the right table and only matching rows from the left table.
i.e SELECT column1
FROM table_name1
RIGHT JOIN table_name2
ON table_name1.column = table_name2.column
Full join
Returns all rows when there is match either in the left or the right table.