Introduction About Star Schema Vs Snowflake Schema Vs Galaxy Schema

Introduction About Star Schema Vs Snowflake Schema Vs Galaxy Schema

Here we’re back with another blog from our Dev IT Engineers to help you make your development life easier. This blog will explore the Data Warehouse Schema Types that are used today including Star Schema and Snowflake Skema, and the differences between them. We will also gloss over the same for Galaxy Schema and discuss how these Schemas are used to structure data warehouses.

Data Warehouse Schema

In data warehouses, a schema is the definition of the way the system is to be organized with all its database entities and their logical association. This includes data formats like fact tables and dimension tables. In Data Warehousing, the following types of Schemas are popularly used:

  1. Star Schema
  2. Snowflake Schema
  3. Galaxy Schema
  4. Star Cluster Schema

#1. Star Schema

Star Schema is the simplest and most effective type of schema in the data warehousing category. It includes a fact table which gets surrounded by multiple dimension tables in a shape that resembles a star. The fact table has one-to-many relationships with all the dimension tables and each of its rows gets associated with a corresponding row in the dimension table as well.

Because of this reason, the navigation between the tables becomes easy when it comes to querying the stored data. This also makes it quite easier for the end-user to understand the structure of your data warehousing and that’s the reason that the Star Schema model is so common among Business Intelligence tools. During the designing process, the dimension tables are de-normalized so that their various attributes can be used to store the contextual data for all kinds of analysis and reporting.

Benefits Of Star Schema

  • Queries in Star Schema use simple join commands while retrieving data and therefore this increases the query performance.
  • Star Schema makes it easy to retrieve the data at any point in time for the purpose of reporting it.

Disadvantages Of Star Schema

  • If changes are required in the existing model, it is not recommended to modify it and reuse, thereby reducing its long-term lifespan.
  • The data redundancy is greatly increased since there isn’t any hierarchy in the tables.

An example of a Star Schema is given below.

Star Schema example

Querying A Star Schema

For performing queries with a Star Schema data model, the end user would need to request a report via the Business Intelligence tools. All these requests are then processed by creating a chain of queries of the ‘SELECT’ type in the internal system. These queries will impact the execution time with their performance.

For an example, if your business needs to know how many books were sold in Maharashtra in January 2022, then you can run the following query for your Star Schema Model tables:

SELECT    pdim.Name Product_Name,
                   Sum (sfact.sales_units) Quanity_Sold
FROM      Product pdim,
                   Sales sfact,
                   Store sdim,
                   Date ddim
WHERE sfact.product_id = pdim.product_id
                 AND sfact.store_id = sdim.store_id
                 AND sfact.date_id = ddim.date_id
                 AND sdim.state = 'Maharashtra'
                 AND ddim.month   = 1
                 AND ddim.year    = 2022
                 AND pdim.Name in (‘Books’)
GROUP BY pdim.Name

Results:

Product_Name Quality_Sold
Books 12,702

Hope this example made it easier to understand the query process in Star Schema.

#2. Snowflake Schema

The Snowflake Schema is designed by using Star Schema as an input. This makes snow flaking an important process that completely normalizes the dimension tables from a Star Schema model.

The model resembles a snowflake with the fact table being in the centre and being surrounded by multiple hierarchies of dimension tables. Each row of the fact table gets associated with several corresponding rows in the dimension tables thanks to its foreign key reference.

During the design process of the Snowflake schema, the dimension tables are normalized, and foreign keys are added to each level to link the parent attribute. The complexity of this schema is directly proportional to the hierarchy of the dimensional tables and that affects the query performance.

Benefits of Snowflake Schema:

  • Data redundancy gets removed upon the creation of new dimension tables thanks to the hierarchal structure.
  • Snowflake Schema tends to use less storage space than Star Schema.
  • It’s rather easy to update Snowflake schema in the long run and this increases its overall lifespan.

Disadvantages of Snowflake Schema:

  • Due to normalized dimension tables, the ETL system must load the number of tables.
  • Due to the normalization of the dimension tables, the ETL system ends up having to load the number of tables.
  • You may often need to use complex joins while performing queries due to the large number of tables that has been added. This decreases the query performance in Snowflake Schema.

An example of a Snowflake Schema is given below.

Snowflake Schema Example

The Dimension Tables in the above Snowflake Diagram are normalized as explained below:

  • The dimension for the date gets normalized into quarterly, monthly, and weekly which leaves foreign keys in the date table.
  • The store dimension is normalized to comprise the table for State.
  • The product dimension is normalized into Brand.
  • In the Customer dimension, the attributes connected to the city are moved into the new City table by leaving a foreign key id in the Customer table.

In the same way, a single dimension can maintain multiple levels of hierarchy.

Different levels of hierarchies from the above diagram can be referred to as follows:

  • The Quarterly ID, Monthly ID and Weekly ID are all used as surrogate keys that are created for the purpose of date dimension hierarchies, and these have been added as foreign keys in the date dimension table.
  • The State ID is the new surrogate key created for Store dimension hierarchy and it has been added as the foreign key in the Store dimension table.
  • Brand ID is the new surrogate key created for the Product dimension hierarchy and it has been added as the foreign key in the Product dimension table.
  • City ID is the new surrogate key created for Customer dimension hierarchy and it has been added as the foreign key in the Customer dimension table.

Querying A Snowflake Schema

For query generation, we will need to generate similar reports for the end-users as shown in the star schema structures. However, the queries do become a little more complicated in the case of Snowflake Schema.

In the above example of a Snowflake Schema model, we will generate the very same query that was designed during the Star Schema example of querying. So, if the business wants to know how many books were sold in the state of Maharashtra in January 2022, you can run the following query for the Snowflake Schema model tables:

SELECT    pdim.Name Product_Name,
                   Sum (sfact.sales_units) Quanity_Sold
FROM        Sales sfact
INNER JOIN Product pdim ON sfact.product_id = pdim.product_id
INNER JOIN Store sdim ON sfact.store_id = sdim.store_id
INNER JOIN State stdim ON sdim.state_id = stdim.state_id
INNER JOIN Date ddim ON sfact.date_id = ddim.date_id
INNER JOIN Month mdim ON ddim.month_id = mdim.month_id
WHERE stdim.state = 'Maharashtra'
                 AND mdim.month   = 1
                 AND ddim.year    = 2022
                 AND pdim.Name in (‘Books’)
GROUP BY pdim.Name

Results:

Product_Name Quantity_Sold
Books 12,702

Points To Remember While Querying Star (or) Snowflake Schema Tables

A query for the Snowflake Schema Model can be designed with the following tips in mind:

SELECT Clause:

  • The SELECT clause specifies attributes that are shown in the query results.
  • The SELECT statement utilizes various groups for the purpose of finding aggregated values. Therefore, you must use the GROUP BY clause in the WHERE condition.

FROM Clause:

  • The essential fact tables and dimension tables need to be chosen as per the context of the FROM clause.

WHERE Clause:

  • The WHERE clause defines appropriate dimension attributes by joining them with the fact table attributes. The surrogate keys from the dimension table get joined by using the respective foreign keys from the fact tables in order to fix the range of data that needs to be queried. For this, you can refer to the above example of the Star Schema query process. You can also fill the data in the clause if you have inner or outer joins as in the Snowflake Schema example.
  • Dimension attributes are also mentioned as constraints on data in the where clause.
  • By filtering the data with all the above steps, appropriate data is returned for the reports.

As per the needs of your business, you will be able to add or remove the facts, dimensions, attributes, and various constraints to the Star Schema or Snowflake Schema query system with the structure mentioned above. This will also allow you to add sub queries or merge different queries and their results in order to generate data for complex reports.

#3. Galaxy Schema

A Galaxy Schema or Fact Constellation Schema is a warehousing format in which various fact tables are able to share dimension tables among themselves. This Schema ends up resembling a galaxy with all the tables representing the stars.

The shared rows of the dimension tables in this Schema model are called conformed dimensions.

The Galaxy Schema is used for the purpose of getting sophisticated requirements for the aggregated fact tables that are a lot more complex in order to be supported by the Star Schema or Snowflake Schema models. The higher level of complexity in this model makes it more difficult to maintain than the others.

An example of Galaxy Schema is given below.

#4. Star Cluster Schema

Sometimes, you may end up forming a Snowflake Schema that has many dimension tables that may require more complex joins during the querying process. On the other hand, a Star Schema with fewer dimensions would end up being less complex but more prone to data redundancy. As a result, a Star Cluster Schema is created by combining the two in one.

The Star Schema model forms the base of the Star Cluster Schema and its essential dimension tables are then Snowflaked which forms a more stable structure with the benefits of both.

An example of a Star Cluster Schema is given below.

Which Is Better Snowflake Schema or Star Schema?

The Data Warehousing platforms and the Business Intelligence tools that are used in your DW system will end up playing a vital role in deciding the suitable schema to be designed. As a result, the Star and Snowflake Schema methods are the ones that are most commonly used in the process of Data Warehousing.

The Star Schema method is preferred if the Business Intelligence tools allow the users to interact with the table structures by the use of simple queries. However, the Snowflake Schema model is more preferable if the BI tools are more complicated for the business users.

You can use either Schema model depending on whether you want to save storage with the Snowflake Schema or provide easier querying with Star Schema.

Difference Between Star Schema Vs Snowflake Schema

Given below are the key differences between Star schema and Snowflake schema.

S.No. Star Schema Snow Flake Schema
1 Data redundancy is more Data redundancy is less
2 Strong space for dimension table is more Strong space for dimension table is comparatively less
3 Contains de-normalized dimension tables. Contains normalized dimension tables.
4 Single fact table is surrounded by multiple dimension tables. ingle fact table is surrounded by multiple hierarchies of dimension tables.
5 Queries use direct joins between fact and dimensions to fetch the data. Queries use complex joins between fact and dimensions to fetch the data.
6 Query execution time is less. Query execution time is more.
7 Anyone can easily understand and design the schema. It is tough to understand and design the schema.
8 Uses top down approach. Uses bottom up approach.

Conclusion

We hope that you loved another explanatory blog from our engineers. If this blog helped you understand the difference between the Schema models and choose the right one for your organization then feel free to browse our site for more amazing knowledge to make your development life easier.