STAR Schema Model Design

Question

STAR Schema Model Design

pdsqsql 436

Hello,

I am planning to create START schema model but need to use existing views but looking data inside the views which have multiple records so it's hard to use ID which makes unique records.

I was thinking to add ROW_NUMBER() to create Unique Key for each view but I think using ROW_NUMBER () , ID number will change each time SELECT runs for this view.

What's the best way to create/identify the Unique Key?

I have to use Primary key from Dim table to Fact table for design but Dim table having an issue which I have mentioned above issue within multiple records,

I have another question for STAR Data Model design , what's the best way to place columns into FACT table. like Primary Key Columns from Dimension Tables, Date Columns, any Numeric/Integer columns???

Thanks for your help!

0 comments

Answer recommended by moderator

3 additional answers

Your answer

Answer 1

pdsqsql 436

Lakshmi,

Thanks for your detail suggestion, appreciate it!

I am kind of following similar was as designing each view breaking into underlying table to make DIM table and using original table's Primary Key ID and combining with FACT table which have mostly numeric measures and FK keys from the DIM tables.

I am not using surrogate key but using actual original table's PK column along with some descriptive required key fields.

0 comments

Answer 2

Yutaka_K_JP 1,645

You can use ROW_NUMBER() for a lot of things, but it’s not a good fit as a key — it tends to shift whenever the data changes.

If you clean up the dimension and give it a proper surrogate key, the star model stays stable and behaves the way it’s supposed to.

The fact table just carries the dimension keys, the dates, and the measures. That’s really all it needs.

0 comments

Answer 3

Lakshmi Narayana Garikapati 1,225 Microsoft External Staff Moderator

Hi pdsqsql,

Why ROW_NUMBER() Is NOT a Good Unique Key

You're absolutely right—ROW_NUMBER() changes every time the SELECT runs unless you have a very strict, deterministic ORDER BY clause using stable columns.

This makes it unreliable for:

Primary Keys in Dimension tables
Foreign Keys in Fact tables
Slowly Changing Dimensions (SCD)
Incremental refresh or snapshot tracking

STAR schema requires stable surrogate keys, and ROW_NUMBER() cannot guarantee that.

2. The Correct Ways to Generate a Unique Key for Dimension Tables

Option A: Create a Surrogate Key (Best Practice)

In modern data modeling (SQL Server, Synapse, Fabric, Power BI), the standard approach is:

Add a surrogate integer key to your Dim table, like:

SQL

DIM_ProductKey INT IDENTITY(1,1) PRIMARY KEY

Show more lines

Even if your source does not have a natural key—you create one in your ETL process.

Why Surrogate Keys Are Best:

Stable (never changes)
Independent of source-system logic
Small integer → fast joins
Required for SCD Type 2
Ensures dimension integrity even if source data changes

Option B: Use Natural Keys (Only if they never change)

If your view has columns that uniquely identify a business entity (e.g., ProductCode, CustomerID), you can use them as keys only if they are stable and unique.

But avoid natural keys if:

They change over time
The business re-uses them
They come from multiple source systems
They are composite keys

Option C: Create a Hash Key (when natural key exists but is composite)

If your dimension logically uses multiple natural key columns:

SQL

HASHBYTES ('SHA2_256', CONCAT (ProductCode, '-', Region, '-', Version))

Show more lines

This works well when:

Views combine multiple tables
Source key is composite
You need a stable deterministic key but don’t want an IDENTITY column

But hash keys:

Are long (up to 32 bytes)
Slow down joins compared to integers

So, they are acceptable but not preferred over surrogate integers.

What NOT to Do

Avoid:

ROW_NUMBER()
RANK()
NEWID() for deterministic dimensional keys
Primary keys based on timestamps (not reliable)

3. How to Fix Your Case (Multiple Records in View)

If your view returns multiple records, you must analyze WHY:

Common reasons:

Missing joining logic
Duplicated business keys
Source system not normalized
Historical data mixing

Fix:

You need to identify the true business key that represents a real-world entity.

For example:

If your dimension is DimCustomer, the business key might be:

CustomerID
EmailAddress
CustomerCode

If none uniquely identifies a row → you must use surrogate DimensionKey.

4. How to decide what goes into FACT vs DIM tables

FACT Table Should Contain:

Foreign Keys to each dimension (surrogate keys)
Numeric, additive measures Examples: Quantity, SalesAmount, Cost, Duration, Count
Date Keys (FKs to DimDate)
High-granularity transaction-level data

Examples (FactSales):

SalesKey (PK)

CustomerKey (FK)

ProductKey (FK)

DateKey (FK)

StoreKey (FK)

SalesQuantity

SalesAmount

DiscountAmount

TaxAmount

TotalAmount

DIMENSION Tables Should Contain:

Descriptive attributes
Text columns
Categories, hierarchies, groups
Slowly Changing attributes (SCD1/SCD2)
Business-friendly descriptors

Examples (DimCustomer):

CustomerKey (PK - surrogate)

CustomerID (natural key)

CustomerName

Gender

Region

Address

BirthDate

CustomerType

Hope This Help!

Thanks,

Lakshmi.

pdsqsql 436 Reputation points

2026-01-29T22:18:10.4733333+00:00

Thanks Laxmi for your detail explanations!

While reading "Even if your source does not have a natural key—you create one in your ETL process." I need to understand that currently we have view which complex join with multiple tables and that doesn't have unique records and of course View is Select Query but if I have to use View to make Datawarehouse kind of STAR Schema creating with DIM and FACT tables will be easy approach and how OR I should use underline tables which used in view to join Multiple tables as DIM tables and connect with FACT table using Key and add some DATE, Amount kind of fields?

Do I have to Recreate View (Top of another similar view with adding Key column in each view) to Add Surrogate Key?

When you say ETL process then I can just directly use in Azure Data Factory while connecting to existing view?

If you give some little detail guidance about your approach will be much helpful.

Thank you and Appreciate your valuable input!
Erland Sommarskog 133.6K Reputation points MVP Volunteer Moderator

2026-01-29T22:39:51.1166667+00:00

Please be aware of that there is a whole lot of things that we don't know about your situation. We only know as much as you tell us.

You cannot create surrogate keys in views. If these views produce duplicate rows, you probably need to work more with these views, so that you can identify key in the views. Else the data is not going be very meaningful.

As for the questions it your most recent post, I think the answer is it depends. That is, to tell you what you should do, we would need to know a lot more. See the queries, see some sample data, know something about the business rules etc. It may not fit into the format of this forum.
pdsqsql 436 Reputation points

2026-01-30T00:05:15.7366667+00:00

Erland,

Thanks for your response.

It's not duplicate records but it's joining multiple tables so it has One to Many relationship that's few fields are same in the records, just other information changes.

It's like out of 15 columns, almost 8-10 fields are same.

I am just looking from better design perspective.

I hope you got my concern.
Erland Sommarskog 133.6K Reputation points MVP Volunteer Moderator

2026-01-30T13:23:28.73+00:00

Maybe then the columns in the view should land in different tables?

From where is this view coming? Is that already a data warehouse? Or some sort of OLTP system? What sort of data is it? It sounds like you are looking into storing this data in a dimension table, but I am not sure that I interpret you correctly.

I have another question for STAR Data Model design , what's the best way to place columns into FACT table. like Primary Key Columns from Dimension Tables, Date Columns, any Numeric/Integer columns???

My understanding is that in a "proper" data warehouse, you have numeric keys for everything, dates, strings you name it. So if there is no natural numeric key, you introduce a surrogate key. Same if the natural key is composite.

(Full disclosure: The systems I have worked with have been more OLTP-like, and I prefer natural keys and composite keys, but I don't shy for surrogate keys when a natural key is impractical. What would happen if I actually were to design a data warehouse remains to see.)
Lakshmi Narayana Garikapati 1,225 Reputation points Microsoft External Staff Moderator

2026-02-02T12:05:19.8+00:00

Hi pdsqsql I'm following up to check whether your issue has been resolved. If you still have any questions or need further assistance, please don't hesitate to reach out we're happy to continue supporting you.

We truly appreciate your participation and thank you for being an active member of the community.

Best regards,

Lakshmi.
Lakshmi Narayana Garikapati 1,225 Reputation points Microsoft External Staff Moderator

2026-02-05T09:28:01.12+00:00

Hi pdsqsql ,

I'm following up to check whether your issue has been resolved. If you still have any questions or need further assistance, please don't hesitate to reach out we're happy to continue supporting you.

We truly appreciate your participation and thank you for being an active member of the community.

Best regards,

Lakshmi.
pdsqsql 436 Reputation points

2026-02-06T22:49:07.8933333+00:00

Erland,

Thanks for your response.

Right, I am right now taking approach to use base tables from the each view and making undelaying tables part of that specific view, I am converting as DIM table as that table already having Primary Key and it's most of the table having ID kind of generic key.

So If I have 5 Views then whatever tables each view had those tables making DIM tables and adding Primary key to FACT table and then adding Numeric, Date columns into FACT table.

Please let me know how you think as a STAR Schema Model perspective.

Thanks once again for your help and support!
pdsqsql 436 Reputation points

2026-02-06T22:52:30.9033333+00:00

Lakshmi,

Thank you for follow up and suggestions.

I am thinking following, please let me know what your thoughts.

I am right now taking approach to use base tables from the each view and making undelaying tables part of that specific view, I am converting as DIM table as that table already having Primary Key and it's most of the table having ID kind of generic key.

So If I have 5 Views then whatever tables each view had those tables making DIM tables and adding Primary key to FACT table and then adding Numeric, Date columns into FACT table.

Please let me know how you think as a STAR Schema Model perspective.

Thanks once again for your help and support!
Lakshmi Narayana Garikapati 1,225 Reputation points Microsoft External Staff Moderator

2026-02-12T05:11:28.1166667+00:00

Hi pdsqsql,

Instead of converting all tables behind each view into dimensions, first identify the core business processes and define Fact tables at the correct grain. Then, model Dimensions based on descriptive business entities used for analysis, regardless of how the original views were built. Use surrogate keys for dimensions, keep facts lean with numeric measures and foreign keys, and use a conformed Date dimension for all date analysis.

Thanks,

Lakshmi.
Lakshmi Narayana Garikapati 1,225 Reputation points Microsoft External Staff Moderator

2026-02-19T05:13:03.15+00:00

Hi pdsqsql,

Thanks for the update

Thanks,

Lakshmi.

Answer 4

To create a unique key for your views in a star schema model, using ROW_NUMBER() can indeed lead to issues since the numbering will change with each execution of the query. Instead, consider the following approaches:

Natural Keys: If your data already has a natural unique identifier (like a combination of columns that can uniquely identify a record), use that as your unique key. This could be a combination of columns that together form a unique value.
Surrogate Keys: If no natural key exists, you can create a surrogate key. This is a unique identifier that is not derived from the data itself. You can generate this key during the ETL process, ensuring it remains consistent across loads. This can be done by creating a separate key column in your dimension tables that will not change.
Composite Keys: If your dimension table has multiple records that can be uniquely identified by a combination of columns, you can use a composite key approach, where you define a unique key based on multiple columns.

Regarding your second question about placing columns into the FACT table, the best practice is to include:

Primary Key Columns from Dimension Tables: These are essential for establishing relationships between your fact and dimension tables.
Date Columns: Including a date column is crucial for time-based analysis and allows you to perform time series analysis effectively.
Numeric/Integer Columns: These should represent measurable quantities, such as sales amounts, quantities sold, or any other metrics that you want to analyze.

Ensure that your fact table is designed to store observations or events, and it should contain dimension key columns that relate to dimension tables, along with numeric measure columns for analysis.

References:

Share via

STAR Schema Model Design

3 additional answers

Your answer