Hi pdsqsql,
Why ROW_NUMBER() Is NOT a Good Unique Key
You're absolutely right—ROW_NUMBER() changes every time the SELECT runs unless you have a very strict, deterministic ORDER BY clause using stable columns.
This makes it unreliable for:
- Primary Keys in Dimension tables
- Foreign Keys in Fact tables
- Slowly Changing Dimensions (SCD)
- Incremental refresh or snapshot tracking
STAR schema requires stable surrogate keys, and ROW_NUMBER() cannot guarantee that.
2. The Correct Ways to Generate a Unique Key for Dimension Tables
Option A: Create a Surrogate Key (Best Practice)
In modern data modeling (SQL Server, Synapse, Fabric, Power BI), the standard approach is:
Add a surrogate integer key to your Dim table, like:
SQL
DIM_ProductKey INT IDENTITY(1,1) PRIMARY KEY
Show more lines
Even if your source does not have a natural key—you create one in your ETL process.
Why Surrogate Keys Are Best:
- Stable (never changes)
- Independent of source-system logic
- Small integer → fast joins
- Required for SCD Type 2
- Ensures dimension integrity even if source data changes
Option B: Use Natural Keys (Only if they never change)
If your view has columns that uniquely identify a business entity (e.g., ProductCode, CustomerID), you can use them as keys only if they are stable and unique.
But avoid natural keys if:
- They change over time
- The business re-uses them
- They come from multiple source systems
- They are composite keys
Option C: Create a Hash Key (when natural key exists but is composite)
If your dimension logically uses multiple natural key columns:
SQL
HASHBYTES ('SHA2_256', CONCAT (ProductCode, '-', Region, '-', Version))
Show more lines
This works well when:
- Views combine multiple tables
- Source key is composite
- You need a stable deterministic key but don’t want an IDENTITY column
But hash keys:
- Are long (up to 32 bytes)
- Slow down joins compared to integers
So, they are acceptable but not preferred over surrogate integers.
What NOT to Do
Avoid:
- ROW_NUMBER()
- RANK()
- NEWID() for deterministic dimensional keys
- Primary keys based on timestamps (not reliable)
3. How to Fix Your Case (Multiple Records in View)
If your view returns multiple records, you must analyze WHY:
Common reasons:
- Missing joining logic
- Duplicated business keys
- Source system not normalized
- Historical data mixing
Fix:
You need to identify the true business key that represents a real-world entity.
For example:
If your dimension is DimCustomer, the business key might be:
- CustomerID
- EmailAddress
- CustomerCode
If none uniquely identifies a row → you must use surrogate DimensionKey.
4. How to decide what goes into FACT vs DIM tables
FACT Table Should Contain:
- Foreign Keys to each dimension (surrogate keys)
- Numeric, additive measures Examples: Quantity, SalesAmount, Cost, Duration, Count
- Date Keys (FKs to DimDate)
- High-granularity transaction-level data
Examples (FactSales):
SalesKey (PK)
CustomerKey (FK)
ProductKey (FK)
DateKey (FK)
StoreKey (FK)
SalesQuantity
SalesAmount
DiscountAmount
TaxAmount
TotalAmount
DIMENSION Tables Should Contain:
- Descriptive attributes
- Text columns
- Categories, hierarchies, groups
- Slowly Changing attributes (SCD1/SCD2)
- Business-friendly descriptors
Examples (DimCustomer):
CustomerKey (PK - surrogate)
CustomerID (natural key)
CustomerName
Gender
Region
Address
BirthDate
CustomerType
Hope This Help!
Thanks,
Lakshmi.