Use batching and bulk-copy style operations rather than a single monolithic INSERT…SELECT over the linked server.
Key points and options:
- Use INSERT…SELECT with batching and minimal logging
If the target test table is (or can be) a heap and the database recovery model allows it, use INSERT INTO … SELECT with TABLOCK and batch the operation:
-- On TEST server, target table is empty heap
SET NOCOUNT ON;
DECLARE @rows INT = 1;
WHILE @rows > 0
BEGIN
INSERT INTO TestDb.dbo.Beast WITH (TABLOCK)
SELECT TOP (100000) *
FROM ProductionServer.ProdDb.dbo.Beast AS B
-- Optional filter by date if needed
-- WHERE B.Create_dt >= @FromDate AND B.Create_dt < @ToDate
SET @rows = @@ROWCOUNT; -- track rows inserted in this batch
END;
Relevant behavior:
-
INSERT INTO <target_table> SELECT <columns> FROM <source_table> can be minimally logged when:
- Database is in SIMPLE or BULK_LOGGED recovery.
- Target is a heap.
- Target is not used in replication.
-
TABLOCK is specified on the target.
- Minimal logging reduces log growth and can significantly improve throughput.
-
@@ROWCOUNT gives the number of rows inserted per batch; the loop terminates when no more rows are copied.
This pattern does not require a WHERE clause for batching; TOP (N) plus @@ROWCOUNT controls batch size.
- Consider adding an index on Create_dt for filtering
To prove benefit of a Create_dt index for copying “one day” of data:
- Add a nonclustered index on
Create_dt in production.
- Re-run the
INSERT…SELECT with a WHERE Create_dt BETWEEN ... predicate.
- Compare duration and IO vs. the no-index run.
The context notes that for large data moves, INSERT … SELECT is often preferred when source and destination are on the same instance, but the same principle applies across linked servers when the predicate can leverage an index.
- Use SqlBulkCopy / bulk copy semantics instead of SSIS defaults
For large one-time moves, bulk-copy style tools are more efficient than row-by-row SSIS data flows:
- Use
SqlBulkCopy (or bcp) from production to a file or directly to the test server.
-
SqlBulkCopy is designed for high-throughput loads and is generally faster than plain INSERT statements.
From the context:
-
SqlBulkCopy offers significant performance advantages over INSERT statements.
- If both tables are on the same instance,
INSERT … SELECT is faster, but across servers or when using files, bulk copy is preferred.
If using OLE DB / bulk copy options, TABLOCK on the target improves performance and allows concurrent loads when there are no indexes.
- Control commit size without WHERE clause
Commit control is independent of WHERE clauses:
- Use
TOP (N) in the INSERT…SELECT and loop until @@ROWCOUNT = 0.
- Each loop iteration is one transaction (or explicitly wrap each batch in its own transaction if needed).
Example with explicit transaction per batch:
WHILE 1 = 1
BEGIN
BEGIN TRAN;
INSERT INTO TestDb.dbo.Beast WITH (TABLOCK)
SELECT TOP (100000) *
FROM ProductionServer.ProdDb.dbo.Beast AS B;
IF @@ROWCOUNT = 0
BEGIN
COMMIT TRAN;
BREAK;
END;
COMMIT TRAN;
END;
This pattern avoids any need for an identity or key column and still gives row-count-based commits.
- Why SSIS appeared to stop at 8M rows
From the context, large bulk operations can be sensitive to:
- Batch sizes and buffer sizes.
- Logging and constraints.
- Triggers and tempdb usage.
For SSIS:
- Increase
DefaultBufferSize and/or DefaultBufferMaxRows to better utilize memory.
- Ensure the destination is a heap with
TableLock enabled in the destination component to mimic TABLOCK behavior.
- Disable constraints and triggers on the destination during load if possible.
If the package “completed” but the table had 0 rows, likely:
- The data flow failed and rolled back, or
- The destination was in a transaction that was not committed.
- When to avoid BULK INSERT
If the requirement is server-to-server without intermediate files, BULK INSERT from a file may not fit the process. In that case, prefer:
- Linked-server
INSERT…SELECT with batching and minimal logging, or
-
SqlBulkCopy from production to test.
References: