Copying a 32-Million Row Table From Production Server to Test Server

Question

Copying a 32-Million Row Table From Production Server to Test Server

Bobby P 271

So we believe we're having tempdb issues with a very wide table that has 32-Million Rows on it. There is no Primary Clustered Index, No Indexes defined on it at all.

We have tried an INSERT SELECT going across a Linked Server to our Test Server to copy this Beast Table and it took 55-Minutes to copy one day's worth of data which was 1.1 Million Rows. Again...there is NO Indexing on this Table so we imagine it's going through every data page to pluck off one day's worth of data.

Our challenge here is that we are being tasked to prove that adding a [Create_dt] Index would help our situation.

We then tried a SSIS Package to Data Pump our Data From our Production Table into Test.

That just seems to have stopped after 8-Million Rows. We didn't play around much with the Data Flow Task Properties [DefaultBufferMaxRows] and/or [DefaultBufferSize] but after it looked like the SSIS Package completed with 8-million rows, when we queried the Table in the Test Environment, there was nothing there...0 rows.

We're about ready to punt on this.

Is there any way to do an INSERT SELECT using a ROWCOUNT without a WHERE Clause since there is currently no defined Index? Nor an IDENTITY column. Yeah...Yeah...Yeah...I know VERY POOR design. So we can't really control how SQL Server Optimizer is accessing the rows.

We don't think a BULK INSERT is the answer here.

I think I saw in some Google searches that there is a way to do an INSERT SELECT and control a COMMIT by the number of rows. But is that always based on some WHERE Clause filtering?

Any help and/or suggestions would be GREATLY appreciated here.

Thanks in Advance for your review and am hopeful for a viable solution.

Bobby P 271 Reputation points

2026-03-13T20:03:48.66+00:00

I cannot change the Linked Server Options Enable Promotion of Distributed Transaction to False

Any other suggestions??
Erland Sommarskog 133.6K Reputation points MVP Volunteer Moderator

2026-03-14T09:09:41.8566667+00:00

I cannot change the Linked Server Options Enable Promotion of Distributed Transaction to False

I'm not sure how this came into the picture? What do you mean with "cannot" - do you get an error message? Or are there operations where you need this setting to be enabled? In any case, for INSERT SELECT it should really matter, I think.

By the way, if you are pursuing the idea to copy the table with INSERT SELECT rather than restoring a backup, run sp_spaceused on the table on both sides when you are done. There may be substantial differences in size.
Erland Sommarskog 133.6K Reputation points MVP Volunteer Moderator

2026-03-14T09:45:17.69+00:00
And, oh, how are you running the INSERT? Are you running it on the source server like:

INSERT remoteserver.db.dbo.targettbl (...) SELECT * FROM sourcetable

Or are you running it on the local server:

INSERT targettbl (....) SELECT * FROM remoteserver.db.dbo.sourcetable

My preference is very much for the latter, and I took for granted that this is what you are doing. But then it occurred to me that you never said this, so I wanted to check.
Bobby P 271 Reputation points

2026-03-14T13:33:31.47+00:00

I cannot do the latter because the Linked Server is on the Test Server from the Production Server...not the other way around. There is no Linked Server to the Production Server from the Test Server...And that makes sense if you think about it. You probably would not want to link up...rather just link down...if you know what I mean
Erland Sommarskog 133.6K Reputation points MVP Volunteer Moderator

2026-03-15T11:05:36.6633333+00:00
(I intended to post most of this yesterday, but it appears that I forgot to press "Add comment".)

I see. Then again, if you have the permissions on the test server, there is nothing to stop you from setting up a linked server to the prod.server, as long as you have credentials. There is no way for the prod.server to say no; as far as the prod.server is concerned it is just another client connection. (Then again, your network people may have been smart to disallow network traffic from test to prod, but not the other way around. Then you are likely to be stuck.)I have not really played a lot with inserting to a remote table, but it does not sound efficient to me. In that case, I may prefer the BULK INSERT way.Out of curiosity, I made a test. I compared the performance of

INSERT LOOPBACK.tempdb.dbo.Orders SELECT * FROM Northgale.dbo.Orders

and

INSERT tempdb.dbo.Orders SELECT * FROM LOOPBACK.Northgale.dbo.Orders

LOOPBACK is a linked server to the server itself. There are about one million rows in the Orders table (regular table width). The latter ran in 11 seconds. The former ran for four minutes.

I guess this point is moot by now, but it may be worth to keep in mind, if you need to redo this operation in the future.
Akhil Gajavelly 1,725 Reputation points Microsoft External Staff Moderator

2026-04-08T07:21:45.3833333+00:00
Hi @

Thanks for all the detailed inputs shared so far especially around heap behavior and full table scans. That really explains why this is taking longer than expected.

Just to add one practical approach that may help you move forward with the copy under your current constraints (no index, no identity, and limited control over linked server setup):

You can process the data in smaller batches and control commits, even without using a WHERE clause. SET NOCOUNT ON; DECLARE @rows INT = 1; WHILE @rows > 0 BEGIN BEGIN TRAN; INSERT INTO TestServer.TestDB.dbo.YourTable WITH (TABLOCK) SELECT TOP (100000) * FROM ProductionDB.dbo.YourTable; SET @rows = @@ROWCOUNT; COMMIT TRAN; END;

Why this helps:

Breaks the load into smaller, manageable transactions

Reduces pressure on tempdb and transaction log

Avoids long-running single transactions (which can fail or rollback)

Works even without any indexing or identity column

That said, since the source table is a heap with no index, each batch will still involve scans so this approach mainly improves stability and reliability, not raw speed.

For your PoC, demonstrating the benefit of adding an index on Create_dt (as already suggested) would still be the key step to show real performance improvement.

Hope this helps you keep things moving

Thanks,
Akhil.
Erland Sommarskog 133.6K Reputation points MVP Volunteer Moderator

2026-04-08T12:27:00.6666667+00:00

You can process the data in smaller batches and control commits, even without using a WHERE clause. SET NOCOUNT ON; DECLARE @rows INT = 1; WHILE @rows > 0 BEGIN BEGIN TRAN; INSERT INTO TestServer.TestDB.dbo.YourTable WITH (TABLOCK) SELECT TOP (100000) * FROM ProductionDB.dbo.YourTable; SET @rows = @@ROWCOUNT; COMMIT TRAN; END;

Akhil, that will of course insert the same 100000 rows again and again, so that is not a smart idea.

Chunking can certainly be helpful at times, but you need to do it right. And the core problem in Bobby's case was obviously that he was running the INSERT statement on the source server rather than the source server.

I'm not sure why you feel compelled to add information to threads that are obviously dead, but if you do, you need to make sure that the information is helpful and on the mark.

3 answers

Your answer

Bobby P 271 Reputation points

2026-03-13T20:03:48.66+00:00

I cannot change the Linked Server Options Enable Promotion of Distributed Transaction to False

Any other suggestions??
Erland Sommarskog 133.6K Reputation points MVP Volunteer Moderator

2026-03-14T09:09:41.8566667+00:00

I cannot change the Linked Server Options Enable Promotion of Distributed Transaction to False

I'm not sure how this came into the picture? What do you mean with "cannot" - do you get an error message? Or are there operations where you need this setting to be enabled? In any case, for INSERT SELECT it should really matter, I think.

By the way, if you are pursuing the idea to copy the table with INSERT SELECT rather than restoring a backup, run sp_spaceused on the table on both sides when you are done. There may be substantial differences in size.
Erland Sommarskog 133.6K Reputation points MVP Volunteer Moderator

2026-03-14T09:45:17.69+00:00

And, oh, how are you running the INSERT? Are you running it on the source server like:

INSERT remoteserver.db.dbo.targettbl (...) SELECT * FROM sourcetable

Or are you running it on the local server:

INSERT targettbl (....) SELECT * FROM remoteserver.db.dbo.sourcetable

My preference is very much for the latter, and I took for granted that this is what you are doing. But then it occurred to me that you never said this, so I wanted to check.
Bobby P 271 Reputation points

2026-03-14T13:33:31.47+00:00

I cannot do the latter because the Linked Server is on the Test Server from the Production Server...not the other way around. There is no Linked Server to the Production Server from the Test Server...And that makes sense if you think about it. You probably would not want to link up...rather just link down...if you know what I mean
Erland Sommarskog 133.6K Reputation points MVP Volunteer Moderator

2026-03-15T11:05:36.6633333+00:00

(I intended to post most of this yesterday, but it appears that I forgot to press "Add comment".)

I see. Then again, if you have the permissions on the test server, there is nothing to stop you from setting up a linked server to the prod.server, as long as you have credentials. There is no way for the prod.server to say no; as far as the prod.server is concerned it is just another client connection. (Then again, your network people may have been smart to disallow network traffic from test to prod, but not the other way around. Then you are likely to be stuck.)I have not really played a lot with inserting to a remote table, but it does not sound efficient to me. In that case, I may prefer the BULK INSERT way.Out of curiosity, I made a test. I compared the performance of

INSERT LOOPBACK.tempdb.dbo.Orders SELECT * FROM Northgale.dbo.Orders

and

INSERT tempdb.dbo.Orders SELECT * FROM LOOPBACK.Northgale.dbo.Orders

LOOPBACK is a linked server to the server itself. There are about one million rows in the Orders table (regular table width). The latter ran in 11 seconds. The former ran for four minutes.

I guess this point is moot by now, but it may be worth to keep in mind, if you need to redo this operation in the future.
Akhil Gajavelly 1,725 Reputation points Microsoft External Staff Moderator

2026-04-08T07:21:45.3833333+00:00

Hi @

Thanks for all the detailed inputs shared so far especially around heap behavior and full table scans. That really explains why this is taking longer than expected.

Just to add one practical approach that may help you move forward with the copy under your current constraints (no index, no identity, and limited control over linked server setup):

You can process the data in smaller batches and control commits, even without using a WHERE clause. SET NOCOUNT ON; DECLARE @rows INT = 1; WHILE @rows > 0 BEGIN BEGIN TRAN; INSERT INTO TestServer.TestDB.dbo.YourTable WITH (TABLOCK) SELECT TOP (100000) * FROM ProductionDB.dbo.YourTable; SET @rows = @@ROWCOUNT; COMMIT TRAN; END;

Why this helps:

Breaks the load into smaller, manageable transactions

Reduces pressure on tempdb and transaction log

Avoids long-running single transactions (which can fail or rollback)

Works even without any indexing or identity column

That said, since the source table is a heap with no index, each batch will still involve scans so this approach mainly improves stability and reliability, not raw speed.

For your PoC, demonstrating the benefit of adding an index on Create_dt (as already suggested) would still be the key step to show real performance improvement.

Hope this helps you keep things moving

Thanks,
Akhil.
Erland Sommarskog 133.6K Reputation points MVP Volunteer Moderator

2026-04-08T12:27:00.6666667+00:00

You can process the data in smaller batches and control commits, even without using a WHERE clause. SET NOCOUNT ON; DECLARE @rows INT = 1; WHILE @rows > 0 BEGIN BEGIN TRAN; INSERT INTO TestServer.TestDB.dbo.YourTable WITH (TABLOCK) SELECT TOP (100000) * FROM ProductionDB.dbo.YourTable; SET @rows = @@ROWCOUNT; COMMIT TRAN; END;

Akhil, that will of course insert the same 100000 rows again and again, so that is not a smart idea.

Chunking can certainly be helpful at times, but you need to do it right. And the core problem in Bobby's case was obviously that he was running the INSERT statement on the source server rather than the source server.

I'm not sure why you feel compelled to add information to threads that are obviously dead, but if you do, you need to make sure that the information is helpful and on the mark.

Answer 1

When I have had to do daily extracts from a large dataset, I typically used partitions. You create a partition for each day (created in advance), and the partition function allows selecting by day. The partitions will allow fast insert while you are doing an extract (assume the dates are different).

a million rows is a large extract and causes lots of logging. Do a select into a new staging table on the test server. This only logs allocations. I typically add a processed column. Then you can move 10 - 100k rows from the staging table to the actual table.

Answer 2

Erland Sommarskog 133.6K MVP Volunteer Moderator

The situation is indeed bleak. Sure, you can do the insert in chunks, assuming that you can find a column to chunk on. For instance, you mentioned a create_dt column. Then you can do:

INSERT target (...)
    FROM tbl
   WHERE create_dt BETWEEN @start AND @end

But beware that there will be a full table scan each time. So unless the bottleneck is the insertion of the rows, this likely to make things take even longer time.

Do I understand correctly that the reason you want to copy the table is to make a PoC that the clustered index will help? Yes, the clustered index is likely to alleviate the situation. But also copying the table with INSERT SELECT is may also show direct improvements. One problem with heaps - that is tables without a clustered index - is that frequent insertions and deletions can lead to a table with a lot of empty space where many eight-page extents just have a single row. After all, 32 million rows is not that startling. (But I note that you say that the table is very wide. )

Anyway, the only way to make faithful tests is restore a backup of the database (or at least the filegroup(s) hosting this monster table) so that you preserve any fragmentation. Then you can try different things like adding a clustered index, which is likely to be the best long-term solution. But you can also try ALTER TABLE tbl REBUILD which will rebuild the heap to make it more compact. Beware that all operations on this misfit are likely to take a very long time.

You can also run an analysis on the table with

SELECT * FROM sys.dm_db_index_physical_stats(db_id(), object_id('tbl'), 0, 0, 'DETAILED')

The particular column of interest is avg_page_space_used_is_percent. Beware that this query is likely to take the full hour at least, as it will scan the full table.

Bobby P 271 Reputation points

2026-03-14T04:56:43.6+00:00

Thank You sooooo much for you being so candid and appreciative and compassionate to my situation. I am trying to explain this to my people and to the DBA but seem to be hitting a brick wall. Seems to be Relational Database 101 to me.
Bobby P 271 Reputation points

2026-03-14T04:58:12.7833333+00:00

Also...I am running it for a day at a time...it's getting there...ssslllooowww...but making progress
Bobby P 271 Reputation points

2026-03-14T13:41:41.93+00:00

I soooo appreciate you being so candid and your compassion. Yes...quite a pickle. We are attempting to load a day at a time. It is doing about 1 Million Rows in about an hour. So yeah...about 30-Days worth of data we maintain in this beast. Deleting GETDATE() - 30-Days back with each production run.

So we are getting there.

Are you thinking a Clustered Index then on the [Create_dt]? I wasn't even thing of putting a Clustered Index on this Beast. But I think that makes sense if traditionally that's how we are going to access it.

If we decide and as POC of adding this Clustered Index once all the rows are loaded, is it as simple as adding the Clustered Index then? I think I remember back in the DB2 and Oracle Days that if you added an Index, you needed to run Stats in order for the Optimizer to determine its access path.

Thanks Again for your feedback!
Erland Sommarskog 133.6K Reputation points MVP Volunteer Moderator

2026-03-14T13:50:28.9333333+00:00

If you add an index, SQL Server takes the opportunity to create statistics for the index since it is scanning all the data anyway.

One advantage with having a clustered index is that you are less prone to get a table with a lot of holes in it, as I suspect may be your current situation.

Creating a clustered index on this misfit is certainly to take a lot more time than creating a non-clustered index on create_dt. But it is also likely to be more useful.

Answer 3

Use batching and bulk-copy style operations rather than a single monolithic INSERT…SELECT over the linked server.

Key points and options:

Use INSERT…SELECT with batching and minimal logging

If the target test table is (or can be) a heap and the database recovery model allows it, use INSERT INTO … SELECT with TABLOCK and batch the operation:

-- On TEST server, target table is empty heap
SET NOCOUNT ON;

DECLARE @rows INT = 1;

WHILE @rows > 0
BEGIN
    INSERT INTO TestDb.dbo.Beast WITH (TABLOCK)
    SELECT TOP (100000) *
    FROM   ProductionServer.ProdDb.dbo.Beast AS B
    -- Optional filter by date if needed
    -- WHERE  B.Create_dt >= @FromDate AND B.Create_dt < @ToDate

    SET @rows = @@ROWCOUNT;  -- track rows inserted in this batch
END;

Relevant behavior:

INSERT INTO <target_table> SELECT <columns> FROM <source_table> can be minimally logged when:
- Database is in SIMPLE or BULK_LOGGED recovery.
- Target is a heap.
- Target is not used in replication.
- TABLOCK is specified on the target.
Minimal logging reduces log growth and can significantly improve throughput.
@@ROWCOUNT gives the number of rows inserted per batch; the loop terminates when no more rows are copied.

This pattern does not require a WHERE clause for batching; TOP (N) plus @@ROWCOUNT controls batch size.

Consider adding an index on Create_dt for filtering

To prove benefit of a Create_dt index for copying “one day” of data:

Add a nonclustered index on Create_dt in production.
Re-run the INSERT…SELECT with a WHERE Create_dt BETWEEN ... predicate.
Compare duration and IO vs. the no-index run.

The context notes that for large data moves, INSERT … SELECT is often preferred when source and destination are on the same instance, but the same principle applies across linked servers when the predicate can leverage an index.

Use SqlBulkCopy / bulk copy semantics instead of SSIS defaults

For large one-time moves, bulk-copy style tools are more efficient than row-by-row SSIS data flows:

Use SqlBulkCopy (or bcp) from production to a file or directly to the test server.
SqlBulkCopy is designed for high-throughput loads and is generally faster than plain INSERT statements.

From the context:

SqlBulkCopy offers significant performance advantages over INSERT statements.
If both tables are on the same instance, INSERT … SELECT is faster, but across servers or when using files, bulk copy is preferred.

If using OLE DB / bulk copy options, TABLOCK on the target improves performance and allows concurrent loads when there are no indexes.

Control commit size without WHERE clause

Commit control is independent of WHERE clauses:

Use TOP (N) in the INSERT…SELECT and loop until @@ROWCOUNT = 0.
Each loop iteration is one transaction (or explicitly wrap each batch in its own transaction if needed).

Example with explicit transaction per batch:

WHILE 1 = 1
BEGIN
    BEGIN TRAN;

    INSERT INTO TestDb.dbo.Beast WITH (TABLOCK)
    SELECT TOP (100000) *
    FROM   ProductionServer.ProdDb.dbo.Beast AS B;

    IF @@ROWCOUNT = 0
    BEGIN
        COMMIT TRAN;
        BREAK;
    END;

    COMMIT TRAN;
END;

This pattern avoids any need for an identity or key column and still gives row-count-based commits.

Why SSIS appeared to stop at 8M rows

From the context, large bulk operations can be sensitive to:

Batch sizes and buffer sizes.
Logging and constraints.
Triggers and tempdb usage.

For SSIS:

Increase DefaultBufferSize and/or DefaultBufferMaxRows to better utilize memory.
Ensure the destination is a heap with TableLock enabled in the destination component to mimic TABLOCK behavior.
Disable constraints and triggers on the destination during load if possible.

If the package “completed” but the table had 0 rows, likely:

The data flow failed and rolled back, or
The destination was in a transaction that was not committed.

When to avoid BULK INSERT

If the requirement is server-to-server without intermediate files, BULK INSERT from a file may not fit the process. In that case, prefer:

Linked-server INSERT…SELECT with batching and minimal logging, or
SqlBulkCopy from production to test.

References:

Share via

Copying a 32-Million Row Table From Production Server to Test Server

3 answers

Your answer