Share via

Copying a 32-Million Row Table From Production Server to Test Server

Bobby P 271 Reputation points
2026-03-13T19:43:14.18+00:00

So we believe we're having tempdb issues with a very wide table that has 32-Million Rows on it. There is no Primary Clustered Index, No Indexes defined on it at all.

We have tried an INSERT SELECT going across a Linked Server to our Test Server to copy this Beast Table and it took 55-Minutes to copy one day's worth of data which was 1.1 Million Rows. Again...there is NO Indexing on this Table so we imagine it's going through every data page to pluck off one day's worth of data.

Our challenge here is that we are being tasked to prove that adding a [Create_dt] Index would help our situation.

We then tried a SSIS Package to Data Pump our Data From our Production Table into Test.

That just seems to have stopped after 8-Million Rows. We didn't play around much with the Data Flow Task Properties [DefaultBufferMaxRows] and/or [DefaultBufferSize] but after it looked like the SSIS Package completed with 8-million rows, when we queried the Table in the Test Environment, there was nothing there...0 rows.

We're about ready to punt on this.

Is there any way to do an INSERT SELECT using a ROWCOUNT without a WHERE Clause since there is currently no defined Index? Nor an IDENTITY column. Yeah...Yeah...Yeah...I know VERY POOR design. So we can't really control how SQL Server Optimizer is accessing the rows.

We don't think a BULK INSERT is the answer here.

I think I saw in some Google searches that there is a way to do an INSERT SELECT and control a COMMIT by the number of rows. But is that always based on some WHERE Clause filtering?

Any help and/or suggestions would be GREATLY appreciated here.

Thanks in Advance for your review and am hopeful for a viable solution.

SQL Server | SQL Server Transact-SQL

3 answers

Sort by: Most helpful
  1. Bruce (SqlWork.com) 83,666 Reputation points
    2026-03-14T23:14:43.6533333+00:00

    When I have had to do daily extracts from a large dataset, I typically used partitions. You create a partition for each day (created in advance), and the partition function allows selecting by day. The partitions will allow fast insert while you are doing an extract (assume the dates are different).

    a million rows is a large extract and causes lots of logging. Do a select into a new staging table on the test server. This only logs allocations. I typically add a processed column. Then you can move 10 - 100k rows from the staging table to the actual table.

    0 comments No comments

  2. Erland Sommarskog 133.6K Reputation points MVP Volunteer Moderator
    2026-03-13T20:08:38.61+00:00

    The situation is indeed bleak. Sure, you can do the insert in chunks, assuming that you can find a column to chunk on. For instance, you mentioned a create_dt column. Then you can do:

    INSERT target (...)
        FROM tbl
       WHERE create_dt BETWEEN @start AND @end
    

    But beware that there will be a full table scan each time. So unless the bottleneck is the insertion of the rows, this likely to make things take even longer time.

    Do I understand correctly that the reason you want to copy the table is to make a PoC that the clustered index will help? Yes, the clustered index is likely to alleviate the situation. But also copying the table with INSERT SELECT is may also show direct improvements. One problem with heaps - that is tables without a clustered index - is that frequent insertions and deletions can lead to a table with a lot of empty space where many eight-page extents just have a single row. After all, 32 million rows is not that startling. (But I note that you say that the table is very wide. )

    Anyway, the only way to make faithful tests is restore a backup of the database (or at least the filegroup(s) hosting this monster table) so that you preserve any fragmentation. Then you can try different things like adding a clustered index, which is likely to be the best long-term solution. But you can also try ALTER TABLE tbl REBUILD which will rebuild the heap to make it more compact. Beware that all operations on this misfit are likely to take a very long time.

    You can also run an analysis on the table with

    SELECT * FROM sys.dm_db_index_physical_stats(db_id(), object_id('tbl'), 0, 0, 'DETAILED')
    

    The particular column of interest is avg_page_space_used_is_percent. Beware that this query is likely to take the full hour at least, as it will scan the full table.


  3. Q&A Assist
    2026-03-13T19:43:37.0966667+00:00
    AI generated content. This question contains an answer created with AI from Q&A Assist. Learn more

    Use batching and bulk-copy style operations rather than a single monolithic INSERT…SELECT over the linked server.

    Key points and options:

    1. Use INSERT…SELECT with batching and minimal logging

    If the target test table is (or can be) a heap and the database recovery model allows it, use INSERT INTO … SELECT with TABLOCK and batch the operation:

    -- On TEST server, target table is empty heap
    SET NOCOUNT ON;
    
    DECLARE @rows INT = 1;
    
    WHILE @rows > 0
    BEGIN
        INSERT INTO TestDb.dbo.Beast WITH (TABLOCK)
        SELECT TOP (100000) *
        FROM   ProductionServer.ProdDb.dbo.Beast AS B
        -- Optional filter by date if needed
        -- WHERE  B.Create_dt >= @FromDate AND B.Create_dt < @ToDate
    
        SET @rows = @@ROWCOUNT;  -- track rows inserted in this batch
    END;
    

    Relevant behavior:

    • INSERT INTO <target_table> SELECT <columns> FROM <source_table> can be minimally logged when:
      • Database is in SIMPLE or BULK_LOGGED recovery.
      • Target is a heap.
      • Target is not used in replication.
      • TABLOCK is specified on the target.
    • Minimal logging reduces log growth and can significantly improve throughput.
    • @@ROWCOUNT gives the number of rows inserted per batch; the loop terminates when no more rows are copied.

    This pattern does not require a WHERE clause for batching; TOP (N) plus @@ROWCOUNT controls batch size.

    1. Consider adding an index on Create_dt for filtering

    To prove benefit of a Create_dt index for copying “one day” of data:

    • Add a nonclustered index on Create_dt in production.
    • Re-run the INSERT…SELECT with a WHERE Create_dt BETWEEN ... predicate.
    • Compare duration and IO vs. the no-index run.

    The context notes that for large data moves, INSERT … SELECT is often preferred when source and destination are on the same instance, but the same principle applies across linked servers when the predicate can leverage an index.

    1. Use SqlBulkCopy / bulk copy semantics instead of SSIS defaults

    For large one-time moves, bulk-copy style tools are more efficient than row-by-row SSIS data flows:

    • Use SqlBulkCopy (or bcp) from production to a file or directly to the test server.
    • SqlBulkCopy is designed for high-throughput loads and is generally faster than plain INSERT statements.

    From the context:

    • SqlBulkCopy offers significant performance advantages over INSERT statements.
    • If both tables are on the same instance, INSERT … SELECT is faster, but across servers or when using files, bulk copy is preferred.

    If using OLE DB / bulk copy options, TABLOCK on the target improves performance and allows concurrent loads when there are no indexes.

    1. Control commit size without WHERE clause

    Commit control is independent of WHERE clauses:

    • Use TOP (N) in the INSERT…SELECT and loop until @@ROWCOUNT = 0.
    • Each loop iteration is one transaction (or explicitly wrap each batch in its own transaction if needed).

    Example with explicit transaction per batch:

    WHILE 1 = 1
    BEGIN
        BEGIN TRAN;
    
        INSERT INTO TestDb.dbo.Beast WITH (TABLOCK)
        SELECT TOP (100000) *
        FROM   ProductionServer.ProdDb.dbo.Beast AS B;
    
        IF @@ROWCOUNT = 0
        BEGIN
            COMMIT TRAN;
            BREAK;
        END;
    
        COMMIT TRAN;
    END;
    

    This pattern avoids any need for an identity or key column and still gives row-count-based commits.

    1. Why SSIS appeared to stop at 8M rows

    From the context, large bulk operations can be sensitive to:

    • Batch sizes and buffer sizes.
    • Logging and constraints.
    • Triggers and tempdb usage.

    For SSIS:

    • Increase DefaultBufferSize and/or DefaultBufferMaxRows to better utilize memory.
    • Ensure the destination is a heap with TableLock enabled in the destination component to mimic TABLOCK behavior.
    • Disable constraints and triggers on the destination during load if possible.

    If the package “completed” but the table had 0 rows, likely:

    • The data flow failed and rolled back, or
    • The destination was in a transaction that was not committed.
    1. When to avoid BULK INSERT

    If the requirement is server-to-server without intermediate files, BULK INSERT from a file may not fit the process. In that case, prefer:

    • Linked-server INSERT…SELECT with batching and minimal logging, or
    • SqlBulkCopy from production to test.

    References:

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.