Hot Iterative Read Operations : Definition and how to manage them?

javier cohenar 211 Reputation points
2023-10-17T15:46:58.7+00:00

Posted this question on SO but with no answers so far, so I try it here hoping more luck.

We are trying to identify the operations behind the item 'Hot Iterative Read Operations' in our Azure Data Lake Gen2 containers as this seems to be our main cost driver.

  1. What is the definition of Hot Iterative Read Operations? Our understanding is that the more files are being read (for instance when appending data to an existing file, or just simply reading it), the higher the number of Hot Iterative Read Operations. Is this correct? However we noticed that Hot iterative Read Operations are also performed during Sundays and bank holidays (null user activity) which led me to think that they are related to non user activity
  2. When exploring the $logs container we found that most records refer to two operations: ReadFile and GetFileProperties. Are those operations linked to Hot Iterative Read Operations? Are there any other?
  3. Is there any other way to identify the operations under the 'Hot Iterative Read Operations' invoice item?
  4. When we use the Copy Activity in ADF (filtered by last modified date), are we incurring in Hot Iterative Read Operations?
Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
{count} votes

1 answer

Sort by: Most helpful
  1. Vahid Ghafarpour 23,595 Reputation points Volunteer Moderator
    2023-10-17T16:26:21.08+00:00

    I hope this KQL can help you to find the reason for HIRO.

    // Query to identify operations contributing to Hot Iterative Read Operations
    AzureDiagnostics
    | where Category == "AzureDataLakeStorage"
    | where OperationName == "ReadFile" or OperationName == "GetFileProperties" // Operations of interest
    | summarize TotalOperations = count() by OperationName, bin(TimeGenerated, 1d)
    | order by TimeGenerated asc
    
    

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.