Hi @Dustin Chavez ,
Welcome to the Microsoft Q&A Platform! Thank you for asking your question here.
Why your function likely stopped at ~8 hours:
-
functionTimeout = "-1"is valid and indicates unbounded execution inhost.json. That only controls the Functions host timeout. It does not guarantee your process will never be recycled by the platform. - The Elastic Premium plan supports unbounded durations, but the platform still has behaviors that can terminate workers: an idle timer that stops a worker after 60 minutes with no executions, scale-in that can shut down a worker (60 minutes grace), and managed shutdowns during platform updates (10 minute grace). Those platform lifecycle events are explicitly called out in the Premium plan docs. A long single invocation that sits on one worker is therefore fragile.
- For HTTP triggers there is an extra constraint (load-balancer idle/idle-timeout) that limits single-request response time regardless of
functionTimeout(see the Functions hosting/scale page). For long work you should avoid relying on a single synchronous HTTP response.
Regardless of the function app time-out setting, 230 seconds is the maximum amount of time that an HTTP triggered function can take to respond to a request. This is because of the default idle time-out of Azure Load Balancer. For longer processing times, consider using the Durable Functions async pattern or defer the actual work and return an immediate response.
Possible Solutions/Suggestions:
- Move long-running jobs to Durable Functions (orchestrator + activities) for reliability.
- Break the job into small, idempotent chunks and process via queues/messages.
- Avoid long synchronous HTTP responses — return
202 Acceptedand run work asynchronously. - Configure minimum/pre-warmed instances (
minimumElasticInstanceCount) on Elastic Premium to reduce scale-in. - Add checkpointing and retries so progress survives host restarts or failures.
- Enable Application Insights, streaming logs and check Activity Log/Kudu to find platform recycles.
- Consider running very long single-process work on a VM/container or WebJob if Functions isn’t suitable.
- Test with shorter runs and correlate logs to identify the exact platform event causing termination.
Reference: