Published 10 hours ago • loading... • Updated 10 hours ago

How NetEase Games cut LLM cold starts from 42 minutes to 30 seconds

At NetEase Games, we learned a hard lesson about large language model (LLM) inference in production: elastic compute is only useful if data can move just as fast. “Elastic compute is only useful if data can move just as fast.” On paper, serverless GPU infrastructure looked like a good fit for inference workloads. Game traffic is bursty, peaks differ by title and time of day, and reserving GPU capacity for every possible spike is expensive. But …

1 Articles

The New Stack

Center

How NetEase Games cut LLM cold starts from 42 minutes to 30 seconds

10 hours ago

Read Full Article

Think freely.Subscribe and get full access to Ground NewsSubscriptions start at $9.99/year

Stories disproportionately reported by the Left or the Right

Coverage Details

Total News Sources1

Leaning Left0Leaning Right0Center1Last Updated8 hours agoBias Distribution

100% Center

Bias Distribution

100% of the sources are Center

100% Center

Factuality

To view factuality data please Upgrade to Premium

Ownership

To view ownership data please Upgrade to Vantage

The New Stack broke the news 10 hours ago on Wednesday, May 6, 2026.

Sources are mostly out of (0)

How NetEase Games cut LLM cold starts from 42 minutes to 30 seconds

1 Articles

1 Articles

How NetEase Games cut LLM cold starts from 42 minutes to 30 seconds

Coverage Details

Bias Distribution

Factuality

Ownership

Similar News Topics

Similar News Topics

How NetEase Games cut LLM cold starts from 42 minutes to 30 seconds

1 Articles

1 Articles

How NetEase Games cut LLM cold starts from 42 minutes to 30 seconds

Coverage Details

Bias Distribution Too Big Arrow Icon

Factuality Info Icon

Ownership

Similar News Topics

Similar News Topics

Bias Distribution

Factuality