method to reduce costs for LLM's

When people use LLMs for coding, they often make the model output the full file to simplify copy and pasting. Some models resist this due to costs. For tasks where only a small portion changes, this seems like an inefficient use of GPU time, especially at scale. It’s baffling that none of the major LLMs programmatically insert the small generated portion into the correct location in the larger text using standard techniques. Using something like a reasoning model primarily for copying text seems incredibly wasteful. Is there a reason this isn’t common practice?