We are using a Cosmos dataset as a sink for a data flow in our pipeline. The Cosmos database is configured to use 400 RUs during normal operations, but we upscale it during the pipeline run. The upscaling works flawlessly.
The pipeline consumes 100% of the provisioned throughput, as is expected. We would like to limit this to about 80%, so that our customers don’t experience delays and timeout exceptions. According to the documentation the "Write throughput budget" setting in the Cosmos sink is suppose to be "An integer that represents the RUs you want to allocate for this Data Flow write operation, out of the total throughput allocated to the collection". Unless I am mistaken, this means that you can set a limit to how many RUs the pipeline is allowed to consume.
However, no matter what value we use for "Write throughput budget", the pipeline will always consume ~10% of the total provisioned throughput. We have tested with a wide range of values, and the result is always the same. If we do not set a value 100% of RUs are consumed, but ~10% is always used whether we set the value to 1, 500, 1000, or even 1200 (of a total 1000).
Does anyone know if this is a bug with the ADF Cosmos sink, or have I misunderstood what this setting is supposed to be? Is there any other way of capping how many Cosmos RUs an ADF pipeline is allowed to use?
This is definitely related to data size. Setting provisioned throughput to 10000 RUs and write throughput budget to 7500 uses ~85% of total RUs when we test with 300 000 documents. Using the same settings, but 10 000 000 documents, we see a consistent ~10% RU usage for the pipeline run.
The solution to our problem was to set "write throughput budget" to a much higher value than provisioned throughput. Data size and number of partitions used in the pipeline is definitely have an effect on what settings you should use. For reference we had ~10 000 000 documents of 455 bytes each. Setting throughput to 10 000 and write throughput budget to 60 000, ADF used on average ~90% of the provisioned throughput.
I recommend trial and error for your specific case, and to not be afraid to set the write budget to a much higher value than you think is necessary.