This is a perfect example of the AWS Batch API ‘leaking’ into your code. The whole point of a compute resource pool is that you don’t have to think about how many jobs you create.
This is true. We’re using AWS Batch because it’s the best tool we could find for other jobs that actually do need hundreds/thousands of spot instances, and this particular job goes in the middle of those. If most of our jobs looked like this one, using Batch wouldn’t make sense.
You get language-level validation either way. The assert statements are superfluous in that sense. What they do add is in effect check_dataset_params(), whose logic probably doesn’t belong in this file.
You’re right. In the explicit example, it makes more sense to have that sort of logic at the call site.
This is true. We’re using AWS Batch because it’s the best tool we could find for other jobs that actually do need hundreds/thousands of spot instances, and this particular job goes in the middle of those. If most of our jobs looked like this one, using Batch wouldn’t make sense.
You’re right. In the explicit example, it makes more sense to have that sort of logic at the call site.