I agree with @faul_sname that the bash is more readable.
But maybe a better (more readable/maintainable) Python alternative is to explicitly use Amazon’s Python API for S3 downloads? I’ve never used it myself, but googling suggests:
import json
import boto3
from io import BytesIO
import gzip
try:
s3 = boto3.resource('s3')
key='YOUR_FILE_NAME.gz'
obj = s3.Object('YOUR_BUCKET_NAME',key)
n = obj.get()['Body'].read()
gzipfile = BytesIO(n)
gzipfile = gzip.GzipFile(fileobj=gzipfile)
content = gzipfile.read()
print(content)
except Exception as e:
print(e)
raise e
You could wrap that in a function to parallelize the download/decompression of path1 and path2 (using your favorite python parallelization paradigm). But this wouldn’t handle piping the decompressed files to cmd without using temp files...
I agree with @faul_sname that the bash is more readable.
But maybe a better (more readable/maintainable) Python alternative is to explicitly use Amazon’s Python API for S3 downloads? I’ve never used it myself, but googling suggests:
You could wrap that in a function to parallelize the download/decompression of
path1andpath2(using your favorite python parallelization paradigm). But this wouldn’t handle piping the decompressed files tocmdwithout using temp files...I don’t see how that solves any of the problems I have here?