Table of Contents
- Table of Contents
- Introduction
- Calculating an S3 ETag using Python
- Determine the partsize/chunksize
- Verifying a local file
Introduction
For an introduction on S3 ETags and how they are calculated see our first post All about AWS S3 ETags.
The following will detail how to calculate the S3 ETag for a local file. We’ve used Python, however the logic can be applied elsewhere if desired.
Calculating an S3 ETag using Python
Given a file and a partsize/chunksize you can easily calculate the S3 ETag for a local file.
def calc_etag(inputfile, partsize):
md5_digests = []
with open(inputfile, 'rb') as f:
for chunk in iter(lambda: f.read(partsize), b''):
md5_digests.append(md5(chunk).digest())
return md5(b''.join(md5_digests)).hexdigest() + '-' + str(len(md5_digests))
Determine the partsize/chunksize
However if you need to determine the partsize/chunksize you can make an educated guess using the following information:
- The size of the file/object
- The number of parts used in the multipart upload
- The known partsizes preferred by popular clients
partsizes = [ ## Default Partsizes Map
8388608, # aws_cli/boto3
15728640, # s3cmd
factor_of_1MB(filesize, num_parts) # Used by many clients to upload large files
]
Verifying a local file
A quick way to put this all together into a command line script
#!/usr/bin/env python3.6
import os
import sys
from hashlib import md5
from argparse import ArgumentParser
parser = ArgumentParser(description='Compare an S3 etag to a local file')
parser.add_argument('inputfile', help='The local file')
parser.add_argument('etag', help='The etag from s3')
args = parser.parse_args()
def factor_of_1MB(filesize, num_parts):
x = filesize / int(num_parts)
y = x % 1048576
return int(x + 1048576 - y)
def calc_etag(inputfile, partsize):
md5_digests = []
with open(inputfile, 'rb') as f:
for chunk in iter(lambda: f.read(partsize), b''):
md5_digests.append(md5(chunk).digest())
return md5(b''.join(md5_digests)).hexdigest() + '-' + str(len(md5_digests))
def possible_partsizes(filesize, num_parts):
return lambda partsize: partsize < filesize and (float(filesize) / float(partsize)) <= num_parts
def main():
filesize = os.path.getsize(args.inputfile)
num_parts = int(args.etag.split('-')[1])
partsizes = [ ## Default Partsizes Map
8388608, # aws_cli/boto3
15728640, # s3cmd
factor_of_1MB(filesize, num_parts) # Used by many clients to upload large files
]
for partsize in filter(possible_partsizes(filesize, num_parts), partsizes):
if args.etag == calc_etag(args.inputfile, partsize):
print('Local file matches')
sys.exit(0)
print('Couldn\'t validate etag')
sys.exit(1)
if __name__ == "__main__":
main()