从网络上下载大文件时,官网一般都会提供一个MD5、SHA1或SHA256值,这是用来校验文件完整性的。因为从网络上下载大文件具有不确定性,可能会受很多因素影响,比如网络抖动导致文件下载不完成,文件被恶意篡改等,这些问题都会导致最终下载的文件无法正常使用。
针对小文件(<500MB)
直接以二进读模式(rb)加载进来,然后计算。
1 2 3 4 5 6 7 8 9 10 11 12
| import hashlib
def encrypt(fpath: str, algorithm: str) -> str: with open(fpath, 'rb') as f: return hashlib.new(algorithm, f.read()).hexdigest()
if __name__ == '__main__': for algorithm in ('md5', 'sha1', 'sha256'): hexdigest = encrypt('test.file', algorithm) print(f'{algorithm}: {hexdigest}')
|
针对大文件(500MB~1GB)
为了更友好地视觉体验,可以增加一个进度条。(进度条用 rich 实现,安装:pip install rich)
1 2 3 4 5 6 7 8 9 10 11 12 13 14
| import hashlib
import rich.progress
def encrypt(fpath: str, algorithm: str) -> str: with rich.progress.open(fpath, 'rb') as f: return hashlib.new(algorithm, f.read()).hexdigest()
if __name__ == '__main__': for algorithm in ('md5', 'sha1', 'sha256'): hexdigest = encrypt('test.file', algorithm) print(f'{algorithm}: {hexdigest}')
|
针对超大文件(>1GB)
为了避免内存溢出,按块读取并迭代计算。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
| import hashlib
import rich.progress
def encrypt(fpath: str, algorithm: str) -> str: with rich.progress.open(fpath, 'rb') as f: hash = hashlib.new(algorithm) for chunk in iter(lambda: f.read(2**20), b''): hash.update(chunk) return hash.hexdigest()
if __name__ == '__main__': for algorithm in ('md5', 'sha1', 'sha256'): hexdigest = encrypt('ubuntu-22.04-desktop-amd64.iso', algorithm) print(f'{algorithm}: {hexdigest}')
|
1 2 3 4 5 6
| Reading... ---------------------------------------- 3.7/3.7 GB 0:00:00 md5: 7621da10af45a031ea9a0d1d7fea9643 Reading... ---------------------------------------- 3.7/3.7 GB 0:00:00 sha1: 8a73a36f38397974d5517b861a68577514ef694e Reading... ---------------------------------------- 3.7/3.7 GB 0:00:00 sha256: b85286d9855f549ed9895763519f6a295a7698fb9c5c5345811b3eefadfb6f07
|