You are not logged in.
Pages: 1
Hi,
I think there is a bug in Hasher. The SHA1 for large files (tested with different files sized 1GB and more) is not identical to the hashes calculated with several other checksum tools.
Other tools which produce identical hashes (supposed to be correct):
sha1deep
Fsum
HashMyFiles (NirSoft)
Hash (Robin Keir)
Please check your implementation!
Regards,
sha1
Offline
Thank you for reporting this. I am aware of the problem with SHA1 implementation and have replaced the algorithm in the latest development version of Hasher. The only problem is that it is not yet ready for the release, because it went a complete recoding.
Offline
ok den, thats fine for me. maybe you should add a warning on the download page that there is a problem...
btw your little tools are very nice and have very clean and straightforward guis! Good Job.
sha1
Offline
Actually, this problem has been eliminated in HasherBeta.zip. It was ready for a proper release for a very long time now, but I didn't go for it because I started redesigning Hasher. Since then, priories have shifted and I never got back to it.
I will now update the official release of Hasher, but it will not be the one that is completely redesigned (version 3.0).
Offline
Done. I have made a proper release of Hasher 1.20. Check downloads section.
Offline
cool. the hashes are definitely correct now!
btw. I think there is another little problem in your application: the performance is far below the expectations. On today's hardware the calculation of checksums like SHA1 is limited by file system access only.
Test with a 4GB file:
Hasher 1.20: avg 19.5MB/s, CPU load 65%
HashMyFiles (NirSoft): avg 40MB/s, CPU load 65%
own python(!) script: avg: 53MB/s, CPU load 30%
File-I/O and CPU-load monitored with ProcessExplorer (SysInternals).
Surprisingly, the python script performs best and reaches the limits of the relatively slow harddisk in the testing system.
so what's the trick?! don't really know... as I don't have own code in "real" programming languages to compare. My script reads chunks of 1MB from disk and uses the update-method from the hashlib library, just like it is recommended in the docs.
How are you processing the files? How large are the chunks you read and process at once?
Maybe you considered this already for version 3.0.
Offline
The speeds are most dependent on the actual implementation of the algorithm. And, of course, limited by the I/O speed.
There are implementations which are designed for easy code maintenance, better portability and readability. Such implementations tend to be object-orientated, generalized and usually use most generic and native to the language/platform functions.
On the other hand, there are implementations which focus purely on speed/performance. These are usually coded up in machine code (if not familiar, look up Assembly on wikipedia). They will provide outstanding performance, but can be a lot more difficult to integrate and use in other applications.
You have guessed correctly. I have addressed the performance aspect in Hasher v3. It will be a lot faster. I have moved onto more efficient hashing algorithms, partly written in assembly. I might be able to give you a tryout version soon, if you are interested?
By the way, the size of chunks does not matter that much, as long as it is not too small and not too big. Generally speaking, chunks should be somewhere between 16KB and 256KB (powers of two are the best), depending on system and hard drive performance. Chunk size of 1MB will usually perform slower than 64KB (for example).
Offline
Of course I'd be interested in a tryout version
I can do some beta testing If you like.
Your tip with chunk size was good... did some benchmarking and found out that the python implementation I use has a flat performance maximum between 1kb and 32kb chunks.
Offline
Pages: 1