Fwd: [Cryptography] SHA1 collisions make Git vulnerable to attakcs by third-parties, not just repo maintainers
Joan Ioannidas For additional hilarity, look at the mitigation: they just ban the offending hash!
---------- Forwarded message ----------
From: John Gilmore
It's interesting watching git evolve. I have one comment, which is that the code and the contributors are throwing around the term "SHA1 hash" a lot. They shouldn't. SHA1 has been broken; it's possible to generate two different blobs that hash to the same SHA1 hash.
Actually, even the theoretical breaking has not been proven for a
pre-existing SHA1 hash (ie you need to control both the starting point for
it), and more importantly, git really uses the SHA1 has a _hash_, not
necessarily as a cryptographically secure one.
IOW, security doesn't actually depend on the hash being cryptographic, and
all git really wants is to avoid collisions, ie it wants it to hash the
contents well. That, sha1 definitely does, and even an md5sum would
suffice (but having 160 bits instead of "just" 128 obviously adds to the
space, so that's always a bonus).
Of course, the fact that sha1 is also very expensive to try to fool is a
big bonus, since it means that it's just another layer on the real
security model. But the _real_ security comes from the fact that git is
distributed, which means that a developer should never actually use a
public tree for his development.
For example, I've got two separate firewall layers (and a NAT) in between
me and the internet, and my personal tree is on that machine. I never
actually trust or use the external trees - I just push the result to them.
This is something you cannot do with a centralized SCM server like SVN or
other traditional crud. A centralized one obviously has to be accessible
to all the developers, which means that it's forced to be open enough to
be much more easily attackable, and also means that there is a single
point of failure also from a security standpoint.
In contrast, even if somebody were to compromise my machine, that does
_not_ automatically compromise the trees of other developers. They'd still
have all the pristine objects, and never even fetch an object from me that
has the same name (ie sha1 hash) as one they already have.
In other words, to really break a git archive, you need to
- be able to replace an existing SHA1 hash'ed object with one that hashes
to the same thing (_not_ the breakage that has been shown to be
possible already)
- the replacement has to still honor all the other git consistency checks
(even "blob" objects have them: they need to have a valid header with a
valid length, so it's not sufficient to just find another object that
hashes to the right thing, you have to find an object with a valid
header that hashes to the right thing)
- you have to break in to _all_ archives that already have that object
and replace it quietly enough that nobody notices.
Quite frankly, it's not worth worrying about. It's a hell of a lot easier
to just break a source archive with other means (ie pay a developer ten
million dollars to just insert the back door you want inserted).
Linus
To: David Wagner
SHA1 isn't totally broken yet. The attack still requires at least 2^60 work to find a collision.
Knew that -- but "Attacks never get harder, only easier."
No one has publicly reported finding a collision in SHA1 yet.
I thought the Chinese team had reported four pairs of colliding plaintexts -- they just hadn't revealed exactly how they generated them. Or are you distinguishing "finding" from "generating" a collision?
One question I would have is what is the impact of a SHA1 collision on his system? In other words, what harm can you do if you can find SHA1 collisions efficiently? I'm not familiar with his source mgmt system, but if there is little harm one can do with a collision, then maybe it just doesn't matter very much.
Here's the mailing list for git: http://kerneltrap.org/mailarchive/15/overview/browse/month Somewhere in there it told me where to find the sources, which include a design document about how it works. Ah, there it is: http://www.kernel.org/pub/software/scm/cogito/ http://www.kernel.org/pub/software/scm/cogito/README Basically, it assumes, deeply embedded, that if two blobs have the same hash, they ARE THE SAME BLOB. You can destroy its integrity by feeding it various blobs which happen to hash to the same values. He seems to think that the only possible attack is that someone would go in and modify the database by hand -- rather than feeding it new input that confuses it. John PS (added 25 Feb 2017): If you assume NSA is six months or a year ahead of the open academic/industrial sector in attacking SHA1, what would they have already subverted using a similar attack? Hmm, check the "cmp" and "diff" sources! If you don't trust the SHA1 hashes that say two trees are the same, the second step is comparing the trees of files directly. Making an input pattern that causes cmp and diff to always say, "yup, no differences here!" would allow any fraudulently inserted modifications to spread much further. _______________________________________________ The cryptography mailing list cryptography@metzdowd.com http://www.metzdowd.com/mailman/listinfo/cryptography
participants (1)
-
grarpamp