Opened 2 years ago

Last modified 4 months ago

#7634 new bug

MD5 collision could lead to SafeHaskell violation

Reported by: shachaf Owned by: ekmett
Priority: normal Milestone:
Component: Core Libraries Version: 7.6.1
Keywords: Cc: core-libraries-committee@…
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Other Test Case:
Blocked By: Blocking:
Related Tickets: Differential Revisions:

Description

The current scheme for computing TypeRep fingerprints is: md5sum (encodeUTF32BE (unwords [moduleName, packageName, typeName])). SafeHaskell doesn't allow custom-written Typeable instances, but this is more or less the code that deriving Typeable generates.

MD5 is broken and not collision-resistant. If someone can make an MD5 collision, they could use it to derive unsafeCoerce and execute arbitrary code. The constraints (UTF-32, names being alphanumeric, etc.) make it pretty tricky to find a valid collision by the standard methods, but I don't know enough about this to say how feasible it is.

It seems to me that, especially with new-typeable, it might not be necessary to use hashing at all, if GHC can figure out fingerprints statically. Or maybe separate compilation requirements make that unworkable (in which case maybe using a hash of the package/module name along with a separate per-module counter, or something along those lines, might be better, since people are less likely to control those? I'm not sure). Maybe the solution is just switching to another hash function, or something else. At any rate, the issue should be considered -- using MD5 isn't a good idea in cases where collisions could have security implications.

Change History (9)

comment:1 Changed 2 years ago by igloo

  • difficulty set to Unknown
  • Milestone set to 7.8.1

comment:2 Changed 14 months ago by thoughtpolice

  • Milestone changed from 7.8.3 to 7.10.1

Moving to 7.10.1

comment:3 Changed 9 months ago by thoughtpolice

  • Component changed from libraries/base to Core Libraries
  • Owner set to ekmett

Moving over to new owning component 'Core Libraries'.

comment:4 Changed 7 months ago by thoughtpolice

  • Milestone changed from 7.10.1 to 7.12.1

Moving to 7.12.1 milestone; if you feel this is an error and should be addressed sooner, please move it back to the 7.10.1 milestone.

comment:5 Changed 4 months ago by ekmett

  • Cc core-libraries-committee@… added
  • Milestone changed from 7.12.1 to

Acknowledging that this probably won't get addressed in the foreseeable future by pushing it out to _|_.

Given that the two strings would have to be legal UTF-32 encodings of text that can actually be entered you wind up with a huge number of fixed bits in the strings on either side, which very quickly raises the cost of an attempted birthday attack and also forbids a number of birthday attack generation techniques.

That said, saying a cryptographic attack isn't possible for hand-wavy reasons doesn't have a great track record for success. :)

Techniques for finding small single block collisions via random walks, like https://eprint.iacr.org/2014/871.pdf are probably the most likely source of such a vulnerability. Notice how similar the colliding document pair are. In theory you could prune the walk to keep the paths always within the space of valid inputs. I lack a few weeks (months?) of HPC cluster time to test this hypothesis, however.

comment:6 Changed 4 months ago by bananu7

Maybe the solution is just switching to another hash function

What about that? Doesn't that improve the chances of the collision not being found even more? It looks like an change that's easy enough. Doesn't moving this ticket to _|_ mean that it's effectively being forgotten?

comment:7 Changed 4 months ago by ekmett

The move out from 7.12 was more an acknowledgement that the CLC wasn't planning on tackling this issue in 7.12. If everything is a priority then nothing is a priority, and we have lots of actionable issues with far more immediate impact.

Switching fingerprints to something like SHA-256 could be done. In fact it is probably a pretty good idea, despite doubling the size of fingerprints and having a measurable performance impact -- if a patch was put forth then it would by all means be considered.

comment:8 Changed 4 months ago by bananu7

A quick investigation shows that there are two parts of the problem:

  • Fingerprint is made of two Word64s; would need to change to four, and all the functions that manipulate on it to take four parts into account; that's an easy part
  • MD5 implementation used internally for hashing is written in C. I suppose the SHA-256 implementation that's necessary for the patch could be taken from cryptohash library, which seems pretty mature already. It would need to be integrated as a C source similarly to MD5; It's probably not feasible to drag the whole library as a GHC dependency; maybe I'm wrong here.

Then the functions that have to be altered are fingerprintData and fingerprintString, the latter needing just to take the larger size of the fingerprint into the account, and the former actually being changed to use the SHA-256 context and hashing function.

comment:9 Changed 4 months ago by ekmett

Pretty much.

There are also dozens if not hundreds of SHA-256 implementations out there in C we could take in more or less unmodified.

Heck, I've even written one before, back in my crypto days.

Note: See TracTickets for help on using tickets.