Olle Mulmo writes:
*Fault tolerance &al is mentioned but I don't think it is discussed (maybe indirectly in config rec on p 9) In part, we can deal with this in the client by making them more robust as discussed above. Perhaps at the end of #5 a section about fault tolerance or high availability:
Fault tolerance is currently covered by a footnote in section 7... a bit minimalistic. A new section 5.x sounds like a good plan, although I think the text below is not intended for all kinds of responders but rather that of root CAs and transponders/global redirectors?
How about this OCSP clients should adopt reliability strategies that take into account the possibility of network partitioning, but do not unduly delay the decision on certificate use. Under some circumstances clients may want to cache OCSP responses for short intervals [see 4.3]. Clients may wish to re-test non-responsive local and high visibility [which is what?] OCSP responders.
OCSP responders should be configured on a server with high OCSP responders intended to provide authoritative and/or high volumes of reponses, should be configured on a server with high .... availability capability: redundant, failure-correcting/responding hardware components. The OCSP responder system should be configured to automatically recover and continue from a single failure of disks supporting the current OCSP database, hardware security module, or other critical system component. This might be particularly important for OCSP responders that operate in whole or in part in transponder mode.
In order to deal with site failures or network partitioning, OCSP service providers should provision multiple, topologically and geographcally dispersed OCSP responders with mirrored OCSP databases and configuration. If possible, WAN high availability capability should be employed.
It seems to me the last paragraph applies to almost all OCSP responders. It's the same content as the footnote mentioned above which I had missed.