10.5. 掩码异常

减少必须处理异常的地方数量的第二种技术是异常屏蔽。使用这种方法,可以在系统的较低级别上检测和处理异常情况,因此,更高级别的软件无需知道该情况。异常屏蔽在分布式系统中尤其常见。例如,在诸如 TCP 的网络传输协议中,由于各种原因(例如损坏和拥塞),可能会丢弃数据包。TCP 通过在其实现中重新发送丢失的数据包来掩盖数据包的丢失,因此所有数据最终都将通过,并且客户端不知道丢失的数据包。

NFS 网络文件系统中出现了一个更具争议性的屏蔽示例。如果 NFS 文件服务器由于任何原因崩溃或无法响应,客户端将一遍又一遍地向服务器发出请求,直到问题最终得到解决。客户端上的低级文件系统代码不会向调用应用程序报告任何异常。正在进行的操作(以及因此的应用程序)只是挂起,直到操作可以成功完成。如果挂起持续的时间不超过一小段时间,则 NFS 客户端将在用户控制台上以“ NFS 服务器 xyzzy 无法响应仍在尝试响应”的形式打印消息。

NFS 用户经常抱怨这样的事实,即他们的应用程序在等待 NFS 服务器恢复正常运行时挂起。许多人建议 NFS 应该异常终止操作而不是挂起。但是,报告异常会使情况更糟,而不是更好。如果应用程序无法访问其文件,则无能为力。一种可能性是应用程序重试文件操作,但这仍然会使应用程序挂起,并且在 NFS 层中的一个位置执行重试会比在每个应用程序中的每个文件系统调用处执行重试更容易(编译器应不必为此担心!)。另一种选择是让应用程序中止并将错误返回给调用者。呼叫者不太可能知道该怎么做,因此他们也将中止,导致用户工作环境崩溃。用户在文件服务器关闭时仍然无法完成任何工作,并且一旦文件服务器恢复工作,他们将不得不重新启动所有应用程序。

因此,最好的替代方法是让 NFS 掩盖错误并挂起应用程序。通过这种方法,应用程序不需要任何代码来处理服务器问题,并且一旦服务器恢复运行,它们就可以无缝恢复。如果用户厌倦了等待,他们总是可以手动中止应用程序。

异常屏蔽并非在所有情况下都有效,但是在它起作用的情况下它是一个强大的工具。它导致了更深的类,因为它减少了类的界面(用户需要注意的异常更少)并以掩盖异常的代码形式添加了功能。异常屏蔽是降低复杂性的一个例子。

The second technique for reducing the number of places where exceptions must be handled is exception masking. With this approach, an exceptional condition is detected and handled at a low level in the system, so that higher levels of software need not be aware of the condition. Exception masking is particularly common in distributed systems. For instance, in a network transport protocol such as TCP, packets can be dropped for various reasons such as corruption and congestion. TCP masks packet loss by resending lost packets within its implementation, so all data eventually gets through and clients are unaware of the dropped packets.

A more controversial example of masking occurs in the NFS network file system. If an NFS file server crashes or fails to respond for any reason, clients reissue their requests to the server over and over again until the problem is eventually resolved. The low-level file system code on the client does not report any exceptions to the invoking application. The operation in progress (and hence the application) just hangs until the operation can complete successfully. If the hang lasts more than a short time, the NFS client prints messages on the user’s console of the form “NFS server xyzzy not responding still trying.”

NFS users often complain about the fact that their applications hang while waiting for an NFS server to resume normal operation. Many people have suggested that NFS should abort operations with an exception rather than hanging. However, reporting exceptions would make things worse, not better. There’s not much an application can do if it loses access to its files. One possibility would be for the application to retry the file operation, but this would still hang the application, and it’s easier to perform the retry in one place in the NFS layer, rather than at every file system call in every application (a compiler shouldn’t have to worry about this!). The other alternative is for applications to abort and return errors to their callers. It’s unlikely that the callers would know what to do either, so they would abort as well, resulting in a collapse of the user’s working environment. Users still wouldn’t be able to get any work done while the file server was down, and they would have to restart all of their applications once the file server came back to life.

Thus, the best alternative is for NFS to mask the errors and hang applications. With this approach, applications don’t need any code to deal with server problems, and they can resume seamlessly once the server comes back to life. If users get tired of waiting, they can always abort applications manually.

Exception masking doesn’t work in all situations, but it is a powerful tool in the situations where it works. It results in deeper classes, since it reduces the class’s interface (fewer exceptions for users to be aware of) and adds functionality in the form of the code that masks the exception. Exception masking is an example of pulling complexity downward.