有时,即使是一个名称不正确的变量也会产生严重的后果。我曾经修复过的最具挑战性的错误是由于名称选择不当造成的。在 1980 年代末和 1990 年代初,我的研究生和我创建了一个名为 Sprite 的分布式操作系统。在某个时候,我们注意到文件偶尔会丢失数据:即使用户未修改文件,数据块之一突然变为全零。该问题并不经常发生,因此很难追踪。一些研究生试图找到该错误,但他们未能取得进展,最终放弃了。但是,我认为任何未解决的错误都是无法忍受的个人侮辱,因此我决定对其进行跟踪。
花了六个月的时间,但我最终找到并修复了该错误。这个问题实际上很简单(就像大多数错误一样,一旦找出它们)。文件系统代码将变量名块用于两个不同的目的。在某些情况下,块是指磁盘上的物理块号。在其他情况下,块是指文件中的逻辑块号。不幸的是,在代码的某一点上有一个包含逻辑块号的块变量,但是在需要物理块号的情况下意外地使用了它。结果,磁盘上无关的块被零覆盖。
在跟踪该错误时,包括我自己在内的几个人阅读了错误的代码,但我们从未注意到问题所在。当我们看到可变块用作物理块号时,我们反身地假设它确实拥有物理块号。经过很长时间的检测,最终显示出腐败一定是在特定的语句中发生的,然后我才能越过该名称所创建的思维障碍,并查看其价值的确切来源。如果对不同种类的块(例如 fileBlock 和 diskBlock)使用了不同的变量名,则错误不太可能发生;程序员会知道在那种情况下不能使用 fileBlock。
不幸的是,大多数开发人员没有花太多时间在思考名字。他们倾向于使用想到的名字,只要它与匹配的名字相当接近即可。例如,块与磁盘上的物理块和文件内的逻辑块非常接近;这肯定不是一个可怕的名字。即使这样,它仍然要花费大量时间来查找一个细微的错误。因此,您不应该只选择“合理接近”的名称。花一些额外的时间来选择准确,明确且直观的好名字。额外的注意力将很快收回成本,随着时间的流逝,您将学会快速选择好名字。
Sometimes even a single poorly named variable can have severe consequences. The most challenging bug I ever fixed came about because of a poor name choice. In the late 1980’s and early 1990’s my graduate students and I created a distributed operating system called Sprite. At some point we noticed that files would occasionally lose data: one of the data blocks suddenly became all zeroes, even though the file had not been modified by a user. The problem didn’t happen very often, so it was exceptionally difficult to track down. A few of the graduate students tried to find the bug, but they were unable to make progress and eventually gave up. However, I consider any unsolved bug to be an intolerable personal insult, so I decided to track it down.
It took six months, but I eventually found and fixed the bug. The problem was actually quite simple (as are most bugs, once you figure them out). The file system code used the variable name block for two different purposes. In some situations, block referred to a physical block number on disk; in other situations, block referred to a logical block number within a file. Unfortunately, at one point in the code there was a block variable containing a logical block number, but it was accidentally used in a context where a physical block number was needed; as a result, an unrelated block on disk got overwritten with zeroes.
While tracking down the bug, several people, including myself, read over the faulty code, but we never noticed the problem. When we saw the variable block used as a physical block number, we reflexively assumed that it really held a physical block number. It took a long process of instrumentation, which eventually showed that the corruption must be happening in a particular statement, before I was able to get past the mental block created by the name and check to see exactly where its value came from. If different variable names had been used for the different kinds of blocks, such as fileBlock and diskBlock, it’s unlikely that the error would have happened; the programmer would have known that fileBlock couldn’t be used in that situation.
Unfortunately, most developers don’t spend much time thinking about names. They tend to use the first name that comes to mind, as long as it’s reasonably close to matching the thing it names. For example, block is a pretty close match for both a physical block on disk and a logical block within a file; it’s certainly not a horrible name. Even so, it resulted in a huge expenditure of time to track down a subtle bug. Thus, you shouldn’t settle for names that are just “reasonably close”. Take a bit of extra time to choose great names, which are precise, unambiguous, and intuitive. The extra attention will pay for itself quickly, and over time you’ll learn to choose good names quickly.