13.5. 接口文档

注释最重要的作用之一就是定义抽象。回想一下 第四章:模块应该是深的 ,抽象是实体的简化视图,它保留了基本信息,但省略了可以安全忽略的细节。代码不适合描述抽象;它的级别太低,它包含实现细节,这些细节在抽象中不应该看到。描述抽象的唯一方法是使用注释。如果您想要呈现良好抽象的代码,则必须用注释记录这些抽象。

记录抽象的第一步是将接口注释与实现注释分开。接口注释提供了使用类或方法时需要知道的信息。他们定义了抽象。实现注释描述了类或方法如何在内部工作以实现抽象。区分这两种注释很重要,这样接口的用户就不会暴露于实现细节。此外,这两种形式最好有所不同。如果接口注释也必须描述实现,则该类或方法很浅。这意味着撰写注释的行为可以提供有关设计质量的线索; 第十五章:先写注释 将回到这个想法。

类的接口注释提供了该类提供的抽象的高级描述,例如:

/**
 * This class implements a simple server-side interface to the HTTP
 * protocol: by using this class, an application can receive HTTP
 * requests, process them, and return responses. Each instance of
 * this class corresponds to a particular socket used to receive
 * requests. The current implementation is single-threaded and
 * processes one request at a time.
 */
public class Http {...}

注释描述了类的整体功能,没有任何实现细节,甚至没有特定方法的细节。它还描述了该类的每个实例代表什么。最后,注释描述了该类的限制(它不支持从多个线程的并发访问),这对于考虑是否使用它的开发人员可能很重要。

方法的接口注释既包括用于抽象的高层信息,又包括用于精度的低层细节:

  • 注释通常以一两个句子开头,描述调用者感知到的方法的行为。这是更高层次的抽象。
  • 注释必须描述每个参数和返回值(如果有)。这些注释必须非常精确,并且必须描述对参数值的任何约束以及参数之间的依赖关系。
  • 如果该方法有任何副作用,则必须在接口注释中记录这些副作用。副作用是该方法的任何结果都会影响系统的未来行为,但不属于结果的一部分。例如,如果该方法将一个值添加到内部数据结构中,可以通过将来的方法调用来检索该值,则这是副作用。写入文件系统也是一个副作用。
  • 方法的接口注释必须描述该方法可能产生的任何异常。
  • 如果在调用某个方法之前必须满足任何前提条件,则必须对其进行描述(也许必须先调用其他方法;对于二进制搜索方法,必须对要搜索的列表进行排序)。尽量减少前提条件是一个好主意,但是任何保留的条件都必须记录在案。

这是从 Buffer 对象复制数据的方法的接口注释

/**
 * Copy a range of bytes from a buffer to an external location.
 *
 * \param offset
 *        Index within the buffer of the first byte to copy.
 * \param length
 *        Number of bytes to copy.
 * \param dest
 *        Where to copy the bytes: must have room for at least
 *        length bytes.
 *
 * \return
 *        The return value is the actual number of bytes copied,
 *        which may be less than length if the requested range of
 *        bytes extends past the end of the buffer. 0 is returned
 *        if there is no overlap between the requested range and
 *        the actual buffer.
 */
uint32_t
Buffer::copy(uint32_t offset, uint32_t length, void* dest)
...

此注释的语法(例如\ return)遵循 Doxygen 的约定,该程序从 C / C ++代码中提取注释并将其编译为 Web 页。注释的目的是提供开发人员调用该方法所需的所有信息,包括特殊情况的处理方式(请注意,此方法如何遵循 第十章:定义不存在的错误 的建议并定义与范围规范相关的任何错误。 )。开发人员不必为了调用它而阅读方法的主体,并且接口注释不提供有关如何实现该方法的信息,例如它如何扫描其内部数据结构以查找所需的数据。

对于更扩展的示例,让我们考虑一个称为 IndexLookup 的类,该类是分布式存储系统的一部分。存储系统拥有一个表集合,每个表包含许多对象。另外,每个表可以具有一个或多个索引;每个索引都基于对象的特定字段提供对表中对象的有效访问。例如,一个索引可以用于根据对象的名称字段查找对象,而另一个索引可以用于根据对象的年龄字段查找对象。使用这些索引,应用程序可以快速提取具有特定名称的所有对象,或具有给定范围内的年龄的所有对象。

IndexLookup 类为执行索引查找提供了一个方便的接口。这是一个如何在应用程序中使用的示例:

query = new IndexLookup(table, index, key1, key2);
while  (true) {
    object = query.getNext();
    if  (object == NULL) {
        break;
    }
    ... process object ...
}

应用程序首先构造一个类型为 IndexLookup 的对象,并提供用于选择表,索引和索引内范围的参数(例如,如果索引基于年龄字段,则 key1 和 key2 可以指定为 21 和 65 选择年龄介于这些值之间的所有对象)。然后,应用程序重复调用 getNext 方法。每次调用都返回一个位于所需范围内的对象。一旦返回所有匹配的对象,getNext 将返回 NULL。因为存储系统是分布式的,所以此类的实现有些复杂。表中的对象可以分布在多个服务器上,每个索引也可以分布在一组不同的服务器上。

现在,让我们考虑该类的接口注释中需要包含哪些信息。对于下面给出的每条信息,问自己一个开发人员是否需要知道该信息才能使用该类(我对问题的回答在本章的结尾):

  • IndexLookup 类发送给包含索引和对象的服务器的消息格式。
  • 用于确定特定对象是否在所需范围内的比较功能(使用整数,浮点数或字符串进行比较吗?)。
  • 用于在服务器上存储索引的数据结构。
  • IndexLookup 是否同时向多个服务器发出多个请求。
  • 处理服务器崩溃的机制。

这是 IndexLookup 类的接口注释的原始版本;摘录还包括类定义的几行内容,在注释中进行了引用:

/*
 * This class implements the client side framework for index range
 * lookups. It manages a single LookupIndexKeys RPC and multiple
 * IndexedRead RPCs. Client side just includes "IndexLookup.h" in
 * its header to use IndexLookup class. Several parameters can be set
 * in the config below:
 * - The number of concurrent indexedRead RPCs
 * - The max number of PKHashes a indexedRead RPC can hold at a time
 * - The size of the active PKHashes
 *
 * To use IndexLookup, the client creates an object of this class by
 * providing all necessary information. After construction of
 * IndexLookup, client can call getNext() function to move to next
 * available object. If getNext() returns NULL, it means we reached
 * the last object. Client can use getKey, getKeyLength, getValue,
 * and getValueLength to get object data of current object.
 */
 class IndexLookup {
       ...
   private:
       /// Max number of concurrent indexedRead RPCs
       static const uint8_t NUM_READ_RPC = 10;
       /// Max number of PKHashes that can be sent in one
       /// indexedRead RPC
       static const uint32_t MAX_PKHASHES_PERRPC = 256;
       /// Max number of PKHashes that activeHashes can
       /// hold at once.
       static const size_t MAX_NUM_PK = (1 << LG_BUFFER_SIZE);
 }

在进一步阅读之前,请先查看您是否可以使用此注释确定问题所在。这是我发现的问题:

  • 第一段的大部分与实现有关,而不是接口。举一个例子,用户不需要知道用于与服务器通信的特定远程过程调用的名称。在第一段的后半部分中提到的配置参数都是所有私有变量,它们仅与类的维护者相关,而与类的用户无关。所有这些实现信息都应从注释中省略。
  • 该评论还包括一些显而易见的事情。例如,不需要告诉用户包括 IndexLookup.h:任何编写 C ++代码的人都可以猜测这是必要的。另外,“通过提供所有必要的信息”一词毫无意义,因此可以省略。

对此类的简短注释就足够了(并且更可取):

/*
 * This class is used by client applications to make range queries
 * using indexes. Each instance represents a single range query.
 *
 * To start a range query, a client creates an instance of this
 * class. The client can then call getNext() to retrieve the objects
 * in the desired range. For each object returned by getNext(), the
 * caller can invoke getKey(), getKeyLength(), getValue(), and
 * getValueLength() to get information about that object.
 */

注释的最后一段不是严格必需的,因为它主要针对单个方法复制了注释中的信息。但是,在类文档中提供示例来说明其方法如何协同工作可能会有所帮助,特别是对于使用模式不明显的深层类尤其如此。注意,新注释未提及 getNext 的 NULL 返回值。此注释无意记录每种方法的每个细节;它只是提供高级信息,以帮助读者了解这些方法如何协同工作以及何时可以调用每种方法。有关详细信息,读者可以参考接口注释中的各个方法。此注释也没有提到服务器崩溃;这是因为此类服务器的用户看不到服务器崩溃(系统会自动从中恢复)。

接口文档(例如方法的文档)描述了不需要使用要记录的事物的实现详细信息时,就会出现此红色标记。

现在考虑以下代码,该代码显示 IndexLookup 中 isReady 方法的文档的第一版:

/**
 * Check if the next object is RESULT_READY. This function is
 * implemented in a DCFT module, each execution of isReady() tries
 * to make small progress, and getNext() invokes isReady() in a
 * while loop, until isReady() returns true.
 *
 * isReady() is implemented in a rule-based approach. We check
 * different rules by following a particular order, and perform
 * certain actions if some rule is satisfied.
 *
 * \return
 *         True means the next Object is available. Otherwise, return
 *         false.
 */
bool IndexLookup::isReady() { ... }

再一次,本文档中的大多数内容,例如对 DCFT 的引用以及整个第二段,都与实现有关,因此不属于此处。这是接口注释中最常见的错误之一。某些实现文档很有用,但应放在方法内部,在该方法中应将其与接口文档明确分开。此外,文档的第一句话是含糊的(RESULT_READY 是什么意思?),并且缺少一些重要信息。最后,无需在此处描述 getNext 的实现。这是注释的更好版本:

/*
 * Indicates whether an indexed read has made enough progress for
 * getNext to return immediately without blocking. In addition, this
 * method does most of the real work for indexed reads, so it must
 * be invoked (either directly, or indirectly by calling getNext) in
 * order for the indexed read to make progress.
 *
 * \return
 *         True means that the next invocation of getNext will not block
 *         (at least one object is available to return, or the end of the
 *         lookup has been reached); false means getNext may block.
 */

注释版本提供了有关“就绪”含义的更精确信息,并且提供了重要信息,如果要继续进行索引检索,则必须最终调用此方法。

One of the most important roles for comments is to define abstractions. Recall from Chapter 4 that an abstraction is a simplified view of an entity, which preserves essential information but omits details that can safely be ignored. Code isn’t suitable for describing abstractions; it’s too low level and it includes implementation details that shouldn’t be visible in the abstraction. The only way to describe an abstraction is with comments. If you want code that presents good abstractions, you must document those abstractions with comments.

The first step in documenting abstractions is to separate interface comments from implementation comments. Interface comments provide information that someone needs to know in order to use a class or method; they define the abstraction. Implementation comments describe how a class or method works internally in order to implement the abstraction. It’s important to separate these two kinds of comments, so that users of an interface are not exposed to implementation details. Furthermore, these two forms had better be different. If interface comments must also describe the implementation, then the class or method is shallow. This means that the act of writing comments can provide clues about the quality of a design; Chapter 15 will return to this idea.

The interface comment for a class provides a high-level description of the abstraction provided by the class, such as the following:

/**
 * This class implements a simple server-side interface to the HTTP
 * protocol: by using this class, an application can receive HTTP
 * requests, process them, and return responses. Each instance of
 * this class corresponds to a particular socket used to receive
 * requests. The current implementation is single-threaded and
 * processes one request at a time.
 */
public class Http {...}

This comment describes the overall capabilities of the class, without any implementation details or even the specifics of particular methods. It also describes what each instance of the class represents. Finally, the comments describe the limitations of the class (it does not support concurrent access from multiple threads), which may be important to developers contemplating whether to use it.

The interface comment for a method includes both higher-level information for abstraction and lower-level details for precision:

  • The comment usually starts with a sentence or two describing the behavior of the method as perceived by callers; this is the higher-level abstraction.
  • The comment must describe each argument and the return value (if any). These comments must be very precise, and must describe any constraints on argument values as well as dependencies between arguments.
  • If the method has any side effects, these must be documented in the interface comment. A side effect is any consequence of the method that affects the future behavior of the system but is not part of the result. For example, if the method adds a value to an internal data structure, which can be retrieved by future method calls, this is a side effect; writing to the file system is also a side effect.
  • A method’s interface comment must describe any exceptions that can emanate from the method.
  • If there are any preconditions that must be satisfied before a method is invoked, these must be described (perhaps some other method must be invoked first; for a binary search method, the list being searched must be sorted). It is a good idea to minimize preconditions, but any that remain must be documented.

Here is the interface comment for a method that copies data out of a Buffer object:

/**
 * Copy a range of bytes from a buffer to an external location.
 *
 * \param offset
 *        Index within the buffer of the first byte to copy.
 * \param length
 *        Number of bytes to copy.
 * \param dest
 *        Where to copy the bytes: must have room for at least
 *        length bytes.
 *
 * \return
 *        The return value is the actual number of bytes copied,
 *        which may be less than length if the requested range of
 *        bytes extends past the end of the buffer. 0 is returned
 *        if there is no overlap between the requested range and
 *        the actual buffer.
 */
uint32_t
Buffer::copy(uint32_t offset, uint32_t length, void* dest)
...

The syntax of this comment (e.g., \return) follows the conventions of Doxygen, a program that extracts comments from C/C++ code and compiles them into Web pages. The goal of the comment is to provide all the information a developer needs in order to invoke the method, including how special cases are handled (note how this method follows the advice of Chapter 10 and defines out of existence any errors associated with the range specification). The developer should not need to read the body of the method in order to invoke it, and the interface comment provides no information about how the method is implemented, such as how it scans its internal data structures to find the desired data.

For a more extended example, let’s consider a class called IndexLookup, which is part of a distributed storage system. The storage system holds a collection of tables, each of which contains many objects. In addition, each table can have one or more indexes; each index provides efficient access to objects in the table based on a particular field of the object. For example, one index might be used to look up objects based on their name field, and another index might be used to look up objects based on their age field. With these indexes, applications can quickly extract all of the objects with a particular name, or all of those with an age in a given range.

The IndexLookup class provides a convenient interface for performing indexed lookups. Here is an example of how it might be used in an application:

query = new IndexLookup(table, index, key1, key2);
while  (true) {
    object = query.getNext();
    if  (object == NULL) {
        break;
    }
    ... process object ...
}

The application first constructs an object of type IndexLookup, providing arguments that select a table, an index, and a range within the index (for example, if the index is based on an age field, key1 and key2 might be specified as 21 and 65 to select all objects with ages between those values). Then the application calls the getNext method repeatedly. Each invocation returns one object that falls within the desired range; once all of the matching objects have been returned, getNext returns NULL. Because the storage system is distributed, the implementation of this class is somewhat complex. The objects in a table may be spread across multiple servers, and each index may also be distributed across a different set of servers; the code in the IndexLookup class must first communicate with all of the relevant index servers to collect information about the objects in the range, then it must communicate with the servers that actually store the objects in order to retrieve their values.

Now let’s consider what information needs to be included in the interface comment for this class. For each piece of information given below, ask yourself whether a developer needs to know that information in order to use the class (my answers to the questions are at the end of the chapter):

  • The format of messages that the IndexLookup class sends to the servers holding indexes and objects.
  • The comparison function used to determine whether a particular object falls in the desired range (is comparison done using integers, floating-point numbers, or strings?).
  • The data structure used to store indexes on servers.
  • Whether or not IndexLookup issues multiple requests to different servers concurrently.
  • The mechanism for handling server crashes.

Here is the original version of the interface comment for the IndexLookup class; the excerpt also includes a few lines from the class’s definition, which are referred to in the comment:

/*
 * This class implements the client side framework for index range
 * lookups. It manages a single LookupIndexKeys RPC and multiple
 * IndexedRead RPCs. Client side just includes "IndexLookup.h" in
 * its header to use IndexLookup class. Several parameters can be set
 * in the config below:
 * - The number of concurrent indexedRead RPCs
 * - The max number of PKHashes a indexedRead RPC can hold at a time
 * - The size of the active PKHashes
 *
 * To use IndexLookup, the client creates an object of this class by
 * providing all necessary information. After construction of
 * IndexLookup, client can call getNext() function to move to next
 * available object. If getNext() returns NULL, it means we reached
 * the last object. Client can use getKey, getKeyLength, getValue,
 * and getValueLength to get object data of current object.
 */
 class IndexLookup {
       ...
   private:
       /// Max number of concurrent indexedRead RPCs
       static const uint8_t NUM_READ_RPC = 10;
       /// Max number of PKHashes that can be sent in one
       /// indexedRead RPC
       static const uint32_t MAX_PKHASHES_PERRPC = 256;
       /// Max number of PKHashes that activeHashes can
       /// hold at once.
       static const size_t MAX_NUM_PK = (1 << LG_BUFFER_SIZE);
 }

Before reading further, see if you can identify the problems with this comment. Here are the problems that I found:

  • Most of the first paragraph concerns the implementation, not the interface. As one example, users don’t need to know the names of the particular remote procedure calls used to communicate with the servers. The configuration parameters referred to in the second half of the first paragraph are all private variables that are relevant only to the maintainer of the class, not to its users. All of this implementation information should be omitted from the comment.
  • The comment also includes several things that are obvious. For example, there’s no need to tell users to include IndexLookup.h: anyone who writes C++ code will be able to guess that this is necessary. In addition, the text “by providing all necessary information” says nothing, so it can be omitted.

A shorter comment for this class is sufficient (and preferable):

/*
 * This class is used by client applications to make range queries
 * using indexes. Each instance represents a single range query.
 *
 * To start a range query, a client creates an instance of this
 * class. The client can then call getNext() to retrieve the objects
 * in the desired range. For each object returned by getNext(), the
 * caller can invoke getKey(), getKeyLength(), getValue(), and
 * getValueLength() to get information about that object.
 */

The last paragraph of this comment is not strictly necessary, since it mostly duplicates information in the comments for individual methods. However, it can be helpful to have examples in the class documentation that illustrate how its methods work together, particularly for deep classes with usage patterns that are nonobvious. Note that the new comment does not mention NULL return values from getNext. This comment is not intended to document every detail of each method; it just provides high level information to help readers understand how the methods work together and when each method might be invoked. For details, readers can refer to the interface comments for individual methods. This comment also does not mention server crashes; that is because server crashes are invisible to users of this class (the system automatically recovers from them).

This red flag occurs when interface documentation, such as that for a method, describes implementation details that aren’t needed in order to use the thing being documented.

Now consider the following code, which shows the first version of the documentation for the isReady method in IndexLookup:

/**
 * Check if the next object is RESULT_READY. This function is
 * implemented in a DCFT module, each execution of isReady() tries
 * to make small progress, and getNext() invokes isReady() in a
 * while loop, until isReady() returns true.
 *
 * isReady() is implemented in a rule-based approach. We check
 * different rules by following a particular order, and perform
 * certain actions if some rule is satisfied.
 *
 * \return
 *         True means the next Object is available. Otherwise, return
 *         false.
 */
bool IndexLookup::isReady() { ... }

Once again, most of this documentation, such as the reference to DCFT and the entire second paragraph, concerns the implementation, so it doesn’t belong here; this is one of the most common errors in interface comments. Some of the implementation documentation is useful, but it should go inside the method, where it will be clearly separated from interface documentation. In addition, the first sentence of the documentation is cryptic (what does RESULT_READY mean?) and some important information is missing. Finally, it isn’t necessary to describe the implementation of getNext here. Here is a better version of the comment:

/*
 * Indicates whether an indexed read has made enough progress for
 * getNext to return immediately without blocking. In addition, this
 * method does most of the real work for indexed reads, so it must
 * be invoked (either directly, or indirectly by calling getNext) in
 * order for the indexed read to make progress.
 *
 * \return
 *         True means that the next invocation of getNext will not block
 *         (at least one object is available to return, or the end of the
 *         lookup has been reached); false means getNext may block.
 */

This version of the comment provides more precise information about what “ready” means, and it provides the important information that this method must eventually be invoked if the indexed retrieval is to move forward.