精通 protobuf 道理之三：一文彻底搞懂反射道理

zjcba 发表于 2023-11-5 08:11

1 说在前面

因为看到网上都是一些零零散散的 protobuf 相关介绍，加上笔者比来因为项目的原因深入分解了一下 protobuf，所以想做一个系统的《精通 protobuf 道理》系列的分享：

[*]「精通 protobuf 道理之一：为什么要使用它以及如何使用」；
[*]「精通 protobuf 道理之二：编码道理分解」；
[*]「精通 protobuf 道理之三：一文彻底搞懂反射道理」；
[*]「精通 protobuf 道理之四：反射实践，和json的彼此转换」；
[*]「精通 protobuf 道理之五：一文彻底搞懂 RPC 道理」；
[*]「精通 protobuf 道理之六：本身动手写一个 RPC 框架」；
[*]「精通 protobuf 道理之七：一文彻底搞懂 Arena 分配器道理分解」。
[*]后续的待定……
本文是系列文章的第三篇，主讲 protobuf 反射道理。本文适合 protobuf 入门、进阶的开发者阅读，是一篇讲道理的文章，主要是深入介绍了 protobuf 反射的底层道理。通过阅读本文，开发者能够对 protobuf 反射道理有深入的理解，对如何更好的运用 protobuf 反射特性提供很大的参考价值。
如果你还在为protobuf 反射道理存在很多问号？
如果让你本身实现一个反射组件，你还不知道怎么实现？
那么通过这篇文章，可以帮你解决这些问题。
文章内容有点长，可能需要阅读 5~10分钟。
2 什么是反射

这里所说的“反射”，指是法式在运行时能够动态的获取到一个类型的元信息的一种操作。而知道了该类型的元信息，就可以操作元信息构造出该类型的实例，并对该实例进行读写操作。和明确地指定一个变量的类型的区别是，后者是在编译阶段就已经生成了该类型的实例，而前者（反射）的过程是在运行时完成，或者说是在运行时推算出该实例的类型。
3 先上一个示例

还是以系列文章第一篇中的 echo.proto 为例。因为笔者感觉从示例入手，分析每一个法式都用到了什么东西，都做了哪些事情，这样能够让读者更容易代入，更容易理解。如果读者还有什么更好的方式，欢迎交流。
先上 echo.proto 源码：
syntax = ”proto3”;
package self;

option cc_generic_services = true;

enum QueryType {
PRIMMARY = 0;
SECONDARY = 1;
};

message EchoRequest {
QueryType querytype = 1;
string payload = 2;
}

message EchoResponse {
int32 code = 1;
string msg = 2;
}

service EchoService {
rpc Echo(EchoRequest) returns(EchoResponse);
}
测试源代码这么写（http://test_reflection.cc）：
#include <iostream>
#include ”proto/echo.pb.h”

void test_relection() {
const std::string type_name = ”self.EchoRequest”;
/*
* ① 在 DescriptorPool 中检索 self.EchoRequest
* Message 类型的 discriptor 元数据
*/
const google::protobuf::Descriptor* descriptor
= google::protobuf::DescriptorPool::generated_pool()
   ->FindMessageTypeByName(type_name);
if (descriptor == nullptr) {
std::cout << ” Cannot found ” << type_name
      << ” in DescriptorPool” << std::endl;
return;
}
/*
* ② 通过 discriptor 元信息在 MessageFactory 检索类型工厂，
* 用于创建该类型的实例。
*/
const google::protobuf::Message* prototype
= google::protobuf::MessageFactory::generated_factory()
   ->GetPrototype(descriptor);
/*
* ③ 创建 self.EchoRequest 类型的 Message 实例。
* google::protobuf::Message 是所有 Message
* 类型的基类。
*/
google::protobuf::Message* req_msg = prototype->New();
/*
* ④ 因为只知道基类的实例指针，需要 Reflection 信息协助判断
* 具体类型。
*/
const google::protobuf::Reflection* req_msg_ref
= req_msg->GetReflection();
/*
* ⑤ 作为开发者，是知道该 Message 是对应哪个类型的，但是法式不知道，
* 开发者告诉法式，试着获取其 payload 字段。
*/
const google::protobuf::FieldDescriptor *req_msg_ref_field_payload
= descriptor->FindFieldByName(”payload”);

/*
* ⑥ Field 信息 + Reflection 信息配合读取 payload 的数据。
*/
std::cout << ”before set, ref_req_msg_payload: ”
         << req_msg_ref->GetString(*req_msg, req_msg_ref_field_payload)
         << std::endl;
/*
* ⑦ Field 信息 + Reflection 信息配合写入 payload 的数据。
*/
req_msg_ref->SetString(req_msg, req_msg_ref_field_payload, ”my payload”);
/*
* ⑧ Field 信息 + Reflection 信息配合再次读取 payload 的数据。
*/
std::cout << ”after set, ref_req_msg_payload: ”
         << req_msg_ref->GetString(*req_msg, req_msg_ref_field_payload)
         << std::endl;
}

int main() {
test_relection();
return 0;
}
看似写了很多源代码，但是其实就做了一个事情，定义个 self::EchoRequest 变量，然后对其进行读和写。编译执行得到成果：
$ ./test_reflection
before set, ref_req_msg_payload:
after set, ref_req_msg_payload: my payloadPS：需要注意的是这里其实有一个坑，笔者猜测可能是编译器优化的原因造成的。现象是会输出 “ Cannot foundin DescriptorPool”。因为main函数中没有使用到 echo.proto 中的任何类型，编译器认为没有使用到echo.proto 的代码，所以不让法式执行以下变量的初始化，从而导致索引没有初始化：
PROTOBUF_ATTRIBUTE_INIT_PRIORITY static ::PROTOBUF_NAMESPACE_ID::internal::AddDescriptorsRunner dynamic_init_dummy_echo_2eproto(&descriptor_table_echo_2eproto);
其道理会在后续的章节中涉及到。
解决法子：在 main 函数中加上一行 self::EchoRequest req 的代码即可，让编译器以为使用到了echo.proto 中的类型。那么接下来笔者的思路就是通过分析 ① ~ ⑧ 各个法式的实现道理来了解反射是如何工作的。

4 道理分析

4.1 DescriptorPool 索引

4.1.1 google::protobuf::Descriptor

前面示例中有使用到 DescriptorPool 的 FindMessageTypeByName 接口函数（如下代码），这里的目的是获取该Message 的元信息。这里获取元信息的过程是一个查表的过程，本节主要了解一下此表的索引是什么道理，以及如何构建的。
const google::protobuf::Descriptor* descriptor
= google::protobuf::DescriptorPool::generated_pool()
   ->FindMessageTypeByName(type_name);
4.1.2 DescriptorPool 索引的构建时机

这里会用到 google::protobuf::internal::AddDescriptorsRunner（下简称 AddDescriptorsRunner），它的实现斗劲简单，如下代码，先看看它的原型。
struct PROTOBUF_EXPORT AddDescriptorsRunner {
explicit AddDescriptorsRunner(const DescriptorTable* table);
};
AddDescriptorsRunner::AddDescriptorsRunner(const DescriptorTable* table) {
AddDescriptors(table);
}
从如上代码中，可以看出执行构造函数的时候会触发 Descriptor 表的构建。那什么情况下会执行构造函数呢？我们在 http://echo.pb.cc 源代码文件中找到这样一行代码（如下），这行代码的感化是定义一个静态类型的 AddDescriptorsRunner 类型的变量，因为是静态类型的，所以在法式启动是生成，在法式退出时销毁。而在定义该变量时回触发构造函数的调用，所以我们不难理解，DescriptorPool 索引的构建时机是法式启动的时候，销毁时机是在法式退出的时候。
PROTOBUF_ATTRIBUTE_INIT_PRIORITY static ::PROTOBUF_NAMESPACE_ID::internal::AddDescriptorsRunner dynamic_init_dummy_echo_2eproto(&descriptor_table_echo_2eproto);
但是看到后面的章节你会发现，这里法式启动生成的索引只是很小一部门，在使用的时候（也就是查询的时候）还会触发构建一个完成的索引数据。具体原因也在该处说明。
4.1.3 DescriptorPool索引的构建道理

我们先从 AddDescriptors 函数开始分析。
void AddDescriptors(const DescriptorTable* table) {
if (table->is_initialized) return;
table->is_initialized = true;
AddDescriptorsImpl(table);
}

void AddDescriptorsImpl(const DescriptorTable* table) {
// Reflection refers to the default fields so make sure they are initialized.
internal::InitProtobufDefaults();

// Ensure all dependent descriptors are registered to the generated descriptor
// pool and message factory.
int num_deps = table->num_deps;
for (int i = 0; i < num_deps; i++) {
// In case of weak fields deps could be null.
if (table->deps) AddDescriptors(table->deps);
}

// Register the descriptor of this file.
DescriptorPool::InternalAddGeneratedFile(table->descriptor, table->size);
MessageFactory::InternalRegisterGeneratedFile(table);
}
可以看出，总共三个法式：

[*]初始化变量：反射需要使用的变量，先确保其已经初始化了；
[*]解析依赖：如果有import 其他proto 源文件，那么先解析其他proto源文件，存在 deps 中。
[*]注册 Descriptor：

[*]构建 DescriptorPool 索引（DescriptorPool::InternalAddGeneratedFile）；
[*]构建MessageFactory 索引（MessageFactory::InternalRegisterGeneratedFile）。

本节我们主要分析构建 DescriptorPool 索引的实现道理，至于MessageFactory 索引的实现道理我们不才一节中再详细分析。
database 是一个斗劲抽象的名称，database 底层实现其实就是索引 index_，这里会先把文件定义信息先解析，主要是确定 encoded_file_descriptor 信息是正确的。
void DescriptorPool::InternalAddGeneratedFile(
const void* encoded_file_descriptor, int size) {
GOOGLE_CHECK(GeneratedDatabase()->Add(encoded_file_descriptor, size));
}

bool EncodedDescriptorDatabase::Add(const void* encoded_file_descriptor, int size) {
FileDescriptorProto file;
if (file.ParseFromArray(encoded_file_descriptor, size)) {
return index_->AddFile(file, std::make_pair(encoded_file_descriptor, size));
} else {
GOOGLE_LOG(ERROR) << ”Invalid file descriptor data passed to ”
               ”EncodedDescriptorDatabase::Add().”;
return false;
}
}
index_ 的类型是 DescriptorIndex的指针。
std::unique_ptr<DescriptorIndex> index_;
DescriptorIndex 的索引如下：
/*数据表，存的是最终数据，包罗：
/* - 文件元数据
* - 标签元数据
* - 扩展元数据 */
std::vector<EncodedEntry> all_values_;
/* 文件元数据索引，指向 all_values_ 的位置（即下标） */
std::set<FileEntry, FileCompare> by_name{FileCompare{*this}};
std::vector<FileEntry> by_name_flat;
/* 标签元数据索引，包罗：
* - message
* - enum
* - externion
* - service */
std::set<SymbolEntry, SymbolCompare> by_symbol{SymbolCompare{*this}};
std::vector<SymbolEntry> by_symbol_flat;
/* 扩展元数据索引 */
std::set<ExtensionEntry, ExtensionCompare> by_extension{ExtensionCompare{*this}};
std::vector<ExtensionEntry> by_extension_flat;
关系图如下图所示：

AddFile 实现如下：
template <typename FileProto>
bool EncodedDescriptorDatabase::DescriptorIndex::AddFile(const FileProto& file,
                                                      Value value) {
// We push `value` into the array first. This is important because the AddXXX
// functions below will expect it to be there.

//############## 数据表 ##############
all_values_.push_back({value.first, value.second, {}});

if (!ValidateSymbolName(file.package())) {
GOOGLE_LOG(ERROR) << ”Invalid package name: ” << file.package();
return false;
}
all_values_.back().encoded_package = EncodeString(file.package());

//############## 索引表 ##############
// 1. 文件元信息索引
if (!InsertIfNotPresent(
      &by_name_, FileEntry{static_cast<int>(all_values_.size() - 1),
                           EncodeString(file.name())}) ||
   std::binary_search(by_name_flat_.begin(), by_name_flat_.end(),
                     file.name(), by_name_.key_comp())) {
GOOGLE_LOG(ERROR) << ”File already exists in database: ” << file.name();
return false;
}

// 2. 类型索引
// - 所有类型城市进 symbol 表
// - externsion 会进入 externsion 表
for (const auto& message_type : file.message_type()) {
if (!AddSymbol(message_type.name())) return false;
if (!AddNestedExtensions(file.name(), message_type)) return false;
}
for (const auto& enum_type : file.enum_type()) {
if (!AddSymbol(enum_type.name())) return false;
}
for (const auto& extension : file.extension()) {
if (!AddSymbol(extension.name())) return false;
if (!AddExtension(file.name(), extension)) return false;
}
for (const auto& service : file.service()) {
if (!AddSymbol(service.name())) return false;
}

return true;
}
讲到这里，构建索引的实现道理就告了一段落，但是你以为构建索引已经结束了吗？当然没有！不才一节中分析。
4.1.4 DescriptorPool索引的查询过程

我们还是从一行代码开始（DescriptorPool 的 FindMessageTypeByName 接口函数）。
const google::protobuf::Descriptor* descriptor
= google::protobuf::DescriptorPool::generated_pool()
->FindMessageTypeByName(”self.EchoRequest”);
FindMessageTypeByName 函数实际上调用了 tables_ 成员的 FindByNameHelper 成员函数。
const Descriptor* DescriptorPool::FindMessageTypeByName(
ConstStringParam name) const {
Symbol result = tables_->FindByNameHelper(this, name);
return (result.type == Symbol::MESSAGE) ? result.descriptor : nullptr;
}
FindByNameHelper 函数也并不复杂，首先查表，如果miss，就会调用 TryFindSymbolInFallbackDatabase 进行索引构建（这里会用到之前讲过的 DescriptorIndex的信息）。源代码如下：
这里刻意略过 underlay，underlay 这个特性笔者猜测是为了效率实现的多层cache，underlay 也就是下层的意思，逻辑都是一样的，这里我们没有涉及 underlay，就先不展开分析。Symbol DescriptorPool::Tables::FindByNameHelper(const DescriptorPool* pool,
                                             StringPiece name) {
if (pool->mutex_ != nullptr) {
// Fast path: the Symbol is already cached.This is just a hash lookup.
ReaderMutexLock lock(pool->mutex_);
if (known_bad_symbols_.empty() && known_bad_files_.empty()) {
   Symbol result = FindSymbol(name);
   if (!result.IsNull()) return result;
}
}
MutexLockMaybe lock(pool->mutex_);
if (pool->fallback_database_ != nullptr) {
known_bad_symbols_.clear();
known_bad_files_.clear();
}
Symbol result = FindSymbol(name);

if (result.IsNull() && pool->underlay_ != nullptr) {
// Symbol not found; check the underlay.
result = pool->underlay_->tables_->FindByNameHelper(pool->underlay_, name);
}

if (result.IsNull()) {
// Symbol still not found, so check fallback database.
if (pool->TryFindSymbolInFallbackDatabase(name)) {
   result = FindSymbol(name);
}
}

return result;
}
分析 FindSymbol ，发现其查的是 symbols_by_name_ 这个索引表（其定义如下），但是这个表我们还没有构建啊。是的，之前没有构建过，但是为什么需要等待这个时候才构建呢？笔者认为有两个原因：一个是内存占用原因，如果没有改proto文件没有被使用到，就不需要前置构建，占用内存；另一个是启动效率原因，没有必要为了没有被使用到的proto文件做无用功，而且就算后续使用到了再构建，也只是第一个使用者会牺牲一些效率（如果读者有认为是其他什么原因导致这样设计，欢迎来交流和探讨）。
typedef HASH_MAP<StringPiece, Symbol, HASH_FXN<StringPiece>> SymbolsByNameMap;
class DescriptorPool::Tables {
...
SymbolsByNameMap symbols_by_name_;
}
最终会使用 DescriptorBuilder来进行 symbol 索引的构建：TryFindSymbolInFallbackDatabase -> BuildFileFromDatabase -> DescriptorBuilder().BuildFile(proto) ，BuildFile -> BuildFileImpl，BuildFileImpl 如下：
FileDescriptor* BuildFileImpl(const FileDescriptorProto& proto) {
...
BUILD_ARRAY(proto, result, message_type, BuildMessage, nullptr);
BUILD_ARRAY(proto, result, enum_type, BuildEnum, nullptr);
BUILD_ARRAY(proto, result, service, BuildService, nullptr);
BUILD_ARRAY(proto, result, extension, BuildExtension, nullptr);
...
}
BuildFileImpl 会针对每个 message_type、enum_type、service、extension 构建索引，举一个 BuildMessage 的例子。
void BuildMessage(const DescriptorProto& proto,
               const Descriptor* parent,
               Descriptor* result) {
...
BUILD_ARRAY(proto, result, oneof_decl, BuildOneof, result);
BUILD_ARRAY(proto, result, field, BuildField, result);
BUILD_ARRAY(proto, result, nested_type, BuildMessage, result);
BUILD_ARRAY(proto, result, enum_type, BuildEnum, result);
BUILD_ARRAY(proto, result, extension_range, BuildExtensionRange, result);
BUILD_ARRAY(proto, result, extension, BuildExtension, result);
BUILD_ARRAY(proto, result, reserved_range, BuildReservedRange, result);
...
AddSymbol(...);
)
再看 AddSymbol 函数，从其代码可以看出会写两个表：一个是 symbols_by_name_ ，另一个是 symbols_by_parernt_ ，前者是通过定名来查找，后者是通过父类型调用触发的查找。
bool DescriptorBuilder::AddSymbol(const std::string& full_name,
                              const void* parent, const std::string& name,
                              const Message& proto, Symbol symbol) {
...
if (tables_->AddSymbol(full_name, symbol)) {
if (!file_tables_->AddAliasUnderParent(parent, name, symbol)) {
..
}
这两个表分布在分歧的处所，symbols_by_name_ 是 DescriptorPool::Tables 类中，一般是全局搜索某个类型需要调用到，而symbols_by_parent_ 是在 FileDescriptorTables 类中，一般我们用来查询当前类型的某个字段（即field）用到斗劲多。
typedef HASH_MAP<StringPiece, Symbol, HASH_FXN<StringPiece>> SymbolsByNameMap;
class DescriptorPool::Tables {
...
SymbolsByNameMap symbols_by_name_;
}

class FileDescriptorTables {
...
SymbolsByParentMap symbols_by_parent_;
}
Symbol 是个抽象的类型，可以暗示proto文件中的所有类型。所以，symbols_by_name_ 和 symbols_by_parent_ 这两个表也用来存储所有的类型。

4.2 MessageFactory 索引

4.2.1 google::protobuf::Message

这是所有Message 类型的基类，所以用他来暗示索引的类型。
4.2.2 MessageFactory 索引的构建时机

和DescriptorPool索引的构建时机不异，法式启动的时候构建了一部门索引，而在使用（也就是查询的时候）还会触发构建完整的Message索引数据。
4.2.3 MessageFactory 索引的构建道理

还是从AddDescriptorsImpl函数接口开始。这个函数是在法式启动的时候触发执行的。
void AddDescriptorsImpl(const DescriptorTable* table) {
...
MessageFactory::InternalRegisterGeneratedFile(table);
}
void MessageFactory::InternalRegisterGeneratedFile(
const google::protobuf::internal::DescriptorTable* table) {
GeneratedMessageFactory::singleton()->RegisterFile(table);
}
GeneratedMessageFactory 类的定义如下，我们主要存眷两个成员 file_map_和 type_map_ ，但实际上最有用的是type_map_ ， file_map_ 只是辅助感化。那为什么这里只是构建了 type_map_ 呢？笔者认为和 DescriptorPool 索引的原因是一样的，一个是内存占用原因，另一个是启动效率原因（如果有认为是其他什么原因导致这样设计，欢迎探讨）。
class GeneratedMessageFactory final : public MessageFactory {
public:
//构建 file_map_
void RegisterFile(const google::protobuf::internal::DescriptorTable* table);
//构建 type_map_
void RegisterType(const Descriptor* descriptor, const Message* prototype);
const Message* GetPrototype(const Descriptor* type) override;

private:
// Only written at static init time, so does not require locking.
HASH_MAP<StringPiece, const google::protobuf::internal::DescriptorTable*,
      STR_HASH_FXN> file_map_;
...
std::unordered_map<const Descriptor*, const Message*> type_map_;
};

4.2.4 MessageFactory 索引的查询过程

从开发者怎么使用说起吧。开发者一般是调用 GetPrototype 函数来获取Messgae 实例。
const google::protobuf::Message* prototype
= google::protobuf::MessageFactory::generated_factory()
   ->GetPrototype(descriptor);
GetPrototype函数的逻辑也很简单，先查 type_map_ 表，如果查不到，再按照 file_map_ 中的信息构建 type_map_ 表索引（见internal::RegisterFileLevelMetadata 函数）。
const Message* GeneratedMessageFactory::GetPrototype(const Descriptor* type) {
{
/* 如果是第一次查询，那这里的查询成果是 Miss */
ReaderMutexLock lock(&mutex_);
const Message* result = FindPtrOrNull(type_map_, type);
if (result != NULL) return result;
}
...
// Apparently the file hasn't been registered yet.Let's do that now.
const internal::DescriptorTable* registration_data =
   FindPtrOrNull(file_map_, type->file()->name().c_str());
...
WriterMutexLock lock(&mutex_);

/* 如果查询成果为 Miss
* 那么需要调用 internal::RegisterFileLevelMetadata 构建 type_map_ 索引 */

// Check if another thread preempted us.
const Message* result = FindPtrOrNull(type_map_, type);
if (result == NULL) {
// Nope.OK, register everything.
internal::RegisterFileLevelMetadata(registration_data);
// Should be here now.
result = FindPtrOrNull(type_map_, type);
}
...
return result;
}
RegisterFileLevelMetadata 函数的一系列实现如下：
void RegisterFileLevelMetadata(const DescriptorTable* table) {
AssignDescriptors(table);
RegisterAllTypesInternal(table->file_level_metadata, table->num_messages);
}
void RegisterAllTypesInternal(const Metadata* file_level_metadata, int size) {
for (int i = 0; i < size; i++) {
const Reflection* reflection = file_level_metadata.reflection;
MessageFactory::InternalRegisterGeneratedMessage(
   file_level_metadata.descriptor,
   reflection->schema_.default_instance_);
}
}
void MessageFactory::InternalRegisterGeneratedMessage(
const Descriptor* descriptor, const Message* prototype) {
GeneratedMessageFactory::singleton()->RegisterType(descriptor, prototype);
}
void GeneratedMessageFactory::RegisterType(const Descriptor* descriptor,
                                       const Message* prototype) {
...
if (!InsertIfNotPresent(&type_map_, descriptor, prototype)) {
GOOGLE_LOG(DFATAL) << ”Type is already registered: ” << descriptor->full_name();
}
}
最终是把 prototype（也就是reflection->schema.default_instance ）插入 type_map_ 表中，我们回到 http://echo.pb.cc 源代码文件，见以下源代码：
struct EchoRequestDefaultTypeInternal {
constexpr EchoRequestDefaultTypeInternal()
: _instance(::PROTOBUF_NAMESPACE_ID::internal::ConstantInitialized{}) {}
~EchoRequestDefaultTypeInternal() {}
union {
EchoRequest _instance;
};
};
PROTOBUF_ATTRIBUTE_NO_DESTROY PROTOBUF_CONSTINIT EchoRequestDefaultTypeInternal _EchoRequest_default_instance_;
default_instance_ 指向的就是 instance。因为Message 都实现了 New 函数，可以通过 default_instance->New()创建出 Message 实例，即使不知道其真实类型是 EchoRequest。
inline EchoRequest* New() const final {
return new EchoRequest();
}

4.3 实例创建接口

通过 New 函数接口实现，实际上调用的 EchoRequest 的 New 函数，返回值为EchoRequest *，而EchoRequest担任了 google::protobuf::Message 类。
google::protobuf::Message* req_msg = prototype->New();
4.4 Reflection 成员

还是以EchoRequest为例子。
message EchoRequest {
QueryType querytype = 1;
string payload = 2;
}
如果需要对 payload 字段读写，那我们直接使用 set_payload 和 get_payload 这两个函数接口就可以了。但是如果是使用 google::protobuf::Message 基类指针类型来操作，它是没有 set_payload 和 get_payload 这两个接口函数的。这个时候 Reflection （即 google::protobuf::Reflection）呈现了，它类似一个代办代理人的角色，可以辅佐做一些读写的操作。如下 SetString、GetString 函数。Reflection 类过于复杂，这里就不详细分析，感兴趣的读者可以自行阅读源代码。
class PROTOBUF_EXPORT Reflection final {
public:
...
void SetString(Message* message, const FieldDescriptor* field,
            std::string value) const;
std::string GetString(const Message& message,
                     const FieldDescriptor* field) const;
...
};
4.5 字段索引（Field）

前面「4.1.4 DescriptorPool索引的查询过程」章节中介绍了构建symbol 索引的过程，字段（即field）索引也是在阿谁时候解析并构建的。如下使用到了 BuildField 函数进行字段索引构建。
void BuildField(const FieldDescriptorProto& proto, Descriptor* parent,
            FieldDescriptor* result) {
BuildFieldOrExtension(proto, parent, result, false);
}

void BuildFieldOrExtension(const FieldDescriptorProto& proto,
                        Descriptor* parent, FieldDescriptor* result,
                        bool is_extension);
...
AddSymbol(result->full_name(), parent, result->name(), proto, Symbol(result));
}
protobuf 中使用FieldDescriptor来描述字段（field），如下所示：
class PROTOBUF_EXPORT FieldDescriptor {
public:
typedef FieldDescriptorProto Proto;

// Identifies a field type.0 is reserved for errors.The order is weird
// for historical reasons.Types 12 and up are new in proto2.
enum Type {
TYPE_DOUBLE = 1, // double, exactly eight bytes on the wire.
TYPE_FLOAT = 2, // float, exactly four bytes on the wire.
TYPE_INT64 = 3, // int64, varint on the wire.Negative numbers
                     // take 10 bytes.Use TYPE_SINT64 if negative
                     // values are likely.
TYPE_UINT64 = 4, // uint64, varint on the wire.
TYPE_INT32 = 5, // int32, varint on the wire.Negative numbers
                     // take 10 bytes.Use TYPE_SINT32 if negative
                     // values are likely.
TYPE_FIXED64 = 6, // uint64, exactly eight bytes on the wire.
TYPE_FIXED32 = 7, // uint32, exactly four bytes on the wire.
TYPE_BOOL = 8,    // bool, varint on the wire.
TYPE_STRING = 9, // UTF-8 text.
TYPE_GROUP = 10, // Tag-delimited message.Deprecated.
TYPE_MESSAGE = 11,// Length-delimited message.

TYPE_BYTES = 12, // Arbitrary byte array.
TYPE_UINT32 = 13, // uint32, varint on the wire
TYPE_ENUM = 14,    // Enum, varint on the wire
TYPE_SFIXED32 = 15,// int32, exactly four bytes on the wire
TYPE_SFIXED64 = 16,// int64, exactly eight bytes on the wire
TYPE_SINT32 = 17, // int32, ZigZag-encoded varint on the wire
TYPE_SINT64 = 18, // int64, ZigZag-encoded varint on the wire

MAX_TYPE = 18,// Constant useful for defining lookup tables
               // indexed by Type.
};

// Specifies the C++ data type used to represent the field.There is a
// fixed mapping from Type to CppType where each Type maps to exactly one
// CppType.0 is reserved for errors.
enum CppType {
CPPTYPE_INT32 = 1, // TYPE_INT32, TYPE_SINT32, TYPE_SFIXED32
CPPTYPE_INT64 = 2, // TYPE_INT64, TYPE_SINT64, TYPE_SFIXED64
CPPTYPE_UINT32 = 3, // TYPE_UINT32, TYPE_FIXED32
CPPTYPE_UINT64 = 4, // TYPE_UINT64, TYPE_FIXED64
CPPTYPE_DOUBLE = 5, // TYPE_DOUBLE
CPPTYPE_FLOAT = 6, // TYPE_FLOAT
CPPTYPE_BOOL = 7,    // TYPE_BOOL
CPPTYPE_ENUM = 8,    // TYPE_ENUM
CPPTYPE_STRING = 9, // TYPE_STRING, TYPE_BYTES
CPPTYPE_MESSAGE = 10,// TYPE_MESSAGE, TYPE_GROUP

MAX_CPPTYPE = 10,// Constant useful for defining lookup tables
                  // indexed by CppType.
};

// Identifies whether the field is optional, required, or repeated.0 is
// reserved for errors.
enum Label {
LABEL_OPTIONAL = 1,// optional
LABEL_REQUIRED = 2,// required
LABEL_REPEATED = 3,// repeated

MAX_LABEL = 3,// Constant useful for defining lookup tables
               // indexed by Label.
};
...
//因为一个field 只有一个类型，
//所以使用内联布局，节省内存，
union {
int32 default_value_int32_;
int64 default_value_int64_;
uint32 default_value_uint32_;
uint64 default_value_uint64_;
float default_value_float_;
double default_value_double_;
bool default_value_bool_;

mutable const EnumValueDescriptor* default_value_enum_;
const std::string* default_value_string_;
mutable std::atomic<const Message*> default_generated_instance_;
};
...
};
4.6 操作反射访谒字段的API

结合 Reflection 来分析一下 field 的使用。看 field->default_value_string() 这一行，其实是返回了上述 union 中的 default_value_string_ 成员。
4.6.1 Reflection::GetString

std::string Reflection::GetString(const Message& message,
                              const FieldDescriptor* field) const {
USAGE_CHECK_ALL(GetString, SINGULAR, STRING);
if (field->is_extension()) {
return GetExtensionSet(message).GetString(field->number(),
                                          field->default_value_string());
} else {
if (schema_.InRealOneof(field) && !HasOneofField(message, field)) {
   return field->default_value_string();
}
switch (field->options().ctype()) {
   default:// TODO(kenton):Support other string reps.
   case FieldOptions::STRING: {
   if (auto* value =
            GetField<ArenaStringPtr>(message, field).GetPointer()) {
      return *value;
   }
   return field->default_value_string();
   }
}
}
}
4.6.2 Reflection::SetString

void Reflection::SetString(Message* message, const FieldDescriptor* field,
                        std::string value) const {
USAGE_CHECK_ALL(SetString, SINGULAR, STRING);
if (field->is_extension()) {
return MutableExtensionSet(message)->SetString(
   field->number(), field->type(), std::move(value), field);
} else {
switch (field->options().ctype()) {
   default:// TODO(kenton):Support other string reps.
   case FieldOptions::STRING: {
   // Oneof string fields are never set as a default instance.
   // We just need to pass some arbitrary default string to make it work.
   // This allows us to not have the real default accessible from
   // reflection.
   const std::string* default_ptr =
         schema_.InRealOneof(field)
            ? nullptr
            : DefaultRaw<ArenaStringPtr>(field).GetPointer();
   if (schema_.InRealOneof(field) && !HasOneofField(*message, field)) {
      ClearOneof(message, field->containing_oneof());
      MutableField<ArenaStringPtr>(message, field)
         ->UnsafeSetDefault(default_ptr);
   }
   MutableField<ArenaStringPtr>(message, field)
         ->Set(default_ptr, std::move(value),
               message->GetArenaForAllocation());
   break;
   }
}
}
}
5 小结一下

[*]法式启动时会先初始化一部门索引，这是轻量级的，只是一些基础的数据，为下一步构建全量索引做筹备。

[*]DescriptorTool 的 EncodedEntry、FileEntry、SymbolEntry、ExtensionEntry；
[*]MessageFactory的 name->DescriptorTable。

[*]当使用到某个类型的时候，会触发对该proto文件的全量索引的构建；

[*]DescriptorTool 的 name-> Symbol、parent -> Symbol；
[*]MessageFactory 的 Descriptor->Message。

[*]Reflection 作为 Message 的一个代办代理人，结合 Descriptor 和 Message的接口，对 Message 进行读写操作。
[*]反射的一般使用场景：

[*]和其他数据布局比如 json、xml 等的彼此转换；
[*]保举系统中的特征抽取（平台化，所以需要对数据类型进行可配置化）。

页: [1]

Unity开发者联盟's Archiver

精通 protobuf 道理之三：一文彻底搞懂反射道理