Fuzz测试学习 | WuJing's Blog

模糊测试（fuzz testing, fuzzing）是一种软件测试技术。其核心思想是将自动或半自动生成的随机数据输入到一个程序中，并监视程序异常，如崩溃，断言（assertion）失败，以发现可能的程序错误，比如内存泄漏。模糊测试常常用于检测软件或计算机系统的安全漏洞。

模糊测试工具主要分为两类，变异测试（mutation-based）以及生成测试（generation-based）。模糊测试可以被用作白盒，灰盒或黑盒测试。

1）基于变异：根据已知数据样本通过变异的方法生成新的测试用例；

2）基于生成：根据已知的协议或接口规范进行建模，生成测试用例；

libFuzzer

libFuzzer 是一个in-process（进程内的），coverage-guided（以覆盖率为引导的），evolutionary（进化的） 的 fuzz 引擎，是 LLVM 项目的一部分。

in-process（进程内的）：we mean that we don’t launch a new process for every test case, and that we mutate inputs directly in memory. 我们并没有为每一个测试用例都开启一个新进程，而是在一个进程内直接将数据投放在内存中。
coverage-guided（以覆盖率为引导的）：we mean that we measure code coverage for every input, and accumulate test cases that increase overall coverage. 我们对每一个输入都进行代码覆盖率的计算，并且不断积累这些测试用例以使代码覆盖率最大化。
evolutionary（进化的）：fuzz按照类型分为3类，这是最后一种。

第一类是基于生成的Generation Based通过对目标协议或文件格式建模的方法，从零开始产生测试用例，没有先前的状态；

第二类为基于突变的Evolutionary基于一些规则，从已有的数据样本或存在的状态变异而来；

最后一种就是基于进化的Evolutionary包含了上述两种，同时会根据代码覆盖率的回馈进行变异。

LibFuzzer和要被测试的库链接在一起，通过一个特殊的模糊测试进入点（目标函数），用测试用例feed（喂）要被测试的库。fuzzer会跟踪哪些代码区域已经测试过，然后在输入数据的语料库上产生变异，来最大化代码覆盖。其中代码覆盖的信息由LLVM的SanitizerCoverage插桩提供。

// fuzz_target.cc
extern "C" int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t Size) {
  DoSomethingInterestingWithMyAPI(Data, Size);
  return 0;  // Non-zero return values are reserved for future use.
}

libfuzzer 生成的测试数据以及测试数据的长度，Fuzz测试用例则把这些生成的测试数据传入到目标程序中让程序来处理测试数据，同时要尽可能的触发更多的代码逻辑。

Dictionaries

字典可以提高Fuzz的性能与效率，对于每一个接口都有特性的数据格式。采用字典预先列举字符序列，fuzz可以直接根据这些感兴趣的字符序列进行组合，可以大量减少无效的尝试，且更容易覆盖更多更深的分支。

# Lines starting with '#' and empty lines are ignored.

# Adds "blah" (w/o quotes) to the dictionary.
kw1="blah"
# Use \\ for backslash and \" for quotes.
kw2="\"ac\\dc\""
# Use \xAB for hex values
kw3="\xF7\xF8"
# the name of the keyword followed by '=' may be omitted:
"foo\x0Abar"

# 指定dict与corpus
./fuzz_exec <path_to_corpus> -dict=<path_to_dict>  -runs=1000000 -max_total_time=3600
# 精简语料
./fuzz_exec -merge=1 <path_to_corpus_min> <path_to_corpus>

Mutating Multiple Inputs

通常，模糊测试引擎仅提供一路输入，而被测API往往有多个输入，那就需要我们对输入进行突变或者拆分，使其符合API接口参数输入。

libprotobuf-mutator（LPM）

https://github.com/google/libprotobuf-mutator/blob/master/src/libfuzzer/libfuzzer_macro.h

/ Registers the callback as a potential mutation performed on the parent
// message of a field. This must be called inside an initialization code block.
// libFuzzer suggests putting one-time-initialization in a function used to
// initialize a static variable inside the fuzzer target. For example:
//
// static bool Modify(
//     SomeMessage* message /* Fix or additionally modify the message */,
//     unsigned int seed /* If random generator is needed use this seed */) {
//   ...
// }
//
// DEFINE_PROTO_FUZZER(const SomeMessage& msg) {
//   static PostProcessorRegistration reg(&Modify);
// }
#define DEFINE_TEST_ONE_PROTO_INPUT_IMPL(use_binary, Proto)                 \
  extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) { \
    using protobuf_mutator::libfuzzer::LoadProtoInput;                      \
    Proto input;                                                            \
    if (LoadProtoInput(use_binary, data, size, &input))                     \
      TestOneProtoInput(input);                                             \
    return 0;                                                               \
  }

**注意：**此方法适用于任何复杂性的API和数据结构，但需要编写一个.proto定义，以将原始输入数据（data, size）转变为protobuf消息（由proto文件生成的自动生成的C++类）传递给您正在模糊处理的API（获得模糊的protobuf消息而不是data, size缓冲区）。

Magic separator

#include <cassert>
#include <cstdint>
#include <cstdio>
#include <cstdlib>
#include <cstring>

#include <algorithm>
#include <vector>

// Splits [data,data+size) into a vector of strings using a "magic" Separator.
std::vector<std::vector<uint8_t>> SplitInput(const uint8_t *Data, size_t Size,
                                     const uint8_t *Separator,
                                     size_t SeparatorSize) {
  std::vector<std::vector<uint8_t>> Res;
  assert(SeparatorSize > 0);
  auto Beg = Data;
  auto End = Data + Size;
  // Using memmem here. std::search may be harder for libFuzzer today.
  while (const uint8_t *Pos = (const uint8_t *)memmem(Beg, End - Beg,
                                     Separator, SeparatorSize)) {
    Res.push_back({Beg, Pos});
    Beg = Pos + SeparatorSize;
  }
  if (Beg < End)
    Res.push_back({Beg, End});
  return Res;
}

static volatile int *Nil = nullptr;

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t Size) {
  if (Size > 10) return 0;  // To make the test quick.
  const uint8_t Separator[] = {0xDE, 0xAD, 0xBE, 0xEF};
  auto Inputs = SplitInput(Data, Size, Separator, sizeof(Separator));
  std::vector<uint8_t> Fuzz({'F', 'u', 'z', 'z'});
  std::vector<uint8_t> Me({'m', 'e'});
  if (Inputs.size() == 2 && Inputs[0] == Fuzz && Inputs[1] == Me)
    *Nil = 42;  // crash.
  return 0;
}

Fuzzed Data Provider

FuzzedDataProvider（FDP）可用于将模糊输入拆分为各种类型的多个部分。在项目中使用该类可以直接#include <fuzzer/FuzzedDataProvider.h>即可，如果没有的化直接将该类拷贝至项目中。

// In addition to the comments below, the API is also briefly documented at
// https://github.com/google/fuzzing/blob/master/docs/split-inputs.md#fuzzed-data-provider
class FuzzedDataProvider {
 public:
  // |data| is an array of length |size| that the FuzzedDataProvider wraps to
  // provide more granular access. |data| must outlive the FuzzedDataProvider.
  FuzzedDataProvider(const uint8_t *data, size_t size)
      : data_ptr_(data), remaining_bytes_(size) {}
  ~FuzzedDataProvider() = default;

  // See the implementation below (after the class definition) for more verbose
  // comments for each of the methods.

  // Methods returning std::vector of bytes. These are the most popular choice
  // when splitting fuzzing input into pieces, as every piece is put into a
  // separate buffer (i.e. ASan would catch any under-/overflow) and the memory
  // will be released automatically.
  template <typename T> std::vector<T> ConsumeBytes(size_t num_bytes);
  template <typename T>
  std::vector<T> ConsumeBytesWithTerminator(size_t num_bytes, T terminator = 0);
  template <typename T> std::vector<T> ConsumeRemainingBytes();

  // Methods returning strings. Use only when you need a std::string or a null
  // terminated C-string. Otherwise, prefer the methods returning std::vector.
  std::string ConsumeBytesAsString(size_t num_bytes);
  std::string ConsumeRandomLengthString(size_t max_length);
  std::string ConsumeRandomLengthString();
  std::string ConsumeRemainingBytesAsString();

  // Methods returning integer values.
  template <typename T> T ConsumeIntegral();
  template <typename T> T ConsumeIntegralInRange(T min, T max);

  // Methods returning floating point values.
  template <typename T> T ConsumeFloatingPoint();
  template <typename T> T ConsumeFloatingPointInRange(T min, T max);

  // 0 <= return value <= 1.
  template <typename T> T ConsumeProbability();

  bool ConsumeBool();

  // Returns a value chosen from the given enum.
  template <typename T> T ConsumeEnum();

  // Returns a value from the given array.
  template <typename T, size_t size> T PickValueInArray(const T (&array)[size]);
  template <typename T> T PickValueInArray(std::initializer_list<const T> list);

  // Writes data to the given destination and returns number of bytes written.
  size_t ConsumeData(void *destination, size_t num_bytes);

  // Reports the remaining bytes available for fuzzed input.
  size_t remaining_bytes() { return remaining_bytes_; }

 private:
  FuzzedDataProvider(const FuzzedDataProvider &) = delete;
  FuzzedDataProvider &operator=(const FuzzedDataProvider &) = delete;

  void CopyAndAdvance(void *destination, size_t num_bytes);

  void Advance(size_t num_bytes);

  template <typename T>
  std::vector<T> ConsumeBytes(size_t size, size_t num_bytes);

  template <typename TS, typename TU> TS ConvertUnsignedToSigned(TU value);

  const uint8_t *data_ptr_;
  size_t remaining_bytes_;
};

提取单个值的方法

ConsumeBool，ConsumeIntegral，ConsumeIntegralInRange方法是用于提取单个布尔或整数值（具体类型由模板参数定义的）。
ConsumeProbability，ConsumeFloatingPoint，ConsumeFloatingPointInRange 方法提取浮点值。
ConsumeEnum， PickValueInArray从一组预定义的值（例如枚举或数组）中选择模糊输入。

提取字节序列的方法

其中许多方法都有长度参数。通过调用提供程序对象上的remaining_bytes()方法，您始终可以知道在该提供程序对象中还剩下多少字节。

ConsumeBytes和ConsumeBytesWithTerminator方法返回指定长度的std::vector
ConsumeBytesAsString方法返回指定长度的std::string
ConsumeRandomLengthString方法也返回一个随机长度的std::string，提供最大长度参数。
ConsumeRemainingBytes和ConsumeRemainingBytesAsString方法分别返回 std::vector和std::string对象，并使用模糊输入中未使用的所有字节进行初始化。
ConsumeData方法将指定字节数从模糊输入复制到给定的指针（void *destination）

eg1： net_verify_name_match_fuzzer将模糊输入分为两部分。

#include <stddef.h>
#include <stdint.h>

#include <vector>

#include "net/der/input.h"
#include "third_party/libFuzzer/src/utils/FuzzedDataProvider.h"

// Entry point for LibFuzzer.
extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) {
  FuzzedDataProvider fuzzed_data(data, size);

  // Intentionally using uint16_t here to avoid empty |second_part|.
  size_t first_part_size = fuzzed_data.ConsumeIntegral<uint16_t>();
  std::vector<uint8_t> first_part =
      fuzzed_data.ConsumeBytes<uint8_t>(first_part_size);
  std::vector<uint8_t> second_part =
      fuzzed_data.ConsumeRemainingBytes<uint8_t>();

  net::der::Input in1(first_part.data(), first_part.size());
  net::der::Input in2(second_part.data(), second_part.size());
  bool match = net::VerifyNameMatch(in1, in2);
  bool reverse_order_match = net::VerifyNameMatch(in2, in1);
  // Result should be the same regardless of argument order.
  CHECK_EQ(match, reverse_order_match);
  return 0;
}

eg2: net_http2_frame_decoder_fuzzer 读取小块数据，以模拟来自网络连接的一系列帧。

#include <stddef.h>
#include <stdint.h>

#include <list>
#include <vector>

#include "net/third_party/quiche/src/http2/decoder/http2_frame_decoder.h"
#include "third_party/libFuzzer/src/utils/FuzzedDataProvider.h"

// Entry point for LibFuzzer.
extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) {
  FuzzedDataProvider fuzzed_data_provider(data, size);
  http2::Http2FrameDecoder decoder;

  // Store all chunks in a function scope list, as the API requires the caller
  // to make sure the fragment chunks data is accessible during the whole
  // decoding process. |http2::DecodeBuffer| does not copy the data, it is just
  // a wrapper for the chunk provided in its constructor.
  std::list<std::vector<char>> all_chunks;
  while (fuzzed_data_provider.remaining_bytes() > 0) {
    size_t chunk_size = fuzzed_data_provider.ConsumeIntegralInRange(1, 32);
    all_chunks.emplace_back(
        fuzzed_data_provider.ConsumeBytes<char>(chunk_size));
    const auto& chunk = all_chunks.back();

    // http2::DecodeBuffer constructor does not accept nullptr buffer.
    if (chunk.data() == nullptr)
      continue;

    http2::DecodeBuffer frame_data(chunk.data(), chunk.size());
    decoder.DecodeFrame(&frame_data);
  }
  return 0;
}

eg3: net_crl_set_fuzzer初始化了多个参数，并将其余的fuzz输入用作主参数

#include <stddef.h>
#include <stdint.h>

#include "net/cert/crl_set.h"
#include "third_party/libFuzzer/src/utils/FuzzedDataProvider.h"

extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) {
  if (size < 32 + 32 + 20)
    return 0;

  FuzzedDataProvider data_provider(data, size);
  std::string spki_hash = data_provider.ConsumeBytesAsString(32);
  std::string issuer_hash = data_provider.ConsumeBytesAsString(32);
  size_t serial_length = data_provider.ConsumeIntegralInRange(4, 19);
  std::string serial = data_provider.ConsumeBytesAsString(serial_length);
  std::string crlset_data = data_provider.ConsumeRemainingBytesAsString();

  scoped_refptr<net::CRLSet> out_crl_set;
  net::CRLSet::Parse(crlset_data, &out_crl_set);

  if (out_crl_set) {
    out_crl_set->CheckSPKI(spki_hash);
    out_crl_set->CheckSerial(serial, issuer_hash);
    out_crl_set->IsExpired();
  }

  return 0;
}

eg4：net_parse_cookie_line_fuzzer用于稍微复杂一些的模糊目标，它使用模糊输入初始化的不同参数来模拟不同的动作

// Copyright 2016 The Chromium Authors. All rights reserved.
// Use of this source code is governed by a BSD-style license that can be
// found in the LICENSE file.

#include <stddef.h>
#include <stdint.h>

#include "base/logging.h"
#include "net/cookies/parsed_cookie.h"
#include "third_party/libFuzzer/src/utils/FuzzedDataProvider.h"

const std::string GetArbitraryString(FuzzedDataProvider* data_provider) {
  // Adding a fudge factor to kMaxCookieSize so that both branches of the bounds
  // detection code will be tested.
  return data_provider->ConsumeRandomLengthString(
      net::ParsedCookie::kMaxCookieSize + 10);
}

// Entry point for LibFuzzer.
extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) {
  FuzzedDataProvider data_provider(data, size);
  const std::string cookie_line = GetArbitraryString(&data_provider);
  net::ParsedCookie parsed_cookie(cookie_line);

  // Call zero or one of ParsedCookie's mutator methods.  Should not call
  // anything other than SetName/SetValue when !IsValid().
  const uint8_t action = data_provider.ConsumeIntegralInRange(0, 10);
  switch (action) {
    case 1:
      parsed_cookie.SetName(GetArbitraryString(&data_provider));
      break;
    case 2:
      parsed_cookie.SetValue(GetArbitraryString(&data_provider));
      break;
  }

  if (parsed_cookie.IsValid()) {
    switch (action) {
      case 3:
        if (parsed_cookie.IsValid())
          parsed_cookie.SetPath(GetArbitraryString(&data_provider));
        break;
      case 4:
        parsed_cookie.SetDomain(GetArbitraryString(&data_provider));
        break;
      case 5:
        parsed_cookie.SetExpires(GetArbitraryString(&data_provider));
        break;
      case 6:
        parsed_cookie.SetMaxAge(GetArbitraryString(&data_provider));
        break;
      case 7:
        parsed_cookie.SetIsSecure(data_provider.ConsumeBool());
        break;
      case 8:
        parsed_cookie.SetIsHttpOnly(data_provider.ConsumeBool());
        break;
      case 9:
        parsed_cookie.SetSameSite(GetArbitraryString(&data_provider));
        break;
      case 10:
        parsed_cookie.SetPriority(GetArbitraryString(&data_provider));
        break;
    }
  }

  // Check that serialize/deserialize inverse property holds for valid cookies.
  if (parsed_cookie.IsValid()) {
    const std::string serialized = parsed_cookie.ToCookieLine();
    net::ParsedCookie reparsed_cookie(serialized);
    const std::string reserialized = reparsed_cookie.ToCookieLine();

    // RFC6265 requires semicolons to be followed by spaces. Because our parser
    // permits this rule to be broken, but follows the rule in ToCookieLine(),
    // it's possible to serialize a string that's longer than the original
    // input. If the serialized string exceeds kMaxCookieSize, the parser will
    // reject it. For this fuzzer, we are considering this situation a false
    // positive.
    if (serialized.size() <= net::ParsedCookie::kMaxCookieSize) {
      CHECK(reparsed_cookie.IsValid());
      CHECK_EQ(serialized, reserialized);
    }
  }

  return 0;
}

基于哈希的参数

如果您的API接受带有数据和某个整数值（即标志的按位组合）的缓冲区，则可以从（data, size）计算哈希值，并使用它来模糊附加的整数参数。例如：

extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) {
  std::string str = std::string(reinterpret_cast<const char*>(data), size);
  std::size_t data_hash = std::hash<std::string>()(str);
  APIToBeFuzzed(data, size, data_hash);
  return 0;
}

下一篇： Linux Tools Quick Tutorial→