Finding three bugs in the markdown crate with cargo-fuzz

created on 2022-01-08

Fuzzing is the act of testing software with a “fuzzer” which generates random or possibly invalid input, in order to find bugs like OOM conditions, crashes, or other errors where the input is expected to fit a specific format. 

It's best used with parsers; one example is the markdown crate which I've found two different bugs within (though they are similar). Both of these bugs cause panicking of the thread, which is a pretty big problem. I've submitted issues for both of these fuzzing errors, and I'll go over them here, then I'll show how I found them. 

First markdown parsing bug

fn main() {
    markdown::to_html("- ");
}

This code looks pretty straight-forward. A simple beginning bullet point and nothing else. Let's look at the error.

matt@matt ~/r/m/src (main) [101]> RUST_BACKTRACE=full mold -run cargo r
    Finished dev [unoptimized + debuginfo] target(s) in 0.02s
     Running `/home/matt/rust/markdown-crash/target/debug/markdown-crash`
thread 'main' panicked at 'index out of bounds: the len is 0 but the index is 0', /home/matt/.cargo/registry/src/github.com-1ecc6299db9ec823/markdown-0.3.0/src/parser/block/unordered_list.rs:84:44

If you want to read the full backtrace (RUST_BACKTRACE=full), generally for each item in the list you want to find the code that's in the project directory, or in .cargo/registry if it's a library crashing. Above I just included the first line (backtrace=full not needed). Let's take a look at the offending code.

    let mut list_contents = vec![];

    for c in contents {
        if is_paragraph || c.len() > 1 {
            list_contents.push(ListItem::Paragraph(c));
        } else if let Paragraph(content) = c[0].clone() {       //LINE 84
            list_contents.push(ListItem::Simple(content));
        }
    }

contents is a Vec<Vec>, or a list of list of blocks. So, for each list of blocks c within the list of lists, if the is_paragraph bool has been flipped or if c's length is greater than 1, push it as a paragraph. Then to the problem line: if the first item of c (which would be a single block within the list of blocks) is a paragraph, push a list item containing the content. 

Again, the iterator here iterates over a Vec<Vec>, meaning each time it runs, c is a list. When this code is ran with the erroneous input, the iterator runs once, because a list was started yet never finished. So the loop runs once with an empty Vec as its input. This means when line 84 asks for c[0], there's nothing to give; there is no first item in the list. The solution is to limit that iterator to lists that aren't empty. This is a standard out-of-bounds error though it's very common when parsing untrusted input. A parser like this should never panic, though: it should always throw an error or simply return erroneous input. Let's move onto the…

Second markdown parsing bug

Oddly enough, when returning to this bug for this post, I was unable to reproduce this specific bug. However, I did find a different one!

fn main() {
    markdown::to_html("0 ");
}

Okay, so at first glance, this looks similar. Let's look at the crash message, except this time, excluding the RUST_BACKTRACE=full environment variable.

thread 'main' panicked at 'index out of bounds: the len is 0 but the index is 0', /home/matt/.cargo/registry/src/github.com-1ecc6299db9ec823/markdown-0.3.0/src/parser/block/ordered_list.rs:88:44

Hmm, another out of bounds in a different place (see the ordered_list.rs which was previously in unordered_list.rs) I suspect this error is the same: an iterator over a list of lists, of which there is one list, of which is empty. Let's look at the code.

    for c in contents {
        if is_paragraph || c.len() > 1 {
            list_contents.push(ListItem::Paragraph(c));
        } else if let Paragraph(content) = c[0].clone() {
            list_contents.push(ListItem::Simple(content));
        }
    }

Yep; it's the same code except in a different module. Let's try a quick fix:

    for c in contents.into_iter().filter(|x| !x.is_empty()) {
        if is_paragraph || c.len() > 1 {
            list_contents.push(ListItem::Paragraph(c));
        } else if let Paragraph(content) = c[0].clone() {
            list_contents.push(ListItem::Simple(content));
        }
    }

This converts contents into an iterator. Notice the usage of into_iter(), which converts contents rather than returning an iter over the slice (which would make c a &Vec<Block>); we want to consume contents so the values are not borrowed. Then, it filters the list of lists to only iterate over lists that are not empty. Let's test: 

matt@matt ~/r/markdown-crash (main) [101]> mold -run cargo r
   Compiling markdown v0.3.0 (/home/matt/markdown.rs)
   Compiling markdown-crash v0.1.0 (/home/matt/rust/markdown-crash)
    Finished dev [unoptimized + debuginfo] target(s) in 1.50s
     Running `target/debug/markdown-crash`
<ol></ol>

Hooray! The bug is fixed. This filter also fixes the previous bug.

How I found these bugs

  1. Follow the instructions given here: https://rust-fuzz.github.io/book/cargo-fuzz/tutorial.html. The book tutorial can explain it far better than I can.
  2. Then, begin fuzzing. Run cargo fuzz run fuzz_target_1.

My setup: I've got three windows open. On the left is VSCodium open in the source directory. On the right, I've got one window on the top running cargo fuzz and on the bottom inside of a test project that simply takes the crashing input and runs it (so I can get a full error with offending line of code and all). 

Whenever a crash is found, I try to minimize it with cargo fuzz tmin fuzz_target_1 fuzz/artifacts/fuzz_target_1/(file). This tries messing with the input to find the smallest input that still crashes; this often reduces it from some gibberish to something that probably is easier to understand. 

Here's one input that crashes; after minimizing it, it turns into [;]:

Then, I enter the input into the project on the bottom right and running it so Rust can give me a more accurate error when it crashes (libFuzzer doesn't show me line where it crashed).

fn main() {
    markdown::to_html("[;]:");
}
   Compiling markdown-crash v0.1.0 (/home/matt/rust/markdown-crash)
    Finished dev [unoptimized + debuginfo] target(s) in 0.28s
     Running `target/debug/markdown-crash`
thread 'main' panicked at 'index out of bounds: the len is 1 but the index is 1', /home/matt/markdown.rs/src/parser/block/link_reference.rs:27:92

Here, I can see where the code crashed. Let's look at the code.

This looks like another standard out-of-bounds; let's try adding a simple if lines.len() < 2 { return None } before the offending line. Of course, there's better ways to fix it, but simply to identify that this is the crash for now.

Yep, that fixes it. 

Next, I'll try running cargo-fuzz on a PR I got that refactors a lot of parsing logic, and uploading a video of me fuzzing and fixing the bugs until the fuzzer runs for a few minutes unperturbed. Until next time!