My First Microsoft PR

2 min read

Cover image for My First Microsoft PR

Last week something cool happened — my pull request got merged into microsoft/RustTraining, a 14k+ star Rust training repo maintained by Microsoft engineers.


starter

The repo has a cargo xtask serve command — basically a tiny HTTP server you run locally to preview the docs. The file-serving code looked like this:

RUST
let mut file_path = site.join(path.trim_start_matches('/'));
if file_path.is_dir() {
    file_path = file_path.join("index.html");
}

if file_path.is_file() {
    let body = fs::read(&file_path).unwrap_or_default();
    // serve it...
}

It works for the happy path. But it had three problems I wanted to fix.


fix 1: URLs with spaces (and weird characters) were silently breaking

When you click a link in a browser, the URL might look like /path%20with%20spaces. That %20 is just how browsers encode a space — it's called percent-encoding. The old code passed the raw URL string directly to the file system, so it would try to literally find a file named path%20with%20spaces instead of path with spaces. Not found. Silent 404.

I wrote a small decoder for this:

RUST
fn hex_val(c: u8) -> Option<u8> {
    match c {
        b'0'..=b'9' => Some(c - b'0'),
        b'a'..=b'f' => Some(c - b'a' + 10),
        b'A'..=b'F' => Some(c - b'A' + 10),
        _ => None,
    }
}

fn percent_decode_path(input: &str) -> String {
    let mut decoded = Vec::with_capacity(input.len());
    let b = input.as_bytes();
    let mut i = 0;
    while i < b.len() {
        if b[i] == b'%' && i + 2 < b.len() {
            if let (Some(hi), Some(lo)) = (hex_val(b[i + 1]), hex_val(b[i + 2])) {
                decoded.push(hi << 4 | lo);
                i += 3;
                continue;
            }
        }
        decoded.push(b[i]);
        i += 1;
    }
    String::from_utf8_lossy(&decoded).into_owned()
}

It walks through the URL byte by byte. When it sees a % followed by two hex digits, it converts those into the actual character. Everything else just passes through unchanged. No external crate needed.


fix 2: Nothing was stopping path traversal

Path traversal is when someone requests something like /../../../etc/passwd — basically using .. to climb up out of the folder you're supposed to be serving from. For a local dev server it's not exactly a crisis, but it's still sloppy to leave open.

The old code just did site.join(path) and trusted that the path was fine. I added a function that walks through each segment of the path and bails out the moment it sees ..:

RUST
fn resolve_site_file(site_canon: &Path, request_target: &str) -> Option<PathBuf> {
    let path_only = request_target.split('?').next()?.split('#').next()?;
    let decoded = percent_decode_path(path_only);

    if decoded.as_bytes().contains(&0) {
        return None;
    }

    let rel = decoded.trim_start_matches('/');
    let mut file_path = site_canon.to_path_buf();

    if !rel.is_empty() {
        for seg in rel.split('/').filter(|s| !s.is_empty()) {
            if seg == ".." {
                return None;
            }
            file_path.push(seg);
        }
    }

    if file_path.is_dir() {
        file_path.push("index.html");
    }

    let real = fs::canonicalize(&file_path).ok()?;
    if !real.starts_with(site_canon) {
        return None;
    }

    real.is_file().then_some(real)
}

There's also fs::canonicalize at the end that's just a standard OS call that resolves the "real" absolute path of a file, following any symlinks along the way. A symlink is like a shortcut on your desktop — it looks like a file in one place but actually points somewhere else. Without canonicalization, a symlink inside site/ could point outside it and we'd happily serve whatever it pointed to. After canonicalizing, we do a starts_with check to confirm the final resolved path is still inside site/. If it isn't, we return None.


the handler became much cleaner

After pulling the logic into resolve_site_file, the actual request handler went from this:

RUST
let mut file_path = site.join(path.trim_start_matches('/'));
if file_path.is_dir() {
    file_path = file_path.join("index.html");
}
if file_path.is_file() {

to this:

RUST
if let Some(file_path) = resolve_site_file(&site_canon, path) {

All the defensive stuff lives in one place now. The function returns Option<PathBuf> either it found a valid safe file, or it didn't and Rust makes you handle both cases at the callsite.


That's it. Not a huge change, but it's the kind of thing that makes a codebase a little more solid. Pretty happy it got merged.