Andrzej unjello Lichnerowicz

Rust Macros: A Cautionary Tale

2024-11-09T18:41:15+02:00

I pitched the idea of rewriting one of the Kubernetes services I work with in Rust, and I got the green light. Long story short, I’ve been writing a lot of Rust lately. It’s obviously not Zig (<3), but it’s way better than some alternatives :)

One of the trickiest parts of Rust – even for experienced Rustaceans – is definitely the macro system. So, how did I find myself neck-deep in Rust macros? I was building a tool that needed to communicate with a few internal services, each with its own client connection pattern. The goal was to create multiple clients that followed a similar structure: connect to a service and fetch data. This seemed like a perfect use case for a Trait.

But here’s the catch: each service had its own connection requirements. Some used Basic Auth, others relied on Bearer Tokens, and a few needed full OAuth Authorization. Some required a username and password, some wanted a token and username, and others just needed a token. It was a bit all over the place. This felt like the ideal situation for the builder pattern set up each client with a builder, and then interact with them through a Trait.

My initial drafts, though, were repetitive and clunky. The solution? Macros!

Overview of Existing Crates

First, I looked for crates I could reuse. There’s an excellent and popular crate, derive_builder, and another one, builder_macro. Unfortunately, neither of them did exactly what I had in mind. I wanted a feature where if I defined a field as an Option in my struct, it would be optional. This might be a questionable pattern, but I wanted to defer calculating certain default values until later.

At first glance, derive_builder looked perfect because it has a setting to strip the Option from fields:

use derive_builder::Builder;

#[derive(Builder, Default, Debug)]
#[builder(setter(strip_option))]
pub struct Client {
    pub required: bool,
    pub optional: Option<bool>
}

fn main() {
    let client = ClientBuilder::default()
        .required(true)
        .build()
        .unwrap();
    print!("{:?}", client);
}

But when I looked at the generated code:

pub fn build(
    &self,
) -> ::derive_builder::export::core::result::Result<Client, ClientBuilderError> {
    Ok(Client {
        // [...]
        optional: match self.optional {
            Some(ref value) => {
                ::derive_builder::export::core::clone::Clone::clone(value)
            }
            None => {
                return ::derive_builder::export::core::result::Result::Err(
                    ::derive_builder::export::core::convert::Into::into(
                        ::derive_builder::UninitializedFieldError::from("optional"),
                    ),
                );
            }
        },
    })
}

It turns out that stripping the Option makes the field required, which was not what I wanted at all. The other crate, builder_macro, also didn’t work for me; it kept Option in the setters, creating code like .with_optional(Some(true)), which felt messy. So, of course, I thought, “I should write a macro to automate this!”

I should write a program automating it!

Type Alias and Struct Limitations in Macros

One of the things I initially hoped was that Rust might allow me to use macros to return a type, something like:

type Builder = builder!({
    example: bool,
})

Unfortunately, as described in The Rust Reference, type aliases can only create new names for existing types within the scope they are defined, so they can’t represent a local struct within a macro. This also means we can’t use a type alias to reference the struct in a way that allows direct initialization, like:

let _ = Builder();

Rust doesn’t support defining new types (structs or otherwise) inline within macros, which limits how we can structure things. While this isn’t a huge problem, it’s something to keep in mind while designing complex macro patterns.

Macro-by-Example

My first approach was pretty direct…and it failed spectacularly. So I took a step back and actually read through the documentation—the excellent The Little Book of Rust Macros. If you read it carefully, it explains everything you need to know, though without prior experience (and failures), it’s easy to overlook details.

Here’s what my first attempt looked like:

macro_rules! builder {
    (@builder_field_type Option<$ftype:ty>) => { Option<$ftype> };
    (@builder_field_type $ftype:ty) => { Option<$ftype> };
    ($builder:ident -> $client:ident { $( $fname:ident: $ftype:ty+ $(,)? )* }) => {
        #[derive(Debug)]
        pub struct $client {
            $( $fname: $($ftype)+, )*
        }

        #[derive(Debug)]
        pub struct $builder {
            $( $fname: $crate::builder!(@builder_field_type $($ftype)+), )*
        }
    };
}

builder!(Builder -> Client {
    example: bool,
    other: String
})

This approach uses a neat trick called Internal Rules, which allows calling a macro within itself without polluting the global namespace. It also features a Builder -> Client syntax I borrowed from builder_macro. I liked this syntax because it was a good compromise between readability and macro hygiene.

Unfortunately, it didn’t work. Rust treats matched macro patterns like ty as opaque, meaning once they’re matched, they can’t be broken down further. So instead of $fname:ty, we’d need to use $($fname:tt)+, but then our expression would become greedy and capture tokens incorrectly. The solution? Use [], {}, or () as a natural boundary, like this:

macro_rules! builder {
    (@builder_field_type Option<$ftype:ty>) => { Option<$ftype> };
    (@builder_field_type $ftype:ty) => { Option<$ftype> };
    ($builder:ident -> $client:ident { $( $fname:ident{$($ftype:tt)+} $(,)? )* }) => {
        #[derive(Debug)]
        pub struct $client {
            $( $fname: $($ftype)+, )*
        }

        #[derive(Debug)]
        pub struct $builder {
            $( $fname: $crate::builder!(@builder_field_type $($ftype)+), )*
        }
    };
}

builder!(Builder -> Client{
  example{bool},
  other{Option<String>}
})

Here, I used {} as a delimiter because it resembles C++11 initializer syntax.

The next hurdle was generating methods like with_example from field named example. Rust macros don’t allow identifier concatenation, and concat_idents, available in nightly, doesn’t allow function definitions. Thankfully, there’s a crate called paste that can handle this. The final result was satisfying:

macro_rules! builder {
    (@builder_field_type Option<$ftype:ty>) => { Option<$ftype> };
    (@builder_field_type $ftype:ty) => { Option<$ftype> };
    (@builder_field_setter_type Option<$ftype:ty>) => { $ftype };
    (@builder_field_setter_type $ftype:ty) => { $ftype };
    (@builder_unwrap_field $self:ident $fname:ident Option<$ftype:ty>) => { $self.$fname.clone() };
    (@builder_unwrap_field $self:ident $fname:ident $ftype:ty) => { $self.$fname.clone().ok_or_else(|| format!("Field '{}' is required", stringify!($fname)))? };
    ($builder:ident -> $client:ident { $( $fname:ident{$($ftype:tt)+} $(,)? )* }) => {
        #[derive(Debug)]
        pub struct $client {
            $( $fname: $($ftype)+, )*
        }

        #[derive(Debug)]
        pub struct $builder {
            $( $fname: $crate::builder!(@builder_field_type $($ftype)+), )*
        }

        impl $builder {
            $(
                paste::paste! {
                pub fn [<with_ $fname>](&mut self, $fname: $crate::builder!(@builder_field_setter_type $($ftype)+)) -> &mut Self {
                    self.$fname = Some($fname);
                    self
                }
                }
            )*

            pub fn build(&self) -> Result<$client, std::boxed::Box<dyn std::error::Error>> {
                Ok($client {
                    $( $fname: $crate::builder!(@builder_unwrap_field self $fname $($ftype)+), )*
                })
            }
        }

        impl $client {
            pub fn builder() -> $builder {
                $builder {
                    $( $fname: None, )*
                }
            }
        }
    };
}

builder!(Builder -> Client{
    field: bool,
});

fn main() {
    let client = Client::builder().with_field(true).build().unwrap();
    assert(client.field);
}

Procedural Macros

With macro-by-example working, I decided to tackle proc-macros. Right now, you’re probably like - but why? And I don’t have a good answer to that except - why not? :) To be honest I’d been kind of wary of them, but it felt like the right time to dive in, since I already dived into the topic. I also wondered what would be the better approach. How will it be different? And I already had a pretty good use-case.

Proc-macros need to live in a separate crate, and they essentially take in a TokenStream and return another TokenStream, so they allow for a bit more flexibility. Here’s what a basic proc-macro looks like:

use proc_macro::TokenStream;
use quote::quote;

#[proc_macro_derive(Builder)]
pub fn builder_derive(input: TokenStream) -> TokenStream {
    // ...
    let expanded = quote! {
        #[derive(Debug)]
        pub struct Builder {
        }
    };
    TokenStream::from(expanded)
}

Now, in our main crate, we could do something like this:

use builder_option::Builder;
#[derive(Debug, Builder)]
pub struct Client {
    field: bool,
}

In this example, the Builder macro would generate a Builder struct for us automatically. Since we have access to the entire TokenStream, we can dynamically create names and fields for the Builder struct based on the struct being derived.

One neat thing about proc-macros is that we can programmatically build names based on the struct we’re deriving. For example, let’s dynamically generate the name of our builder struct by appending “Builder” to the name of the original struct:

pub fn builder_derive(input: TokenStream) -> TokenStream {
    let input = parse_macro_input!(input as DeriveInput);

    let name = input.ident;
    let builder_name = Ident::new(&format!("{}Builder", name), Span::def_site());
    let expanded = quote! {
        #[derive(Debug)]
        pub struct #builder_name {
        }
    };

    TokenStream::from(expanded) 
}

To make our builder actually useful, we’ll want to populate the Builder struct with the same fields as the original struct, but with a twist: if a field is an Option, we’ll strip the Option to have a nice setter syntax. Here’s how we can achieve that:

1.	First, let’s make sure the macro is only applied to structs and extract their named fields.
2.	Then, we’ll check each field’s type and strip Option if needed.

Here’s what that looks like in code:

    let fields = match input.data {
        Data::Struct(ref data_struct) => match data_struct.fields {
            Fields::Named(ref fields_named) => &fields_named.named,
            _ => {
                return syn::Error::new_spanned(
                    &data_struct.fields,
                    "Builder can only be derived for structs with named fields",
                )
                .to_compile_error()
                .into();
            }
        },
        _ => {
            return syn::Error::new_spanned(&name, "Builder can only be derived for structs")
                .to_compile_error()
                .into();
        }
    };
    
    fn inner_type(ty: &Type) -> (bool, &Type) {
        if let Type::Path(type_path) = ty {
            if let Some(segment) = type_path.path.segments.first() {
                if segment.ident == "Option" {
                    if let syn::PathArguments::AngleBracketed(ref angle_bracketed) =
                        segment.arguments
                    {
                        if let Some(syn::GenericArgument::Type(ref inner_ty)) =
                            angle_bracketed.args.first()
                        {
                            return inner_ty;
                        }
                    }
                }
            }
        }
        ty
    }

    let builder_fields = fields.iter().map(|f| {
        let name = &f.ident;
        let ty = &f.ty;
        let inner_ty = inner_type(ty);

        quote! {
            #name: std::option::Option<#inner_ty>,
        }
    });
    

Finally, we can implement a build method on the builder, which checks each field and either takes the provided value or returns an error if a required field is missing:

#[proc_macro_derive(Builder)]
/// A macro to create a corresponding builder for an annotated struct
pub fn builder_derive(input: TokenStream) -> TokenStream {
    let input = parse_macro_input!(input as DeriveInput);

    let name = input.ident;
    let builder_name = Ident::new(&format!("{}Builder", name), Span::def_site());

    let fields = match input.data {
        Data::Struct(ref data_struct) => match data_struct.fields {
            Fields::Named(ref fields_named) => &fields_named.named,
            _ => {
                return syn::Error::new_spanned(
                    &data_struct.fields,
                    "Builder can only be derived for structs with named fields",
                )
                .to_compile_error()
                .into();
            }
        },
        _ => {
            return syn::Error::new_spanned(&name, "Builder can only be derived for structs")
                .to_compile_error()
                .into();
        }
    };

    fn inner_type(ty: &Type) -> (bool, &Type) {
        if let Type::Path(type_path) = ty {
            if let Some(segment) = type_path.path.segments.first() {
                if segment.ident == "Option" {
                    if let syn::PathArguments::AngleBracketed(ref angle_bracketed) =
                        segment.arguments
                    {
                        if let Some(syn::GenericArgument::Type(ref inner_ty)) =
                            angle_bracketed.args.first()
                        {
                            return (true, inner_ty);
                        }
                    }
                }
            }
        }
        (false, ty)
    }
    let builder_fields = fields.iter().map(|f| {
        let name = &f.ident;
        let ty = &f.ty;
        let (_, inner_ty) = inner_type(ty);

        quote! {
            #name: std::option::Option<#inner_ty>,
        }
    });

    let builder_init = fields.iter().map(|f| {
        let name = &f.ident;
        quote! {
            #name: None,
        }
    });

    let build_fields = fields.iter().map(|f| {
        let name = &f.ident;
        let ty = &f.ty;
        let field_name_str = name.as_ref().unwrap().to_string();
        let (is_option, _) = inner_type(ty);

        if is_option {
            quote! {
                #name: self.#name.clone(),
            }
        } else {
            quote! {
                #name: self.#name.clone().ok_or_else(|| format!("Field '{}' is required", #field_name_str))?,
            }
        }
    });

    let builder_methods = fields.iter().map(|f| {
        let name = &f.ident;
        let ty = &f.ty;
        let (_, inner_ty) = inner_type(ty);
        let method_name = syn::Ident::new(
            &format!("with_{}", name.as_ref().unwrap()),
            Span::call_site(),
        );

        quote! {
            pub fn #method_name(&mut self, #name: #inner_ty) -> &mut Self {
                self.#name = std::option::Option::Some(#name);
                self
            }
        }
    });

    let expanded = quote! {
        #[derive(Debug)]
        pub struct #builder_name {
             #(#builder_fields)*
        }

        impl #builder_name {
           #(#builder_methods)*

           pub fn build(&self) -> Result<#name, std::boxed::Box<dyn std::error::Error>> {
                Ok(#name {
                    #(#build_fields)*
                })
           }
        }

        impl #name {
            pub fn builder() -> #builder_name {
                #builder_name {
                    #(#builder_init)*
                }
            }
        }
    };

    TokenStream::from(expanded)
}

This final version of our proc-macro dynamically generates a builder struct and methods for each field, handling optional fields as well. Now, we have a fully functional builder for any struct decorated with #[derive(Builder)], thanks to the flexibility of proc-macros!

Packaging Both Macros

To package both the macro-by-example and the proc-macro versions, I created two crates and made the proc-macro a dependency of the main crate, conditionally enabled by a feature flag:

package]
name = "builder_option"
version = "0.1.0"
edition = "2021"

[dependencies]
paste = "1.0.15"
builder_option_derive = { version = "0.1.0", path = "../builder_option_derive", optional = true }

[features]
derive = ["builder_option_derive"]

In lib.rs, we can conditionally export the appropriate macro depending on the feature flag:

#[cfg(not(feature = "derive"))]
#[macro_export]
/// Macro to declare a struct and a corresponding builder.
macro_rules! builder {
    // ...
}

#[cfg(feature = "derive")]
/// re-export the `Builder` macro from `builder_option_derive` here if a `derive` feature is enabled
pub use builder_option_derive::Builder;

This setup allows us to use both types of macros seamlessly: users can either use the macro-by-example builder pattern directly or enable the derive feature to access the proc-macro.

Wrapping Up

There are still a few advanced concepts I haven’t fully grasped, like all of the intricacies of builder_macro, but I’m now far more comfortable diving into Rust’s macro system. The journey through both macro-by-example and proc-macro approaches was challenging, but the flexibility and power of Rust macros are impressive once you get the hang of it.

I hope this helps someone who’s just starting out with Rust macros or needs to generate some repetitive code more elegantly. Happy coding, and remember: sometimes the macro rabbit hole is worth it!

Comments

Discussion powered by , hop in. if you want.