Have Data, Need Iterator

As part of the Embedded Demo Project I am implementing a more full-featured i2c driver for the STM32. It would be nice if the driver’s write function can just operate with any iterable container. Give the write operation the destination address and a series of bytes, and bam… you have a blinky LED or maybe data stored to an SD card.

Communication on slow serial interfaces like i2c benefits from hardware accelerators. The particulars of these accelerators vary but the generic version is:

configure hardware/communication particulars
load data into the output buffers
tell the hardware to go

It is not uncommon that you want to write more data than the output buffer holds at once. As such, the software iterates through the data and loads the output buffer with each chunk. This is where DMA comes in handy … but that will be a future post.

In our i2c write example, we will:

set up the controller
write 1 byte to the output buffer
wait for an event to indicate the output buffer is empty
return to step 2 until all data is written

Now I hear you saying, “I thought you already demonstrated this in your CppCon 2025 Groov talk.” In that talk I took advantage of knowing the size of the data array at compile-time. I was able to simply construct the sender chain at compile-time with the proper number of writes:

auto values = stdx::bit_unpack<std::uint8_t>(data);
auto write_data =
    std::apply([](auto ... v, auto last) {
        return
            (async::seq(write_byte(v))
             | ...
             |async::seq(write_last_byte(last) ));
    }, values);

This is great because the entire write chain is composed at compile-time, offering additional optimizations. But what if we don’t know the size at compile-time? What if all we have is the start and end iterators? What can we do?

If this were plain-old sequential code, we might write something like:

while (iter != iter_end) {
   write_byte(*iter);
   ++iter;
}

This is great if we don’t mind blocking with our write_byte operation but in our event-driven system that needs to handle many activities, this is going to be a non-starter.

Asynchronous Functions

Senders provide a nice abstraction so that we can reason about asynchronous functions.

  A | B | C

A is followed by running B with the results of A which is followed by running C with the results of B. Anyone who has dealt with a Unix shell will feel at home. As the user of this construct we don’t need to think about the asynchronicity (more on this in a future blog post).

We are left with some questions for a sender chain:

how are we going to loop?
how will we evaluate when we are done looping?
how do we control the scope of variables?

What seems trivial in our sequential code now seems a bit more daunting.

Write, Increment, Repeat

Perusing the Intel C++ Baremetal Senders and Receivers documentation, we will notice the algorithm repeat. This seems promising.

auto send_data = write_byte() | repeat();

If we can:

get the current data word to write_byte
increment to the next data word
repeat if we aren’t at the end of the data list

then this might be a direction.

There are three variations of repeat:

repeat : repeats forever
repeat_n : repeats N number of times, where N is known at the time of construction
repeat_until : repeats until the passed callable returns true

It seems that we might be able to construct a solution with either repeat_n or repeat_until. Let’s try both and see what the differences are in ease-of-writing and generated code.

`repeat_n`

The repeat_n algorithm will run the sender N+1 times. It will repeat N times. So if we wanted the chain to run 5 times we might have something like:

auto send_data = write_byte() | repeat_n(4);

Or maybe we could have:

auto send_data(auto iter, auto iter_end) {
    auto d = std::distance(iter, iter_end);
    return write_byte() | repeat_n(d);
}

This certainly looks like it will do the repeating job.

`repeat_until`

The repeat_until algorithm takes a predicate that will determine when to stop. Using our previous example:

auto send_data(auto iter, auto iter_end) {
    return write_byte() | repeat_until([&](){return ++iter == iter_end;});
}

This might seem nice at first glance but senders are lazily evaluated which means references to iter and iter_end are dangling. We somehow need to get the state into the sender chain so that it stays in scope.

Welcome to structured concurrency. As we build up a sender chain, we want to put the state inside the chain.

auto send_data(auto iter, auto iter_end) {
    return let_value([iter, iter_end]() mutable {
            return
                write_byte()
              | repeat_until([&](){return ++iter == iter_end;})
              ;
    });
}

let_value takes a callable that will return a sender when called. The callable is copied into the operation state when the sender connects to the receiver. This means the callable will stay in scope during the lifetime of the sender chain execution.

In the above example, iter and iter_end are captured into the closure object which is stored in the op-state of the let_value. repeat_until is referencing the captured data.

This is pretty nice and also a bigger hammer than we need right now. When the callable to let_value takes no arguments it is usually an indicator that the less-powerful and lighter-weight sequence can be used:

auto send_data(auto iter, auto iter_end) {
    return sequence([iter, iter_end]() mutable {
            return
                write_byte()
              | repeat_until([&](){return ++iter == iter_end;})
              ;
    });
}

Another way to capture values in the sender chain is to use just.

auto send_data(auto iter, auto iter_end) {
    return
       just(iter, iter_end)
     | let_value([](auto & iter, auto & iter_end) {
           return
              write_byte()
            | repeat_until([&](){return ++iter == iter_end;})
            ;
       })
     ;
}

Capturing the values in the just op-state is probably more canonical in the S/R world. The resulting codegen and memory usage is basically the same.

Putting it Together

We now have some repeating, asynchronous functions but we haven’t sent any data. Let’s build on the lighter-weight sequence variation to iterate through the data.

auto send_data(auto iter, auto iter_end) {
    return sequence([iter, iter_end]() mutable {
            return
               just()
             | then([&iter]() -> std::uint8_t {
                 return *iter++;
               })
             | write_byte()
             | wait_write_done()
             | repeat_until([&iter, &iter_end]() {
                 return iter == iter_end;
               })
             ;
    });
}

sequence takes a callable that takes no arguments and returns a sender. We capture the iter and iter_end within the closure object, as previously discussed.

The just is a factory to get us going … we need a sender
then is an adapter that takes a callable. In our case, we don’t need anything from the value channel. We just capture iter by reference. We return the value the iterator is currently pointing at and increment the iterator. That returned value is in the value-channel.
Let’s assume write_byte() is an adapter that returns a sender that extracts the value to write from the value channel and sticks that value into the hardware buffer.
wait_write_done is an adapter that returns a sender that will progress when the write value is done being shifted by the hardware. We are going to leave this as hand-wavey until the future post on interrupts (optionally… see any of my past conference talks on baremetal senders).
repeat_until the iterator is at the end