Iteratees in C README The real quick intro First of all, this is all modelled closely after Oleg's original formulation in Haskell which can be found at http://okmij.org/ftp/Streams.html#iteratee so that acts as kind of an authorative reference to the background. Iteratees and Streams ======================= An 'Iteratee' is a data structure that represents a stream processor. It consumes a stream of input elements and at some point produces a result or fails with an error. Streams may consist of elements of any type. Processing proceeds one chunk of input at a time. Accordingly, the 'Stream' type encodes the current "state" of the stream rather than the entire stream. It contains either the next chunk or the end of input. Likewise, 'Iteratee' encodes the current state of stream processing: 'DONE', 'CONT', or 'STOP'. 'DONE' means that the iteratee is finished and will accept no more input. The 'Iteratee' holds only the result. 'CONT' is the "continuation" state in which the iteratee is suspended waiting for more input. The 'Iteratee' holds an arbitrary continuation function. The function takes the input Stream and returns an 'Iteratee' representing the new state.[1] This function is supposed to be a closure which is represented by pairing it with a pointer to an arbitrary environment. Any state information of the iteratee is held in the continuation function. 'STOP' is the error state. For the proof-of-concept, it contains an error message as a plain string. It is also possible to resume from a 'STOP' by including a suitable continuation. A non-resumable error should return itself from the continuation. [1]: 'Iteratee's are treated as immutable values, 'Stream's are modified in place. There are constructor functions for the three states of iteratees: 'done', 'cont', 'stop', as well as a convenience function 'fail' that produces a non-resumable error. In general, it is desirable to define complex iteratees in terms of more primitive ones by means of combinators. Unfortunately, this is not always practical or even possible, so some amount of bare construction is expected. Some primitive iteratees defined in the code: 'head' returns the first element of input. 'chunk' returns the first chunk of input. 'count' consumes all input and yields the number of elements in the stream. 'drop' consumes all input and returns NULL. 'take' consumes all input and returns it as one big chunk. 'take1' is like take but fails on empty input. 'drop1' is like drop but fails on empty input. Combinators defined in the code: 'bind' is like Haskell's monadic binding operator (>>=). It takes an iteratee 'iter' and "binds" it to a continuation function 'k'. The resulting iteratee runs 'iter' first and when that finishes, passes the result to 'k' which returns the iteratee to continue with. Again, the continuation is supposed to be an arbitrary closure which we emulate by passing an arbitrary environment as a void-pointer. 'bind_' is like Haskell's (>>), a variant of bind where the continuation doesn't depend on the first argument's result. I.e. it simply runs its arguments one after the other. 'seq' takes an arbitrary number of iteratees, runs them one after another, and yields each one's result in an array. 'seq_' is like 'seq' but discards the results. Enumerators ============= Iteratees perform their work when they are fed input. This is referred to as "enumerating" a stream (or streams) to them. An 'Enumerator' represents a function that given an iteratee, will feed some input to it. The only reason 'Enumerator' is a separate data type is to be combined with an environment once again. This allows enumerator combinators to be defined. The 'apply' function calls the enumerator on the given 'Iteratee'. Like 'Iteratee', 'Enumerator' is treated as an immutable value type including a constructor function 'enumerator' for convenience. Enumerators defined: 'enumf' enumerates a file descriptor as a stream of bytes. 'enumfile' opens and enumerates a file given by name. Enumerator combinators defined: 'append' constructs an enumerator that runs two enumerators after another. Enumeratees ============= Enumeratees are conceptually enumerators that work over an 'Iteratee'. This is somewhat tricky to explain. Think of an iteratee as representing a certain process (or procedure) broken chunk-wise over some input. So if a normal 'Enumerator' is a procedure that feeds input to an iteratee, an 'Enumeratee' is that process broken over chunks of input. Applying an enumeratee to an iteratee yields, at first, only another iteratee, which we call the outer iteratee, that consumes the input to the enumerator. Feeding the outer iteratee lets the enumerator run. That feeds the inner iteratee (the one the enumeratee was applied to). When the outer iteratee finishes, the result is the state of the inner iteratee. This nesting of iteratees looks confusing but makes it possible to feed the inner iteratee through multiple different enumeratees, and to differentiate an outer from an inner error. Unfortunately, the C type system does not allow any of this to be described properly, so note that 'Enumeratee' is just a typedef to 'Enumerator' and the user must keep track of what to expect for results. More severely, the C type system cannot distinguish between enumerators that run directly when applied and enumeratees that run indirectly through an iteratee. In Haskell parlance, no polymorphism wrt. the base monad. :( This means that enumerator combinators cannot directly be used on enumeratees, even though this is intended conceptually. Instead we define the normal enumerator combinators as convenience wrappers over general counterparts that return a monad. For that purpose, plain "IO actions" are wrapped in 'Iteratee' and run via 'finish', which is in fact aliased to 'run'. There is a combinator 'fuse' for the common case where one wants the inner iteratee to finish with the outer one. The convenience function 'wrap' combines this with 'apply'. For instance, wrap(decode(utf8), iter) = apply(fuse(decode(utf8)), iter) turns 'iter' from an iteratee that consumes decoded characters into one that consumes a byte stream in UTF-8 encoding. Enumeratees defined: 'decode' runs its iteratee argument repeatedly on the input and feeds the stream of results to the inner iteratee. 'filter' passes only those elements of the input that satisfy a predicate to the inner iteratee. 'ewhile' passes elements to the inner iteratee while they satisfy a predicate. (prefixed with 'e' to avoid keyword clash.) 'until' passes elements while they do not satisfy a predicate.