You are here

HTTP chunked transfer-coded data read / decode example in C++

Uditha Atukorala's picture

I was writing some code for the bitz-server project the other day and I wanted to read data passed in using the Chunked Transfer Coding (RFC 2616, section 3.6.1). I did some quick googling but I couldn't find a coding example in C++ (or in C).

In the RFC 2616, section 19.4.6, there is a pseudo-code example on how to decode "chunked" transfer-coding, which seemed simple enough. So I thought of implementing it myself.

This is my approach to implementing a "chunked" tranfer-coding decoder in C++.

First I needed a couple of helper functions. I started off with two methods to read data from the socket. 

Listing 1 - read data from a socket

1
2
3
4
5
6
7
8
9
10
11
12
std::string read_data( int socket, int size ) throw() {
 
        char buffer[size];
        std::string data = "";
 
        if ( recv( socket, buffer, size, 0 ) > 0 ) {
                data.append( buffer );
        }
 
        return data;
 
}

Listing 2 - read a line from a socket

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
std::string read_line( int socket, bool incl_endl = true ) throw() {
 
        int n;
        std::string line;
        char c = '\0';
 
        while ( ( n = recv( socket, &c, 1, 0 ) ) > 0 ) {
 
                if ( c == '\r' ) {
 
                        if ( incl_endl ) {
                                line += c;
                        }
 
                        // peak for \n
                        n = peek( socket, &c, 1, MSG_PEEK );
 
                        if ( ( n > 0 ) && ( c == '\n' ) ) {
 
                                n = recv( socket, &c, 1, 0 );
 
                                if ( incl_endl ) {
                                        line += c;
                                }
 
                                break; // end of line
                        }
                        
                }
 
                line += c;
 
        }
 
        return line;
 
}

 

Now I can read from the socket but I needed to know what to read. Lets take a look at the following example.

1
2
3
4
1e\r\n
I am posting this information.\r\n
0\r\n
\r\n

In the above, line 1 contains the length of the "chunk" in hex format. So I wrote a couple more methods to extract this information. Note that according to the RFC 2616, section 3.6.1, the first line could include "chunk-extension" information as well which can be ignored safely if you don't support any extensions. 

Listing 3 - split a string

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
std::vector split( const std::string &str, const std::string &delimiter ) throw() {
 
        std::vector result;
        size_t current;
        size_t next = -1;
 
        do {
                current = next + 1;
                next = str.find_first_of( delimiter, current );
                result.push_back( str.substr( current, ( next - current ) ) );
        } while ( next != std::string::npos );
 
        return result;
 
}

Listing 4 - convert hex to decimal

1
2
3
4
5
6
7
8
9
10
11
unsigned int hextodec( const std::string &hex ) throw() {
 
        unsigned int dec;
        std::stringstream ss;
 
        ss << std::hex << hex;
        ss >> dec;
 
        return dec;
 
}

Listing 5 - read chunk size

1
2
3
4
5
6
7
8
9
10
11
unsigned int read_chunk_size( int socket ) throw() {
 
        std::string line;
        std::vector chunk_header;
 
        line = read_line( socket );
        chunk_header = split( line, ";" );
 
        return hextodec( chunk_header.at( 0 ) );
 
}

 

In the above example, line 2 contains the "chunked" data. Line 3 and 4 denotes the end of the "chunk". Keeping that information in mind, following I wrote the following method to read / decode chunked data.

Listing 6 - read chunked transfer-coded data

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
std::string read_chunked( int socket ) throw() {
 
        unsigned int chunk_size = 0;
        unsigned int offset = 0;
        std::string chunked_data = "";
 
        while ( ( chunk_size = read_chunk_size( socket ) ) > 0 ) {
 
                offset = chunked_data.size();
 
                // read chunk-data
                chunked_data.append( read_data( socket, chunk_size ) );
 
                // sanity check
                if ( ( chunked_data.size() - offset ) != chunk_size ) {
                        // something went wrong
                        break;
                }
 
                // extra \r\n
                read_data( socket, 2 );
 
        }
 
        // read until the end of chunked data
        while ( read_line( socket, true ).size() > 2 ) ;
 
        return chunked_data;
 
}