I need some general information about OpenSSL BIO. Some kind of introduction to it. What is OpenSSL BIO? What is its general idea? I know that it is some kind of API for input/o
What is OpenSSL BIO?
OpenSSL BIO is an API that provides input/output related functionality. The acronym BIO stands for Basic Input/Output
What is its general idea? How is it different from stdio API or sockets API?
The first idea is that it is not an API for some specific type of IO (for example, for files or for network). It is generalized API for various types of entities capable of input/output operations. It is similar to C++ abstract classes with pure virtual functions. You just use a single interface but the behavior is different depending on which specific BIO object is used. For example, it can be a socket object or a file object. If you use the BIO_write
function with a socket object, the data will be sent over network. If you use the BIO_write
function with a file object, the data will be written to a file.
The second idea behind OpenSSL BIO API is that BIO objects can be stacked together into a single linear chain. It allows to process data through different filters before sending it to the final output (sink) or after reading it from the initial input (source). Filters are also BIO objects.
What is a filter BIO? What is a source BIO? What is a sink BIO?
An OpenSSL filter BIO is a BIO that takes data, processes it and passes it to another BIO.
An OpenSSL source BIO is a BIO that doesn't take data from another BIO but takes it from somewhere else (from a file, network, etc.).
An OpenSSL sink BIO is a BIO that doesn't pass data to another BIO but transfers it to somewhere else (to a file, network, etc.).
As regard to source and sink BIOs, there are not specifically source BIOs and there are not specifically sink BIOs, there are only "source-sink" BIOs. The BIO which is a source BIO is a sink BIO as well. For example, a socket BIO is a source BIO and a sink BIO at the same time. When data is written to a socket BIO, the BIO works as a sink. When data is read from a socket BIO, the BIO works a source. Source-sink BIOs are always the terminating section of a BIO chain. This is different from a usual data processing pipeline where the source is the start of the pipeline and the sink is the end of the pipeline.
How can I run data through a filter BIO? Do I have to feed the data to the filter BIO using the
BIO_write
function and get processed data using theBIO_read
function?
If you put data to a filter using the BIO_write
function, you can't get processed data by simply calling the BIO_read
function on the BIO. Filter BIOs work in a different way. A filter BIO may avoid storing processed data in a buffer. It may just take the input data, process it and immediately pass it to the next BIO in the chain using the same BIO_write
function you used to put your data to the BIO. The next BIO, in turn, may, after processing, write the data to the next BIO in the chain. The process stops if either some BIO stores the data in its internal buffer (if it doesn't have enough data to generate output for the next BIO) or if the data reaches the sink.
If you need just run data through a filter BIO without sending it over network or without writing it to a file, you can attach the filter BIO to an OpenSSL memory BIO (i.e. make the following chain: filter bio <-> memory bio
). A memory BIO is a source-sink BIO, but it doesn't send data to anywhere, it just stores the data in a memory buffer. After writing the data to the filter BIO, the data will be written to the memory BIO which will store it in the memory buffer. A memory BIO has special interface to get the data directly from the buffer (though you can use BIO_read
to get the data that was written to a memory BIO, see below).
Reading from a filter BIO works in an opposite way. If you request to read data from a filter BIO, the filter BIO may, in turn, request to read data from the next BIO in the chain. The process stops if either some BIO has enough buffered data to return or if the process reaches the source BIO. A single call to the BIO_read
function on a filter BIO may result in multiple calls to the BIO_read
function inside the filter BIO to get data from the next BIO. A filter BIO will continue to call BIO_read
until it gets enough data to generate processed result.
The situation is more complicated if the source-sink BIO of a chain works in non-blocking mode. For example, non-blocking sockets are used or memory BIO is used (memory BIOs are non-blocking by nature).
Also note that reading from a filter BIO does reversed data processing as compared to processing done when writing to that BIO. For example, if you use a cipher BIO, then writing to the BIO will encipher the written data, but reading from that BIO will decipher the input data. This allows to make a such chain: your code <-> cipher BIO <-> socket BIO
. You write unencrypted data to the cipher BIO which encrypts it and sends it to the socket. When you read from the cipher BIO it, at first, gets encrypted data from the socket, then decrypts it and return unencrypted data to you. This allows you to set up encrypted channel through network. You just use BIO_write
and BIO_read
and all encryption/decryption is done automatically by the BIO chain.
In general a BIO chain looks like on the following diagram:
/------\ /--------\ /---------\ /-------------\
| your | -- BIO_write -> | filter | -- BIO_write -> | another | -- BIO_write -> | source/sink |
| | | | | filter | | |
| code | <- BIO_read -- | BIO | <- BIO_read -- | BIO | <- BIO_read -- | BIO |
\------/ \--------/ \---------/ \-------------/
Why are BIOs needed in OpenSSL? How are they used when programming with OpenSSL? Any examples?
OpenSSL uses BIOs for communicating with the remote side when operating SSL/TLS protocol. The SSL_set_bio function is used to set up BIOs for communicating in a concrete instance of an SSL/TLS link. You can use socket BIO, for example, to run SSL/TLS protocol via network connection. But you may also develop your own BIO (yes, it is possible) or use memory BIO to run SSL/TLS protocol via your own type of link.
You can also wrap an instance of an SSL/TLS link as a BIO itself (BIO_f_ssl
). Calling BIO_write
on an SSL BIO will result in calling SSL_write
. Calling BIO_read
will result in calling SSL_read
.
Although SSL BIO is a filter BIO, it is a little different from other filter BIOs. Calling BIO_write
on SSL BIO may result in series of both BIO_read
and BIO_write
calls on the next BIO in the chain. Because SSL_write
(that is used inside of BIO_write
of SSL BIO) not only sends data, but also provides operating SSL/TLS protocol which may require multiple data exchanging steps between sides to perform some negotiation. The same is true for BIO_read
of SSL BIO. That is how SSL BIOs are different from ordinary filter BIOs.
Also note, that you are not required to use SSL BIO. You can still use SSL_read
and SSL_write
directly.
Which BIOs does OpenSSL provide? Can you provide examples of BIOs and tell about the differences between them?
Here is examples of source-sink BIOs that OpenSSL provides:
BIO_s_file
). It is a wrapper around stdio's FILE*
object. It used for writing to and reading from a file.BIO_s_fd
). It is similar to file BIO but works with POSIX file descriptors instead stdio files.BIO_s_socket
). It is a wrapper around POSIX sockets. It is used for communicating over network.BIO_s_null
). It is similar to the /dev/null
device in POSIX systems. Writing to this BIO just discards data, reading from it results in EOF (end of file).BIO_s_mem
). It is a loopback BIO in essence. Reading from this type of BIO returns the data that was previously written to the BIO. But the data can also be extracted from (or placed to) internal buffer by calling functions that are specific to this type of BIO (every type of BIO has functions that are specific only for this type of BIO).BIO_s_bio
). It is a pipe-like BIO. A pair of such BIOs can be created. Data written to one BIO in the pair will be placed for reading to the second BIO in the pair. And vice versa. It is similar to memory BIO, but memory BIO places data to itself and pipe BIO places data to the BIO which it is paired with.Some information about similarity between BIO_s_mem
and BIO_s_bio
can be found here: OpenSSL “BIO_s_mem” VS “BIO_s_bio”.
And here is examples of filter BIOs:
BIO_f_base64
). BIO_write
through this BIO encodes data to base64 format. BIO_read
through this BIO decodes data from base64 format.BIO_f_cipher
). It encrypts/decrypts data passed through it. Different cryptographic algorithms can be used.BIO_f_md
). It doesn't modify data passed through it. It only calculates digest of data that flows through it, leaving the data itself unchanged. Different digest calculation algorithms can be used. The calculated digest can be retrieved using special functions.BIO_f_buffer
). It also doesn't change data passed through it. Data written to this BIO is buffered and therefore not every write operation to this BIO results in writing the data to the next BIO. As for reading, it is a similar situation. This allows to reduce number of IO operations on BIOs that are located behind buffering IO.BIO_f_ssl
). This type of BIO was described above. It wraps SSL link inside.