Serialising OpenCV matrices using Boost and cereal

I’ve recently switched from Boost Serialization to the cereal serialisation library in one of my projects and wanted to share my serialisation code for OpenCV matrices (cv::Mat) for both libraries since it might be useful for anybody using OpenCV and one of these two libraries.

For those not familiar, a serialisation library takes care of storing and loading built-in datatypes as well as custom classes to and from the filesystem. Most libraries require minimal changes to existing classes and take care of things like serialising std::vector‘s of varying length, so it’s much easier and less error-prone than doing it by yourself. Also, most libraries takes care about things like different newline-styles on Windows/Linux or endianness and allow to serialise to different formats, for example binary archives or JSON.

I’ve been using Boost Serialization in my landmark detection library to store trained models, but I’ve recently encountered a show-stopper. I tend to be on a pretty recent Boost version in my development environment, and any model that I train with my version of Boost (1.58.0 right now) can’t be loaded by people using an earlier version of Boost (which is pretty much everybody). There are a few possible solutions: Using XML archives, but as we need to store quite a lot of floating point values, the XML files ended up too big at first (>100MB). Second, I could switch to an earlier Boost version to train models – but that’s a) cumbersome to switch each time and b) I couldn’t get earlier versions of Boost to compile under Visual Studio 2015 RC (I tried 1.54.0).

While thinking about a solution, I remembered cereal, a header-only serialisation library that has quite an active community. It can easily be included in the project and anybody will be able to load and exchange models. However, there’s one potential show-stopper here as well: cereal makes no guarantee that archives created with newer versions are compatible with archives created with older versions. This means that if I update the cereal headers in my project, it could make all earlier created models invalid. Well, the choice seems to be between the lesser of two evils here and I decided to give cereal a try. I should point out that I had a look at OpenCV’s FileStorage as well, but I didn’t really like it, and I also suspect their YAML files would suffer the same size issue that I had with XML files.

In the rest of this post I’ll describe my serialisation functions for OpenCV’s cv::Mat matrices for both cereal and Boost Serialization and some choices and problems I faced on the way. If you just want to grab the ready-to-use header files, here you go:

Serialising a cv::Mat using cereal

To enable serialisation, we have to define either a single archive() function or two separate load() and save() functions. Since the mechanics for saving and loading are slightly different, I split them up. Given a cv::Mat, we need to store its rowscolstype and a continuous flag, and of course the actual data. In the case of non-contiguous data, we store it row by row.

template<class Archive>
void save(Archive& ar, const cv::Mat& mat)
{
    int rows, cols, type;
    bool continuous;

    rows = mat.rows;
    cols = mat.cols;
    type = mat.type();
    continuous = mat.isContinuous();

    ar & rows & cols & type & continuous;

    if (continuous) {
        const int data_size = rows * cols * static_cast(mat.elemSize());
        auto mat_data = cereal::binary_data(mat.ptr(), data_size);
        ar & mat_data;
    }
    else {
        const int row_size = cols * static_cast(mat.elemSize());
        for (int i = 0; i < rows; i++) {
            auto row_data = cereal::binary_data(mat.ptr(i), row_size);
            ar & row_data;
        }
    }
};

cereal::binary_data allows us to quite neatly wrap around the matrix data and serialise it directly, without unnecessary copies. It makes the load() function look similarly simple:

template<class Archive>
void load(Archive& ar, cv::Mat& mat)
{
    int rows, cols, type;
    bool continuous;

    ar & rows & cols & type & continuous;

    if (continuous) {
        mat.create(rows, cols, type);
        const int data_size = rows * cols * static_cast(mat.elemSize());
        auto mat_data = cereal::binary_data(mat.ptr(), data_size);
        ar & mat_data;
    }
    else {
        mat.create(rows, cols, type);
        const int row_size = cols * static_cast(mat.elemSize());
        for (int i = 0; i < rows; i++) {
            auto row_data = cereal::binary_data(mat.ptr(i), row_size);
            ar & row_data;
        }
    }
};

If you’re interested in how the loading and saving in your application looks, it’s as easy as:

cv::Mat data;
{
    std::ofstream file("file.bin", std::ios::binary);
    cereal::BinaryOutputArchive ar(file);
    ar(data);
}
// Load the data from the disk again:
cv::Mat loaded_data;
{
    std::ifstream file("file.bin", std::ios::binary);
    cereal::BinaryInputArchive ar(file);
    ar(loaded_data);
}

What’s a bit of a shame is that cereal::binary_data<uchar*> only works with Binary[Input|Output]Archive, and prevents serialisation to JSON or XML. For small matrices, it would be great to have the option to serialise to a human readable format. But in any case this would be a bit more tricky as OpenCV always stores the data as uchar* and the type of the matrix data is dynamically encoded in a member variable (for example, a valid type would be CV_32FC1). From what I’ve seen in cereal’s documentation, it would allow us to write specialised load()/save() functions for the case of XML or JSON archives, and we could then loop over the matrix and extract the float (or whatever type) values. But I think it would require a non-trivial amount of code to handle all of OpenCV’s possible types. It would probably be slow and result in large files, thus it’ll only be useful for small matrices. I’m happy with just having a small and efficient binary storage for now.

Serialising a cv::Mat using Boost Serialization

The code here is based on this stackoverflow answer and on a blog post from Christoph Heindl and its comments – thanks to all of them! I combined the best parts of each of their ideas. I omitted the unnecessary elemSize field, made it work with non-contiguous matrices as well, added support for XML serialisation and used binary_object instead of make_array. And we don’t have to split the load() and save() functions, we can get by with a single serialize() function.

template<class Archive>
void serialize(Archive& ar, cv::Mat& mat, const unsigned int /*version*/)
{
    int rows, cols, type;
    bool continuous;

    if (Archive::is_saving::value) {
        rows = mat.rows;
        cols = mat.cols;
        type = mat.type();
        continuous = mat.isContinuous();
    }

    ar & BOOST_SERIALIZATION_NVP(rows) & BOOST_SERIALIZATION_NVP(cols) & BOOST_SERIALIZATION_NVP(type) & BOOST_SERIALIZATION_NVP(continuous);

    if (Archive::is_loading::value)
        mat.create(rows, cols, type);

    if (continuous) {
        const int data_size = rows * cols * static_cast(mat.elemSize());
        boost::serialization::binary_object mat_data(mat.data, data_size);
        ar & BOOST_SERIALIZATION_NVP(mat_data);
    }
    else {
        const int row_size = cols * static_cast(mat.elemSize());
        for (int i = 0; i < rows; i++) {
            boost::serialization::binary_object row_data(mat.ptr(i), row_size);
            std::string row_name("mat_data_row_" + std::to_string(i));
            ar & make_nvp(row_name.c_str(), row_data);
        }
    }
};

The code looks similar to cereal’s code, this is because cereal deliberately wanted to keep its interfaces similar to Boost’s. Storing and loading a cv::Mat looks very similar as well:

cv::Mat data;
{
    std::ofstream file("file.txt");
    boost::archive::text_oarchive ar(file);
    ar << data;
}
// Load the data from the disk again:
cv::Mat loaded_data;
{
    std::ifstream file("file.bin");
    boost::archive::text_iarchive ar(file);
    ar >> loaded_data;
}

In contrast to cereal, Boost Serialization supports writing binary_object to XML files (and reading them). Using binary_object is much more efficient than the alternative, boost::serialization::make_arraymake_array puts each matrix entry (actually, each uchar, which results in even more elements) inside its own <item> tag, which is incredibly inefficient. Using binary_object results in much smaller files and a lot faster storing and reading of the archive. Also there’s no drawbacks to it as I see it, the uchar data written by make_array is not really human readable anyway, so we might as well use binary_object, which uses some sort of encoding.

Conclusion

While writing this blog post, I found out that Boost Serialization’s xml_[i|o]archive actually seems to be agnostic to the Boost version that is used, in contrast to text and binary archives. I was able to store a model to an XML file with above serialisation code using Boost 1.58.0, and successfully read it with Boost 1.57.0. I’m not sure that this is guaranteed behaviour, I may have to dig into that. But it would solve a lot of the versioning issues and might even be better than using cereal.

The two header files with the full code and doxygen comments are available here for cereal (matcerealisation.hpp) and here for Boost Serialization (matserialisation.hpp).

Any comments about the code, potential bugs, or better solutions in general (maybe I’ve overlooked something obvious) are very welcome. Best drop me an email for now, I will probably add means to add comment to the blog at a later time. Also, I would welcome just a “Hi” if you found any of this useful. Update 12/08/2015: I finally added a comment form.

All code in this post is licensed under the Apache License, Version 2.0.

Leave a comment