Categories We Write About

Writing C++ Code That Works Well with Memory Mapped Files

Memory-mapped files provide an efficient mechanism for file I/O by mapping a file’s contents directly into the process’s address space. This technique is particularly valuable when working with large datasets, enabling access to file data as if it were in memory. In C++, integrating memory-mapped file handling requires careful attention to system calls, memory management, and platform-specific APIs. Here’s how to write performant and portable C++ code that works well with memory-mapped files.

Understanding Memory-Mapped Files

A memory-mapped file maps a region of a file into the virtual memory space of a process. Instead of reading and writing file data through traditional I/O functions (read, write), the file can be accessed like an array in memory. This results in reduced I/O overhead and faster access to data, especially for random reads and writes.

Use Cases for Memory-Mapped Files

  • Large data processing: Useful in scientific computing, big data analytics, and high-performance computing.

  • Inter-process communication: Allows multiple processes to share memory by mapping the same file.

  • Database systems: Improves performance by eliminating the need for redundant I/O operations.

Platform Considerations

The implementation of memory-mapped files is OS-specific:

  • On Unix/Linux, use mmap, munmap, and related system calls.

  • On Windows, use CreateFileMapping, MapViewOfFile, and related functions.

A well-written C++ program should ideally abstract platform-specific code to maintain portability.

C++ Code Design for Memory-Mapped Files

Include the Required Headers

For Linux:

cpp
#include <sys/mman.h> #include <sys/stat.h> #include <fcntl.h> #include <unistd.h>

For Windows:

cpp
#include <windows.h>

Encapsulating Memory-Mapped Files in a Class

Creating a C++ class to manage memory-mapped files helps improve modularity and manage resources efficiently using RAII (Resource Acquisition Is Initialization).

Linux Example

cpp
class MemoryMappedFile { private: int fd; size_t fileSize; void* data; public: MemoryMappedFile(const std::string& filepath) : fd(-1), fileSize(0), data(nullptr) { fd = open(filepath.c_str(), O_RDONLY); if (fd == -1) throw std::runtime_error("Failed to open file"); struct stat st; if (fstat(fd, &st) == -1) throw std::runtime_error("Failed to get file size"); fileSize = st.st_size; data = mmap(nullptr, fileSize, PROT_READ, MAP_PRIVATE, fd, 0); if (data == MAP_FAILED) throw std::runtime_error("Memory mapping failed"); } ~MemoryMappedFile() { if (data) munmap(data, fileSize); if (fd != -1) close(fd); } const char* getData() const { return static_cast<const char*>(data); } size_t size() const { return fileSize; } };

Usage

cpp
MemoryMappedFile mmf("largefile.dat"); std::cout.write(mmf.getData(), mmf.size());

This class safely maps a file into memory, allowing data to be accessed like a raw pointer.

Windows Equivalent

cpp
class MemoryMappedFile { private: HANDLE hFile, hMapping; LPVOID data; size_t fileSize; public: MemoryMappedFile(const std::wstring& filepath) : hFile(NULL), hMapping(NULL), data(nullptr), fileSize(0) { hFile = CreateFileW(filepath.c_str(), GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL); if (hFile == INVALID_HANDLE_VALUE) throw std::runtime_error("Failed to open file"); LARGE_INTEGER size; if (!GetFileSizeEx(hFile, &size)) throw std::runtime_error("Failed to get file size"); fileSize = static_cast<size_t>(size.QuadPart); hMapping = CreateFileMappingW(hFile, NULL, PAGE_READONLY, 0, 0, NULL); if (!hMapping) throw std::runtime_error("Failed to create file mapping"); data = MapViewOfFile(hMapping, FILE_MAP_READ, 0, 0, 0); if (!data) throw std::runtime_error("Failed to map view of file"); } ~MemoryMappedFile() { if (data) UnmapViewOfFile(data); if (hMapping) CloseHandle(hMapping); if (hFile != INVALID_HANDLE_VALUE) CloseHandle(hFile); } const char* getData() const { return static_cast<const char*>(data); } size_t size() const { return fileSize; } };

Design Considerations

Alignment and Page Size

Memory mappings are often aligned to the system’s page size, typically 4 KB. If you plan to map only part of a file, ensure your offset is aligned accordingly.

Synchronization

When mapping a file with write access across threads or processes, take care to synchronize access. Race conditions and data corruption can occur if synchronization mechanisms such as mutexes are not used properly.

Error Handling

Always check the return values of system calls. Memory-mapping can fail for various reasons, such as insufficient permissions or invalid file formats.

Performance Optimization Tips

  1. Access Pattern Awareness: Access data sequentially when possible, to take advantage of OS-level read-ahead mechanisms.

  2. Use MAP_PRIVATE vs. MAP_SHARED: Choose MAP_PRIVATE for copy-on-write behavior and MAP_SHARED for changes to be written back to the file.

  3. Prefetching: On Linux, madvise with MADV_WILLNEED can be used to inform the OS about future access.

  4. Unmapping: Always unmap the file when done to release resources immediately.

  5. Mapping Only Necessary Portions: For extremely large files, map only the segments you need rather than the entire file.

Testing and Portability

Use conditional compilation with #ifdef _WIN32 to maintain portability between Linux and Windows. Write wrapper functions that abstract the OS-specific implementation details.

cpp
#ifdef _WIN32 // Windows implementation #else // Linux/Unix implementation #endif

Advanced Use Cases

  • Writable Memory Maps: For editing file content directly in memory, open the file with O_RDWR and use PROT_READ | PROT_WRITE.

  • Shared Memory between Processes: Map the same file in multiple processes to establish shared memory.

  • Anonymous Mappings: Useful for creating large arrays in memory that do not back onto a file, often used in high-performance applications.

Conclusion

Memory-mapped files offer a powerful tool for high-performance I/O in C++. Properly abstracted and carefully managed, they can simplify code and vastly improve performance in data-heavy applications. By designing cross-platform classes, handling errors thoroughly, and optimizing access patterns, developers can leverage this capability in a safe and efficient way.

Share This Page:

Enter your email below to join The Palos Publishing Company Email List

We respect your email privacy

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

Categories We Write About