crow

Utility library


Project maintained by CaptainCrowbar Hosted on GitHub Pages — Theme by mattgraham

File System Path

Crow Library by Ross Smith

#include "crow/path.hpp"
namespace Crow;

Contents

Path class

class Path;

The Path class holds a file system path as a string. Internally, the file name is stored as a std::string on Unix, or a std::wstring on Windows, but Path objects on any system can be constructed from a UTF-8 string. The class provides a set of file system operations, such as file renaming and deletion, directory search, and so on.

File handling in Unicode always has the problem that the actual file names on most systems are not guaranteed to be valid Unicode. On most Unix systems, a file name is an arbitrary sequence of bytes, with no restrictions other than excluding the null and slash characters; while file names on systems such as Linux are usually expected to be encoded in UTF-8 when they contain non-ASCII characters, this is not enforced in any way. On Microsoft Windows, using the NTFS file system, file names are nominally UTF-16 (with a short list of disallowed characters), but not all of the wide character API functions that deal with files check for valid encoding; it’s not hard to create a file whose name is an arbitrary sequence of 16-bit integers.

The HFS+ file system formerly used by Apple on macOS and iOS appears to be the only widely used file system that actually does enforce valid Unicode names at the file system level (although it then proceeds to complicate matters by using a proprietary normalization scheme that does not match any of the four standard ones). In any case the current APFS file system has returned to the usual Unix convention of allowing any sequence of bytes without checking encoding.

To deal with this situation, the default behaviour of Path when constructed from a native file name string (std::string on Unix, std::wstring on Windows) is to copy the string verbatim, without checking for or correcting any UTF encoding. When converting from UTF-8 to UTF-16, an exception will be thrown if the name is not valid UTF-8.

The examples in the documentation below mostly use Unix-style file names for simplicity; the equivalent code on Windows will make the obvious substitutions of backslashes for slashes, and drive roots such as "C:\" for "/".

The term “leaf name” is used here to mean the name of an individual file within a directory, with no directory path prefix (for example, the leaf name of "/foo/bar/hello.txt", or "C:\foo\bar\hello.txt", is "hello.txt").

Exceptions are documented for individual functions only where it differs from the following rules:

Member types

class Path::directory_iterator;
class Path::search_iterator;
using directory_range = Irange<directory_iterator>;
using search_range = Irange<search_iterator>;

Directory search iterators, and range types containing a pair of iterators, returned by the directory search functions.

using Path::id_type = std::pair<uint64_t, uint64_t>;

Type used for a file’s unique ID.

using Path::os_char = [char on Unix, wchar_t on Windows];
using Path::os_string = std::basic_string<os_char>;

Native OS string types.

using Path::time_point = std::chrono::system_clock::time_point;

Type used to represent file times.

enum class Path::cmp: uint32_t {
    cased,
    icase,
    native
};
class Path::equal {
    equal() noexcept;
    explicit equal(cmp mode) noexcept;
    bool operator()(const Path& lhs, const Path& rhs) const;
};
class Path::less {
    less() noexcept;
    explicit less(cmp mode) noexcept;
    bool operator()(const Path& lhs, const Path& rhs) const;
};

Path comparison objects. These have equality and less-than semantics; the constructor argument indicates whether to do a case insensitive comparison. Path::cmp::native is the default, and will be equal to one of the other two options, reflecting the native OS convention.

Case insensitivity here applies only to ASCII characters; the semantics of Unicode case sensitivity are too complicated to emulate here,

enum class Path::flag_type: uint32_t {
    no_flags = 0,
    append,        // If file exists, append instead of overwriting
    bottom_up,     // Search directory tree in bottom up order
    legal_name,    // Fail if file name is illegal for OS
    may_copy,      // Fall back to copying if normal operation is not possible
    may_fail,      // Return default value instead of throwing on failure
    no_backtrack,  // Don't search upward through directories
    no_follow,     // Don't follow symlinks
    no_hidden,     // Ignore hidden files
    overwrite,     // Replace file if it already exists
    recurse,       // Perform directory operations recursively
    stdio,         // If path is empty or "-", use stdin/stdout instead
    unicode,       // Ignore files whose names are not valid UTF
};
using enum Path::flag_type;

Bitmask flags controlling the behaviour of some of the path functions. More details of the resulting behaviour are described with the relevant functions. As a general rule, flags that are not relevant to a specific function are ignored.

enum class Path::form: uint32_t {
    empty,
    absolute,
    drive_absolute,
    drive_relative,
    relative
};
std::ostream& operator<<(std::ostream& out, Path::form f);

These are the possible formats for a path; any path will have one of these forms, which can be queried using path_form() and related functions such as is_absolute().

The concepts of a “drive absolute” path (e.g. "\foo" with no leading drive letter), or a “drive relative” path (e.g. "C:foo" with no backslash after the drive letter), are specific to Windows. On Unix, path_form() will never return one of those values; is_drive_absolute() and is_drive_relative() are always false.

enum class Path::kind: uint32_t {
    none,
    directory,
    file,
    special,
    symlink
};
std::ostream& operator<<(std::ostream& out, Path::kind k);

These are used to indicate what kind of file a path refers to (none indicates that the file does not exist), which can be queried using file_kind().

Constants

static constexpr char Path::delimiter = ['/' on Unix, '\\' on Windows];
static constexpr Path::os_char Path::os_delimiter = [same];

The native path delimiter character.

static constexpr bool Path::os_case_sensitive;

Indicates whether path names are case sensitive by default on the host OS. False on MacOS and Windows, true on all other systems (including iOS).

Life cycle functions

Path::Path() noexcept;

Default constructor, equivalent to constructing from an empty string.

Path::Path(const std::string& file, flag_type flags = no_flags);
Path::Path(const std::wstring& file, flag_type flags = no_flags);
Path::Path(std::string_view file, flag_type flags = no_flags);
Path::Path(std::wstring_view file, flag_type flags = no_flags);
Path::Path(const char* file, flag_type flags = no_flags);
Path::Path(const wchar_t* file, flag_type flags = no_flags);

Constructors (and implicit conversions) from a string. The constructors that take a wide string are only defined on Windows.

If the argument is a native string, the name will simply be copied verbatim with no UTF encoding checks. On Windows, the constructor from an 8-bit string will throw std::invalid_argument if the name is not valid UTF-8.

If the legal_name flag is used, the constructor will throw std::invalid_argument if a file name whose format is illegal for the operating system is supplied.

If the unicode flag is used, the constructor will throw std::invalid_argument if a file name containing invalid UTF is supplied. This is implicitly always set when the 8-bit string constructor is used on Windows.

On construction, paths are brought into a canonical form according to the following rules:

Path::~Path() noexcept;
Path::Path(const Path& p);
Path::Path(Path&& p) noexcept;
Path& Path::operator=(const Path& p);
Path& Path::operator=(Path&& p) noexcept;

Other life cycle functions.

Comparison operators

std::strong_ordering operator<=>(const Path& lhs, const Path& rhs) noexcept;
bool operator==(const Path& lhs, const Path& rhs) noexcept;
bool operator!=(const Path& lhs, const Path& rhs) noexcept;
bool operator<(const Path& lhs, const Path& rhs) noexcept;
bool operator>(const Path& lhs, const Path& rhs) noexcept;
bool operator<=(const Path& lhs, const Path& rhs) noexcept;
bool operator>=(const Path& lhs, const Path& rhs) noexcept;

Comparison operators. These perform lexicographical comparison by code unit, not by code point, because the path string may not be a valid encoding. On Windows this means that paths that contain characters outside the BMP will not necessarily sort in Unicode order.

These operators always perform case sensitive comparison regardless of the operating system. You can use the Path::equal and Path::less function objects if you need case insensitive comparison, but please note that even with these, no attempt is made to exactly emulate the native file name sorting conventions.

Path name functions

These operate on the path purely as a string, making no contact with the actual file system.

std::string Path::name() const;
std::string Path::str() const;
explicit Path::operator std::string() const;
std::ostream& operator<<(std::ostream& out, const Path& p);

These return the full path as an 8-bit string. On Windows, they will throw std::invalid_argument if the actual file name contains invalid UTF-16.

Path::os_string Path::os_name() const;
explicit Path::operator os_string() const;
const Path::os_char* Path::c_name() const noexcept;

These return the full path in its native form, with no conversion. The os_string operator is defined (as a separate operator from std::string) only on WIndows.

std::string Path::as_url() const;

Returns the file name as a UTF-8 "file://..." URL. This will throw std::invalid_argument if the path is not absolute, or if the file name contains invalid UTF.

std::vector<std::string> Path::breakdown() const;
std::vector<os_string> Path::os_breakdown() const;

Break the path down into its directory and file elements. If the path is absolute, the first element will be the root path.

Path Path::change_ext(std::string_view new_ext) const;

Replace the file extension with a new one. The new extension can be supplied with or without a leading dot, or can be an empty string to remove the extension. This will throw std::invalid_argument if the path is empty or is a root path, or (on Windows) if the new extension contains invalid UTF-8.

bool Path::is_empty() const noexcept;
bool Path::is_absolute() const noexcept;
bool Path::is_relative() const noexcept;
bool Path::is_drive_absolute() const noexcept;
bool Path::is_drive_relative() const noexcept;
Path::form Path::path_form() const noexcept;

Query which syntactic form the path has. Exactly one of the first five functions will always be true for any path. On Unix, is_drive_absolute() and is_drive_relative() will always be false.

size_t Path::hash() const noexcept;
struct std::hash<Path>;

Hash of the file name. This returns the same value as std::hash of the os_name().

bool Path::is_leaf() const noexcept;

True if the path is a leaf name (one with no directory prefix).

bool Path::is_legal() const noexcept;

True if the path is legal for this operating system. This does not check for valid UTF, only for the OS-specific file naming rules.

bool Path::is_root() const noexcept;

True if the path is the root of a logical file system. This is a purely syntactic check and does not attempt to check against actual mount points. On Unix this is true only for "/" and no other paths. On Windows, this checks for a drive letter (e.g. "C:\") or a network path (e.g. "\\server\").

bool Path::is_unicode() const noexcept;

True if the name is valid UTF. This checks the name of the file, not its contents.

Path Path::relative_to(const Path& base, flag_type flags = no_flags) const;

Returns a relative path locating the current path relative to the given base path (which is assumed, without checking, to be a directory). The current and base paths must be either both absolute or both relative; if they are relative, they are assumed to be relative to the same root. If the no_backtrack flag is set, paths that are not within the base directory will be returned as absolute paths, instead of using ".." elements to backtrack up the directory tree.

On Windows, if the two paths are absolute and have different drive prefixes, the original absolute path will be returned unchanged, regardless of the no_backtrack flag.

This will throw std::invalid_argument if one of the two paths is absolute and the other relative, or if both paths are relative, the current path is not below the base directory, and the no_backtrack flag is set.

std::pair<std::string, std::string> Path::split_leaf() const;
std::pair<Path::os_string, Path::os_string> Path::split_os_leaf() const;
std::string Path::base() const;
std::string Path::ext() const;
Path::os_string Path::os_base() const;
Path::os_string Path::os_ext() const;

Split the path’s leaf name into a base and extension; the extension begins with the last dot, provided this is not the first or last character of the leaf name. The part of the path before the leaf name (i.e. up to the last path delimiter) is not included in the return value. If the path is a root path, both strings will be empty. For example, "/foo/bar/hello.txt" will return {"hello",".txt"}.

std::pair<Path, Path> Path::split_path() const;
Path Path::parent() const;
Path Path::leaf() const;

Split the path into a directory path and a leaf name. If the path is a root path, the leaf name will be empty. For example, "/foo/bar/hello.txt" will return {"/foo/bar","hello.txt"}.

std::pair<Path, Path> Path::split_root() const;
Path Path::root() const;
Path Path::from_root() const;

Split the path into a root path and a path relative to the root. For example, "/foo/bar/hello.txt" will return {"/","foo/bar/hello.txt"}.

static Path Path::common(const Path& lhs, const Path& rhs);
template <typename Range> static Path Path::common(const Range& files);

Return the longest common prefix of two or more paths. The return value will be a complete path (i.e. it will not split a path element in half if two path elements happen to share a substring). The returned path will be empty if the argument paths are a mixture of absolute and relative paths, or (on Windows) if they are absolute but do not all have the same root. For example, common("/a/b/c/d/e/f","/a/b/c/x/y/z") will return "/a/b/c".

static Path Path::join(const Path& lhs, const Path& rhs);
template <typename Range> static Path Path::join(const Range& files);
Path Path::operator/(const Path& lhs, const Path& rhs);
Path& Path::operator/=(Path& lhs, const Path& rhs);

Join two paths. This will discard the LHS and return the RHS unchanged if the LHS is empty or the RHS is absolute; otherwise, the resulting path is obtained by assuming the RHS is relative to the LHS.

File system query functions

These require read-only access to the file system. Any function not marked noexcept can throw std::system_error if the underlying system API fails, in addition to any other documented exceptions.

Query functions that are paired with a corresponding update function are listed separately under the query/update functions category.

Path::directory_range Path::directory(flag_type flags = no_flags) const;
Path::search_range Path::deep_search(flag_type flags = no_flags) const;

Return an iterator range over the files within a directory, either the immediate directory members (directory()) or a recursive search of all child directories (deep_search()). If the path is empty, this will iterate over the current directory (yielding the same results as Path(".").directory(), but without the "./" prefix).

The range will be empty if the path does not exist or is not a directory. The order in which files are returned is unspecified. If the directory’s contents are changed while iteration is in progress, it is unspecified whether the changes will be visible to the caller. If the no_hidden flag is set, hidden files will not be included. If the unicode flag is set, files whose names are not valid UTF will not be included (this test is applied to the whole path, so a directory whose own name is invalid will appear to be empty).

By default, deep_search() visits files in top down order (a directory is seen before its contents); if the bottom_up flag is set, bottom up order is used instead (a directory’s contents are seen before the directory itself). Deep search iterators otherwise take the same flags, and follow the same rules, as directory iterator.

bool Path::exists(flag_type flags = no_flags) const noexcept;

Query whether a file exists. This may give a false negative if the file exists but is not accessible to the calling process. The no_follow flag prevents this function from following symbolic links.

Path::kind Path::file_kind(flag_type flags = no_flags) const noexcept;

Returns the type of the file, or none if the file does not exist. This may give a false negative if the file exists but is not accessible to the calling process. If the no_follow flag is set, this will return symlink if the path names a symlink (symlink will never be returned in the absence of this flag).

Path::id_type Path::id(flag_type flags = no_flags) const noexcept;

Returns a unique file identifier, intended to identify the file even if it is referred to by different paths. It will return a null ID if the file does not exist or the caller does not have permission to query its properties. This is based on the file’s device and inode numbers (or the equivalent on Windows), but completely reliable file identification cannot be guaranteed in the presence of shenanigans like parallel NFS mounts. The no_follow flag prevents this function from following symbolic links.

bool Path::is_directory(flag_type flags = no_flags) const noexcept;
bool Path::is_file(flag_type flags = no_flags) const noexcept;
bool Path::is_special(flag_type flags = no_flags) const noexcept;

Query the file type. Exactly one of these will be true if exists() is true. These may give a false negative if the file exists but is not accessible to the calling process. The no_follow flag prevents this function from following symbolic links.

bool Path::is_hidden() const noexcept;

True if the file is normally hidden. On Unix this is based on the file name (a file is hidden if its name starts with a dot), and the file’s existence is not checked; on Windows this is a metadata property, and will return false if the file does not exist or is not accessible to the caller.

bool Path::is_symlink() const noexcept;

True if the file is a symbolic link. I haven’t tried to support the complicated symlink-like concepts on NTFS; this simply always returns false on Windows.

Path Path::resolve() const;

This resolves a relative path to an absolute one. If the path is empty, this will return the current directory. On Unix, if the path is absolute, it will be returned unchanged, but on Windows any short filenames in the original path will be resolved to long names even if the path was absolute.

Path Path::resolve_symlink() const;

Returns the file pointed to by a symlink. This will just return the original path unchanged if it is not a symlink (on Windows this will always happen).

uint64_t Path::size(flag_type flags = no_flags) const;

Returns the size of a file in bytes. If the no_follow flag is set, and the path refers to a symlink, this will return the size of the symlink itself instead of the target file. If the recurse flag is set, and the path refers to a directory, this will recursively determine the total size of the directory and everything in it (symlinks below the outermost directory will never be followed regardless of the flags).

File system update functions

These require write access to the file system. Any of these functions can throw std::system_error if the underlying system API fails, in addition to any other documented exceptions.

Update functions that are paired with a corresponding query function are listed separately under the query/update functions category.

void Path::copy_to(const Path& dst, flag_type flags = no_flags) const;

Copy a file from the current path to the destination path. If the overwrite flag is set, this will replace any existing file at the destination. If the recurse flag is set, and the path refers to a directory, it will be copied recursively.

This will throw std::system_error if the source file does not exist, if the source and destination are the same file, if the destination already exists and overwrite was not set, or if the source is a directory and recurse was not set.

void Path::create() const;

If the file does not exist, an empty file with default permissions is created.

void Path::make_directory(flag_type flags = no_flags) const;

Create a directory, with default permissions, at the current path. If the overwrite flag is set, and the path exists but is not a directory, it will be replaced. If the recurse flag is set, any missing parent directories will also be created (like mkdir -p).

This will do nothing if the directory already exists. It will throw std::system_error if the path exists but is not a directory, and the overwrite flag was not set.

void Path::make_symlink(const Path& linkname,
    flag_type flags = no_flags) const;

Create a symlink at linkname, pointing to the current path. The existence or validity of the target is not checked. If the may_copy flag is set, this will copy the file (recursively) if a symlink can’t be created. If the overwrite flag is set, any existing file at linkname will be replaced.

This will do nothing if the link already exists and points to the correct file. It will throw std::system_error if a file already exists at linkname, it is not a symlink already pointing to the target path, and the overwrite flag was not set. On Windows this will always fail unless the may_copy flag is set.

void Path::move_to(const Path& dst, flag_type flags = no_flags) const;

Move a file from the current path to the destination path. If the may_copy flag is set, this will attempt to copy the file and then delete the original if the move fails (this may leave two copies of a file if the operation fails partway through). If the overwrite flag is set, this will replace any existing file at the destination.

This will do nothing if the source and destination are the same path. It will throw std::system_error if the source file does not exist, or if the destination already exists and overwrite was not set.

void Path::remove(flag_type flags = no_flags) const;

Delete a file. This will do nothing if the file already does not exist. If the recurse flag is set, this will behave like like rm -rf and delete directories recursively.

This will throw std::system_error if the path refers to a non-empty directory and the recurse flag is not used, if the caller does not have permission to remove the file, or in some circumstances, if the file is in use by another process.

File system query/update functions

Path::time_point Path::access_time(flag_type flags = no_flags) const noexcept;
Path::time_point Path::create_time(flag_type flags = no_flags) const noexcept;
Path::time_point Path::modify_time(flag_type flags = no_flags) const noexcept;
Path::time_point Path::status_time(flag_type flags = no_flags) const noexcept;
void Path::set_access_time(flag_type flags = no_flags) const;
void Path::set_access_time(Path::time_point t,
    flag_type flags = no_flags) const;
void Path::set_create_time(flag_type flags = no_flags) const;
void Path::set_create_time(Path::time_point t,
    flag_type flags = no_flags) const;
void Path::set_modify_time(flag_type flags = no_flags) const;
void Path::set_modify_time(Path::time_point t,
    flag_type flags = no_flags) const;

Query or modify the file’s time metadata. The four possible time properties are detailed below, along with which operations are possible on each operating system. The query functions will return the epoch if the file does not exist or is not accessible. If the no_follow flag is set, and the path refers to a symlink, these will operate (if possible) on the symlink rather than the target file.

Property Interpretation Posix Linux Mac Windows
Access time File was last read atime Read/write Read/write Read/write
Create time File was created N/A Not supported Read only Read/write
Modify time File was last modified mtime Read/write Read/write Read/write
Status time Metadata was last modified ctime Read only Read only Not supported

The update functions that do not take a time argument will set it to the current time. If the no_follow flag is set, and the path refers to a symlink, these will operate (if possible) on the symlink rather than the target file.

The update functions will throw std::system_error if the file does not exist, if the calling process does not have permission to modify the requested time field, or if that field is not defined, or not modifiable, on this operating system.

No known operating system supports explicitly setting the file metadata modification time. The set_status_time() functions exist but will always throw std::system_error.

I/O functions

bool Path::is_binary(flag_type flags = no_flags) const;
bool Path::is_text(flag_type flags = no_flags) const;

Try to determine whether this is a text or binary file. This is a best guess based on reading a limited amount of data from the file, and should not be relied on to be guaranteed correct. It does not attempt to detect UTF-16, UTF-32, or any other wide encoding; only 8-bit text is checked for. Both functions will return false if the file is empty, does not exist, or is not a regular file.

Symlinks are followed by default. If the no_follow flag is set, symlinks will always return false for both functions.

std::string Path::load(size_t maxlen = npos,
    flag_type flags = no_flags) const;
void Path::load(std::string& content, size_t maxlen = npos,
    flag_type flags = no_flags) const;

Read the contents of a file into a string. Optionally, a maximum number of bytes can be specified. In the second version, by default this will overwrite the string’s former contents; set the append flag to append instead (the append flag has no effect on the first version). If the may_fail flag is set, this will return an empty string (or append nothing) if the file does not exist or a read error occurs. If the stdio flag is set, this will read from standard input if the path is an empty string or "-".

If the may_fail flag is not set, this will throw std::system_error if the file does not exist or an I/O error occurs.

void Path::save(std::string_view content, flag_type flags = no_flags) const;

Writes the contents of a string to a file. If the append flag is set and the file already exists, the new data will be appended to it; otherwise, if the overwrite flag is set, it will be overwritten. If the stdio flag is set, this will write to standard output if the path is an empty string or "-".

This will throw std::system_error if the file already exists and neither append nor overwrite are set, or if an I/O error occurs. It will throw std::invalid_argument if both append and overwrite are set.

Process state functions

void Path::change_directory() const;
static Path Path::current_directory();

Query or set the calling process’s current working directory. Remember that the CWD concept is process-global, so neither of these functions is thread safe. These will throw std::system_error if the underlying system API fails.