vfs
Package vfs provides a pluggable, extensible, and opinionated set of file system
functionality for Go across a number of file system types such as os, Azure, S3, GCS
and SFTP.
Philosophy
When building our platform, initially we wrote a library that was something to
the effect of
if config.DISK == "S3" {
} else if config.DISK == "mock" {
} else {
}
Not only was ugly but because the behaviors of each "file system" were
different and we had to constantly alter the file locations and pass a bucket string (even
if the file system didn't know what a bucket was).
We found a handful of third-party libraries that were interesting but none of
them had everything we needed/wanted. Of particular inspiration was
https://github.com/spf13/afero in its composition of the super-powerful stdlib
io.* interfaces. Unfortunately, it didn't support Google Cloud Storage and there
was still a lot of passing around of strings and structs. Few, if any, of the
vfs-like libraries provided interfaces to easily and confidently create new
file system backends.
What we needed/wanted was the following(and more):
- self-contained set of structs that could be passed around like a file/dir handle
- the struct would represent an existing or nonexistent file/dir
- provide common (and only common) functionality across all file system so that after initialization, we don't care
what the underlying file system is and can therefore write our code agnostically/portably
- use io.* interfaces such as io.Reader and io.Writer without needing to call a separate function
- extensibility to easily add other needed file systems like Microsoft Azure Cloud File Storage
- prefer native atomic functions when possible (ie S3 to S3 moving would use the native move api call rather than
copy-delete)
- a uniform way of addressing files regardless of file system. This is why we use complete URI's in vfssimple
- fmt.Stringer interface so that the file struct passed to a log message (or other Stringer use) would show the URI
- mockable file system
- pluggability so that third-party implementations of our interfaces could be used
Install
Pre 1.17:
go get -u github.com/hibrid/vfs/v6
Post 1.17:
go install github.com/hibrid/vfs/v6
Upgrading
Upgrading from v5 to v6
With v6.0.0, sftp.Options struct changed to to accept an array of Key Exchange algorithms rather than a string. To update, change the syntax of the auth commands.
"keyExchanges":"diffie-hellman-group-a256"
becomes
"keyExchanges":["diffie-hellman-group-a256"]
Usage
We provide vfssimple as basic way of initializing file system backends (see each
implementations's docs about authentication). vfssimple pulls in every c2fo/vfs
backend. If you need to reduce the backend requirements (and app memory
footprint) or add a third party backend, you'll need to implement your own
"factory". See backend doc for more info.
You can then use those file systems to initialize locations which you'll be
referencing frequently, or initialize files directly
osFile, err := vfssimple.NewFile("file:///path/to/file.txt")
s3File, err := vfssimple.NewFile("s3://bucket/prefix/file.txt")
osLocation, err := vfssimple.NewLocation("file:///tmp/")
s3Location, err := vfssimple.NewLocation("s3://bucket/")
osTmpFile, err := osLocation.NewFile("anotherFile.txt")
You can perform a number of actions without any consideration for the system's api or implementation details.
osFileExists, err := osFile.Exists()
s3FileExists, err := s3File.Exists()
err = osFile.CopyToFile(s3File)
s3FileExists, err = s3File.Exists()
movedOsFile, err := osFile.MoveToLocation(osLocation)
osFileExists, err = osFile.Exists()
movedOsFileExists, err := movedOsFile.Exists()
s3FileUri := s3File.URI()
s3FileName := s3File.Name()
s3FilePath := s3File.Path()
File's io.* interfaces may be used directly:
reader := strings.NewReader("Clear is better than clever")
gsFile, err := vfssimple.NewFile("gs://somebucket/path/to/file.txt")
byteCount, err := io.Copy(gsFile, reader)
err := gsFile.Close()
Note: io.Copy() doesn't strictly define what happens if a reader is empty. This is complicated because io.Copy
will first delegate actual copying in the following:
- if the io.Reader also implements io.WriterTo, WriteTo() will do the copy
- if the io.Writer also implements io.ReaderFrom, ReadFrom() will do the copy
- finally, if neither 1 or 2, io.Copy will do it's own buffered copy
In case 3, and most implementations of cases 1 and 2, if reader is empty, Write() never gets called. What that means for
vfs is there is no way for us to ensure that an empty file does or doesn't get written on an io.Copy(). For instance
OS always creates a file, regardless of calling Write() whereas S3 must Write() and Close().
As such, vfs cannot guarantee copy behavior except in our own CopyToFile, MoveToFile, CopyToLocation, and MoveToLocation
functions. If you need to ensure a file gets copied/moved with io.Copy(), you must do so yourself OR use vfs's utils.TouchCopy
Third-party Backends
Feel free to send a pull request if you want to add your backend to the list.
See also:
Ideas
Things to add:
- Provide better List() functionality with more abstracted filtering and paging (iterator?) Return File structs vs URIs?
- Add better/any context.Context() support
Contributors
Brought to you by the Enterprise Pipeline team at C2FO:
https://github.com/c2fo/
Contributing
1. Fork it (<https://github.com/c2fo/vfs/fork>)
2. Create your feature branch (`git checkout -b feature/fooBar`)
3. Commit your changes (`git commit -am 'Add some fooBar'`)
4. Push to the branch (`git push origin feature/fooBar`)
5. Create a new Pull Request
License
Distributed under the MIT license. See `http://github.com/c2fo/vfs/License.md
for more information.
Definitions
absolute path
- A path is said to be absolute if it provides the entire context
need to find a file, including the file system root. An absolute path must
begin with a slash and may include . and .. directories.
file path
- A file path ends with a filename and therefore may not end with a slash. It may be relative or absolute.
location path
- A location/directory path must end with a slash. It may be relative or absolute.
relative path
- A relative path is a way to locate a directory or file relative to
another directory. A relative path may not begin with a slash but may include .
and .. directories.
URI
- A Uniform Resource Identifier (URI) is a string of characters that
unambiguously identifies a particular resource. To guarantee uniformity, all
URIs follow a predefined set of syntax rules, but also maintain extensibility
through a separately defined hierarchical naming scheme (e.g. http://).
Interfaces
type File
type File interface {
io.Closer
io.Reader
io.Seeker
io.Writer
fmt.Stringer
Exists() (bool, error)
Location() Location
CopyToLocation(location Location) (File, error)
CopyToFile(file File) error
MoveToLocation(location Location) (File, error)
MoveToFile(file File) error
Delete() error
LastModified() (*time.Time, error)
Size() (uint64, error)
Path() string
Name() string
Touch() error
URI() string
}
File represents a file on a file system. A File may or may not actually exist on
the file system.
type FileSystem
type FileSystem interface {
NewFile(volume string, absFilePath string) (File, error)
NewLocation(volume string, absLocPath string) (Location, error)
Name() string
Scheme() string
Retry() Retry
}
FileSystem represents a file system with any authentication accounted for.
type Location
type Location interface {
fmt.Stringer
List() ([]string, error)
ListByPrefix(prefix string) ([]string, error)
ListByRegex(regex *regexp.Regexp) ([]string, error)
Volume() string
Path() string
Exists() (bool, error)
NewLocation(relLocPath string) (Location, error)
ChangeDir(relLocPath string) error
FileSystem() FileSystem
NewFile(relFilePath string) (File, error)
DeleteFile(relFilePath string) error
URI() string
}
Location represents a file system path which serves as a start point for
directory-like functionality. A location may or may not actually exist on the
file system.
type Options
type Options interface{}
Options are structs that contain various options specific to the file system
type Retry
type Retry func(wrapped func() error) error
Retry is a function that can be used to wrap any operation into a definable
retry operation. The wrapped argument is called by the underlying VFS
implementation.
Ex:
var retrier Retry = func(wrapped func() error) error {
var ret error
for i := 0; i < 5; i++ {
if err := wrapped(); err != nil { ret = err; continue }
}
return ret
}
func DefaultRetryer
func DefaultRetryer() Retry
DefaultRetryer returns a no-op retryer which simply calls the wrapped command
without looping.