File system abstraction with implementations for GCP GCS, AWS S3, Azure, SMB, HTTP, and Local file systems. Provides atomic primitives enabling multiple readers and writers.
- LocalFileSystem employs content hashing to approximate GCS Object Versioning.
- GoogleCloudFileSystem provides consistent parallel access paterns.
- S3FileSystem provides basic file system primitives.
- SMBFileSystem provides basic file system primitives.
- HTTPFileSystem provides a basic HTTP file system.
Provides file format implementations for:
- Lines
- CSV (via csv)
- JSON, ND-JSON / JSONL (via JSONStream and ndjson)
- Parquet including
streamingParquetcodec and parquetjs. - TFRecord including tfrecord-stream.
Additionally provides sharding & merging utilities.
The FileSystem implementations require peer dependencies:
- AnyFileSystem: None. URL resolution as a
FileSystem. Files have URLs and HTTP is a file system. - AzureBlobStorageFileSystem:
@azure/storage-bloband@azure/identity - AzureFileShareFileSystem:
@azure/storage-file-share - GoogleCloudFileSystem:
@google-cloud/storage - HTTPFileSystem:
axios - LocalFileSystem:
fs-ext,glob, andglob-stream - S3FileSystem:
aws-sdk,s3-stream-upload, andathena-express - SMBFileSystem:
@marsaud/smb2
Built with the tree-stream primitives ReadableStreamTree and WritableStreamTree.
The project started to support @wholebuzz/archive, a terabyte-scale archive for GCS. The focus has since expanded to include powering dbcp and @wholebuzz/mapreduce with a collection of file system implementations under a common interface. The atomic primitives are only available for Google Cloud Storage and local.
import { AnyFileSystem } from '@wholebuzz/fs/lib/fs' import { GoogleCloudFileSystem } from '@wholebuzz/fs/lib/gcp' import { HTTPFileSystem } from '@wholebuzz/fs/lib/http' import { LocalFileSystem } from '@wholebuzz/fs/lib/local' import { S3FileSystem } from '@wholebuzz/fs/lib/s3' import { readJSON, writeJSON } from '@wholebuzz/fs/lib/json' const httpFileSystem = new HTTPFileSystem() const fs = new AnyFileSystem([ { urlPrefix: 'gs://', fs: new GoogleCloudFileSystem() }, { urlPrefix: 's3://', fs: new S3FileSystem() }, { urlPrefix: 'http://', fs: httpFileSystem }, { urlPrefix: 'https://', fs: httpFileSystem }, { urlPrefix: '', fs: new LocalFileSystem() }, ]) await writeJSON(fs, 's3://bucket/file', { foo: 'bar' }) const foobar = await readJSON(fs, 's3://bucket/file')
node lib/cli.js ls .
node lib/cli.js --help- appendToFile
- copyFile
- createFile
- ensureDirectory
- fileExists
- getFileStatus
- moveFile
- openReadableFile
- openWritableFile
- queueRemoveFile
- readDirectory
- readDirectoryStream
- removeDirectory
- removeFile
- replaceFile
+ new FileSystem(): FileSystem
Returns: FileSystem
▸ Abstract appendToFile(urlText: string, writeCallback: (stream: WritableStreamTree) => Promise<boolean>, createCallback?: (stream: WritableStreamTree) => Promise<boolean>, createOptions?: CreateOptions, appendOptions?: AppendOptions): Promise<null | FileStatus>
Appends to the file, safely. Either writeCallback or createCallback is called.
For simple appends, the same paramter can be supplied for both writeCallback and
createCallback.
| Name | Type | Description |
|---|---|---|
urlText |
string | The URL of the file to append to. |
writeCallback |
(stream: WritableStreamTree) => Promise<boolean> |
Stream callback for appending to the file. |
createCallback? |
(stream: WritableStreamTree) => Promise<boolean> |
Stream callback for initializing the file, if necessary. |
createOptions? |
CreateOptions | Initial metadata for initializing the file, if necessary. |
appendOptions? |
AppendOptions | - |
Returns: Promise<null | FileStatus>
Defined in: src/fs.ts:209
▸ Abstract copyFile(sourceUrlText: string, destUrlText: string): Promise<boolean>
Copies the file.
| Name | Type | Description |
|---|---|---|
sourceUrlText |
string | The URL of the source file to copy. |
destUrlText |
string | The destination URL to copy the file to. |
Returns: Promise<boolean>
Defined in: src/fs.ts:178
▸ Abstract createFile(urlText: string, createCallback?: (stream: WritableStreamTree) => Promise<boolean>, options?: CreateOptions): Promise<boolean>
Creates file, failing if the file already exists.
| Name | Type | Description |
|---|---|---|
urlText |
string | The URL of the file to create. |
createCallback? |
(stream: WritableStreamTree) => Promise<boolean> |
Stream callback for initializing the file. |
options? |
CreateOptions | - |
Returns: Promise<boolean>
Defined in: src/fs.ts:155
▸ Abstract ensureDirectory(urlText: string, options?: EnsureDirectoryOptions): Promise<boolean>
Ensures the directory exists
| Name | Type | Description |
|---|---|---|
urlText |
string | The URL of the directory. |
options? |
EnsureDirectoryOptions | - |
Returns: Promise<boolean>
Defined in: src/fs.ts:109
▸ Abstract fileExists(urlText: string): Promise<boolean>
Returns true if the file exists.
| Name | Type | Description |
|---|---|---|
urlText |
string | The URL of the file to check whether exists. |
Returns: Promise<boolean>
Defined in: src/fs.ts:121
▸ Abstract getFileStatus(urlText: string, options?: GetFileStatusOptions): Promise<FileStatus>
Determines the file status. The file version is used to implement atomic mutations.
| Name | Type | Description |
|---|---|---|
urlText |
string | The URL of the file to retrieve the status for. |
options? |
GetFileStatusOptions | - |
Returns: Promise<FileStatus>
Defined in: src/fs.ts:127
▸ Abstract moveFile(sourceUrlText: string, destUrlText: string): Promise<boolean>
Moves the file.
| Name | Type | Description |
|---|---|---|
sourceUrlText |
string | The URL of the source file to copy. |
destUrlText |
string | The destination URL to copy the file to. |
Returns: Promise<boolean>
Defined in: src/fs.ts:185
▸ Abstract openReadableFile(url: string, options?: OpenReadableFileOptions): Promise<ReadableStreamTree>
Opens a file for reading.
optional version Fails if version doesn't match for GCS URLs.
| Name | Type | Description |
|---|---|---|
url |
string | The URL of the file to read from. |
options? |
OpenReadableFileOptions | - |
Returns: Promise<ReadableStreamTree>
Defined in: src/fs.ts:134
▸ Abstract openWritableFile(url: string, options?: OpenWritableFileOptions): Promise<WritableStreamTree>
Opens a file for writing.
optional version Fails if version doesn't match for GCS URLs.
| Name | Type | Description |
|---|---|---|
url |
string | The URL of the file to write to. |
options? |
OpenWritableFileOptions | - |
Returns: Promise<WritableStreamTree>
Defined in: src/fs.ts:144
▸ Abstract queueRemoveFile(urlText: string): Promise<boolean>
Queues deletion, e.g. after DaysSinceCustomTime.
| Name | Type | Description |
|---|---|---|
urlText |
string | The URL of the file to remove. |
Returns: Promise<boolean>
Defined in: src/fs.ts:171
▸ Abstract readDirectory(urlText: string, options?: ReadDirectoryOptions): Promise<DirectoryEntry[]>
Returns the URLs of the files in a directory.
| Name | Type | Description |
|---|---|---|
urlText |
string | The URL of the directory to list files in. |
options? |
ReadDirectoryOptions | - |
Returns: Promise<DirectoryEntry[]>
Defined in: src/fs.ts:94
▸ Abstract readDirectoryStream(urlText: string, options?: ReadDirectoryOptions): Promise<ReadableStreamTree>
Returns a stream of the URLs of the files in a directory.
| Name | Type | Description |
|---|---|---|
urlText |
string | The URL of the directory to list files in. |
options? |
ReadDirectoryOptions | - |
Returns: Promise<ReadableStreamTree>
Defined in: src/fs.ts:100
▸ Abstract removeDirectory(urlText: string, options?: RemoveDirectoryOptions): Promise<boolean>
Removes the directory
| Name | Type | Description |
|---|---|---|
urlText |
string | The URL of the directory. |
options? |
RemoveDirectoryOptions | - |
Returns: Promise<boolean>
Defined in: src/fs.ts:115
▸ Abstract removeFile(urlText: string): Promise<boolean>
Deletes the file.
| Name | Type | Description |
|---|---|---|
urlText |
string | The URL of the file to remove. |
Returns: Promise<boolean>
Defined in: src/fs.ts:165
▸ Abstract replaceFile(urlText: string, writeCallback: (stream: WritableStreamTree) => Promise<boolean>, options?: ReplaceFileOptions): Promise<boolean>
Replaces the file, failing if the file version doesn't match.
| Name | Type | Description |
|---|---|---|
urlText |
string | The URL of the file to replace. |
writeCallback |
(stream: WritableStreamTree) => Promise<boolean> |
Stream callback for replacing the file. |
options? |
ReplaceFileOptions | - |
Returns: Promise<boolean>
Defined in: src/fs.ts:194 @wholebuzz/fs / Exports / json
- newJSONLinesFormatter
- newJSONLinesParser
- parseJSON
- parseJSONLines
- pipeJSONFormatter
- pipeJSONLinesFormatter
- pipeJSONLinesParser
- pipeJSONParser
- readJSON
- readJSONHashed
- readJSONLines
- serializeJSON
- serializeJSONLines
- writeJSON
- writeJSONLines
- writeShardedJSONLines
• Const JSONStream: any
Defined in: src/json.ts:11
▸ Const newJSONLinesFormatter(): Transform
Returns: Transform
Defined in: src/json.ts:146
▸ Const newJSONLinesParser(): ThroughStream
Returns: ThroughStream
Defined in: src/json.ts:147
▸ parseJSON(stream: ReadableStreamTree): Promise<unknown>
Parses JSON object from [[stream]]. Used to implement readJSON.
| Name | Type | Description |
|---|---|---|
stream |
ReadableStreamTree | The stream to read a JSON object from. |
Returns: Promise<unknown>
Defined in: src/json.ts:72
▸ parseJSONLines(stream: ReadableStreamTree): Promise<unknown[]>
Parses JSON object from [[stream]]. Used to implement readJSON.
| Name | Type | Description |
|---|---|---|
stream |
ReadableStreamTree | The stream to read a JSON object from. |
Returns: Promise<unknown[]>
Defined in: src/json.ts:80
▸ pipeJSONFormatter(stream: WritableStreamTree, isArray: boolean): WritableStreamTree
Create JSON formatter stream.
| Name | Type | Description |
|---|---|---|
stream |
WritableStreamTree | - |
isArray |
boolean | Accept array objects or property tuples. |
Returns: WritableStreamTree
Defined in: src/json.ts:127
▸ pipeJSONLinesFormatter(stream: WritableStreamTree): WritableStreamTree
Create JSON-lines formatter stream.
| Name | Type |
|---|---|
stream |
WritableStreamTree |
Returns: WritableStreamTree
Defined in: src/json.ts:142
▸ pipeJSONLinesParser(stream: ReadableStreamTree): ReadableStreamTree
Create JSON parser stream.
| Name | Type |
|---|---|
stream |
ReadableStreamTree |
Returns: ReadableStreamTree
Defined in: src/json.ts:119
▸ pipeJSONParser(stream: ReadableStreamTree, isArray: boolean): ReadableStreamTree
Create JSON parser stream.
| Name | Type |
|---|---|
stream |
ReadableStreamTree |
isArray |
boolean |
Returns: ReadableStreamTree
Defined in: src/json.ts:110
▸ readJSON(fileSystem: FileSystem, url: string): Promise<unknown>
Reads a serialized JSON object or array from a file.
| Name | Type | Description |
|---|---|---|
fileSystem |
FileSystem | - |
url |
string | The URL of the file to parse a JSON object or array from. |
Returns: Promise<unknown>
Defined in: src/json.ts:17
▸ readJSONHashed(fileSystem: FileSystem, url: string): Promise<[unknown, null | string]>
Reads a serialized JSON object from a file, and also hashes the file.
| Name | Type | Description |
|---|---|---|
fileSystem |
FileSystem | - |
url |
string | The URL of the file to parse a JSON object from. |
Returns: Promise<[unknown, null | string]>
Defined in: src/json.ts:25
▸ readJSONLines(fileSystem: FileSystem, url: string): Promise<unknown[]>
Reads a serialized JSON-lines array from a file.
| Name | Type | Description |
|---|---|---|
fileSystem |
FileSystem | - |
url |
string | The URL of the file to parse a JSON object or array from. |
Returns: Promise<unknown[]>
Defined in: src/json.ts:35
▸ serializeJSON(stream: WritableStreamTree, obj: object | any[]): Promise<boolean>
Serializes JSON object to [[stream]]. Used to implement writeJSON.
| Name | Type | Description |
|---|---|---|
stream |
WritableStreamTree | The stream to write a JSON object to. |
obj |
object | any[] | - |
Returns: Promise<boolean>
Defined in: src/json.ts:88
▸ serializeJSONLines(stream: WritableStreamTree, obj: any[]): Promise<boolean>
Serializes JSON object to [[stream]]. Used to implement writeJSONLines.
| Name | Type | Description |
|---|---|---|
stream |
WritableStreamTree | The stream to write a JSON object to. |
obj |
any[] | - |
Returns: Promise<boolean>
Defined in: src/json.ts:103
▸ writeJSON(fileSystem: FileSystem, url: string, value: object | any[]): Promise<boolean>
Serializes object or array to a JSON file.
| Name | Type | Description |
|---|---|---|
fileSystem |
FileSystem | - |
url |
string | The URL of the file to serialize a JSON object or array to. |
value |
object | any[] | The object or array to serialize. |
Returns: Promise<boolean>
Defined in: src/json.ts:44
▸ writeJSONLines(fileSystem: FileSystem, url: string, obj: object[]): Promise<boolean>
Serializes array to a JSON Lines file.
| Name | Type | Description |
|---|---|---|
fileSystem |
FileSystem | - |
url |
string | The URL of the file to serialize a JSON array to. |
obj |
object[] | - |
Returns: Promise<boolean>
Defined in: src/json.ts:53
▸ writeShardedJSONLines(fileSystem: FileSystem, url: string, obj: object[], shards: number, shardFunction?: (x: object, modulus: number) => number): Promise<boolean>
| Name | Type |
|---|---|
fileSystem |
FileSystem |
url |
string |
obj |
object[] |
shards |
number |
shardFunction |
(x: object, modulus: number) => number |
Returns: Promise<boolean>
Defined in: src/json.ts:57