module Plasma_client:Client access to the Plasma Filesystemsig..end
Plasmafs_protocol. It
explains all background concepts of the PlasmaFS protocol._e. There is always a
"normal", i.e. synchronous variant not returning engines
computing the result, but directly the result. The engines make
it possible to send queries asynchronously. For more information
about engines, see the module Uq_engines of Ocamlnet. It is
generally not possible to use the client in a synchronous way when
an engine is still running.Plasma_rpcapi_aux,
especially
type plasma_cluster
type plasma_trans
type inode = int64
type errno = [ `eaccess
| `ebadpath
| `econflict
| `ecoord
| `eexist
| `efailed
| `efailedcommit
| `efbig
| `efhier
| `einval
| `eio
| `eisdir
| `elongtrans
| `eloop
| `enametoolong
| `enoent
| `enonode
| `enospc
| `enotdir
| `enotempty
| `enotrans
| `eperm
| `erofs
| `estale
| `etbusy ]
type topology = [ `Chain | `Star ]
copy_intype copy_in_flags = [ `Late_datasync | `No_datasync ]
copy_intype copy_out_flags = [ `No_truncate ]
copy_outexception Plasma_error of errno
exception Cluster_down of string
val open_cluster : string ->
(string * int) list -> Unixqueue.event_system -> plasma_cluster open_cluster name namenodes: Opens the cluster with these namenodes
(given as (hostname,port) pairs). The client automatically
determines which is the coordinator.val open_cluster_cc : Plasma_client_config.client_config ->
Unixqueue.event_system -> plasma_cluster Plasma_client_config.client_config object which
can in turn be obtained via Plasma_client_config.get_config.val event_system : plasma_cluster -> Unixqueue.event_systemval sync : ('a -> 'b Uq_engines.engine) -> 'a -> 'bval dump_buffers : plasma_cluster -> unitval close_cluster : plasma_cluster -> unitval abort_cluster : plasma_cluster -> unitval cluster_name : plasma_cluster -> stringval cluster_namenodes : plasma_cluster -> (string * int) listopen_clusterval configure_buffer : plasma_cluster -> int -> unitconfigure_buffer c n: configures to use n buffers. Each buffer
is one block. These buffers are only used for buffered I/O, i.e.
for Plasma_client.read and Plasma_client.write, but not for
Plasma_client.copy_in and Plasma_client.copy_out.val configure_pref_nodes : plasma_cluster -> string list -> unitlocal_identities below), i.e. for enforcing that blocks
are allocated on the same machine, so far possible.val configure_shm_manager : plasma_cluster -> Plasma_shm.shm_manager -> unitval shm_manager : plasma_cluster -> Plasma_shm.shm_manager val blocksize_e : plasma_cluster -> int Uq_engines.engineval blocksize : plasma_cluster -> intval params_e : plasma_cluster -> (string * string) list Uq_engines.engineval params : plasma_cluster -> (string * string) listval fsstat_e : plasma_cluster -> Plasma_rpcapi_aux.fsstat Uq_engines.engineval fsstat : plasma_cluster -> Plasma_rpcapi_aux.fsstat val local_identities_e : plasma_cluster -> string list Uq_engines.engineval local_identities : plasma_cluster -> string listconfigure_pref_nodes)On the filesystem level, the client can take over any user ID independent of what ID was used on the RPC level. If "proot" is the RPC user, one can just become any filesystem user without credentials. If "pnobody" is the RPC user, one needs an authentication ticket to become a certain user on the filesystem level.
There are two ways of getting a ticket:
Plasma_client.get_auth_ticket in a session one can obtain
a ticket, and use it in further sessions to become again the same
user (via Plasma_client.impersonate).val configure_auth : plasma_cluster ->
string -> string -> (string -> string) -> unitconfigure_auth c nn_user dn_user get_password: Configures that accesses
to the namenode are authenticated on the RPC level as nn_user and
accesses to datanodes
are authenticated as dn_user. The function get_password is called
to obtain the password for a user.
nn_user can be set to "proot" or "pnobody".
dn_user is normally set to "pnobody".
This type of authentication does not imply any impersonation on the
filesystem level. One should run Plasma_client.impersonate to
set something.
val configure_auth_daemon : plasma_cluster -> unit
This mode also impersonates on the filesystem level.
val configure_auth_ticket : plasma_cluster -> string -> unit
This mode also impersonates on the filesystem level.
val impersonate_e : plasma_cluster ->
string -> string -> string list -> string option -> unit Uq_engines.engineval impersonate : plasma_cluster ->
string -> string -> string list -> string option -> unitimpersonate c user group supp_groups ticket:
Become the user on the filesystem level. The main group is group,
and the supplementary groups are supp_groups.
The ticket is necessary when the new privileges are not implied
by the existing privileges (i.e one can only give up rights when
a ticket is lacking). Examples when a ticket is not required:
impersonate.user does not change, but only the main group is set
to a different member of the supplementary groups.Plasma_client.get_auth_ticket.
impersonate must not be used inside transactions.
val get_auth_ticket_e : plasma_cluster -> string -> string Uq_engines.engineval get_auth_ticket : plasma_cluster -> string -> stringget_auth_ticket user: Get an authentication ticket for this userval current_user_e : plasma_cluster ->
(string * string * string list) Uq_engines.engineval current_user : plasma_cluster -> string * string * string listlet (user,group,supp_groups) = current_user c: Get the identity
of the current client principal on the filesystem level.
Indicates `efailed if no impersonation has been done yet.
val configure_default_user_group : plasma_cluster -> string -> string -> unitconfigure_default_user_group c user group: Normally, new files are
created as the user and group corresponding to the current
impersonation. If privileges permit it, this can changed here so that
files are created as user and group. Each string can be empty,
in which case the value is taken from the impersonation.
This is especially useful if one authenticates as "proot" and does not do any impersonation, i.e. the superuser privileges are still in effect. Another use is to create files with a group that is different from the main group of the current impersonation.
This affects not only files, but also new directories and symlinks.
plasma_trans value as argument must be
run inside a transaction. This means one has to first call start
to open the transaction, call then the functions covered by the
transaction, and then either commit or abort.
It is allowed to open several transactions simultaneously.
If you use the engine-based interface, it is important to
ensure that the next function in a transaction can first be
called when the current function has responded the result.
This restriction is only valid in the same transaction -
other transactions are totally independent in this respect.
val start_e : plasma_cluster -> plasma_trans Uq_engines.engineval start : plasma_cluster -> plasma_trans val commit_e : plasma_trans -> unit Uq_engines.engineval commit : plasma_trans -> unitval abort_e : plasma_trans -> unit Uq_engines.engineval abort : plasma_trans -> unitval cluster : plasma_trans -> plasma_cluster val create_inode_e : plasma_trans ->
Plasma_rpcapi_aux.inodeinfo -> inode Uq_engines.engineval create_inode : plasma_trans ->
Plasma_rpcapi_aux.inodeinfo -> inode
At the end of the transaction inodes are automatically deleted that
do not have a name. Use link_e to assign names (below).
See also Plasma_client.create_file below, which immediately
links the inode to a name. See also Plasma_client.regular_ii,
Plasma_client.dir_ii, and Plasma_client.symlink_ii for
how to create inodeinfo values.
val delete_inode_e : plasma_trans -> inode -> unit Uq_engines.engineval delete_inode : plasma_trans -> inode -> unitval get_inodeinfo_e : plasma_trans ->
inode -> Plasma_rpcapi_aux.inodeinfo Uq_engines.engineval get_inodeinfo : plasma_trans ->
inode -> Plasma_rpcapi_aux.inodeinfo val get_cached_inodeinfo_e : plasma_cluster ->
inode -> bool -> Plasma_rpcapi_aux.inodeinfo Uq_engines.engineval get_cached_inodeinfo : plasma_cluster ->
inode -> bool -> Plasma_rpcapi_aux.inodeinfo
The bool argument can be set to true to enforce that the
newest version is retrieved. However, there is no guarantee that
the returned version is still the newest one when this function
returns.
Note that get_inodeinfo also implicitly refreshes the cache when
the transaction is (still) only used for read accesses.
The returned inodeinfo does not include modifications caused by
block writes that were not yet flushed to disk.
val set_inodeinfo_e : plasma_trans ->
inode -> Plasma_rpcapi_aux.inodeinfo -> unit Uq_engines.engineval set_inodeinfo : plasma_trans ->
inode -> Plasma_rpcapi_aux.inodeinfo -> unitval truncate_e : plasma_trans ->
inode -> int64 -> unit Uq_engines.engineval truncate : plasma_trans -> inode -> int64 -> unitcopy_in writes a local file to the cluster. copy_out
reads a file from the cluster and copies it into a local file.
Especially copy_in works only in units of whole blocks. The
function never reads a block from the filesystem, modifies it,
and writes it back. Instead, it writes the block with the data it
has, and if there is still space to fill, it pads the block with
zero bytes. If you need support for updating parts of a block
only, better use the buffered access below.
val copy_in_e : ?flags:copy_in_flags list ->
plasma_cluster ->
inode ->
int64 ->
Unix.file_descr -> int64 -> topology -> int64 Uq_engines.engineval copy_in : ?flags:copy_in_flags list ->
plasma_cluster ->
inode ->
int64 -> Unix.file_descr -> int64 -> topology -> int64copy_in_e c inode pos fd len: Copies the data from the file descriptor
fd to the file given by inode. The data is taken from the current
position of the descriptor. Up to len bytes are copied. The data
is written to position pos of the file referenced by the inode. If
it is written past the EOF position of the destination file, the EOF
position is advanced. The function returns the number of copied
bytes.
For seekable descriptors, len specifies the exact number of bytes
to copy. If the input file is shorter, null bytes are appended to
the file until len is reached.
For non-seekable descriptors, an additional buffer needs to be
allocated. Also, len is ignored for non-seekable descriptors -
data is always copied until EOF is seen. (However, in the
future this might be changed. It is better to pass
Int64.max_int as len if unlimited copying is required.)
topology says how to transfer data from the client to the data nodes.
`Star means the client organizes the writes to the data nodes as
independent streams. `Chain means that the data is first written to
one of the data nodes, and the replicas are transferred from there to
the next data node.
flags:
`No_datasync: Data blocks are not synchronized to disk`Late_datasync: Only the last block is synchronized to disk.
This also includes are preceding blocks. If an error occurs, though,
nothing is guaranteed.copy_in commits, all blocks are guaranteed to be on disk.
Limitation: pos must be a multiple of the blocksize. The file
is written in units of the blocksize (i.e. blocks are never partially
updated).
copy_in performs its operations always in separate transactions.
val copy_in_from_buf_e : ?flags:copy_in_flags list ->
plasma_cluster ->
inode ->
int64 ->
Netsys_mem.memory -> int -> topology -> int Uq_engines.engineval copy_in_from_buf : ?flags:copy_in_flags list ->
plasma_cluster ->
inode ->
int64 -> Netsys_mem.memory -> int -> topology -> intcopy_in_from_buf c inode pos buf len: Copies the data from
buf to the file denoted by inode. The data is taken from the
beginning of buf, and the length is given by len. The data is
written to position pos of inode.
copy_in_from_buf works much in the same way as copy_in, only
that the data is taken from a buffer and not from a file descriptor.
val copy_out_e : ?flags:copy_out_flags list ->
plasma_cluster ->
inode ->
int64 -> Unix.file_descr -> int64 -> int64 Uq_engines.engineval copy_out : ?flags:copy_out_flags list ->
plasma_cluster ->
inode -> int64 -> Unix.file_descr -> int64 -> int64copy_out_e c inode pos fd len Copies the data from the file referenced
by inode to file descriptor fd. The data is taken from position
pos to pos+len-1 of the file, and it is written to the current
position of fd. The number of copied bytes is returned.
Seekable output files may only be extended, but are never truncated.
For non-seekable descriptors, an additional buffer needs to be allocated.
If there are holes in the input file, the corresponding byte region is filled with zero bytes in the output. If it is tried to read past EOF, this is not prevented, but handled as if the region past EOF was a file hole.
Limitation: pos must be a multiple of the blocksize.
copy_out performs its operations always in separate transactions.
Flags:
`No_truncate: The descriptor fd is not truncated to the real
file sizeval copy_out_to_buf_e : ?flags:copy_out_flags list ->
plasma_cluster ->
inode ->
int64 -> Netsys_mem.memory -> int -> int Uq_engines.engineval copy_out_to_buf : ?flags:copy_out_flags list ->
plasma_cluster ->
inode -> int64 -> Netsys_mem.memory -> int -> intcopy_out_to_buf_e c inode pos buf len Copies the data from the
file denoted by inode to the buffer buf. The data is taken from
position
pos to pos+len-1 of the file, and it is written to the beginning
of buf.Plasma_client.configure_buffer.type strmem = [ `Memory of Netsys_mem.memory | `String of string ]
read and write can be given as string or as bigarray
(memory). The latter is advantageous, because there are some
optimizations that are only applicable to bigarrays.val read_e : ?lazy_validation:bool ->
plasma_cluster ->
inode ->
int64 ->
strmem ->
int -> int -> (int * bool * Plasma_rpcapi_aux.inodeinfo) Uq_engines.engineval read : ?lazy_validation:bool ->
plasma_cluster ->
inode ->
int64 ->
strmem ->
int -> int -> int * bool * Plasma_rpcapi_aux.inodeinfo read_e c inode pos s spos len: Reads data from inode, and returns
(n,eof,ii) where n is the number of read bytes, and eof the indicator
that EOF was reached. This number n may be less than len only
if EOF is reached. ii is the current inodeinfo.
Before a read is responded from a clean buffer it is checked whether
the buffer is still up to date.
By default, read updates the metadata from the namenode before starting
any transaction. By setting lazy_validation, one can demand a different
mode, where these updates can be delayed by a short period of time
(useful when several reads are done in sequence).
type read_request = int64 * strmem * int * int
type read_response = int * bool * Plasma_rpcapi_aux.inodeinfo
type multi_read_task = (read_request * (read_response -> unit)) option
Uq_engines.engine
val multi_read_e : ?lazy_validation:bool ->
plasma_cluster ->
inode ->
multi_read_task Stream.t -> unit Uq_engines.enginemulti_read_e c inode stream: This version of read allows it
to read multiple times from the same file. All reads are done in
the same transaction.
The function gets the next task from stream when the previous
task is done (if any). A task is always an engine which results
either in None (ending the stream), or in Some(req,pass_resp).
The request req = (pos, s, spos, len) says from where to take
the data and where to store it (like in read_e). The
response resp = (n,eof,ii) is the argument of pass_resp.
val write_e : plasma_cluster ->
inode ->
int64 -> strmem -> int -> int -> int Uq_engines.engineval write : plasma_cluster ->
inode -> int64 -> strmem -> int -> int -> intwrite_e c inode pos s spos len: Writes data to inode and returns
the number of written bytes. This number n may be less than len for
arbitrary reasons (unlike read - to be fixed).
A write that is not aligned to a block implies that the old version
of the block is read first (if not available in a buffer). This is
a big performance penalty, and best avoided.
It is not ensured that the write is completed when the return value
becomes available. The write is actually done in the background,
and can be explicitly triggered with the flush_e operation. Also,
note that the write happens in a separate transaction. (With
"background" we do not mean a separate kernel thread, but an
execution thread modeled with engines.)
Writing also triggers that the EOF position is at least set to the
position after the last written position. However, this is first
done when the blocks are flushed in the background. (Use get_write_eof
to get this value immediately, before flushing.)
As writing happens in the background, some special attention has to be
paid for the way errors are reported. At the first error the write thread
stops, and an error code is set. This code is reported at the next
write or flush. After being reported, the code is cleared again.
Writing is not automatically resumed - only further write and
flush invocations will restart the writing thread. Also, the
data buffers are kept intact after errors - so everything will be
again tried to be written (which may run into the same error).
The function drop_inode can be invoked to drop all dirty buffers
of the inode in the near future.
val get_write_eof : plasma_cluster -> inode -> int64Not_found if nothing is known.val get_write_mtime : plasma_cluster -> inode -> Plasma_rpcapi_aux.time Not_found if nothing is known.val flush_e : plasma_cluster ->
inode -> int64 -> int64 -> unit Uq_engines.engineval flush : plasma_cluster -> inode -> int64 -> int64 -> unitflush_e inode pos len: Flushes all buffered data of inode from
pos to pos+len-1, or to the end of the file if len=0. This
ensures that data is really written.val drop_inode : plasma_cluster -> inode -> unitval flush_all_e : plasma_cluster -> unit Uq_engines.engineval flush_all : plasma_cluster -> unitval snapshot_e : ?append:bool -> plasma_trans -> int64 -> unit Uq_engines.engineval snapshot : ?append:bool -> plasma_trans -> int64 -> unitsnapshot trans inode: Takes a snapshot of the file, affecting
buffered reads and writes, and a few other functions. Reads and
writes use now the transaction trans instead of creating
transactions automatically. Also, the block list is completely
buffered up. The main effects:
trans are visible. Taking a snapshot
is an atomic operation.trans is committed (by calling commit)
the changes are made permanent (atomically).There are a few other effects of the snapshot mode:
trans is aborted, the dirty buffers are also dropped.trans is committed, the buffers for inode remain intact,
of course, because they reflect now the latest state of the file.
Note that it is strongly recommended to flush the buffers
before committing.ECONFLICT if there are
other transactions writing to the same file.trans is either committed or aborted.
The following functions also see/modify the snapshot if trans is used:
get_inodeinfoset_inodeinfotruncateget_write_eofget_write_mtimeflushflush_all
The append flag enables an optimization if new data is only
appended to the file. In this case, it is sufficient to take
only a snapshot of the last block of the file, because the
previous blocks can be considered as immutable.
val lookup_e : plasma_trans ->
string -> bool -> inode Uq_engines.engineval lookup : plasma_trans -> string -> bool -> inode
The bool says whether to keep the last component of the path
as symbolic link (lstat semantics).
val dir_lookup_e : plasma_trans ->
inode ->
string -> bool -> inode Uq_engines.engineval dir_lookup : plasma_trans ->
inode -> string -> bool -> inode If the filename is absolute the inode number is ignored.
The bool says whether to keep the last component of the path as symbolic link (lstat semantics).
dir_lookup trans inode "" _ is legal and just returns inode.
val rev_lookup_e : plasma_trans ->
inode -> string list Uq_engines.engineval rev_lookup : plasma_trans -> inode -> string listval rev_lookup_dir_e : plasma_trans -> inode -> string Uq_engines.engineval rev_lookup_dir : plasma_trans -> inode -> string
It is possible to get an `econflict error when the lock requirement
cannot be satisfied.
val namelock_e : plasma_trans ->
inode -> string -> unit Uq_engines.engineval namelock : plasma_trans -> inode -> string -> unitnamelock trans dir name: Acquires an existence lock on the member
name of directory dir. name must not contain slashes.
A namelock prevents that the entry name of the directory dir
can be moved or deleted. This protection lasts until the end of
the transaction. If a concurrent transaction tries to move or
delete the file, it will get an `econflict error.
It is not allowed to lock a not yet existing entry.
It is not prevented that the directory dir is moved, and thus it
is possible that the absolute path of the protected file changes.
val link_count_e : plasma_trans -> inode -> int Uq_engines.engineval link_count : plasma_trans -> inode -> intval link_e : plasma_trans ->
string -> inode -> unit Uq_engines.engineval link : plasma_trans -> string -> inode -> unit
For directories there is the restriction that at most one name
may be linked with the inode.
val link_at_e : plasma_trans ->
inode ->
string -> inode -> unit Uq_engines.engineval link_at : plasma_trans ->
inode -> string -> inode -> unitlink_at trans dir_inode name inode: Adds the entry name into
the directory dir_inode and connects the entry with inode.
name must not contain slashes.val unlink_e : plasma_trans -> string -> unit Uq_engines.engineval unlink : plasma_trans -> string -> unit
This also works for directories! (They must be empty, of course.)
val unlink_at_e : plasma_trans ->
inode -> string -> unit Uq_engines.engineval unlink_at : plasma_trans -> inode -> string -> unitunlink_at trans dir_inode name: Removes the entry name from
the directory dir_inode. name must not contain slashes.val rename_e : plasma_trans -> string -> string -> unit Uq_engines.engineval rename : plasma_trans -> string -> string -> unitrename trans old_path new_path: Renames/moves the file or directory
identified by old_path to the location identified by new_path.
There must not be a file at new_path (i.e. you cannot move into
a directory).val rename_at_e : plasma_trans ->
inode ->
string -> inode -> string -> unit Uq_engines.engineval rename_at : plasma_trans ->
inode -> string -> inode -> string -> unitrename_at trans old_dir_inode old_name new_dir_inode new_name:
Moves the file old_name in old_dir_inode to the new location
which is given by new_name in new_dir_inode.
Neither old_name nor new_name must contain slashes.val list_inode_e : plasma_trans ->
inode -> (string * inode) list Uq_engines.engineval list_inode : plasma_trans ->
inode -> (string * inode) listval list_e : plasma_trans ->
string -> (string * inode) list Uq_engines.engineval list : plasma_trans -> string -> (string * inode) listval create_file_e : plasma_trans ->
string ->
Plasma_rpcapi_aux.inodeinfo -> inode Uq_engines.engineval create_file : plasma_trans ->
string -> Plasma_rpcapi_aux.inodeinfo -> inode `ftype_regular or `ftype_symlink.val mkdir_e : plasma_trans ->
string ->
Plasma_rpcapi_aux.inodeinfo -> inode Uq_engines.engineval mkdir : plasma_trans ->
string -> Plasma_rpcapi_aux.inodeinfo -> inode val regular_ii : plasma_cluster -> int -> Plasma_rpcapi_aux.inodeinfo regular_ii c mode: Creates an inodeinfo record for a new empty
regular file, where the mode field is set to mode modulo
the current maskval symlink_ii : plasma_cluster -> string -> Plasma_rpcapi_aux.inodeinfo regular_ii c target: Creates an inodeinfo record for a symlink
pointing to targetval dir_ii : plasma_cluster -> int -> Plasma_rpcapi_aux.inodeinfo regular_ii c mode: Creates an inodeinfo record for a new
directory, where the mode field is set to mode modulo
the current maskval get_blocklist_e : plasma_trans ->
inode ->
int64 -> int64 -> bool -> Plasma_rpcapi_aux.blockinfo list Uq_engines.engineval get_blocklist : plasma_trans ->
inode ->
int64 -> int64 -> bool -> Plasma_rpcapi_aux.blockinfo listget_blocklist_e t inode block n keep_flag Returns the list of blocks for
blocks block to blocks+n-1. This is useful for analyzing where
the blocks are actually physically stored.
If keep_flag the blocks are protected for the duration of the
transaction.
val read_admin_table_e : plasma_cluster -> string -> string Uq_engines.engineval read_admin_table : plasma_cluster -> string -> stringread_admin_table_e key Returns the admin table key as text.
Possible keys: "passwd", "group".val write_admin_table_e : plasma_cluster -> string -> string -> unit Uq_engines.engineval write_admin_table : plasma_cluster -> string -> string -> unitwrite_admin_table_e key file: Sets the admin table key to file.
Possible keys: "passwd", "group".val read_ug_admin_e : plasma_cluster -> Plasma_ug.ug_admin Uq_engines.engineval read_ug_admin : plasma_cluster -> Plasma_ug.ug_admin Plasma_ug.ug_admin object.val write_ug_admin_e : plasma_cluster -> Plasma_ug.ug_admin -> unit Uq_engines.engineval write_ug_admin : plasma_cluster -> Plasma_ug.ug_admin -> unitval with_trans_e : plasma_cluster ->
(plasma_trans -> 'a Uq_engines.engine) -> 'a Uq_engines.engineval with_trans : plasma_cluster -> (plasma_trans -> 'a) -> 'awith_trans c f: Starts a new transaction t and runs f t.
The transaction is committed if f returns normally, and aborted
if f raises an exception.val retry_e : plasma_cluster ->
string -> ('a -> 'b Uq_engines.engine) -> 'a -> 'b Uq_engines.engineval retry : plasma_cluster -> string -> ('a -> 'b) -> 'a -> 'bretry c name f arg: Executes f arg and returns the result.
If an ECONFLICT error or timeout occurs the execution is repeated,
until a general timeout is reached.
Errors are logged (Netlog). name is used in log output.
It is common to combine retry and with_trans, e.g.
retry c "create_file"
(fun filename ->
with_trans c
(fun trans ->
create_file trans filename (regular_ii c 0o666)
)
)
implementing the general convention that retry means to retry
whole transactions.