Character string manipulation library
July, 2008
Introduction
This is a string library that is intended to be compatible with the class string
library in the C++ standard. My version is for strings of characters of type char
only.
It is for people who do not have access to an official version of the string library or
wish to use a version without templates.
It follows the standard class string as I understand it, except that a few functions
that are relevant only to the template version are omitted, and all the functions involving
iterators are omitted.
I use the name String rather than string to prevent conflicts with
other string libraries (as in BC 5.0).
The initial version was taken from Tony Hansen's
book The C++ answer book, but very little of Tony's code remains.
Permission is granted to use/modify/distribute this. If you distribute it or put it on your web
site please include a link to my site. If you distribute a modified version
please make it clear which bits are mine and which are yours. I take no responsibility for
errors, omissions etc, but please tell me about them.
This library links into my exception package. If you are using a very old
compiler, you may need to edit the file include.h to
determine whether to use simulated exceptions or compiler supported exceptions or simply
to disable exceptions. More information on the exception package is given in the
documentation for my matrix library, newmat11.
The package uses a limited form of copy-on-write (see Tony Hansen's book for
more details) and also attempts to avoid repeated reallocation of the string storage
during a multiple sum. This results in some saving in space and time for some operations
at the expense of an increase in the complexity of the program and an increase in the time
used by a few operations. As with newmat it is still an open question whether the extra
complexity is really warranted. Or under what circumstances it is really warranted.
This package includes simple functions for
manipulating strings and a class for extracting
information from the command line.
It also includes class libraries to help format numerical output
and to edit ASCII files. They documented in separate
files.
Files in this package
The following files are included in this package
str.h
header file for the string library
str.cpp
function bodies
str_fns.h
header file for string functions
str_fns.cpp
string functions bodies
commline.h
command line class header
commline.cpp
command line bodies
myexcept.h
header for the exceptions simulator
myexcept.cpp
bodies for the exceptions simulator
include.h
options header file (see documentation in newmat11)
strtst.cpp
test program
strtst.dat
data file used by test program
strtst.txt
output from the test program
test_exs.cpp
test exceptions
test_exs.txt
output from test_exs
readme.txt
readme file
string.htm
this file
rbd.css
style sheet for use with htm files
st_gnu.mak
make file for gnu c++
st_cc.mak
make file for CC compiler
st_b55.mak
make file for Borland C++ 5.5
st_b56.mak
make file for Borland C++ 5.6
st_b58.mak
make file for Borland C++ 5.8
st_m6.mak
make file for Visual C++ version 6 or 7
st_m8.mak
make file for Visual C++ version 8
st_i8.mak
make file for Intel compiler for Windows, v8,9
st_i10.mak
make file for Intel compiler for Windows, v10
st_il8.mak
make file for Intel compiler for Linux, v8,9,10
st_ow.mak
Make file for Open Watcom compiler
str.lfl
library file list for make file generator
st_targ.txt
target file for make file generator
format.h
header file for format program
format.cpp
bodies for format program
formtest.cpp
test program for format program
formtest.txt
output from test program
format.htm
documentation for format program
gstring.h
header file for gstring ascii file editor
gstring.cpp
bodies for gstring program
liststr.cpp
bodies for gstring program
lstst.cpp
test program
fox.dat
test data file
lstst.dat
test data file
lstst.txt
output from test program
gstring.htm
documentation for gstring program
Testing and getting started
I have tested this program on recent versions of the Borland, Microsoft, Gnu,
Intel, Sun, Open Watcom compilers.
You may need to edit include.h - but it will probably work for you as
is. See the
newmat documentation for more information about editing include.h.
Activate the _STANDARD_ option to use the form of include statements
used in standard C++ (automatic for recent versions of Borland, Microsoft, Gnu
and Intel compilers).
Activate the use_namespace to put the string library in namespace
RBD_STRING.
The GString library which is included in this
package uses nested classes and will not compile under older compilers.
Some CC compilers generate 33 error messages when running the strtst test program. I suspect
these are due to a slightly different convention in deleting temporaries and don't matter.
For the indexes, lengths etc I use unsigned int (typedefed to uint). This is
instead of size_type in the official package.
You will need to #include files include.h and str.h in your programs that use this
package. Don't forget to edit include.h to determine whether exceptions are to be used,
simulated or disabled. If you use the simulated exceptions you should turn off the
exception capability of a compiler that does support exceptions.
I have included make files for a variety of compilers for compiling the test
programs. Make files for some
other compilers can be generated using my genmake
utility. The file st_targ.txt gives the list of targets for genmake and
str.lfl has the list of names of the libraries. See the genmake
documentation for more details about the make files.
The public member functions
Static variable
static uint npos
String::npos is the largest possible value of uint and is
used to indicate that a find function has failed to find its target. All Strings must have
length strictly less than String::npos
Constructors, destruction, operator=
String()
construct a String of zero length
String(const String&str)
copy constructor (not explicitly in standard)
String(const String&str, uint pos, uint n = npos)
construct a String from str starting at location pos (first
location = 0) and continuing for the length of the String or for n characters, whichever
occurs first
String(const char* s, uint n)
construct a String from s taking a maximum of n characters or
the length of the String
String(const char* s)
construct a String from s
String(uint n, char c)
construct a String consisting of n copies of the character c
~String()
the destructor
String& operator=(const String& str)
copy a String (except that it may be able to avoid copying)
String& operator=(const char* s)
set a String equal to a c-style character string pointed to
by s
String& operator=(const char c)
set a String equal to a character
Storage control
uint size() const
the length of the String (does not include a trailing zero -
in most cases there isn't one)
uint length() const
same as size
uint max_size() const
the maximum size of a String, I have set it to npos-1
void resize(uint n, char c = 0)
change the size of a String, either by truncating or filling
out with copies of character c (std does default separately)
uint capacity() const
the total space allocated for a String (always >= size())
void reserve(uint res_arg = 0)
change the capacity of a String to the maximum of res_arg and
size(). This may be an increase or a decrease in the capacity.
void clear()
erase the contents of the string
bool empty() const
true if the String is empty; false otherwise
Character access
char operator[](uint pos) const
return the pos-th character; return 0 if pos = size()
char& operator[](uint pos)
return a reference to the pos-th character; undefined if
pos>=size() - I throw an exception. This reference may become invalid after almost any
manipulation of the String
char at(uint n) const
same as operator[] const
char& at(uint n)
same as operator[]. Throw an exception of pos >=size()
The editing functions
For conditions under which references and pointers to data are invalidated by these
functions see policy on reallocation.
String& operator+=(const String& rhs)
append rhs to a String
String& operator+=(const char* s)
append the c-string defined by s to a String
String& operator+=(char c)
append the character c to a String
String& append(const String& str)
append str to a String
String& append(const String& str, uint pos, uint
n)
append String(str,pos,n)
String& append(const char* s, uint n)
append String(s,n)
String& append(const char* s)
append String(s)
String& append(uint n, char c)
append character c
void push_back(char c)
operator+=(c)
String& assign(const String& str)
replace the String by str (this function is not explicitly in
the standard)
String& assign(const String& str, uint pos, uint
n)
replace the String by String(str,pos,n)
String& assign(const char* s, uint n)
replace the String by String(s, n)
String& assign(const char* s)
replace the String by String(s)
String& assign(uint n, char c)
replace the String by String(c)
String& insert(uint pos1, const String& str)
insert str before character pos1
String& insert(uint pos1, const String& str, uint
pos2, uint n)
insert String(str,pos2,n) before character pos1
String& insert(uint pos, const char* s, uint n =
npos)
insert String(s,n) before character pos (std does default
separately)
String& insert(uint pos, uint n, char c)
insert character c(s,n) before character pos
String& erase(uint pos = 0, uint n = npos)
erase characters starting at pos and continuing for n
characters or till the end of the String. This was originally called remove
String& replace(uint pos1, uint n1, const String&
str)
erase(pos1,n1); insert(pos1,str)
String& replace(uint pos1, uint n1, const String&
str, uint pos2, uint n2)
erase(pos1,n1); insert(pos1,str,pos2,n2)
String& replace(uint pos, uint n1, const char* s,
uint n2 = npos)
erase(pos,n1); insert(pos,s,n2); (std does default
separately)
String& replace(uint pos, uint n1, uint n2, char c)
erase(pos,n1); insert(pos,n2,c)
uint copy(char* s, uint n, uint pos = 0) const
copy a maximum of n characters from a string starting at
position pos to memory starting at location given by s. Return the number of characters
copied. I assume that the program has already allocated space for the characters
void swap(String&)
a.swap(b) swaps the contents of Strings a and b. The standard
also provides for a function swap(a,b) - see
binary operators
Pointer to data
const char* c_str() const
return a pointer to the contents of a String after appending
(char)0 to the String. This pointer will be invalidated by almost any operation on the
String
const char* data() const
return a pointer to the contents of a String. This pointer
will be invalidated by almost any operation on the String
The find functions
uint find(const String& str, uint pos = 0) const
find the first location of str in a String starting at
position pos. The location is relative to the beginning of the parent String. Return
String::npos if not found
uint find(const char* s, uint pos, uint n) const
find(String(s,n),pos)
uint find(const char* s, uint pos = 0) const
find(String(s),pos)
uint find(const char c, uint pos = 0) const
find(String(1,c),pos)
uint rfind(const String& str, uint pos = npos) const
find the last location of str in a String starting at
position pos. ie begin the search with the first character of str at position pos of the
target String. The location is relative to the beginning of the parent String. Return
String::npos if not found
uint rfind(const char* s, uint pos, uint n) const
rfind(String(s,n),pos)
uint rfind(const char* s, uint pos = npos) const
rfind(String(s),pos)
uint rfind(const char c, uint pos = npos) const
rfind(String(1,c),pos)
uint find_first_of(const String& str, uint pos = 0)
const
find first of any element in str starting at pos. Return
String::npos if not found
uint find_first_of(const char* s, uint pos, uint n) const
find_first_of(String(s,n),pos)
uint find_first_of(const char* s, uint pos = 0) const
find_first_of(String(s),pos)
uint find_first_of(const char c, uint pos = 0) const
find_first_of(String(1,c),pos)
uint find_last_of(const String& str, uint pos = npos)
const
find last of any element in str starting at pos. Return
String::npos if not found
uint find_last_of(const char* s, uint pos, uint n) const
find_last_of(String(s,n),pos)
uint find_last_of(const char* s, uint pos = npos) const
find_last_of(String(s),pos)
uint find_last_of(const char c, uint pos = npos) const
find_last_of(String(1,c),pos)
uint find_first_not_of(const String& str, uint pos =
0) const
find first of any element not in str starting at pos. Return
String::npos if not found
uint find_first_not_of(const char* s, uint pos, uint n)
const
find_first_not_of(String(s,n),pos)
uint find_first_not_of(const char* s, uint pos = 0) const
find_first_not_of(String(s),pos)
uint find_first_not_of(const char c, uint pos = 0) const
find_first_not_of(String(1,c),pos)
uint find_last_not_of(const String& str, uint pos =
npos) const
find last of any element not in str starting at pos. Return
String::npos if not found
uint find_last_not_of(const char* s, uint pos, uint n)
const
find_last_not_of(String(s,n),pos)
uint find_last_not_of(const char* s, uint pos = npos)
const
find_last_not_of(String(s),pos)
uint find_last_not_of(const char c, uint pos = npos)
const
find_last_not_of(String(1,c),pos)
The substring function
String substr(uint pos = 0, uint n = npos) const
return String(*this, pos, n)
The compare functions
int compare(const String& str) const
a.compare(b) compares a and b in normal sort order. Return
-1, 0 or 1
int compare(uint pos, uint n, const String& str)
const
a.compare(pos,n,b) compares String(a,pos,n) and b in normal
sort order. Return -1, 0 or 1
int compare(uint pos1, uint n1, const String& str,
uint pos2, uint n2) const
a.compare(pos1,n1,b,pos2,n2) compares String(a,pos1,n1) and
String(b,pos2,n2) in normal sort order. Return -1, 0 or 1
int compare(const char* s) const
return compare(String(s))
int compare(uint pos1, uint n1, const char* s, uint n2 =
npos) const
return compare(pos1, n1, String(s,n2))
The binary String functions
+ means concatenate, otherwise the meanings are obvious.
String operator+(const String& lhs, const String& rhs)
String operator+(const char* lhs, const String& rhs)
String operator+(char lhs, const String& rhs)
String operator+(const String& lhs, const char* rhs)
String operator+(const String& lhs, char rhs)
bool operator==(const String& lhs, const String& rhs)
bool operator==(const char* lhs, const String& rhs)
bool operator==(const String& lhs, const char* rhs)
bool operator!=(const String& lhs, const String& rhs)
bool operator!=(const char* lhs, const String& rhs)
bool operator!=(const String& lhs, const char* rhs)
bool operator<(const String& lhs, const String& rhs)
bool operator<(const char* lhs, const String& rhs)
bool operator<(const String& lhs, const char* rhs)
bool operator>(const String& lhs, const String& rhs)
bool operator>(const char* lhs, const String& rhs)
bool operator>(const String& lhs, const char* rhs)
bool operator<=(const String& lhs, const String& rhs)
bool operator<=(const char* lhs, const String& rhs)
bool operator<=(const String& lhs, const char* rhs)
bool operator>=(const String& lhs, const String& rhs)
bool operator>=(const char* lhs, const String& rhs)
bool operator>=(const String& lhs, const char* rhs)
void swap(const String& A, const String& B)
The stream functions - slightly rough implementation as yet:
istream& operator>>(istream& is, String& str)
... read token from istream
ostream& operator<<(ostream& os, const String& str)
... output a String
istream& getline(istream is, String& str, char delim = '\n')
... read a line
The policies
Reallocation policy
This section discusses under what circumstances the String data in a String object will
be moved. It is unclear to me what the standard allows. Moving the String data invalidates
the const char* returned by .data() and .c_str() and any reference returned by the
non-const versions of .at() or operator[] (and any iterators referring to the string).
I describe here what my program does. Another standard String package may (and probably
does) follow different rules.
The value returned by .c_str will most likely become invalid under almost any operation
of the String which changes the value of the String. Also a call to .c_str will invalidate
a const char* returned by .data() and any reference returned by .at() or operator[].
If A is a String that has been assigned a capacity with the reserve function then the
following functions will not cause a reallocation (so the value returned by .data() etc.
will remain valid)
A += ...
A.assign(...)
A.append(...)
A.insert(...)
A.erase(...)
A.replace(...)
where ... denotes a legitimate argument, providing the resulting String will fit in the
assigned capacity (as set by a call to reserve).
If the resulting String will not fit into the assigned capacity the String data will be
moved (so the value returned by .data() etc. will not remain valid). Also the String will
no longer be regarded as having an assigned capacity.
The concept of having an assigned capacity is important in considering the behaviour of
assign, erase and replace when the parameters are such that length of the String is
reduced. For example
String A = "0123456789";
A.reserve(1); // will set capacity to A.size() = 10
const char* d = A.data();
A.erase(1,9);
will leave a valid value in d whereas
String A = "0123456789";
const char* d = A.data();
A.erase(1,9);
will not leave a valid value in d since the storage of the String data will have been
moved.
The operator= does not conform to these rules. A = something will always
remove any assigned capacity for A (and will not pick up any capacity from the something).
In this package A.reserve() or A.reserve(0) will remove any assigned capacity. i.e. it
will be as though no capacity had ever been assigned. So an erase or a replace that
changes a length will cause a reallocation.
But don't expect anyone else's package to follow these rules.
Policy on operator+, operator+= and append
The evaluation of the concatenation expression A+B is delayed until the expression is
used or until the value is referred to twice. This means the expressions such as A+B+C are
evaluated in one sweep rather than having A+B formed as a temporary before evaluating
A+B+C.
Unfortunately, this means that in expressions such as A + c_string the
c-string c_string will be converted to a String object, before the overall String
is formed. Since c-strings will usually be small I don't see this as a serious problem.
Likewise A+=X or A.append(X) will not be evaluated until the result is used (unless A
has been assigned a capacity that is large enough to accommodate X). This means that
sequences like
A += X1;
A += X2;
...
will not cause repeated reallocations of the space used by the String data.
String functions
These are a set of simple functions for manipulating strings. You need the
header file str_fns.h and body file str_fns.cpp.
String ToString(int i)
Convert int to string
String ToString(long i)
Convert long to string
String ToString(double f, int ndec = 4)
Convert double to string; ndec determines the number of
decimal places
void UpperCase(String& S)
Convert string to upper case
void LowerCase(String& S)
Convert string to lower case
bool IsInt(const String& S)
Does a string represent an integer?
bool IsFloat(const String& S)
Does a string represent a floating point number (includes
integer, does allow for E format)?
inline bool Contains(const String& S, const String& str)
inline bool Contains(const String& S, const char* s)
inline bool Contains(const String& S, char c)
Does S contain str, s or c, respectively?
inline bool ContainsAnyOf(const String& S, const String& str)
inline bool ContainsAnyOf(const String& S, const char* s)
inline bool ContainsAnyOf(const String& S, char c)
Does S contain any of the characters of str, s or c,
respectively?
inline bool ContainsOnly(const String& S, const String& str)
inline bool ContainsOnly(const String& S, const char* s)
inline bool ContainsOnly(const String& S, char c)
Does S contain only characters of str, s or c, respectively?
int sf(String& S, const String& s1, const String& s2);
int sl(String& S, const String& s1, const String& s2);
int sa(String& S, const String& s1, const String& s2);
Suppose S contains a contains a copy of s1. The function sf
replaces the first copy by s2, sl replaces the last copy and sa replaces all
copies. Return number of changes (0 or 1 for sf and sl).
Command line class
This is a simple class for extracting the information from the command line
(when you call a program from a text window). See the
genmake program as an example. I assume you call your program with a
command like
program -options A B C
where program is the name of the program, options is a sequence
of single letter options with no spaces and A B C is a sequence of names separated by
spaces.
Start your main program with
#include "str.h"
#include "commline.h"
int main(int argc, char** argv)
{
CommandLine CL(argc, argv);
...
Here are the member functions for the CommandLine class.
CommandLine(int argc, char** argv)
Constructor: argc, argv from main(int argc, char** argv)
int argc()
Return argc
char** argv()
Return argv
String GetArg(int i)
Get the i-th name; i=1 for first name after
options
String GetOptions()
Get option sequence
int NumberOfArgs()
Return number of arguments excluding options
bool Options()
True if there are options
bool HasOption(const String& s)
True if options has any character in s
bool HasOptionCI(const String& s)
Case independent version of HasOption
To do list
- Can there be memory leaks following an exception?
- Tests for failure to allocate memory (?)
- Avoid virtual call by operator[] and at
- Inline functions where appropriate
- Try to reduce the number of virtual function calls
- Try to remove repeated pieces of code
- Check all code is exercised by test program
- Redo IO routines
- Implement iterators
- Better comments
- Document policy on operator[] and at.
History
- April, 1996: First version
- November, 1996: Fix problems with find and rfind, change string
to String, update simulated exceptions.
- August, 1998: Minor fixes, align with January '96 working paper.
- September, 1998: Align with official standard, minor improvements.
- July, 2001: Fix IO, minor changes, include format and gstring programs and
string functions.
- January, 2002: Updates to gstring and string functions.
- October, 2002: Fixes for modern compilers, update include.h, namespace
option
August, 1998 changes
- remove replaced by erase
- order of arguments changed in compare
- replace(str, pos, n, c) changed to replace(str, pos, n1, n2, c)
[make sure you update your program if you are using this, since the old version will still
compile and will give wrong answers]
- update behaviour of reserve (see notes).
- clear()
- swap(A,B)
- some defaults for arguments deleted
- updated exception library
- fixed global variable problem (?)
- html version of documentation
September, 1998 changes
- align with C++ standard - delete a number of default values for parameters
- minor improvements - particularly remove some unnecessary copies
July, 2001 changes
January, 2002 changes
April, 2004 changes
June, 2004 changes
May, 2005 changes
- Additional make files, additional option in format
program, reinstate simulated booleans
April, 2006 changes
- Compatibility fix for Gnu G++ 4.1.0, make file for Visual C++ 8.
July, 2008 changes
- Additional make files, update include.h