1,073 questions
- Bountied 0
- Unanswered
- Frequent
- Score
- Trending
- Week
- Month
- Unanswered (my tags)
3
votes
0
answers
132
views
Go regex word boundary marker doesn't work with non-ASCII characters [duplicate]
I would like to match a string that may contain non-ASCII characters using a regular expression in Go. After writing some tests, I discovered some surprising behavior that I'd like to check if it's ...
1
vote
1
answer
96
views
Translate UTF-8 punctuation with normal ascii punctuation marks
I'm trying to cleanup a raw data that has embedded \r\n or \n in csv lines.Line terminator is \r\n.
trying to translate utf-8 punctuation marks to normal ascii punctuation marks.
cleaning up any ...
0
votes
0
answers
48
views
Spark file with corrupted header
I have a zipped parquet file with a corrupted header, i.e. it contains weird characters making it impossible to read the table in a standard way.
So I created a cleaning function that reads in the ...
0
votes
0
answers
76
views
Rails way to elegantly avoid Zero Width Space character problems
We had a Zero Width Space character problem with our rails app. Somebody copied and pasted a configuration value (a URL) into a form in our rails app, which later caused confusing error messages. It ...
2
votes
1
answer
156
views
How does C determine whether a character is lower case (islower or isupper)?
I was looking into GNU tr in bash on Debian Linux. The regex engine appears to have a [:lower:] and [:upper:] shorthand. The regex matches on "lowercase" and "uppercase" letters. ...
3
votes
0
answers
106
views
php script with accented characters or spaces in filename path not found by apache
On my Apache web server I have a few (file system) paths that contain spaces or accented characters, such as é, á, or ö. The web server returns "File not found." if (1) the path contains a ...
-1
votes
2
answers
99
views
Detecting and converting string containing ASCII [closed]
I have this string:
Miami, Florida
I would like to find a regex to help defect to see if this string contains ASCII code.
I have tried these regex \\p{ASCII}, ^[\\u0000-\\u007F]*,ドル ^\p{ASCII}*...
2
votes
1
answer
1k
views
python - How to get Unicode characters to display as boxes instead of accented letters - "x96\x88" and "x96\x80"
I have a table that is returning the characters "â\x96\x88" and "â\x96\x80"
These are displaying as "â" and "â"
However, what I need is for them to display as &...
0
votes
2
answers
149
views
Reading UTF-8 texts in PowerPoint via VBA, for export to another software [duplicate]
I want to read all text in a PowerPoint file using VBA, and write them to external file (or some other way) to use in another Software.
I wrote this code:
Sub ReadFileText()
On Error Resume Next
...
1
vote
1
answer
110
views
How to manage non ASCII characters inside sh/bash scripts
My terminal.txt file in the sequel
shows the output of my tmpPdfFile.sh and tmpPdfFile1.sh scripts:
both scripts are unable to properly manage the file
"6._ANbertà_di_scelta.docx.pdf"
The ...
0
votes
1
answer
138
views
I am getting extra character  when i run sql file
I have a .sql file to create a Postgres function:
CREATE FUNCTION id_generator°generateid() RETURNS SETOF integer AS
$BODY$
BEGIN
RETURN QUERY SELECT max(ua_id)
FROM user_attribute;...
0
votes
0
answers
59
views
WinmergeU.exe mangles non-ASCII characters
I have a file with this text (on Windows 10):
'content' => '<h3>This is Schüler Doc1</h3>
<p>glurk</p>
<p>Straße</p>
<p>Europäisch</p>
<p&...
0
votes
0
answers
23
views
Why is gnu strings not detecting chars followed by null char?
There are unprintable characters in an applescript file. However, when i attempt to run strings on the file, it doesn't output all printable characters. It only outputs a portion of the file as plain ...
0
votes
0
answers
161
views
English character frequency calculation
I've written code to perform character frequency analysis on strings by taking the standard ETAOIN SRHD... frequencies of english character occurrence and I have two questions.
Is this the most ...
0
votes
0
answers
121
views
How can I escape all unicode characters in a QString so that only ASCII is used and the result is as short as possible?
I'm searching for a safe method to escape all non-ASCII characters in a QString (and of course to un-escape them later) that will result in pure (printable) ASCII but yield the shortest possible ...