Questions tagged [character-encoding]

Ask Question

Questions that deal with various representations of characters & character sets, such as: ASCII, UTF-8, EBCDIC, among others. Often encountered when moving files between operating systems that encode new lines with carriage returns and/or newline characters.

423 questions

Newest Active Bountied Unanswered

3 votes

2 answers

168 views

Embedded special characters skewing sed output

The Issue I've been parsing a file with sed trying to tweeze out the desired data. This has worked fine for most lines in the file but there appears to be some embedded special characters that are ...

Gandalf's user avatar

Gandalf

asked Sep 16 at 16:16

1 vote

1 answer

89 views

Invalid characters in ssh sessions

I'm opening an SSH session from Fedora to Raspberry Pi OS. Accented and special characters are replaced with question marks. Preferably I would like to learn to solve this without changing the server'...

Cutter's user avatar

Cutter

asked Aug 28 at 19:05

2 votes

1 answer

89 views

Tmux pane with long-running session using wrong character set?

Today I connected to a long-running process in tmux over ssh for work, to find that the pane the process was running in seems to have started using the wrong character encoding for its output, leading ...

Patronics's user avatar

Patronics

asked Aug 9 at 19:31

6 votes

1 answer

358 views

Revert filenames after they were garbled by using different encoding

I have a file СМП бвагTMвга† The first three letters are proper Cyrllic and the remaining part is mojibake. "Mojibake is the garbled or gibberish text that is the result of text being decoded ...

jsx97's user avatar

jsx97

1,357

asked Mar 14 at 15:58

0 votes

1 answer

76 views

Output of echo uses different encoding than the one specified according to LANG and LC_CTYPE

It is my understanding that the LANG and LC_CTYPE environment variables define the encoding used by shell commands when writing to stdout. However, after executing LANG=de_DE.iso88591 LC_CTYPE=de_DE....

userAcgJllhSe's user avatar

userAcgJllhSe

asked Mar 6 at 9:26

0 votes

2 answers

209 views

To have or not Byte Order Mark (BOM) in UTF-8 text files?(Linux)

Is it advisable to have or not Byte Order Mark (BOM) in UTF-8 text files on Linux? Is it correct to say byte order (even for multi-byte characters) is already strictly defined/fixed in UTF-8 standard? ...

strider's user avatar

strider

asked Feb 28 at 23:43

0 votes

0 answers

92 views

Advanced CLI tool/code to determine text encoding (besides enca)

Looking for advanced CLI tool/code to determine text Codepage/Language (besides enca). Goal: Automate as much as possible conversion of hundreds/thousands of 8-bit text files (including non-ASCII ...

strider's user avatar

strider

asked Feb 27 at 18:34

0 votes

0 answers

33 views

file -i provide two different charsets for the same file on the same FS

I'm a bit confuse about a behavior of the file -i command. I searched a while and give up since I didn't have a sufficient knowledge regarding encoding as well as linux file command (to stay concise ...

ollie314's user avatar

ollie314

asked Nov 20, 2024 at 13:34

-2 votes

1 answer

82 views

Convert subtitles so they are coded correctly (Polish and `"` even gets wrongly coded)

Wrong encoding: 1 00:01:27,879 --> 00:01:31,216 No i dupa. Koniec z darmowym wi-fi. 2 00:01:33,009 --> 00:01:34,972 - Ki-jung! - No? 3 00:01:35,219 --> 00:01:39,183 Kobieta z góry ...

jirafey's user avatar

jirafey

asked Aug 2, 2024 at 14:22

2 votes

1 answer

170 views

removing hidden control characters in filenames

I have a huge number of files spread across a large directory structure that have hidden control characters in their names. ls lists them as, e.g.: '614.7-4-F1-00-090-007-RozvadØ'$'302円237円'' RP1-...

atapaka's user avatar

atapaka

asked Jun 26, 2024 at 14:27

1 vote

1 answer

160 views

How can I set the character to Latn-1 or MCS when using serial-getty?

I'd like to use my old VT420 terminal as system console. Adding RS232 ports and setting up serial-getty are not a problem, but: For years, almost all Linux distros have been using UTF-8 as the ...

Neppomuk's user avatar

Neppomuk

asked Jun 20, 2024 at 19:38

11 votes

3 answers

2k views

UTF-8 characters in POSIX shell script comments - anything against it?

I would like to include a couple of non-ASCII characters in my POSIX shell script comments. Note this is in no way a duplicate of e.g. "Which character encodings are supported by posix?" as ...

Vlastimil Burián's user avatar

Vlastimil Burián

30.9k

asked Jun 18, 2024 at 22:21

0 votes

1 answer

156 views

regex: how come the trademark symbol matches to a-z?

Sorry if this is a repeat or basic question but it is hard to search for a TM. I'm writing a script to remove weird characters from file names. How come the trade mark symbol TM matches [^a-z] ??? $ ...

codywohlers's user avatar

codywohlers

asked Apr 7, 2024 at 8:43

0 votes

1 answer

319 views

nmtui is not rendering correctly

When using nmtui, â"Œ and â"‚ are added in places where they definitely should not be (see attached screenshot): How can I solve this?

user20695956's user avatar

user20695956

asked Mar 4, 2024 at 16:55

4 votes

2 answers

1k views

How can I convert full-width characters to half-width characters (and vice versa)?

Here is my simple problem, how can I convert half-width to full-width from the command line. I thought this would be built-in my iconv command line, but I did not find anything here: $ iconv -l | ...

character-encoding

malat's user avatar

malat

3,459

asked Mar 1, 2024 at 8:18

15 30 50 per page

2 3 4 5

...

29 Next

Stack Exchange Network

Questions tagged [character-encoding]

Embedded special characters skewing sed output

Invalid characters in ssh sessions

Tmux pane with long-running session using wrong character set?

Revert filenames after they were garbled by using different encoding

Output of echo uses different encoding than the one specified according to LANG and LC_CTYPE

To have or not Byte Order Mark (BOM) in UTF-8 text files?(Linux)

Advanced CLI tool/code to determine text encoding (besides enca)

file -i provide two different charsets for the same file on the same FS

Convert subtitles so they are coded correctly (Polish and `"` even gets wrongly coded)

removing hidden control characters in filenames

How can I set the character to Latn-1 or MCS when using serial-getty?

UTF-8 characters in POSIX shell script comments - anything against it?

regex: how come the trademark symbol matches to a-z?

nmtui is not rendering correctly

How can I convert full-width characters to half-width characters (and vice versa)?

Hot Network Questions