Linked Questions

Question 1

I have looked all around and only found solutions for python 2.6 and earlier, NOTHING on how to do this in python 3.X. (I only have access to Win7 box.) I HAVE to be able to do this in 3.1 and ...

Question 2

I'm doing some web scraping and sites frequently use HTML entities to represent non ascii characters. Does Python have a utility that takes a string with HTML entities and returns a unicode type? For ...

Question 3

Does anyone know an easy way in Python to convert a string with HTML entity codes (e.g. < &) to a normal string (e.g. < &)? cgi.escape() will escape strings (poorly), but there ...

Question 4

Possible Duplicate: Decode HTML entities in Python string? I have a string full of HTML escape characters such as ", ”, and —. Do any Python libraries offer reliable ...

Question 5

print u'<' How can I print < print '>' How can I print >

Question 6

I want to scrape some information off a football (soccer) web page using simple python regexp's. The problem is that players such as the first chap, ÄÄRITALO, comes out as ÄÄRITALO! ...

Question 7

I've a huge csv file of tweets. I read them both into the computer and stored them in two separate dictionaries - one for negative tweets, one for positive. I wanted to read the file in and parse it ...

Question 8

Possible Duplicate: Decode HTML entities in Python string? I have parsed some HTML text. But some punctuations like apostrophe are replaced by ’. How to revert them back to ` P.S: I am ...

Question 9

I'm trying to write a json to csv - using Python 3.6 - and the json contains &. How can I write just plain ampersands (&) instead of &? I've tried str.replace using a variety of ...

Question 10

I scraped news article titles and URLs, and stored the titles and urls in a tsv file as plain text. For some reason, the scraper I use converts some characters (€ for example) into hexacode. I have ...

Question 11

I am using requests to request a page. The task is very simple, but I have a problem with encoding. The page contains non-ascii, Turkish characters, but in the HTML source, the result is as below: ...

Question 12

I've been banging my head against the wall with this for a while. I'm trying to parse an RSS feed with Python's BeautifulSoup, and every now and then I get errors like: I don't know what I ...

Question 13

Possible Duplicate: Decode HTML entities in Python string? I have a malformed string in Python: Muhammad Ali's fight with Larry Holmes where ' is a apostrophe. Firstly what ...

Question 14

I'm trying to split on a lookahead, but it doesn't work for the last occurrence. How do I do this? my_str = 'HRCâs' import re print(re.split(r'.(?=&)', my_str)) My ...

Question 15

My python string consists of ' instead of ' (single quotes). My current objective is to expand compound words like It's to It is, Haven't to Have not. "This has been great for me. I&#...

Question 16

The task is to convert Де to Де. Does Python 3 has builtin function or I need to parse this string and then use builtin chr method to convert each number to string?

Question 17

Possible Duplicate: Convert XML/HTML Entities into Unicode String in Python Decode HTML entities in Python string? I am using Python 2.7 and am fairly lost in unicode type. I looked up variety ...

Question 18

I have used the strip_tags function. It removes tags like "<p>, <b>", etc but things like "  “" and other html encodings remain. How to remove them ?

Question 19

I'm getting data from a website and this is an example of a sentence I retrieved : PHA+Q29ycmlnJmVhY3V0ZTtzIGV4ZXJjaWNlcyBlbnRyYWluZW1lbnQgY2hhcGl0cmUgbW91dmVtZW50IGV0IGZvcmNlczwvcD4K The sentence is ...

Question 20

I have a string like below: THE SMASH-HIT, CRITICALLY ACCLAIMED SERIES RETURNS! Now that you've read the first two bestselling collections of SAGA , you're all caught up and ready to ...

Question 21

I want to convert in Python 2.7 string like "", "ż" and similar to UTF-8 string. How to do it?

Question 22

I am trying to scrape a webpage whose charset like this <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> and when I get the page source using python requests, I get content ...

Question 23

I have a string "hello[ World]" and I want to convert it to "hello[World]" I tried something like this: a.encode("utf-8").decode("ascii") I got back same string as input.

Question 24

I am trying scraping and meet an issue about the words shows as ''and '', i serach the whole network but there's no answer about how to decode it, so I come to here to ask for ...

Question 25

I have an email address which is encoded subi.bhaskaran@in.ibm.com I have to ...

Question 26

I have a database of texts, some of which contain unconverted hex codes: import pandas as pd example = pd.DataFrame({"content": ["Zwischen den Parteien ist unstreitig, dass Sch&#...

Question 27

I used python to get html page from a japanese comic site, and used regex to only extract some titles of chapters of the comics. I can get most of them correctly as it is but some of them comes in ...

Question 28

I am having problems with code created in Python, and it is that when I generate some texts in json, the accents are not appreciated. This is the code I'm using: import requests url = requests.get(f&...

Question 29

I have a curious problem where I am given many strings in a list. For instance, the list may be: [ "Cartier 'Déclaration d'un Soir'", "Hue Cool", "Lagos Caviar™ Hoop" ] ...

Question 30

I'm parsing a large XML file using lxml in Python 3 that has HTML character codes (e.g. [ and ]). Here's an example of the problem and an example of my attempt to use html....

Question 31

I am fetching a string from a website and it is returning the string with some sort of weird charachters in it Pokémon & Magic the Gathering How can I easily convert it to say ...

Question 32

I have a list with results from re.findall from a web page. List items contains & # 171; & # 8211; etc. How do i remove these substrings from list items ?

Question 33

I'm using beautifulsoup with html5lib, it puts the html, head and body tags automatically: BeautifulSoup('<h1>FOO</h1>', 'html5lib') # => <html><head></head><body&...

Question 34

I've got a string from an HTTP header, but it's been escaped.. what function can I use to unescape it? myemail%40gmail.com -> [email protected] Would urllib.unquote() be the way to go?

Question 35

I am downloading HTML pages that have data defined in them in the following way: ... <script type= "text/javascript"> window.blog.data = {"activity":{"type":"read"}}; </script> ... I ...

Question 36

I have a html text like this: <xml ... > and I want to convert it to something readable: <xml ...> Any easy (and fast) way to do it in Python?

Question 37

I'd like to be able to use unicode in my python string. For instance I have an icon: icon = '▲' print icon which should create icon = '▲' but instead it literally returns it in ...

Question 38

I would like to convert HTML entities back to its human readable format, e.g. '£' to '£', '°' to '°' etc. I've read several posts regarding this question Converting html source ...

Question 39

The solutions in other answers do not work when I try them, the same string outputs when I try those methods. I am trying to do web scraping using Python 2.7. I have the webpage downloaded and it has ...

Question 40

I have a reference list containing function names in this format: list_1_ref = ['Name1<abc>' , 'Name2<abc>'] From another function I get a return list containing elements with html format:...

Question 41

I'd hate to open a new question even though many questions have been opened on this same topic, but I'm literally at my ends as to why this isn't working. I am attempting to create a JSON object ...

Question 42

There's a xml file: <body> <entry> I go to <hw>to</hw> to school. </entry> </body> For some reason, I changed <hw> to <hw> and ...

Question 43

I am trying to use google translate api as below. Translation seems ok except the apostrophe chars which are returned as ' instaead. Is it possible to fix those ? I can of course make a ...

Question 44

I'm trying to read a CSV file with pandas read_csv. The data looks like this (example) thing;weight;price;colour apple;1;2;red m & m's;0;10;several cherry;0,5;2;dark red Because of the HTML-...

Question 45

When I'm processing HTML code in Python I have to use the following code because of special characters. line = string.replace(line, """, "\"") line = string.replace(line, "'", "'") ...

Question 46

I'm using ajax and django for dynamically populate a combo box. ajax component works really fine and it parse the data to the view but int the view, when i'm using the spiting function it gives me a ...

Question 47

Input text: Ell és la víctima que expia els nostres pecats, i no tan sols els nostres, sinó els del món sencer. Expected output: Ell és la víctima que expia ...

Question 48

I recieve HTML-files and they contain Strings like that " ("), ü(ü) and so on. I need them humand-readable. So I could use str.replace() for that. But isn't there a package/library ...

Question 49

I'm currently learning how to parse xml data using elementtree. I got an error that say:ParseError: not well-formed (invalid token): line 1, column 2. My code is right below, and a bit of the xml ...

Question 50

Say I have the following HTML emoji entity: '&#x1f604 ;' Note there isn't actually a space between the 4 and the ; it's just there so that it doesn't show up as a smiley The emoji's Python form ...

CollectivesTM on Stack Overflow

Linked Questions