I have the following html code structure but I don't know how to extract the values of text1 and text2 from <td> <a href ="....."> text1 </a> text2 </td>
<tbody>
<tr class="trBgGrey"><td nowrap="nowrap">1</td><td nowrap="nowrap">11</td><td class="tdAlignL font13 fontStyle" nowrap="nowrap"><a href="http://www.hkjc.com/english/racing/horse.asp?horseno=S205">SWEET BEAN</a>(S205)</td><td class="tdAlignL font13 fontStyle" nowrap="nowrap"><a href="http://www.hkjc.com/english/racing/jockeyprofile.asp?jockeycode=MOJ&season=Current">J Moreira</a></td><td class="tdAlignL font13 fontStyle" nowrap="nowrap"><a href="http://www.hkjc.com/english/racing/trainerprofile.asp?trainercode=FC&season=Current">C Fownes</a></td><td nowrap="nowrap">121</td><td nowrap="nowrap">1034</td><td nowrap="nowrap">7</td><td nowrap="nowrap">-</td><td align="center" nowrap="nowrap"><table width="80" border="0" cellSpacing="0" cellPadding="0"><tr><td width="16" align="center">8</td><td width="16" align="center">8</td><td width="16" align="center">8</td><td width="16" align="center">3</td><td width="16" align="center">1</td></tr></table></td><td nowrap="nowrap">1.51.13</td><td nowrap="nowrap">5.3</td></tr>
</tr><tr class="trBgGrey"><td nowrap="nowrap">3</td><td nowrap="nowrap">2</td><td class="tdAlignL font13 fontStyle" nowrap="nowrap"><a href="http://www.hkjc.com/english/racing/horse.asp?horseno=V311">CITY WINNER</a>(V311)</td><td class="tdAlignL font13 fontStyle" nowrap="nowrap"><a href="http://www.hkjc.com/english/racing/jockeyprofile.asp?jockeycode=RN&season=Current">N Rawiller</a></td><td class="tdAlignL font13 fontStyle" nowrap="nowrap"><a href="http://www.hkjc.com/english/racing/trainerprofile.asp?trainercode=TYS&season=Current">Y S Tsui</a></td><td nowrap="nowrap">132</td><td nowrap="nowrap">978</td><td nowrap="nowrap">6</td><td nowrap="nowrap">1</td><td align="center" nowrap="nowrap"><table width="80" border="0" cellSpacing="0" cellPadding="0"><tr><td width="16" align="center">9</td><td width="16" align="center">9</td><td width="16" align="center">9</td><td width="16" align="center">10</td><td width="16" align="center">3</td></tr></table></td><td nowrap="nowrap">1.51.30</td><td nowrap="nowrap">22</td></tr>
</tbody>
I tried my codes as follows but cannot get the text values
import requests
from bs4 import BeautifulSoup
import urllib.request
race_link = 'http://racing.hkjc.com/racing/info/meeting/Results/English/Local/20171227/HV'
sauce1 = urllib.request.urlopen(race_link).read()
soup1 = BeautifulSoup(sauce1, 'html.parser')
for link in soup1.find_all('tr', {'class': 'trBgGrey'}):
for ilink in link.find_all('td'):
print(ilink.string)
But my results return to:
1
11
None
J Moreira
C Fownes
121
1034
7
-
None
8
8
8
3
1
1.51.13
5.3
.....
My expected results are
1
11
SWEET BEAN
(S205)
J Moreira
C Fownes
121
1034
7
-
None
8
8
8
3
1
1.51.13
5.3
......
I can get the values from the html structure as
<td>text1</td><td>text2</td>
But I don't know how to code to get the values from the html structure as
<td><a href="....">text1</a>text2</td>
How can I get the values from the second structure?
-
I mean, I would like to extract text1 and text2 from the following html structure:how2code– how2code2018年01月01日 08:35:12 +00:00Commented Jan 1, 2018 at 8:35
-
You want the horse name and ID?cs95– cs952018年01月01日 08:38:20 +00:00Commented Jan 1, 2018 at 8:38
-
sorry that it is my first time to post a thread here and missed out something. I amended my thread. In fact, I want to know how get the values (text1 and text2) inside a html structure as follows: <td><a hre="......">text1</a>text2</td>how2code– how2code2018年01月01日 08:38:40 +00:00Commented Jan 1, 2018 at 8:38
-
@cᴏʟᴅsᴘᴇᴇᴅ: in fact, i need all values including the horse name and ID. But now that I can only get all other values except horse name and ID. I want to get those both as well. Thanks!how2code– how2code2018年01月01日 08:40:54 +00:00Commented Jan 1, 2018 at 8:40
-
1. Please add example of expected output. 2. the code you added is not giving the output you gave. for example, there is no <tr> and there is no class trBgGreyElisha– Elisha2018年01月01日 08:47:56 +00:00Commented Jan 1, 2018 at 8:47
1 Answer 1
Try something like that:
from bs4 import element
def print_strings(elemnt):
for c in elemnt.children:
if isinstance(c, element.Tag):
print_strings(c)
else:
print (c, end=" ")
for link in soup1.find_all('tr', {'class': 'trBgGrey'}):
for ilink in link.find_all('td'):
print_strings(ilink)
print()
answered Jan 1, 2018 at 8:53
Elisha
4,9914 gold badges32 silver badges47 bronze badges
Sign up to request clarification or add additional context in comments.
1 Comment
Elisha
@how2code please accept the answer if it helped and solve your question
default