I have a string which contains the source code of a html file extracted through mechanize library.
The html file will always contain a table like this. I want to convert the table to CSV Format.
Several SO questions which address the same problem have the table with a class name. But my table doesnt have a class attribute. So what should i do...?
<table border=1 cellPadding="2" cellSpacing="0" width="75%" bordercolor="#000000" >
<tr bgcolor="mediumblue">
<td width="20%"><p align="center"><font face="Arial" color="white" size="2"><strong>SUB CODE</strong></font></p></td>
<td width="26%"><p align="left"><font face="Arial" color="white" size="2"><strong>SUB NAME</strong></font></p></td>
<td width="13%"><p align="left"><font face="Arial" color="white" size="2"><strong>THEORY</strong></font></p> </td>
<td width="10%"><p align="left"><font face="Arial" color="white" size="2"><strong>PRACTICAL</strong></font></p> </td>
<td width="17%"><p align="left"><font face="Arial" color="white" size="2"><strong>MARKS</strong></font></p></td>
<td width="14%"><p align="center"><font face="Arial" color="white" size="2"><strong>GRADE</strong></font></p></td>
</tr>
<tr bgColor="#ffffff">
<td align="middle"><font face="Arial" size=2> 301</font></td>
<td align="left" ><font face="Arial" size=2>ENGLISH CORE</font></td>
<td align="left" ><font face="Arial" size=2>067</font></td>
<td align="left" ><font face="Arial" size=2></font></td>
<td align="left" ><font face="Arial" size=2>067 </font></td>
<td align="middle"><font face="Arial" size=2>C2</font></td>
</tr>
</table>
asked May 27, 2015 at 9:33
Rohith R
1,3272 gold badges17 silver badges38 bronze badges
-
Is there a reason why you cannot give it a class?Colum– Colum2015年05月27日 09:37:24 +00:00Commented May 27, 2015 at 9:37
-
because i am getting the code directly from a library called mechanizeRohith R– Rohith R2015年05月27日 09:38:50 +00:00Commented May 27, 2015 at 9:38
-
If there is only going to be one table in the string that you need to convert, can you identify it by the table element tag, rather than a class name?Colum– Colum2015年05月27日 09:40:21 +00:00Commented May 27, 2015 at 9:40
-
no...there are more than 1 tablesRohith R– Rohith R2015年05月27日 09:41:00 +00:00Commented May 27, 2015 at 9:41
1 Answer 1
pandas has a neat way to read html tables.
import pandas as pd
html_data = '''
<table border=1 cellPadding="2" cellSpacing="0" width="75%" bordercolor="#000000" >
<tr bgcolor="mediumblue">
<td width="20%"><p align="center"><font face="Arial" color="white" size="2"><strong>SUB CODE</strong></font></p></td>
<td width="26%"><p align="left"><font face="Arial" color="white" size="2"><strong>SUB NAME</strong></font></p></td>
<td width="13%"><p align="left"><font face="Arial" color="white" size="2"><strong>THEORY</strong></font></p> </td>
<td width="10%"><p align="left"><font face="Arial" color="white" size="2"><strong>PRACTICAL</strong></font></p> </td>
<td width="17%"><p align="left"><font face="Arial" color="white" size="2"><strong>MARKS</strong></font></p></td>
<td width="14%"><p align="center"><font face="Arial" color="white" size="2"><strong>GRADE</strong></font></p></td>
</tr>
<tr bgColor="#ffffff">
<td align="middle"><font face="Arial" size=2> 301</font></td>
<td align="left" ><font face="Arial" size=2>ENGLISH CORE</font></td>
<td align="left" ><font face="Arial" size=2>067</font></td>
<td align="left" ><font face="Arial" size=2></font></td>
<td align="left" ><font face="Arial" size=2>067 </font></td>
<td align="middle"><font face="Arial" size=2>C2</font></td>
</tr>
</table>
'''
print pd.read_html(html_data)[0].to_csv(index=False, header=False)
When where's multiple tables in html, you can check column names of the table, to remove unneeded ones.
answered May 27, 2015 at 9:45
tdihp
2,3642 gold badges23 silver badges42 bronze badges
lang-py