I\'m using bs4 to parse a html page and extract a table, sample table given below and I\'m trying to load it into pandas but when i call pddataframe = pd.read_html(LOT
This exact code works for me.
htm = """<table cellpadding="5" cellspacing="0" class="borders" width="100%">
<tr>
<th colspan="2">
Learning Outcomes
</th>
</tr>
<tr>
<td class="info" colspan="2">
On successful completion of this module the learner will be able to:
</td>
</tr>
<tr>
<td style="width:10%;">
LO1
</td>
<td>
Demonstrate an awareness of the important role of Financial Accounting information as an input into the decision making process.
</td>
</tr>
<tr>
<td style="width:10%;">
LO2
</td>
<td>
Display an understanding of the fundamental accounting concepts, principles and conventions that underpin the preparation of Financial statements.
</td>
</tr>
<tr>
<td style="width:10%;">
LO3
</td>
<td>
Understand the various formats in which information in relation to transactions or events is recorded and classified.
</td>
</tr>
<tr>
<td style="width:10%;">
LO4
</td>
<td>
Apply a knowledge of accounting concepts,conventions and techniques such as double entry to the posting of recorded information to the T accounts in the Nominal Ledger.
</td>
</tr>
<tr>
<td style="width:10%;">
LO5
</td>
<td>
Prepare and present the financial statements of a Sole Trader in prescribed format from a Trial Balance accompanies by notes with additional information.
</td>
</tr>
</table>
"""
pd.read_html(htm, skiprows=2, flavor='bs4')[0]
Pandas can guess.
HTML = '''\
<table cellpadding="5" cellspacing="0" class="borders" width="100%">
<tr>
<th colspan="2">
Learning Outcomes
</th>
... omitting most of what you had here
Prepare and present the financial statements of a Sole Trader in prescribed format from a Trial Balance accompanies by notes with additional information.
</td>
</tr>
</table>'''
from io import StringIO
import pandas as pd
df = pd.read_html(StringIO(HTML))
print (df)
Result:
[ 0 \
0 Learning Outcomes
1 On successful completion of this module the le...
2 LO1
3 LO2
4 LO3
5 LO4
6 LO5
1
0 NaN
1 NaN
2 Demonstrate an awareness of the important role...
3 Display an understanding of the fundamental ac...
4 Understand the various formats in which inform...
5 Apply a knowledge of accounting concepts,conve...
6 Prepare and present the financial statements o... ]
Thanks for the pointers from all the suggested answers and comments, my rookie mistake was I had the table in a variable after extracting it using bs4.
I was running pd.read_html(LOTable,skiprows=2, flavor='bs4')
when I needed to run pd.read_html(LOTable.prettify(),skiprows=2, flavor='bs4')