Using pd.read_html to return a specific table from a webpage of multiple tables

68
April 12, 2022, at 09:40 AM

I am trying to return a specific table from this webpage.

I have tried to scrape it using beautifulsoup but that got way too complicated so I an trying to use pd.read_html instead. The table I'm after is the one with "XYZ Substantial Shareholders" (note that this is just one particular page, I would be using this code to return tables from other stocks as well).

This is the code that I'm currently using to search for the tables - my initial idea was to find the word 'Holding' in the header of the title I'm after, but it has also picked up another table which I didn't want because the word 'Holdings' appeared in an earlier table.


# Scrape the Substantial Shareholder table if it exists
sub_table = pd.read_html(current_url, match='Holding')
print(sub_table)

Current output

                    Company   Mkt Cap         2018 A  2019 A   2020 A    2018 A     2019 A    2020 A             2018 A 2019 A 2020 A
0              Altium (ALU)   $4,528M         0.1377  0.3989   0.2353   96.9595    69.3097   56.1095              1.55%  1.28%  1.46%
1  Brainchip Holdings (BRN)   $2,974M         0.6838  0.0000   0.0000    0.0000     0.0000    0.0000              0.00%  0.00%  0.00%
2      Technology One (TNE)   $3,299M         0.1484  0.1679   0.1574   45.3375    38.8213   33.5414              1.36%  1.59%  1.84%
3     Wisetech Global (WTC)  $14,241M         1.2490  0.3624   0.2610  124.5434    91.4118   72.4917              0.15%  0.20%  0.30%
4                Xero (XRO)  $16,610M         2.6697 -0.8109  18.9607  920.3138  4866.8122  243.8197              0.00%  0.00%  0.00%,                 Holding                                        Name
0  176,305,508 (10.82%)                          Peter Van Der Made
1   119,205,377 (7.31%)  Merrill Lynch (Australia) Nominees Pty Ltd
2      114,093,915 (7%)   HSBC Custody Nominees (Australia) Limited]
Process finished with exit code 0

Desired output

Holding                                        Name
0  176,305,508 (10.82%)                          Peter Van Der Made
1   119,205,377 (7.31%)  Merrill Lynch (Australia) Nominees Pty Ltd
2      114,093,915 (7%)   HSBC Custody Nominees (Australia) Limited]

Is there a way for me to specifically match only looking at headers rather than the entire page content?

Answer 1

You simply need to do this:

current_url = 'https://www.intelligentinvestor.com.au/shares/asx-brn/brainchip-holdings-ltd'
sub_table = pd.read_html(current_url, match='Holding')
print(sub_table[1])

Which returns:

          Holding                                        Name
0  176,305,508 (10.82%)                          Peter Van Der Made
1   119,205,377 (7.31%)  Merrill Lynch (Australia) Nominees Pty Ltd
2      114,093,915 (7%)   HSBC Custody Nominees (Australia) Limited
​

while

current_url = 'https://www.intelligentinvestor.com.au/shares/asx-brn/brainchip-holdings-ltd'
sub_table = pd.read_html(current_url, match='Holding')
print(sub_table[0])

gives

               Company   Mkt Cap EPS Growth (%)                   \
                    Company   Mkt Cap         2018 A  2019 A   2020 A   
0              Altium (ALU)   $4,536M         0.1377  0.4001   0.2353   
1  Brainchip Holdings (BRN)   $2,794M         0.6838  0.0000   0.0000   
2      Technology One (TNE)   $3,296M         0.1484  0.1679   0.1574   
3     Wisetech Global (WTC)  $14,456M         1.2490  0.3624   0.2610   
4                Xero (XRO)  $16,684M         2.6697 -0.8101  18.8826   
    P/E (%)                      Dividend Yield (%)                
     2018 A     2019 A    2020 A             2018 A 2019 A 2020 A  
0   97.1284    69.3746   56.1615              1.54%  1.28%  1.46%  
1    0.0000     0.0000    0.0000              0.00%  0.00%  0.00%  
2   45.2931    38.7833   33.5085              1.36%  1.60%  1.84%  
3  126.4269    92.7943   73.5880              0.15%  0.20%  0.29%  
4  924.4426  4867.3913  244.8065              0.00%  0.00%  0.00%  
​
Rent Charter Buses Company
READ ALSO
Django - get data from form and return to user so they can edit/delete

Django - get data from form and return to user so they can edit/delete

New to Django and making a restuarant booking systemI have everything going to my database but now i'm trying to return the info back to the user, ideally in a similar form view to what i have, so they can edit or delete the booking

79
For python's built-in sort()'s key parameter, when to use base class vs object as identifier?

For python's built-in sort()'s key parameter, when to use base class vs object as identifier?

Let me take an example to clarify what I am talking aboutSay I'm trying to sort() a string s with a key of count (frequency of letters) and in a different case, with a key of lower

87
Get cpu temperature from python? (windows)

Get cpu temperature from python? (windows)

PleaseI have tried every google search result and none of them worked

128
How to go through a Django dictionary in HTML template

How to go through a Django dictionary in HTML template

Im new to Django and I don't really have a clear idea how the Django Template language works but I have a dictionary that is something like this:

106