Importing table/array from website to Matlab
조회 수: 7 (최근 30일)
이전 댓글 표시
Hello. I am trying to import the central table from this web url 'https://tennisabstract.com/reports/atp_elo_ratings.html', from the players name to the "grass" column into MatLab, as an array or table. I want to do it this way because it keeps updating every week, but I do not know how to approach this problem which makes me need your help.
Thank you, Erik
댓글 수: 0
채택된 답변
Guillaume
2019년 7월 3일
First, note that importing data from html is always going to be very iffy. html is a presentation format designed to display things to humans, it's not design for data transfer and you're going to have to remove all the presentation cruft to get at your data.
So, the first thing you should try is contacting the website to see if they have a direct interface to the underlying database.
Bearing this in mind, the following will import your data with the current format of the website. Any change, even minor to the format of that page may break the code.
%definition of patterns used to locate required information with a table row:
intpattern = '<td[^>]*>(\d+)</td>';
linkpattern = '<td[^>]*><a[^>]+>([^<]+)</a></td>';
numberpattern = '<td[^>]*>(\d+(\.\d+)?)</td>';
anypattern = '<td[^>]*>([^<]+)</td>';
emptypattern = '<td[^>]*></td>';
%read and parse html
html = webread('https://tennisabstract.com/reports/atp_elo_ratings.html');
tabledata = regexp(html, ['<tr[^>]*>', ...
intpattern, ...
linkpattern, ...
numberpattern, ...
numberpattern, ...
emptypattern, ...
numberpattern, ...
numberpattern, ...
numberpattern, ...
emptypattern, ...
anypattern, ...
numberpattern, ...
numberpattern], 'tokens');
assert(~isempty(tabledata), 'Failed to parse html according to pattern. The format of the page may have changed');
tabledata = cell2table(vertcat(tabledata{:}), 'VariableNames', {'Rank', 'Player', 'Age', 'ELO', 'Hard', 'Clay', 'Grass', 'Peak_Match', 'Peak_Age', 'Peak_ELO'});
tabledata = convertvars(tabledata, [1, 3:7, 9, 10], @str2double);
tabledata.Player = strrep(tabledata.Player, ' ', ' ')
댓글 수: 5
Guillaume
2019년 7월 3일
Don't use c as a variable name. It's meaningless and doesn't say anything about what it contains.
Assuming, you've imported the data as tabledata:
>> elo(tabledata, 'Ivan Nedelko', 'Kevin King')
ans =
0.230209216637309
0.769790783362691
Guillaume
2019년 7월 3일
Note: you could calculate the odds for matches of every player against any player with:
Q = 10 .^ (tabledata.ELO / 400);
odds = Q ./ (Q + Q.');
odds(r, c) is then the odds of tabledata.Player{r} winning against tabledata.Player{c}
추가 답변 (1개)
참고 항목
카테고리
Help Center 및 File Exchange에서 Workspace Variables and MAT Files에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!