convert a html table to csv format

조회 수: 8 (최근 30일)
FATEMEH
FATEMEH 2013년 4월 10일
I need to convert the table in the following url to csv format. Since I have to convert many tables, I can't use cope paste. http://climate.weatheroffice.gc.ca/climate_normals/results_e.html?stnID=2046&lang=e&dCode=0&province=ALTA&provBut=Search&month1=0&month2=12
  댓글 수: 1
Matt Kindig
Matt Kindig 2013년 4월 10일
Do you have to use Matlab for this purpose? The reason I ask is because other languages that are more commonly used for website development have good HTML parsing capabilities, whereas such features are more limited in Matlab--in Matlab you'd basically have to resort to complex regexp statements.
I would recommend Python and the BeautifulSoup package to do this, actually.

댓글을 달려면 로그인하십시오.

답변 (2개)

Jan
Jan 2013년 4월 11일
You can import the HTML table to Matlab at first by FEX: htmltableToCell or FEX: get-html-table-data-into-matlab. Then an export to CSV depends on the contents of the data.

Cedric
Cedric 2013년 4월 10일
편집: Cedric 2013년 4월 10일
As Matt mentions, Python + package would be perfect for this part. Here is one way to do it using REGEXP in MATLAB.. not the full stuff though, but enough to illustrate.
% - Get HTML page.
url = 'http://climate.weatheroffice.gc.ca/climate_normals/results_e.html?stnID=2046&lang=e&dCode=0&province=ALTA&provBut=Search&month1=0&month2=12' ;
buffer = urlread(url) ;
% - Extract horizontal header.
p = '(?<=<td class="dataTableColHeader">).*?(?=</td>)' ;
hheader = regexp(buffer, p, 'match') ;
% - Extract vertical header.
p = '(?<=<td class="dataTableRowHeader">).*?(?=</td>)' ;
vheader = regexp(buffer, p, 'match') ;
% - Extract/reshape data.
p = '(?<=<td class="dataTableRowData">).*?(?=</td>)' ;
data = regexp(buffer, p, 'match') ;
data = reshape(data, 12+2, []).' ;
% - Build and export the whole.
content = [vheader.',[hheader; data]] ;
xlswrite('example.xlsx', content) ;
Let me know if you want to go this way and I can improve a little this code. There would be still quite a bit of work to do on your side, e.g. to manage some inconsistency in the way they build the HTML table, to detect/manage failures in the processing, to export to CSV instead of XLSX, etc.

카테고리

Help CenterFile Exchange에서 Tables에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by