How to extract specific rows with requirements from multiple CSV files?

조회 수: 3 (최근 30일)
Hi,
I am having a question regarding to data extracting and need your help.
I have roughly 700 CSV files under three folders with the same format. Each CSV contains 40 columns, and the columns I am interested in are the 32th and 33th column, which are "Latitude" and "Longitude".
Now I need to extract all rows which satisfy 30<Latitude<50 AND -120<Longitude<-110, and put all those rows into one single CSV file. What I have known so far is to open the file, textscan those data into 40 different cells, and close the file.
Could someone help me on this issue? Thanks!
Typical CSV files are like:
Date Time ...... Latitude Longitude ......
08/27 7:00 ...... 33.44 110.43 ......
08/28 7:00 ...... 35.44 160.43 ......
08/29 7:00 ...... 33.44 120.48 ......
08/29 7:00 ...... 32.44 150.42 ......
08/29 7:00 ...... 38.44 130.46 ......
08/29 7:00 ...... 53.44 110.63 ......
......
Two out of 600 sample CSV files have been attached.
  댓글 수: 2
Akira Agata
Akira Agata 2017년 8월 28일
To import CSV file, readtable function would be more easier, because it reads csv data and saves it to table format.
Regarding an extraction of specific rows in table, Access Data in a Table page will be helpful.
If you could upload your csv file here, I (...or someone) can provide more detailed solution. So, I would recommend uploading your sample data.
Ryan Wang
Ryan Wang 2017년 8월 28일
Thank you very much Akira. I just uploaded two csv files, where I have nearly 600 of them. Specifically, the latitude and longitude columns are the 32th and the 33th, and their requirements are 33.975~33.982, and -117.373~-117.331, respectively.

댓글을 달려면 로그인하십시오.

채택된 답변

Akira Agata
Akira Agata 2017년 8월 28일
Hi Ziran-san, thank you for uploading some sample csv files. Thanks to these files, I could understand the point!
Based on these files, I believe the solution for your problem will be like the following. In this code, I assumed that your csv files are stored in ./data folder.
But looking at your sample csv files, there are no rows that matches your condition. So, using your two csv files, the output of the following code is empty (I hope this is simply due to these two sample files...). Anyway, I would be happy if my answer could be your help! :-)
% List of CSV files
dataFolder = './data/'
list = dir([dataFolder,'*.csv']);
% CSV data format
format = ['%{MM/dd/yy HH:mm:ss}D',repmat('%f',1,39)];
% Extraction condition
latLim = [33.975, 33.982];
lonLim = [-117.373, -117.331];
% Read each CSV file, extract rows and store them to 'Output'
Output = table();
for kk = 1:numel(list)
data = readtable(...
fullfile(list(kk).folder, list(kk).name),...
'HeaderLines',2,...
'Format', format);
idxLat = (data{:,32} >= latLim(1)) & (data{:,32} <= latLim(2));
idxLon = (data{:,33} >= lonLim(1)) & (data{:,33} <= lonLim(2));
idx = idxLat & idxLon;
Output = [Output; data(idx,:)];
end
  댓글 수: 3
Ryan Wang
Ryan Wang 2017년 9월 4일
Hi Akira,
Just one further question. I am getting an error "Reference to non-existent field 'folder' today. Before this happened, the code works pretty well. Do you know why does this happen?
Thanks,
Ziran

댓글을 달려면 로그인하십시오.

추가 답변 (0개)

카테고리

Help CenterFile Exchange에서 Cell Arrays에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by