Deriving specific rows from a large text files
이전 댓글 표시
Hello! I am very new at MATLAB and am still learning. I have a 66 million row dataset in the form of a text file. I need to pull certain rows from the list to create a new list. Specifically, the data presents as:
Time,Pressure,Sea pressure,Depth
2018-08-25 16:10:26.000,10.2011833,0.0686833,0.0681232
The data isn't in separate columns, it's comma separated (except date and time, which are separated by a space). The new list needs to contain anything with a time stamp that ends in milliseconds of:
.000
.125
.375
.500
.625
.750
.875
I am not sure what the best way would be to address this. Any help would be greatly appreciated!
댓글 수: 14
jonas
2018년 10월 28일
The best way to approach it would be to upload a sample of your textfile and let someone figure it out for you!
Walter Roberson
2018년 10월 28일
Perhaps you should create a timetable object and use retime()
lostatsea
2018년 10월 28일
dpb
2018년 10월 28일
Open it in the editor and select a few lines and paste them in another file...
lostatsea
2018년 10월 28일
dpb
2018년 10월 28일
Are the asterisks actually in the file or is that a fignewton of editing the data done in posting?
lostatsea
2018년 10월 28일
Walter Roberson
2018년 10월 28일
TT1 = table2timetable(readtable('YourFilenameGoesHere.txt'));
TT2 = retime(TT1,'regular', 'TimeStep', seconds(1/8));
That's good thought to use the 1/8th second for the time vector, Walter. That's newer syntax than I have, however...only way w/ R2017b would be to generate the full-length newtime vector explicitly, hence the selection process.
I wasn't aware of the seconds(1/8) syntax, either.
lostatsea
2018년 10월 28일
lostatsea
2018년 10월 28일
jonas
2018년 10월 28일
That's quite common, it just means that the variable headers are not valid identifiers, i.e. they cannot be used as variable names. For example "Sea pressure" is not valid.
lostatsea
2018년 10월 28일
dpb
2018년 10월 28일
The function Walter used is retime not etime.
채택된 답변
추가 답변 (1개)
dpb
2018년 10월 28일
Read as table or timetable; deal with the asterisks as needs must depending on whether they're real or not...
Two alternatives, retime to a specific vector means making a new time vector; the alternative would be to find the locations matching the desired times..
t=readtable('yourfile');
ix=ismember((t.Time-fix(t.Time))*1000,[0:7]*125)); % locate millisec multiples of 125
t=t(ix,:); % save those into the desired table
댓글 수: 15
Walter Roberson
2018년 10월 28일
I would probably add a round() around the millisecond calculation.
If it weren't being read from a file that appears to have three significant fractional digits I'd've done so, Walter. Looks like (at least from the sample record posted) that there are only three digits in the time stamp fractional seconds so the output routine will have already done the rounding for us here...
lostatsea
2018년 10월 28일
Walter Roberson
2018년 10월 28일
Because datetimes are not internally represented in decimal, input in decimal to 3 digits might not match to representation, so better to round() to be sure.
lostatsea
2018년 10월 28일
lostatsea
2018년 10월 28일
dpb
2018년 10월 28일
Oh. I presumed you had a header line like the one shown in the original question.
If there aren't, ML will use Var1, Var2, etc., ...
After readtable use\
t.Properties.VariableNames
to see what actually were imported as.
You can also do things like
t(1:4,:)
to just see the first lines of the table...
Poke around... :)
lostatsea
2018년 10월 29일
dpb
2018년 10월 29일
Use the optional 'ReadVariableNames' parameter if you need to...read the documentation; there's a lot of information there plus examples to guide...
dpb
2018년 10월 29일
"Because datetimes are not internally represented in decimal, ..."
But N*(1/8) is exactly representable and since that's the value that was wanted I used the fact... :)
In absolute general terms, there may be some value that isn't but I'd guess the rounding in the i/o routines from the internal datetime conversion process will be the same rounding so won't actually be an issue. But it couldn't hurt, granted, to use both the belt and the suspenders. :)
lostatsea
2018년 11월 1일
Walter Roberson
2018년 11월 2일
편집: dpb
2018년 11월 2일
"But N*(1/8) is exactly representable and since that's the value that was wanted I used the fact... :)"
However, you are assuming that the f[r]action of a seconds are stored as a separate floating point field, which is not the case. The time is represented as a "double-double stored as a complex" counting seconds since some epoch.
Not exactly. I was just counting on the end decimal fraction of 1/2 being stored as exact in whatever form it is internally(*); how TMW had chosen to do it internally, specifically, I didn't know. That the input was coming from a fixed format being converted rather than some other possibly subject to other rounding floating point calculation made me believe it would have all the rounding it needed from the rtl i/o conversion routines.
"Assumptions"? Yes, but I think very reasonable ones in the case.
As noted, "belt and suspenders" can't hurt even if it doesn't make any difference in a given case.
(*) And I was comfortable there is more than sufficient precision allocated in that the documentation guarantees precision to six digits (last one rounded) beyond the milliseconds needed:
--S, SS,...,SSSSSSSSS Fractional second, using the number of digits specified
by the number of instances of 'S' (up to 9 digits).
in the format descriptors.
Walter Roberson
2018년 11월 3일
t = readtable('sample1.txt'); %up to R2018b still cannot autodetect times with fractions of a second, even if detectImportOptions is used
tt = datetime(t.Time, 'inputformat', 'yyyy-MM-dd HH:mm:ss.SSS');
s = mod(second(tt),1) * 1000;
mask = ismember(s, 0:125:875);
selected = t(mask,:);
However, with that particular data, this can be shorted to
t = readtable('sample1.txt');
selected = t(1:2:end, :);
dpb
2018년 11월 4일
Trudat if is complete dataset...
카테고리
도움말 센터 및 File Exchange에서 Cell Arrays에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!