How do i find one string with another?

조회 수: 1(최근 30일)
Laurentiu Galan
Laurentiu Galan 2012년 1월 10일
Hi Guys,
I am trying to sequentially look for a string in a document and was wondering how I would go about doing that.
Essentially I have a large file called A.csv with a bunch of columns [date], [Open], [Low], [Close], [Volume], [Adj.Close], [Ret]
I want to write a script that will find a date 4/5/2000 and will pull the corresponding return for that date.
This is the trick: the day is a variable. All of the dates have a different month and year so it looks something like 1/3/2000, 2/4/2000, 3/1/2000. How do I find a match using the year and month. For example, I want to pull 1/*/2006, 7/*/2007, but I don't know what the * is (it could be 1, 2, 3, 4, 5, etc...)
The first row for example (skipping the header), looks like: 1/3/2000,78.75,78.94,58.13,66.19,1642300,62.37,0.569183903
Thank you for all of your help guys!
-Larry G.
  댓글 수: 1
Laurentiu Galan
Laurentiu Galan 2012년 1월 10일
I want to be able to pull the return (0.569183903) and place it in a vector.

댓글을 달려면 로그인하십시오.

채택된 답변

Andrew Newell
Andrew Newell 2012년 1월 11일
The trick is to use regular expressions. The first line below searches for any string that has one or more integers between '1/' and '/2000'. One line at a time is examined and the number extracted if there is a match.
match_str = '1/[0-9]+/2000';
match_vector = zeros(32000,1); % Use whatever size you're sure is large enough
fid=fopen('yourfile.m');
count=0;
tline = fgetl(fid);
while ischar(tline)
if regexp(tline,match_str)
A = textscan(tline,'%*s %*f %*f %*f %*f %*d %*f %f','delimiter',',');
count = count+1;
match_vector(count) = A{1};
end
end
fclose(fid);
  댓글 수: 5
Laurentiu Galan
Laurentiu Galan 2012년 1월 11일
attached update below

댓글을 달려면 로그인하십시오.

추가 답변(2개)

Walter Roberson
Walter Roberson 2012년 1월 11일
regexp(STRING, '^(?<=1/\d+/2006/.*,)[^,]+$', 'match', 'dotexceptnewline', 'lineanchors')

Laurentiu Galan
Laurentiu Galan 2012년 1월 11일
Thanks Andrew!! This is great. Now comes the really tough part. I want to loop the whole process.
Basically I want to open a file 'A' which contains a series of Ticker symbols (A.csv, B.csv, (about 8000 of them) and month and date information).
Then I want to open each individual file in a directory which are named based on the ticker symbols in file 'A'. Finally, I want to pull returns from each file using the month and date information also located in file A.
I don't expect you to be able to help me with this task as it is really extensive, but I was wondering if you wouldn't mind some sharing some insight as to how I can improve the whole process?
Your answer was more than sufficient as it has helped organize some of my thinking process. Thanks a bunch!! Any additional insight is greatly appreciated.
I attached my existing code to try and better explain what I mean:
%Code to Get Matrix
%fid=fopen('C:\Users\Laurentiu Galan\Desktop\pca1.csv');
C = textscan(fid, '%s %s %s %*s %*s %*s %*s', 'delimiter', ',', ...
'HeaderLines', '1');
fclose(fid);
%Strcat Identifier
tickername=C{1}
year=C{2};
month=C{3};
%Get Size of Loop for filepath
D=size(tickername);
numval=D(1,1);
%Create Loop for filepaths
for i=1:numval;
filepath(i,1) = strcat('C:\Users\Laurentiu Galan\Desktop\', tickername(i,1), '.csv');
end;
%Create Matching value
for i=1:numval;
ssearch(i,1) = strcat(month(i,1), '/[0-9]+/', year(i,1));
end;
%Open file where search will take place (path name will be looped)
match_str = '1/[0-9]+/2000'; %(This will also be looped based ssearch
match_vector = zeros(32000,1);
fid=fopen('-------'); %<- Loop for each file goes here
count=0;
tline = fgetl(fid);
while ischar(tline)
if regexp(tline,match_str)
A = textscan(tline,'%*s %*f %*f %*f %*f %*d %*f %f','delimiter',',');
count = count+1;
match_vector(count) = A{1};
end
end
fclose(fid);
%Then I want to output a file with all the returns and ticker symbols on
%the desktop
  댓글 수: 1
Andrew Newell
Andrew Newell 2012년 1월 11일
I don't see anything obvious. For any code I suggest the following sequence: (1) test it thoroughly to make sure it works; (2) run it with the MATLAB Profiler and see where the code is spending most of its time; and (3) look for ways to speed up that part of the code.

댓글을 달려면 로그인하십시오.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by