Data manipulation of a financial time series

Hi all I have a 1x1 Struct called "ans". Within this I have a textdata matrix of size 5 million x 4.
Now in this textdata I have 4 columns, 1st column is dates, 2nd column is time, 3rd column is prices and 4th column is symbol.
OBJECTIVE
I need to rearrange the 2nd column times so it gives me the prices in 30 second intervals. Currently it is not regular.
Example of current format of column 2 is:'18:42:20.000' '18:42:22.000' '18:42:30.000' '18:42:40.000' '18:42:40.000' '18:42:40.000' '18:42:40.000' '18:42:40.000' '18:42:43.000' '18:42:49.000' '18:43:00.000'
I need to do it so that it gives me the times only every 30 seconds and the corresponding price (column 3) at that specific time. The goal is to have regular 30 second intervals alongside the price at that current time. So we only need to work with columns 2 and 3.
Please message me if I never explained properly, and thanks so much for the help in advance.

댓글 수: 4

José-Luis
José-Luis 2012년 10월 5일
So you have 5e6 data points. If they are taken every second, that gives around 57 days of data. How do you distinguish between days? How do you want to interpolate when you have several data with the same time stamp?
'18:42:40.000' '18:42:40.000' '18:42:40.000'
Mate 2u
Mate 2u 2012년 10월 5일
I do not need to distinguish between days. In your example the price at 18:43:00 would be the last price at 18:42:40 as it stays the same from the last 42.40 to 43......so we have to think of it as a financial application
José-Luis
José-Luis 2012년 10월 5일
Well, you would still need to distinguish between days then. Say the price in Monday at 18:40:02 is 412 bananas and on Tuesday at 18:40:02 it is 24 bananas. That would have an impact in your calculations. Similary, you have several data with the same time stamp:
  • 18:42:00 12 bananas
  • 18:42:00 13 bananas
  • 18:42:00 0 bananas
Which is the valid one? Are the prices guaranteed to be the same? Or how do you calculate it? The average? The minimum? The maximum? The last one? Or maybe I am missing something?
Mate 2u
Mate 2u 2012년 10월 5일
Hi, you are right 1st column is dates in the format of 01/03/2007...
The prices are not guaranteed to be the same.
Let me explain properly: I need the prices in 30 second even intervals. Currently it is uneven. If there are multiple prices for a certain time that corresponds to many trades occuring in a second and the last price is what the asset would remain until the next trade? Does this help?

댓글을 달려면 로그인하십시오.

답변 (1개)

José-Luis
José-Luis 2012년 10월 5일
편집: José-Luis 2012년 10월 5일

0 개 추천

Here is what you can do, assuming your_data is a cell matrix containing strings.
your_data{1} = '01/08/2007';
your_data{1,2} = '18:04:01';
your_data{2,1} = '04/08/2007';
your_data{2,2} = '18:05:01';
your_data{1,3} = '12';
your_data{2,3} = '13';
%Getting Matlab's date number
your_dates = cellfun(@(a,b) datenum([a ' ' b],'dd/mm/yy HH:MM:SS'),...
your_data(:,1),your_data(:,2));
%Creating time stamp with 1/2 hour interval
interpDates = (floor(your_dates(1)):1/2880:ceil(your_dates(end)));
interpDates(interpDates < your_dates(1)) = [];
interpDates(interpDates > your_dates(end)) = [];
%Keeping only the last time stamp, from the data
[dummy idx dummy] = unique(your_dates,'last');
your_dates = your_dates(idx);
%Getting data values
your_bananas = cellfun(@(a) str2double(a), your_data(:,3));
your_bananas = your_bananas(idx);
%Function to find the nearest value
find_banana = @(x) find(your_dates>=x,1,'first'); %There is probably a more efficient way
%Interpolating
your_interp_bananas = arrayfun(@(x) your_bananas(find_banana(x)),interpDates);

댓글 수: 4

Mate 2u
Mate 2u 2012년 10월 5일
편집: Mate 2u 2012년 10월 5일
Hi there, thanks,
It does not seem to work. Could you please see my data sample and desired output below. It is only a sample.
Mate 2u
Mate 2u 2012년 10월 5일
편집: Mate 2u 2012년 10월 5일
Data sample:
'01/03/2007' '15:30:06.000' '55.90' 'CTH07'
'01/03/2007' '15:30:30.000' '55.75' 'CTH07'
'01/03/2007' '15:30:42.000' '55.80' 'CTH07'
'01/03/2007' '15:30:53.000' '55.85' 'CTH07'
'01/03/2007' '15:30:57.000' '55.75' 'CTH07'
'01/03/2007' '15:31:17.000' '55.70' 'CTH07'
'01/03/2007' '15:31:23.000' '55.65' 'CTH07'
'01/03/2007' '15:31:36.000' '55.55' 'CTH07'
'01/03/2007' '15:31:38.000' '55.60' 'CTH07'
'01/03/2007' '15:31:43.000' '55.55' 'CTH07'
'01/03/2007' '15:31:44.000' '55.60' 'CTH07'
'01/03/2007' '15:31:50.000' '55.70' 'CTH07'
'01/03/2007' '15:32:07.000' '55.55' 'CTH07'
'01/03/2007' '15:32:07.000' '55.90' 'CTH07'
'01/03/2007' '15:40:41.000' '55.30' 'CTH07'
'01/03/2007' '15:40:43.000' '55.40' 'CTH07'
'01/03/2007' '15:40:52.000' '55.30' 'CTH07'
'01/03/2007' '15:40:54.000' '55.50' 'CTH07'
'01/03/2007' '15:41:33.000' '55.15' 'CTH07'
OUTPUT REQUIRED
'01/03/2007' '15:30:30.000' '55.75' 'CTH07'
'01/03/2007' '15:31:00.000' '55.75' 'CTH07'
'01/03/2007' '15:31:30.000' '55.65' 'CTH07'
'01/03/2007' '15:32:00.000' '55.70' 'CTH07'
'01/03/2007' '15:32:30.000' '55.90' 'CTH07'
'01/03/2007' '15:33:00.000' '55.90' 'CTH07'
'01/03/2007' '15:33:30.000' '55.90' 'CTH07'
etc.......
José-Luis
José-Luis 2012년 10월 5일
편집: José-Luis 2012년 10월 5일
Modified original code. I thought it was 30 min interval. To do a 30 second interval, use:
interpDates = (floor(your_dates(1)):1/2880:ceil(your_dates(end)));
Also I realized an error in the find_bananas function, it should have been written the other way around:
find_banana = @(x) find(your_dates>=x,1,'first');
Mate 2u
Mate 2u 2012년 10월 5일
OK, I am running it on my 5,000,000 x 4 cell array. It is Busy, I will let you know when it has run and if it works.

댓글을 달려면 로그인하십시오.

카테고리

도움말 센터File Exchange에서 Financial Toolbox에 대해 자세히 알아보기

질문:

2012년 10월 5일

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by