Deleting X-Y points that are not near other points on a field of data points

조회 수: 11 (최근 30일)
I have a set of data. This data is around 900 rows of two columns. Each row has an X and a Y value which specifies a point on the X-Y plane. The X-Y plane is from 0 to 100 and 0 to 100 respectively. All of these points are randomly scattered throughout the X-Y plane. My problem is there are too many X-Y points cluttering up the scatter plot. So what I want to do is have Matlab look at each point and say: Is this point a distance of 10 or less to another point. If it is then keep it. If it isn’t then delete the row containing that X, Y value. A shortened example of my data:
X=[1 2 3 4 20];
Y=[1 3 4 3 59];
Since (20,59) is more than a distance of 10 away from the other points, delete it and return the following:
X2=[1 2 3 4];
Y2=[1 3 4 3];
If anyone knows how I could do this, It would be a very great thing.

채택된 답변

per isakson
per isakson 2012년 6월 6일
See Doug's video Advanced: making a 2d or 3d histogram to visualize data density and search the FEX for "hist2"
I failed to find a solution in the FEX. Here is a naive code with "10" hard-coded in the magic number "100".
X=[1,2,3,4,20];
Y=[1,3,4,3,59];
to_be_removed = false(size(X));
for ii = 1 : length(X)
is = (X-X(ii)).^2+(Y-Y(ii)).^2 <= 100;
is(ii) = false;
if not( any( is ) )
to_be_removed(ii) = true;
end
end
X(to_be_removed)=[];
Y(to_be_removed)=[];
  댓글 수: 1
charles atlas
charles atlas 2012년 6월 7일
I had to tweak the code a little bit to fit my actual data,.. but it worked well, Thank you

댓글을 달려면 로그인하십시오.

추가 답변 (2개)

Geoff
Geoff 2012년 6월 6일
Naive (brute force) implementation given by per isakson looks sufficient for this problem. O(N^2) is okay for 900 rows. For larger sets, I'd consider partitioning the points into a quad tree.
However, without making things complicated, I would say that the number of candidates for removal will be small due to your X and Y range. You could easily speed up the naive algorithm by first approximating the local point-density into a 21x21 array (cel-sizes of 5 with extra one for the ends) and then only do a search on points that are unique to a cel address.
  댓글 수: 2
charles atlas
charles atlas 2012년 6월 7일
Sorry I havent been able to get into the office and test the code until today.
the read data is latitude and longitudes but for simplicity's sake, I said it was 0 to 100 on the X and Y axis (which would actually be the longitude and latitude axes respectively.
The code did what it was supposed to do when I tested it, but It neglected half the values that were jumbled together (that is at a distance of about 600 yards away, aka <= .005 as a difference in lat and long squared, added and then square rooted.
charles atlas
charles atlas 2012년 6월 7일
That is; I used isakson's code from above.

댓글을 달려면 로그인하십시오.


Image Analyst
Image Analyst 2012년 6월 7일
If you displayed it as an image instead of a scatterplot, you wouldn't have that problem. Why not give it a try?

카테고리

Help CenterFile Exchange에서 Data Distribution Plots에 대해 자세히 알아보기

태그

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by