## Deleting X-Y points that are not near other points on a field of data points

charles atlas

### charles atlas (view profile)

님이 질문을 제출함. 6 Jun 2012
per isakson

### per isakson (view profile)

님이 답변을 채택함.
I have a set of data. This data is around 900 rows of two columns. Each row has an X and a Y value which specifies a point on the X-Y plane. The X-Y plane is from 0 to 100 and 0 to 100 respectively. All of these points are randomly scattered throughout the X-Y plane. My problem is there are too many X-Y points cluttering up the scatter plot. So what I want to do is have Matlab look at each point and say: Is this point a distance of 10 or less to another point. If it is then keep it. If it isn’t then delete the row containing that X, Y value. A shortened example of my data:
X=[1 2 3 4 20];
Y=[1 3 4 3 59];
Since (20,59) is more than a distance of 10 away from the other points, delete it and return the following:
X2=[1 2 3 4];
Y2=[1 3 4 3];
If anyone knows how I could do this, It would be a very great thing.

로그인 to comment.

## 답변 수: 3

per isakson

### per isakson (view profile)

님의 답변 6 Jun 2012
채택된 답변

See Doug's video Advanced: making a 2d or 3d histogram to visualize data density and search the FEX for "hist2"
I failed to find a solution in the FEX. Here is a naive code with "10" hard-coded in the magic number "100".
X=[1,2,3,4,20];
Y=[1,3,4,3,59];
to_be_removed = false(size(X));
for ii = 1 : length(X)
is = (X-X(ii)).^2+(Y-Y(ii)).^2 <= 100;
is(ii) = false;
if not( any( is ) )
to_be_removed(ii) = true;
end
end
X(to_be_removed)=[];
Y(to_be_removed)=[];

charles atlas

### charles atlas (view profile)

7 Jun 2012
I had to tweak the code a little bit to fit my actual data,.. but it worked well, Thank you

로그인 to comment.

Geoff

### Geoff (view profile)

님의 답변 6 Jun 2012

Naive (brute force) implementation given by per isakson looks sufficient for this problem. O(N^2) is okay for 900 rows. For larger sets, I'd consider partitioning the points into a quad tree.
However, without making things complicated, I would say that the number of candidates for removal will be small due to your X and Y range. You could easily speed up the naive algorithm by first approximating the local point-density into a 21x21 array (cel-sizes of 5 with extra one for the ends) and then only do a search on points that are unique to a cel address.

charles atlas

### charles atlas (view profile)

7 Jun 2012
Sorry I havent been able to get into the office and test the code until today.
the read data is latitude and longitudes but for simplicity's sake, I said it was 0 to 100 on the X and Y axis (which would actually be the longitude and latitude axes respectively.
The code did what it was supposed to do when I tested it, but It neglected half the values that were jumbled together (that is at a distance of about 600 yards away, aka <= .005 as a difference in lat and long squared, added and then square rooted.
charles atlas

### charles atlas (view profile)

7 Jun 2012
That is; I used isakson's code from above.

로그인 to comment.

Image Analyst

### Image Analyst (view profile)

님의 답변 7 Jun 2012

If you displayed it as an image instead of a scatterplot, you wouldn't have that problem. Why not give it a try?

per isakson

### per isakson (view profile)

7 Jun 2012
That's what Doug shows

로그인 to comment.