Minimize Difference in Partition Sums from Experimental Data

Question

0 개 추천

I have a 2,000 row dataset representing measured data. I would like to create 60 partitions of 30 measurements (discarding the 200 outliers or "worst contributors") with the partitions created to minimize the differences in the sum of the measurements in each partition. Or, perhaps more simply, I want each partition to have as close as possible to the same sum of the measurements in partition.

My first attempt was based on a random sampling approach, which was inefficient as expected. I am considering a histogram-based approach for my next attempt, but wanted to sample the community for ideas or best practices first.

Thanks!

댓글 수: 2
없음 표시 없음 숨기기

Bjorn Gustavsson 2020년 11월 26일

This sounds like a variant of the knapsack-problem - that similarity makes me think that it is a hard problem, but that also means that there should be algorithms for this available...

Bruno Luong 2020년 11월 27일

편집: Bruno Luong 2020년 11월 27일

Do you really want to find the best partitions (which is very hard to solve) or you just want to have partitions having the sums that are "close enough"?

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Follow Question

Answer 1

Bjorn Gustavsson 2020년 11월 26일

0 개 추천

Yup, you will find the algorithm and information here: partition problem (wikipedia).

HTH

댓글 수: 2
없음 표시 없음 숨기기

Jeffrey Corbets 2020년 11월 27일

I looked at those algorithms and they seem limited to the sets of positive integers without easy ways to generalize to real numbers. In my situation the measurements are real numbers and the differences between the measurements are to the right of the decimal place.

The knapsack problem is a good generalization of my situation. I will read up and see if any of the algorithms available will meet my needs.

Thanks!

Bjorn Gustavsson 2020년 11월 27일

That's a bit of a bummer - and also slightly confusing to me, I cannot see why they should be that restricted, perhaps that is somewhat of a artificial limitation (since you will be using finite precision numbers they are not really real numbers anyway?). Perhaps some of the algorithms can be adapted anyway....

댓글을 달려면 로그인하십시오.

Minimize Difference in Partition Sums from Experimental Data

댓글 수: 2
없음 표시 없음 숨기기

답변 (1개)

댓글 수: 2
없음 표시 없음 숨기기

카테고리

제품

릴리스

태그

Community Treasure Hunt

Minimize Difference in Partition Sums from Experimental Data

댓글 수: 2 없음 표시 없음 숨기기

답변 (1개)

댓글 수: 2 없음 표시 없음 숨기기

카테고리

제품

릴리스

태그

참고 항목

Community Treasure Hunt

댓글 수: 2
없음 표시 없음 숨기기

댓글 수: 2
없음 표시 없음 숨기기