Fast Gene Sequence Search for Very Large Data File

버전 1.6.0.0 (3.85 KB) 작성자: Binlin Wu
Search for a specific sequence and record a neighboring code sequence with an offset position.
다운로드 수: 262
업데이트 날짜: 2011/7/1

라이선스 보기

A research fellow at Harvard asked me to write a program to search for gene sequence, such as ‘TCC’, and record the next 4 codes. The data file was 14Gb. He tried some matlab codes, and the system froze, or kept running and never stopped.

I first tested using a loop method (V1.0). It turned out it would take a month to finish 14Gb data on my 1.8GHz Core 2 Duo/3Gb RAM PC. Then I updated it to use matrix. It turned out it would only take 1.3 hours on my 1.8Gb PC or 40 minutes on my 2.33GHz Core 2 Duo/2Gb RAM PC. It beat any codes that he got using Python or other languages.

I put the file here, and hopefully it will be useful to the people with the same situation.

인용 양식

Binlin Wu (2024). Fast Gene Sequence Search for Very Large Data File (https://www.mathworks.com/matlabcentral/fileexchange/31966-fast-gene-sequence-search-for-very-large-data-file), MATLAB Central File Exchange. 검색됨 .

MATLAB 릴리스 호환 정보
개발 환경: R2010b
모든 릴리스와 호환
플랫폼 호환성
Windows macOS Linux
카테고리
Help CenterMATLAB Answers에서 Large Files and Big Data에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!
버전 게시됨 릴리스 정보
1.6.0.0

Function renamed and all names made consistent.

1.2.0.0

Added a parameter nHL, which is used to specify how many headlines you want to remove from the data. It is 0 by default.

1.0.0.0