Read csv strings, keep or create surrounding whitespace

조회 수: 6 (최근 30일)
Ben
Ben 2014년 6월 20일
편집: Cedric 2014년 6월 23일
I have a list of stop words that currently exists as a comma-separated list in a .txt file. The goal is to use that list to remove those words from some target text, but only when a given word (e.g. "and") appears by itself - remove "and", but don't make "sand" into "s". To that end, I tried manually putting spaces around all the words in the list, so "a,able,about" became " a , able , about ". However, the txtscan function stripped the spaces out. Is there a way to prevent it from doing that? Alternatively, if I use the original form of the list, can I tell txtscan to surround each string with spaces?
  댓글 수: 1
Cedric
Cedric 2014년 6월 20일
편집: Cedric 2014년 6월 20일
Could you give an example, like a sample file, and indicate precisely what you want to achieve? This seems to be a task for REGEXPREP.

댓글을 달려면 로그인하십시오.

채택된 답변

Cedric
Cedric 2014년 6월 20일
편집: Cedric 2014년 6월 20일
Here is an example that I can refine if you provide more information. It writes some keywords in upper case..
key = {'lobster', 'and'} ;
str = 'Lobster anatomy includes the cephalothorax which fuses the head and the thorax, both of which are covered by a chitinous carapace, and the abdomen. The lobster''s head bears antennae, antennules, mandibles, the first and second maxillae, and the first, second, and third maxillipeds. Because lobsters live in a murky environment at the bottom of the ocean, they mostly use their antennae as sensors.' ;
for kId = 1 : length( key )
pat = sprintf( '(?<=\\W?)%s(?=(s |\\W))', key{kId} ) ;
str = regexprep( str, pat, upper( key{kId} ), 'ignorecase' ) ;
end
Running this, you get
>> str
str =
LOBSTER anatomy includes the cephalothorax which fuses the head AND the thorax, both of which are covered by a chitinous carapace, AND the abdomen. The LOBSTER's head bears antennae, antennules, mandibles, the first AND second maxillae, AND the first, second, AND third maxillipeds. Because LOBSTERs live in a murky environment at the bottom of the ocean, they mostly use their antennae as sensors.
The REXEXP-based approach makes it possible to code for..
  • only if framed by non alphanumeric characters (e.g. ,),
  • unless following character is an 's',
  • unless at the beginning of the string.
  댓글 수: 21
Ben
Ben 2014년 6월 23일
Ah, I hadn't realized that regexp functions don't do their work all at once, as stringrep does. That should do it. Thank you so much!
Cedric
Cedric 2014년 6월 23일
편집: Cedric 2014년 6월 23일
You're welcome! Note that it could do its job all at once if you were passing a pattern which contains all keywords in an OR operation. Yet, it's often more efficient to apply several times a simple pattern than passing once an extra-long/complex one. That could/should be profiled for your specific case though if you wanted to optimize.

댓글을 달려면 로그인하십시오.

추가 답변 (0개)

카테고리

Help CenterFile Exchange에서 Language Support에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by