How to access data of regexp output (without temporary variable)?

Question

Simon 2023년 4월 26일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1953139-how-to-access-data-of-regexp-output-without-temporary-variable

댓글: Stephen23 2023년 5월 13일

I know how to convert the output from regexp( ) to a table by intermediating through a temporary variable as this solution:

strT = ["apple 001"; "banana 102"; "orange 344 001"];
C = regexp(strT, '\s', 'split', 'once'); % temporary variable
array2table(vertcat(C{:,:})) % desired output, except I hope there's a way to avoid braces.
ans = 3×2 table
      Var1        Var2   
    ________    _________

    "apple"     "001"    
    "banana"    "102"    
    "orange"    "344 001"

I wonder if there is a way to cascade/pipeline function calls in a single line. It seems that similar questions had been asked zillion times (this post said so.) But I want to learn more about Matlab, so forgive me for asking it again.

% I tried this, but it doesn't give the desired result.
cell2table(regexp(strT, '\s', 'split', 'once'))
ans = 3×1 table
            Var1         
    _____________________

    "apple"     "001"    
    "banana"    "102"    
    "orange"    "344 001"

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Stephen23 2023년 4월 26일

1
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1953139-how-to-access-data-of-regexp-output-without-temporary-variable#answer_1223414

편집: Stephen23 2023년 4월 26일

MATLAB Online에서 열기

The general answer to the question has not changed much since 2012.

Using a temporary variable is not "less efficient" as many beginners might imagine (most likely MATLAB will generate the intermediate cell array internally anyway).

https://www.mathworks.com/matlabcentral/answers/344486-generate-comma-separated-list-in-single-line-of-code

https://www.mathworks.com/matlabcentral/answers/1656435-tutorial-comma-separated-lists-and-how-to-use-them

str = ["apple 001"; "banana 102"; "orange 344 001"];
tbl = splitvars(cell2table(regexp(str, '\s', 'split', 'once')))
tbl = 3×2 table
     Var1_1      Var1_2  
    ________    _________

    "apple"     "001"    
    "banana"    "102"    
    "orange"    "344 001"

Note that this creates and discards an intermediate table, so uses more memory.

Personally I would recommend using the intermediate variable: while code golf is very entertaining, it does not serve the purpose of clarity and efficiency. In one year you will come back to this code any wonder what it does, how it works, and what side-effects it has. Code is read more times than it is written, so clarity when reading is valuable.

On top of that, hiding intermediate arrays within nested function call makes debugging harder.

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

Simon 2023년 4월 26일

There are several advices for good coding habit in your answer. Thanks again. (I gradually realize that a couple fantacies I learned from Python coding turn out not quite ok, oneliner functions being one.)

댓글을 달려면 로그인하십시오.

Answer 2

albara 2023년 4월 26일

1
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1953139-how-to-access-data-of-regexp-output-without-temporary-variable#answer_1223409

편집: per isakson 2023년 4월 26일

MATLAB Online에서 열기

You can access the data of the regexp output directly without using a temporary variable by using the syntax [~,~,~,match] = regexp(str, expr), where str is the input string and expr is the regular expression you want to match.

The regexp function returns multiple outputs, including the start and end indices of the match, any tokens captured by the regular expression, and the matched substring. By using the ~ symbol as a placeholder for the outputs you don't need, you can ignore them and only keep the matched substring.

Here is an example:

% Define input string and regular expression
str = 'The quick brown fox jumps over the lazy dog';
expr = 'brown';
% Use regexp to find the matching substring and access it directly
[~, ~, ~, match] = regexp(str, expr);
% Display the matching substring
disp(match)

In this example, regexp is used to find the first occurrence of the word "brown" in the input string str. The ~ symbol is used as a placeholder for the outputs we don't need, and the match variable is used to store the matching substring directly.

The output of this code will be the string 'brown'. Note that you can access any of the other outputs of regexp by using the corresponding placeholder variables, such as [start, end, tokens, match] = regexp(str, expr) to store the start and end indices and any tokens in separate variables.

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

albara 2023년 4월 26일

Here's an other example:

less

strT = ["apple 001"; "banana 102"; "orange 344 001"];

array2table(cell2mat(cellfun(@(x) str2double(x), regexp(strT, '\s', 'split', 'once'), 'UniformOutput', false)))

This code will split the strings using regexp and then convert the resulting cell array of strings to a cell array of doubles using str2double. Then, cell2mat is used to convert the cell array to a matrix, and finally array2table is used to convert the matrix to a table. The output will be:

makefile

ans =

Var1 Var2

_____ ____

apple 1

banana 102

orange 344

Note that the second column contains only the first number in the original strings. If you want to keep the entire second column, you can modify the regular expression to capture the entire second field, like this:

less

array2table(regexp(strT, '(\S+)\s+(\S+)', 'tokens', 'once'))

This will give you the desired output:

makefile

ans =

Var1 Var2

________ _________

"apple" "001"

"banana" "102"

"orange" "344 001"

Simon 2023년 5월 13일

MATLAB Online에서 열기

Your codes didn't return the same result in my Matlab version (2023a). The result your codes return is the same as the one below. I just checked out array2table( ) manual page. Its behavior has not changed. Why we have different results from the same codes?

strT = ["apple 001"; "banana 102"; "orange 344 001"];
array2table(cell2mat(cellfun(@(x) str2double(x), regexp(strT, '\s', 'split', 'once'), 'UniformOutput', false)))
ans = 3×2 table
    Var1    Var2
    ____    ____

    NaN       1 
    NaN     102 
    NaN     NaN 
array2table(regexp(strT, '(\S+)\s+(\S+)', 'tokens', 'once'))
ans = 3×1 table
            Var1         
    _____________________

    {["apple"    "001" ]}
    {["banana"    "102"]}
    {["orange"    "344"]}

Stephen23 2023년 5월 13일

MATLAB Online에서 열기

"Why we have different results from the same codes?"

Within a regular expression the meta-character \S only matches non-whitespace characters:

https://www.mathworks.com/help/matlab/matlab_prog/regular-expressions.html#f0-42723

"orange"    "344 001"
%               ^ so there is no way it could return this

댓글을 달려면 로그인하십시오.

Answer 3

Rik 2023년 4월 26일

1
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1953139-how-to-access-data-of-regexp-output-without-temporary-variable#answer_1223419

MATLAB Online에서 열기

If you insist on a single line, you can even use subsref to index the result of regexp. That will make your code hard to read and hard to modify.

Why don't you write a wrapper function you can call regexp2table like this:

strT = ["apple 001"; "banana 102"; "orange 344 001"];
regexp2table(strT, '\s', 'split', 'once') % Look, ma, one line
ans = 3×2 table
      Var1        Var2   
    ________    _________

    "apple"     "001"    
    "banana"    "102"    
    "orange"    "344 001"
function tab=regexp2table(varargin)
% All inputs are piped to regexp().
C = regexp(varargin{:});
tab = array2table(vertcat(C{:}));
end

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

Simon 2023년 5월 13일

Thanks. Your code and the idea behind it might come handy for solving more difficult regexp problems.

댓글을 달려면 로그인하십시오.

Answer 4

Dyuman Joshi 2023년 4월 26일

1
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1953139-how-to-access-data-of-regexp-output-without-temporary-variable#answer_1223444

MATLAB Online에서 열기

Experimenting lead to this -

str = ["apple 001"; "banana 102 3579"; "orange 344 001"]
str = 3×1 string array
    "apple 001"
    "banana 102 3579"
    "orange 344 001"
C = array2table([extractBefore(str,' ') extractAfter(str,' ')])
C = 3×2 table
      Var1         Var2   
    ________    __________

    "apple"     "001"     
    "banana"    "102 3579"
    "orange"    "344 001" 

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

Simon 2023년 5월 13일

I was not aware of extractBefore( ) or extractAfter( ). Very nice solution. Thanks!

댓글을 달려면 로그인하십시오.

How to access data of regexp output (without temporary variable)?

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

추가 답변 (3개)

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

How to access data of regexp output (without temporary variable)?

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

추가 답변 (3개)

댓글 수: 3 이전 댓글 1개 표시이전 댓글 1개 숨기기

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기