looking for regular expression to parse sparse data

Question

0 개 추천

Hi,

i have a sparse mass matrix exported from ansys, and the data looks as follows:

[ 1, 1]: 1.157e-07 [ 1, 4]: 2.332e-08 [ 1, 7]: 2.146e-08 [ 1, 10]: 5.835e-08 [ 1, 13]: 4.043e-08 [ 1, 16]: 1.011e-08 [ 1, 19]: 8.211e-09 [ 1, 22]: 2.590e-08 [ 1, 25]:-3.475e-08 [ 1, 28]:-2.854e-08 [ 1, 31]:-2.987e-08 [ 1, 34]:-8.897e-08 [ 1, 37]:-1.351e-08 [ 1, 40]:-8.564e-09 [ 1, 43]:-9.072e-09 [ 1, 46]:-3.556e-08 [ 1, 49]:-6.093e-08 [ 1, 52]:-1.343e-08 [ 1, 55]:-8.914e-09 [ 1, 58]:-3.609e-08 [ 1, 61]:-3.609e-08 [ 1, 64]:-6.093e-08 [ 1, 67]:-1.343e-08 [ 1, 70]:-8.914e-09 [ 1, 118]: 5.625e-08 [ 1, 121]: 2.883e-08 [ 1, 130]: 2.507e-08 [ 1, 133]: 1.102e-08 [ 1, 142]:-3.891e-08 [ 1, 154]:-1.175e-08 [ 1, 166]:-3.459e-08 [ 1, 169]:-1.171e-08 [ 1, 181]:-1.171e-08 [ 1, 184]:-3.459e-08 [ 1, 187]:-8.513e-08 [ 1, 190]:-3.947e-08 [ 1, 193]:-3.466e-08 [ 1, 196]:-1.196e-08 [ 1, 958]: 1.944e-08 [ 1, 964]: 7.516e-09 [ 1, 970]:-2.705e-08 [ 1, 979]:-8.340e-09 [ 1, 988]:-7.965e-09 [ 1, 994]:-7.965e-09 [ 1, 1021]: 2.166e-08 [ 1, 1024]: 9.467e-09 [ 1, 1027]:-2.557e-08 [ 1, 1030]:-3.156e-08 [ 1, 1033]:-7.830e-09 [ 1, 1036]:-1.295e-08 [ 1, 1039]:-1.246e-08 [ 1, 1042]:-1.246e-08

Im looking to put this into a dense matrix, but well enough will be to store all the items in a cell array of 3 columns: x, y, data by N rows, where the regular expression will read to the end of the file.

I would then search the cell array for the largest index (X,Y) and initialize an array of that size, then copy the data over from the cell array to the matrix.

Is this possible?

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Follow Question

Answer 1

Star Strider 2020년 11월 13일

MATLAB Online에서 열기

1 개 추천

This uses one regexp call to parse the data into specific cells that are read with sscanf, and then partitioned into individual columns using the reshape function in the ‘Out’ assignment. It may not be exactly what you intended (I doubt that is possible), however it has the virtue of produciing the desired result:

M = '[     1,     1]: 1.157e-07 [     1,     4]: 2.332e-08 [     1,     7]: 2.146e-08 [     1,    10]: 5.835e-08 [     1,    13]: 4.043e-08 [     1,    16]: 1.011e-08 [     1,    19]: 8.211e-09 [     1,    22]: 2.590e-08 [     1,    25]:-3.475e-08 [     1,    28]:-2.854e-08 [     1,    31]:-2.987e-08 [     1,    34]:-8.897e-08 [     1,    37]:-1.351e-08 [     1,    40]:-8.564e-09 [     1,    43]:-9.072e-09 [     1,    46]:-3.556e-08 [     1,    49]:-6.093e-08 [     1,    52]:-1.343e-08 [     1,    55]:-8.914e-09 [     1,    58]:-3.609e-08 [     1,    61]:-3.609e-08 [     1,    64]:-6.093e-08 [     1,    67]:-1.343e-08 [     1,    70]:-8.914e-09 [     1,   118]: 5.625e-08 [     1,   121]: 2.883e-08 [     1,   130]: 2.507e-08 [     1,   133]: 1.102e-08 [     1,   142]:-3.891e-08 [     1,   154]:-1.175e-08 [     1,   166]:-3.459e-08 [     1,   169]:-1.171e-08 [     1,   181]:-1.171e-08 [     1,   184]:-3.459e-08 [     1,   187]:-8.513e-08 [     1,   190]:-3.947e-08 [     1,   193]:-3.466e-08 [     1,   196]:-1.196e-08 [     1,   958]: 1.944e-08 [     1,   964]: 7.516e-09 [     1,   970]:-2.705e-08 [     1,   979]:-8.340e-09 [     1,   988]:-7.965e-09 [     1,   994]:-7.965e-09 [     1,  1021]: 2.166e-08 [     1,  1024]: 9.467e-09 [     1,  1027]:-2.557e-08 [     1,  1030]:-3.156e-08 [     1,  1033]:-7.830e-09 [     1,  1036]:-1.295e-08 [     1,  1039]:-1.246e-08 [     1,  1042]:-1.246e-08';
V = regexp(M, '\[', 'split');
R = sscanf([V{:}], '%d,%d]: %f');
Out = reshape(R, 3, []);

with:

FirstFiveColumns = Out(:,1:5)

producing:

FirstFiveColumns =
            1            1            1            1            1
            1            4            7           10           13
    1.157e-07    2.332e-08    2.146e-08    5.835e-08    4.043e-08

with ‘x’ being the first row, ‘y’ being the second row, and the floating-point variables (I have no idea what they represent) the third row.

댓글 수: 6
이전 댓글 4개 표시 이전 댓글 4개 숨기기

Stephen23 2020년 11월 14일

편집: Stephen23 2020년 11월 14일

MATLAB Online에서 열기

Without regexp or reshape, sscanf can parse it directly:

format long
str = '[     1,     1]: 1.157e-07 [     1,     4]: 2.332e-08 [     1,     7]: 2.146e-08 [     1,    10]: 5.835e-08 [     1,    13]: 4.043e-08 [     1,    16]: 1.011e-08 [     1,    19]: 8.211e-09 [     1,    22]: 2.590e-08 [     1,    25]:-3.475e-08 [     1,    28]:-2.854e-08 [     1,    31]:-2.987e-08 [     1,    34]:-8.897e-08 [     1,    37]:-1.351e-08 [     1,    40]:-8.564e-09 [     1,    43]:-9.072e-09 [     1,    46]:-3.556e-08 [     1,    49]:-6.093e-08 [     1,    52]:-1.343e-08 [     1,    55]:-8.914e-09 [     1,    58]:-3.609e-08 [     1,    61]:-3.609e-08 [     1,    64]:-6.093e-08 [     1,    67]:-1.343e-08 [     1,    70]:-8.914e-09 [     1,   118]: 5.625e-08 [     1,   121]: 2.883e-08 [     1,   130]: 2.507e-08 [     1,   133]: 1.102e-08 [     1,   142]:-3.891e-08 [     1,   154]:-1.175e-08 [     1,   166]:-3.459e-08 [     1,   169]:-1.171e-08 [     1,   181]:-1.171e-08 [     1,   184]:-3.459e-08 [     1,   187]:-8.513e-08 [     1,   190]:-3.947e-08 [     1,   193]:-3.466e-08 [     1,   196]:-1.196e-08 [     1,   958]: 1.944e-08 [     1,   964]: 7.516e-09 [     1,   970]:-2.705e-08 [     1,   979]:-8.340e-09 [     1,   988]:-7.965e-09 [     1,   994]:-7.965e-09 [     1,  1021]: 2.166e-08 [     1,  1024]: 9.467e-09 [     1,  1027]:-2.557e-08 [     1,  1030]:-3.156e-08 [     1,  1033]:-7.830e-09 [     1,  1036]:-1.295e-08 [     1,  1039]:-1.246e-08 [     1,  1042]:-1.246e-08';
mat = sscanf(str,'[%d,%d]:%f ',[3,Inf]).'
mat = 52×3
   1.000000000000000   1.000000000000000   0.000000115700000
   1.000000000000000   4.000000000000000   0.000000023320000
   1.000000000000000   7.000000000000000   0.000000021460000
   1.000000000000000  10.000000000000000   0.000000058350000
   1.000000000000000  13.000000000000000   0.000000040430000
   1.000000000000000  16.000000000000000   0.000000010110000
   1.000000000000000  19.000000000000000   0.000000008211000
   1.000000000000000  22.000000000000000   0.000000025900000
   1.000000000000000  25.000000000000000  -0.000000034750000
   1.000000000000000  28.000000000000000  -0.000000028540000
●

Stephen23 2021년 1월 3일

MATLAB Online에서 열기

"both of the answers above work if i have the data in a 'string'. However... it comes in as a 1x270000000 character vector. ... it still wont work."

I very much doubt that it would make any difference.

The code in my comment already uses a character vector, not a string. Using the equivalent string would give exactly the same output, because either a character vector or a string scalar can be supplied to sscanf, it makes zero difference. Lets try it:

Character vector:

str = '[     1,     1]: 1.157e-07 [     1,     4]: 2.332e-08'; % char vector
mat = sscanf(str,'[%d,%d]:%f ',[3,Inf]).'
mat = 2×3
    1.0000    1.0000    0.0000
    1.0000    4.0000    0.0000
●

String:

str = "[     1,     1]: 1.157e-07 [     1,     4]: 2.332e-08"; % string
mat = sscanf(str,'[%d,%d]:%f ',[3,Inf]).'
mat = 2×3
    1.0000    1.0000    0.0000
    1.0000    4.0000    0.0000
●

Most likely your character vector does not have the exact format that you showed us in your original question, e.g. contains some leading characters or non-displaying character, or some other difference. Both Star Strider's and my code rely on the input having the exact format that you showed in your question.

Tyler 2021년 1월 3일

Thank you, this is correct. There was one line of header in the file.

Thanks so much

Star Strider 2021년 1월 4일

As always, my pleasure!

댓글을 달려면 로그인하십시오.

looking for regular expression to parse sparse data

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 6
이전 댓글 4개 표시 이전 댓글 4개 숨기기

추가 답변 (0개)

카테고리

제품

릴리스

태그

Community Treasure Hunt

looking for regular expression to parse sparse data

댓글 수: 0 이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 6 이전 댓글 4개 표시 이전 댓글 4개 숨기기

추가 답변 (0개)

카테고리

제품

릴리스

태그

참고 항목

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

댓글 수: 6
이전 댓글 4개 표시 이전 댓글 4개 숨기기