0

I have a string of characters (an CSV file) that contains \r\n (or chr(13)||chr(10) in PLSQL) embedded in the data. I need to ingore these embedded Carriage Return Line Feeds so I can find the actual end of the CSV row.

Example:

This is line 1,"abc\r\nxyz","and, comma, some more","abc\r\n123\r\nzyx",oh more text,"let's see","33,333.01"\r\nThis is line 2,"Hello, 321","Four\r\nand five...\r\nand six",and 123456\r\nThis is line 3

or in PLSQL

'This is line 1,"abc' || chr(13) || chr(10) || 'xyz","and, comma, some more","abc' || chr(13) || chr(10) || '123' || chr(13) || chr(10) || 'zyx",oh more text,"let's see","33,333.01"' || chr(13) || chr(10) || 'This is line 2,"Hello, 321","Four' || chr(13) || chr(10) || 'and five...' || chr(13) || chr(10) || 'and six",and 123456' || chr(13) || chr(10) || 'This is line 3'

Using https://regex101.com/ I was able to find the following regular expression that works on the first set of text above.

[\r\n](?![^\"]*(\",))

https://regex101.com/r/ju6qE0/1 this replaces the actual row ending \r\n with the text "CRLF"

When trying the same regex in Oracle, it does not work, using v_TEXT as variable holding the second set of text above.

select
REGEXP_REPLACE( v_TEXT, '[' || chr(13) || chr(10) || '](?![^\"]*(\",))', 'CRLF' )
from dual;

Can anyone help me convert the regex above to work in Oracle? Or, alternatively, point me to a different regex combo that would do the same work as above (only replace row ending \r\n with CRLF while ignoring \r\n that occurs between double quotes?

The intended out put would be this:

This is line 1,"abc\r\nxyz","and, comma, some more","abc\r\n123\r\nzyx",oh more text,"let's see","33,333.01"CRLFThis is line 2,"Hello, 321","Four\r\nand five...\r\nand six",and 123456CRLFThis is line 3
Fravadona
17.6k1 gold badge29 silver badges50 bronze badges
asked Jan 10, 2025 at 18:57
4
  • According to the doc, there's no lookahead nor lookbehind so it's kind of impossible to only target the relevant CRLFs. On the other hand, It should be possible to write a regexp that matches a full record with its trailing CRLF Commented Jan 10, 2025 at 21:43
  • 1
    No answer to "how to do x in oracle regexp" but maybe that is not needed if you use a parser function. Have a look at the package APEX_DATA_PARSER. It is a part of APEX which is a no-cost option of the Oracle Database. You could install APEX in your database and just use the API's that come with it. This post has and example of how to run it in sql. It handles line breaks pretty well. Commented Jan 11, 2025 at 20:36
  • ^("[^"]*"|[^\r\n,]*)?((,("[^"]*"|[^\r\n,]*))*)(\r\n) with glocal and multiline options and 1円2円CRLF as replacement should be quite close to what you want Commented Jan 12, 2025 at 0:56
  • Hi, @Day I can help you. But your question confusing me what you exactly want. Please simplify your question with input & expected output. Better share the sample dbfiddle inputs & query for Oracle/PLsql - dbfiddle.uk Commented Jan 29, 2025 at 18:55

1 Answer 1

0

@Koen, thank you for suggestion. Since Oracle does not support lookahead nor lookbehind, as pointed out by @Fravodona, we implemented APEX_DATA_PARSER and this did the trick. Thanks All.

answered Feb 9, 2025 at 15:53
Sign up to request clarification or add additional context in comments.

2 Comments

Please don't add "thank you" as an answer. Instead, accept the answer that you found most helpful. - From Review
@PatPanda there is no answer to accept, Koen's suggestion was a comment.

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.