Application Development Blog Posts
Learn and share on deeper, cross technology development topics such as integration and connectivity, automation, cloud extensibility, developing at scale, and security.
cancel
Showing results for 
Search instead for 
Did you mean: 
Former Member

I came across some situation to extract URL from a string/paragraph and could see some others also wanted this logic in scn. In this document I include a sample report program code to extract URL from a string using the regular expression. Based on your business need you could reuse the code in function module or BADI to check the URL. Also we can change the regular expression in such a way to use the same code for extracting the e-mail from the string.

TYPES : BEGIN OF lt_table_types,
        idx TYPE i,
        type TYPE c,
        tdline TYPE string,
        END OF lt_table_types.
DATA : valid_url TYPE abap_bool,
       regex   TYPE REF TO cl_abap_regex,
       lv_prob_desc TYPE string,
       lv_prob_desc1 TYPE string,
       lv_index TYPE i,
       lv_length TYPE i,
       lv_stindex TYPE i,
       lv_new_start TYPE i,
       lv_last_end TYPE i,
       lv_temp TYPE i,
       lv_temp1 TYPE i,
       lv_char  TYPE c,
       lv_string TYPE string,
       lv_flag TYPE c,
       lv_tblidx TYPE i VALUE 0.
DATA : lt_result TYPE TABLE OF lt_table_types,
       ls_result TYPE lt_table_types.

PARAMETERS lv_prob TYPE string.

CREATE OBJECT regex
  EXPORTING
    pattern     = '((https?|ftp|gopher|telnet|file):((//)|(\\\\\\\\))+[\\\\w\\\\d:#@%/;$()~_?\\\\+-=\\\\\\\\\\\\.&]*)'
    ignore_case = abap_true.

lv_prob_desc = lv_prob.
lv_prob_desc1 = lv_prob_desc.
REPLACE ALL OCCURRENCES OF '{' IN lv_prob_desc WITH 'a'.
REPLACE ALL OCCURRENCES OF '}' IN lv_prob_desc WITH 'a'.
lv_length = strlen( lv_prob_desc ).
lv_stindex = lv_index.
lv_new_start = lv_index.
WHILE lv_index < lv_length.
  lv_char = lv_prob_desc+lv_index(1).
  IF lv_char EQ ' ' OR lv_index EQ lv_length - 1.
    lv_temp = lv_index.
    IF lv_char EQ ' '.
      SUBTRACT lv_stindex FROM lv_temp.
    ELSE.
      SUBTRACT lv_stindex FROM lv_temp.
      ADD 1 TO lv_temp.
    ENDIF.
    lv_string = lv_prob_desc+lv_stindex(lv_temp).
    CALL METHOD cl_http_utility=>is_valid_url
      EXPORTING
        url           = lv_string
        white_pattern = regex
      RECEIVING
        is_ok         = valid_url.
    lv_string = lv_prob_desc1+lv_stindex(lv_temp).
    SORT lt_result BY idx.
    LOOP AT lt_result INTO ls_result.
      lv_tblidx = ls_result-idx.
    ENDLOOP.
    IF valid_url EQ 'X'.
      IF lv_flag NE 'X'.
        ls_result-idx = lv_tblidx + 1.
        ls_result-type = 'U'.
        SHIFT lv_string RIGHT DELETING TRAILING '.'.
        SHIFT lv_string RIGHT DELETING TRAILING ','.
        SHIFT lv_string LEFT DELETING LEADING ' '.
        ls_result-tdline = lv_string.
        APPEND ls_result TO lt_result.
      ELSE.
        ls_result-idx = lv_tblidx + 2.
        ls_result-type = 'U'.
        SHIFT lv_string RIGHT DELETING TRAILING '.'.
        SHIFT lv_string RIGHT DELETING TRAILING ','.
        SHIFT lv_string LEFT DELETING LEADING ' '.
        ls_result-tdline = lv_string.
        APPEND ls_result TO lt_result.
      ENDIF.
      IF lv_flag EQ 'X'.
        lv_temp1 = lv_last_end - lv_new_start.
        lv_string   = lv_prob_desc+lv_new_start(lv_temp1).
        ls_result-idx = lv_tblidx + 1.
        ls_result-type = 'S'.
        ls_result-tdline = lv_string.
        APPEND ls_result TO lt_result.
        CLEAR lv_flag.
      ENDIF.
      lv_new_start = lv_index + 1.
    ELSE.
      lv_flag = 'X'.
      IF lv_index EQ lv_length - 1.
        lv_temp1 = lv_length - lv_new_start.
        lv_string = lv_prob_desc+lv_new_start(lv_temp1).
        ls_result-idx = lv_tblidx + 1.
        ls_result-type = 'S'.
        ls_result-tdline = lv_string.
        APPEND ls_result TO lt_result.
      ENDIF.
      lv_last_end = lv_index.
    ENDIF.
    lv_stindex = lv_index.
    ADD 1 TO lv_stindex.
  ENDIF.
  ADD 1 TO lv_index.
ENDWHILE.
SORT lt_result BY idx.
WRITE : 9'No' ,12'TY', 15'Value'.
LOOP AT lt_result INTO ls_result.
  WRITE: / ls_result-idx , ls_result-type , ls_result-tdline.
ENDLOOP.

Output :

Here the type 'S' shows the value is string and 'U' shows the value is an URL.

Hope it is useful. Please share your comment/feedback and let me know if you have any doubts. TY.

2 Comments