Skip to Content

I came across some situation to extract URL from a string/paragraph and could see some others also wanted this logic in scn. In this document I include a sample report program code to extract URL from a string using the regular expression. Based on your business need you could reuse the code in function module or BADI to check the URL. Also we can change the regular expression in such a way to use the same code for extracting the e-mail from the string.

TYPES : BEGIN OF lt_table_types,
        idx TYPE i,
        type TYPE c,
        tdline TYPE string,
        END OF lt_table_types.
DATA : valid_url TYPE abap_bool,
       regex   TYPE REF TO cl_abap_regex,
       lv_prob_desc TYPE string,
       lv_prob_desc1 TYPE string,
       lv_index TYPE i,
       lv_length TYPE i,
       lv_stindex TYPE i,
       lv_new_start TYPE i,
       lv_last_end TYPE i,
       lv_temp TYPE i,
       lv_temp1 TYPE i,
       lv_char  TYPE c,
       lv_string TYPE string,
       lv_flag TYPE c,
       lv_tblidx TYPE i VALUE 0.
DATA : lt_result TYPE TABLE OF lt_table_types,
       ls_result TYPE lt_table_types.

PARAMETERS lv_prob TYPE string.

CREATE OBJECT regex
  EXPORTING
    pattern     = ‘((https?|ftp|gopher|telnet|file):((//)|(\\\\\\\\))+[\\\\w\\\\d:#@%/;$()~_?\\\\+-=\\\\\\\\\\\\.&]*)’
    ignore_case = abap_true.

lv_prob_desc = lv_prob.
lv_prob_desc1 = lv_prob_desc.
REPLACE ALL OCCURRENCES OF ‘{‘ IN lv_prob_desc WITH ‘a’.
REPLACE ALL OCCURRENCES OF ‘}’ IN lv_prob_desc WITH ‘a’.
lv_length = strlen( lv_prob_desc ).
lv_stindex = lv_index.
lv_new_start = lv_index.
WHILE lv_index < lv_length.
  lv_char = lv_prob_desc+lv_index(1).
  IF lv_char EQ ‘ ‘ OR lv_index EQ lv_length – 1.
    lv_temp = lv_index.
    IF lv_char EQ ‘ ‘.
      SUBTRACT lv_stindex FROM lv_temp.
    ELSE.
      SUBTRACT lv_stindex FROM lv_temp.
      ADD 1 TO lv_temp.
    ENDIF.
    lv_string = lv_prob_desc+lv_stindex(lv_temp).
    CALL METHOD cl_http_utility=>is_valid_url
      EXPORTING
        url           = lv_string
        white_pattern = regex
      RECEIVING
        is_ok         = valid_url.
    lv_string = lv_prob_desc1+lv_stindex(lv_temp).
    SORT lt_result BY idx.
    LOOP AT lt_result INTO ls_result.
      lv_tblidx = ls_result-idx.
    ENDLOOP.
    IF valid_url EQ ‘X’.
      IF lv_flag NE ‘X’.
        ls_result-idx = lv_tblidx + 1.
        ls_result-type = ‘U’.
        SHIFT lv_string RIGHT DELETING TRAILING ‘.’.
        SHIFT lv_string RIGHT DELETING TRAILING ‘,’.
        SHIFT lv_string LEFT DELETING LEADING ‘ ‘.
        ls_result-tdline = lv_string.
        APPEND ls_result TO lt_result.
      ELSE.
        ls_result-idx = lv_tblidx + 2.
        ls_result-type = ‘U’.
        SHIFT lv_string RIGHT DELETING TRAILING ‘.’.
        SHIFT lv_string RIGHT DELETING TRAILING ‘,’.
        SHIFT lv_string LEFT DELETING LEADING ‘ ‘.
        ls_result-tdline = lv_string.
        APPEND ls_result TO lt_result.
      ENDIF.
      IF lv_flag EQ ‘X’.
        lv_temp1 = lv_last_end – lv_new_start.
        lv_string   = lv_prob_desc+lv_new_start(lv_temp1).
        ls_result-idx = lv_tblidx + 1.
        ls_result-type = ‘S’.
        ls_result-tdline = lv_string.
        APPEND ls_result TO lt_result.
        CLEAR lv_flag.
      ENDIF.
      lv_new_start = lv_index + 1.
    ELSE.
      lv_flag = ‘X’.
      IF lv_index EQ lv_length – 1.
        lv_temp1 = lv_length – lv_new_start.
        lv_string = lv_prob_desc+lv_new_start(lv_temp1).
        ls_result-idx = lv_tblidx + 1.
        ls_result-type = ‘S’.
        ls_result-tdline = lv_string.
        APPEND ls_result TO lt_result.
      ENDIF.
      lv_last_end = lv_index.
    ENDIF.
    lv_stindex = lv_index.
    ADD 1 TO lv_stindex.
  ENDIF.
  ADD 1 TO lv_index.
ENDWHILE.
SORT lt_result BY idx.
WRITE : 9’No’ ,12’TY’, 15’Value’.
LOOP AT lt_result INTO ls_result.
  WRITE: / ls_result-idx , ls_result-type , ls_result-tdline.
ENDLOOP.

Output :

/wp-content/uploads/2014/10/1_559518.jpg

/wp-content/uploads/2014/10/2_559519.jpg

/wp-content/uploads/2014/10/11_560175.jpg

12.JPG

Here the type ‘S’ shows the value is string and ‘U’ shows the value is an URL.

Hope it is useful. Please share your comment/feedback and let me know if you have any doubts. TY.

To report this post you need to login first.

1 Comment

You must be Logged on to comment or reply to a post.

  1. Katan Patel

    I think your blog needs more work than just pasting your ztest program.  Try to eloborate on the key features and add comments explaining the key points in your code. 

    As you intimate at, the crux of this code is the regular expression.    There is some good content out there already in this space.

    http://scn.sap.com/docs/DOC-10319

    Keep pushing, I’m sure you can create some great content, which is what we are all striving to see on SCN.   

    (0) 

Leave a Reply