Skip to Content
Author's profile photo Former Member

Extracting URL from string

I came across some situation to extract URL from a string/paragraph and could see some others also wanted this logic in scn. In this document I include a sample report program code to extract URL from a string using the regular expression. Based on your business need you could reuse the code in function module or BADI to check the URL. Also we can change the regular expression in such a way to use the same code for extracting the e-mail from the string.

TYPES : BEGIN OF lt_table_types,
        idx TYPE i,
        type TYPE c,
        tdline TYPE string,
        END OF lt_table_types.
DATA : valid_url TYPE abap_bool,
       regex   TYPE REF TO cl_abap_regex,
       lv_prob_desc TYPE string,
       lv_prob_desc1 TYPE string,
       lv_index TYPE i,
       lv_length TYPE i,
       lv_stindex TYPE i,
       lv_new_start TYPE i,
       lv_last_end TYPE i,
       lv_temp TYPE i,
       lv_temp1 TYPE i,
       lv_char  TYPE c,
       lv_string TYPE string,
       lv_flag TYPE c,
       lv_tblidx TYPE i VALUE 0.
DATA : lt_result TYPE TABLE OF lt_table_types,
       ls_result TYPE lt_table_types.

PARAMETERS lv_prob TYPE string.

CREATE OBJECT regex
  EXPORTING
    pattern     = ‘((https?|ftp|gopher|telnet|file):((//)|(\\\\\\\\))+[\\\\w\\\\d:#@%/;$()~_?\\\\+-=\\\\\\\\\\\\.&]*)’
    ignore_case = abap_true.

lv_prob_desc = lv_prob.
lv_prob_desc1 = lv_prob_desc.
REPLACE ALL OCCURRENCES OF ‘{‘ IN lv_prob_desc WITH ‘a’.
REPLACE ALL OCCURRENCES OF ‘}’ IN lv_prob_desc WITH ‘a’.
lv_length = strlen( lv_prob_desc ).
lv_stindex = lv_index.
lv_new_start = lv_index.
WHILE lv_index < lv_length.
  lv_char = lv_prob_desc+lv_index(1).
  IF lv_char EQ ‘ ‘ OR lv_index EQ lv_length – 1.
    lv_temp = lv_index.
    IF lv_char EQ ‘ ‘.
      SUBTRACT lv_stindex FROM lv_temp.
    ELSE.
      SUBTRACT lv_stindex FROM lv_temp.
      ADD 1 TO lv_temp.
    ENDIF.
    lv_string = lv_prob_desc+lv_stindex(lv_temp).
    CALL METHOD cl_http_utility=>is_valid_url
      EXPORTING
        url           = lv_string
        white_pattern = regex
      RECEIVING
        is_ok         = valid_url.
    lv_string = lv_prob_desc1+lv_stindex(lv_temp).
    SORT lt_result BY idx.
    LOOP AT lt_result INTO ls_result.
      lv_tblidx = ls_result-idx.
    ENDLOOP.
    IF valid_url EQ ‘X’.
      IF lv_flag NE ‘X’.
        ls_result-idx = lv_tblidx + 1.
        ls_result-type = ‘U’.
        SHIFT lv_string RIGHT DELETING TRAILING ‘.’.
        SHIFT lv_string RIGHT DELETING TRAILING ‘,’.
        SHIFT lv_string LEFT DELETING LEADING ‘ ‘.
        ls_result-tdline = lv_string.
        APPEND ls_result TO lt_result.
      ELSE.
        ls_result-idx = lv_tblidx + 2.
        ls_result-type = ‘U’.
        SHIFT lv_string RIGHT DELETING TRAILING ‘.’.
        SHIFT lv_string RIGHT DELETING TRAILING ‘,’.
        SHIFT lv_string LEFT DELETING LEADING ‘ ‘.
        ls_result-tdline = lv_string.
        APPEND ls_result TO lt_result.
      ENDIF.
      IF lv_flag EQ ‘X’.
        lv_temp1 = lv_last_end – lv_new_start.
        lv_string   = lv_prob_desc+lv_new_start(lv_temp1).
        ls_result-idx = lv_tblidx + 1.
        ls_result-type = ‘S’.
        ls_result-tdline = lv_string.
        APPEND ls_result TO lt_result.
        CLEAR lv_flag.
      ENDIF.
      lv_new_start = lv_index + 1.
    ELSE.
      lv_flag = ‘X’.
      IF lv_index EQ lv_length – 1.
        lv_temp1 = lv_length – lv_new_start.
        lv_string = lv_prob_desc+lv_new_start(lv_temp1).
        ls_result-idx = lv_tblidx + 1.
        ls_result-type = ‘S’.
        ls_result-tdline = lv_string.
        APPEND ls_result TO lt_result.
      ENDIF.
      lv_last_end = lv_index.
    ENDIF.
    lv_stindex = lv_index.
    ADD 1 TO lv_stindex.
  ENDIF.
  ADD 1 TO lv_index.
ENDWHILE.
SORT lt_result BY idx.
WRITE : 9’No’ ,12’TY’, 15’Value’.
LOOP AT lt_result INTO ls_result.
  WRITE: / ls_result-idx , ls_result-type , ls_result-tdline.
ENDLOOP.

Output :

/wp-content/uploads/2014/10/1_559518.jpg

/wp-content/uploads/2014/10/2_559519.jpg

/wp-content/uploads/2014/10/11_560175.jpg

12.JPG

Here the type ‘S’ shows the value is string and ‘U’ shows the value is an URL.

Hope it is useful. Please share your comment/feedback and let me know if you have any doubts. TY.

Assigned Tags

      2 Comments
      You must be Logged on to comment or reply to a post.
      Author's profile photo Katan Patel
      Katan Patel

      I think your blog needs more work than just pasting your ztest program.  Try to eloborate on the key features and add comments explaining the key points in your code. 

      As you intimate at, the crux of this code is the regular expression.    There is some good content out there already in this space.

      http://scn.sap.com/docs/DOC-10319

      Keep pushing, I'm sure you can create some great content, which is what we are all striving to see on SCN.   

      Author's profile photo Kurt Wagner
      Kurt Wagner

      I find this code very useful however I'd like to share some improvements.

      The code interprets 2HTTP://A.B.DE as valid url. Therefore please use following regex:

      pattern     '^((https?|ftp|gopher|telnet|file):((//)|(\\\\\\\\))+[\\\\w\\\\d:#@%/;$()~_?\\\\+-=\\\\\\\\\\\\.&]*)'

       

      In addition I'd like to allow the user to embed the URL in sapscript text formatting eg.

      <U>HTTP://A.B.DE</>

       

      therefore you need to change

      ...
      lv_char = iv_string_desc+lv_index(1).
      IF lv_char CA ' <>' OR lv_index EQ lv_length - 1. 
      lv_temp = lv_index.
      IF lv_char CA ' <>'. 
      SUBTRACT lv_stindex FROM lv_temp.
      ...
       IF lv_flag EQ 'X'.
      lv_temp1 = lv_last_end - lv_new_start.
      ADD 1 TO lv_temp1. "consider separator
      lv_string = iv_string_desc+lv_new_start(lv_temp1).
      ...
       lv_new_start = lv_index + 1.
      SUBTRACT 1 FROM lv_new_start. "consider separator
      ELSE.
      
      

      In addition I'd like to return the surrounding strings (table lt_result) and check for a dot after the url. I share the full source code here:

       

        FUNCTION z_extract_url.
      *"----------------------------------------------------------------------
      *"*"Lokale Schnittstelle:
      *"  IMPORTING
      *"     VALUE(IV_STRING) TYPE  STRING
      *"  EXPORTING
      *"     REFERENCE(STRING_TABLE) TYPE  STRING_TABLE
      *"----------------------------------------------------------------------

      TYPES : BEGIN OF lt_table_types,
      idx TYPE i,
      type TYPE c,
      tdline TYPE string,
      END OF lt_table_types.
        DATA valid_url        TYPE abap_bool,
               regex            TYPE REF TO cl_abap_regex,
               iv_string_desc   TYPE string,
               iv_string_desc1  TYPE string,
               lv_index         TYPE i,
               lv_length        TYPE i,
               lv_stindex       TYPE i,
               lv_new_start     TYPE i,
               lv_last_end      TYPE i,
               lv_temp          TYPE i,
               lv_temp1         TYPE i,
               lv_char          TYPE c,
               lv_string        TYPE string,
               lv_flag          TYPE c,
               lv_tblidx        TYPE VALUE 0.
        DATA lt_result TYPE TABLE OF lt_table_types,
               ls_result TYPE lt_table_types,
              lv_end_of_string TYPE abap_bool.
        CREATE OBJECT regex     EXPORTING       pattern     '^((https?|ftp|gopher|telnet|file):((//)|(\\\\\\\\))+[\\\\w\\\\d:#@%/;$()~_?\\\\+-=\\\\\\\\\\\\.&]*)'       ignore_case abap_true.   iv_string_desc iv_string.   iv_string_desc1 iv_string_desc.   REPLACE ALL OCCURRENCES OF '{' IN iv_string_desc WITH 'a'.   REPLACE ALL OCCURRENCES OF '}' IN iv_string_desc WITH 'a'.   lv_length strleniv_string_desc ).   lv_stindex lv_index.   lv_new_start lv_index.   WHILE lv_index < lv_length.     lv_char iv_string_desc+lv_index(1).     IF lv_char CA ' <>,' OR lv_index EQ lv_length 1.       lv_temp lv_index.       IF lv_char CA ' <>,'.         SUBTRACT lv_stindex FROM lv_temp.         lv_end_of_string abap_false.       ELSE.         SUBTRACT lv_stindex FROM lv_temp.         ADD TO lv_temp.         lv_end_of_string abap_true.       ENDIF.       lv_string iv_string_desc+lv_stindex(lv_temp).       CALL METHOD cl_http_utility=>is_valid_url         EXPORTING           url           lv_string           white_pattern regex         RECEIVING           is_ok         valid_url.       lv_string iv_string_desc1+lv_stindex(lv_temp).       SORT lt_result BY idx.       LOOP AT lt_result INTO ls_result.         lv_tblidx ls_result-idx.       ENDLOOP.       IF valid_url EQ 'X'.         " check for additional dot after URL         lv_temp1 strlenlv_string 1.         IF lv_string+lv_temp1(1'.'.           SUBTRACT FROM lv_index.         ENDIF.         IF lv_flag NE 'X'.           ls_result-idx lv_tblidx + 1.           ls_result-type 'U'.           SHIFT lv_string RIGHT DELETING TRAILING '.'.           SHIFT lv_string LEFT DELETING LEADING ' '.           ls_result-tdline lv_string.           APPEND ls_result TO lt_result.         ELSE.           ls_result-idx lv_tblidx + 2.           ls_result-type 'U'.           SHIFT lv_string RIGHT DELETING TRAILING '.'.           SHIFT lv_string LEFT DELETING LEADING ' '.           ls_result-tdline lv_string.           APPEND ls_result TO lt_result.         ENDIF.         IF lv_flag EQ 'X'.           lv_temp1 lv_last_end lv_new_start.           ADD TO lv_temp1"consider separator           lv_string   iv_string_desc+lv_new_start(lv_temp1).           ls_result-idx lv_tblidx + 1.           ls_result-type 'S'.           ls_result-tdline lv_string.           APPEND ls_result TO lt_result.           CLEAR lv_flag.         ENDIF.         lv_new_start lv_index + 1.         IF lv_end_of_string abap_false.           SUBTRACT FROM lv_new_start"consider separator         ENDIF.       ELSE.         lv_flag 'X'.         IF lv_index EQ lv_length 1.           lv_temp1 lv_length lv_new_start.           lv_string iv_string_desc+lv_new_start(lv_temp1).           ls_result-idx lv_tblidx + 1.           ls_result-type 'S'.           ls_result-tdline lv_string.           APPEND ls_result TO lt_result.         ENDIF.         lv_last_end lv_index.       ENDIF.       lv_stindex lv_index.       ADD TO lv_stindex.     ENDIF.     ADD TO lv_index.   ENDWHILE.   SORT lt_result BY idx. * prepare return table   REFRESH string_table.   LOOP AT lt_result INTO ls_result WHERE type 'U'.     APPEND ls_result-tdline TO string_table.   ENDLOOP.