I have a URL:
url = \"http://timesofindia.feedsportal.com/fy/8at2EuL0ihSIb3s7/story01.htmA\"
There are some unwanted characters like A,TRE, at
If your url always finish with .htm
, .apsx
or .php
you can solve it with a simple regex:
url = url[/^(.+\.(htm|aspx|php))(:?.*)$/, 1]
Tests here at Rubular.
First I use this method to get a substring, works like slice. Then comes the regex. From left to right:
^ # Start of line
( # Capture everything wanted enclosed
.+ # 1 or more of any character
\. # With a dot after it
(htm|aspx|php) # htm or aspx or php
) # Close url asked in question
( # Capture undesirable part
:? # Optional
.* # 0 or more any character
) # Close undesirable part
$ # End of line