Friday, February 26, 2010

RegEx Strip of HTML

So, for some reason, I have real troubles with regular expressions.  Luckily, the magic of google usually gives me what I need.

But, here is something I tend to look up over and over.

From http://www.4guysfromrolla.com/webtech/042501-1.shtml:

Function stripHTML(strHTML)
'Strips the HTML tags from strHTML

  Dim objRegExp, strOutput
  Set objRegExp = New Regexp

  objRegExp.IgnoreCase = True
  objRegExp.Global = True
  objRegExp.Pattern = "<(.|\n)+?>"

  'Replace all HTML tag matches with the empty string
  strOutput = objRegExp.Replace(strHTML, "")
  
  'Replace all < and > with < and >
  strOutput = Replace(strOutput, "<", "<")
  strOutput = Replace(strOutput, ">", ">")
  
  stripHTML = strOutput    'Return the value of strOutput

  Set objRegExp = Nothing
End Function

There are some issues with this.  If you have text such as 50 < 10 and 9 > 100, your result is going to be:  50 100.

At some point, I came across this:
string _pattern = @"";

That solves the problem ... but seems to be limited quite a bit.

Here is a site that can test RegEx:  http://www.regular-expressions.info/javascriptexample.html

In case this site goes away, here is the code behind the widget:

function demoMatchClick() {
  var re = new RegExp(document.demoMatch.regex.value);
  if (document.demoMatch.subject.value.match(re)) {
    alert("Successful match");
  } else {
    alert("No match");
  }
}

function demoShowMatchClick() {
  var re = new RegExp(document.demoMatch.regex.value);
  var m = re.exec(document.demoMatch.subject.value);
  if (m == null) {
    alert("No match");
  } else {
    var s = "Match at position " + m.index + ":\n";
    for (i = 0; i < m.length; i++) {
      s = s + m[i] + "\n";
    }
    alert(s);
  }
}

function demoReplaceClick() {
  var re = new RegExp(document.demoMatch.regex.value, "g");
  document.demoMatch.result.value = 
    document.demoMatch.subject.value.replace(re, 
      document.demoMatch.replacement.value);
}

Here is another site to test regular expressions:
http://www.fileformat.info/tool/regex.htm

No comments: