INTELLIGENT WORK FORUMS
FOR ENGINEERING PROFESSIONALS

Log In

Come Join Us!

Are you an
Engineering professional?
Join Eng-Tips Forums!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!

*Eng-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.

Jobs

sed-Strip HTML tags (or XML tags)

sed-Strip HTML tags (or XML tags)

(OP)
So far I have this:

CODE

sed -r 's/(<[^>\n]*>)//g'
It does pretty good with tags all on one line, but things like <img src=blah blah blah that may extend over more than one line are not being caught.

Likewise things like <style type=text/css> ... </style> where I want to remove not just the tags, but the text between the tags are not being caught. Again, <style></style> tag pairs run over multiple lines in the general case.

Is there a way to accomplish this in sed? I can do it in awk already.

TOP
CSWP, BSSE
www.engtran.com  www.niswug.org
www.linkedin.com/in/engineeringtransport
Phenom IIx6 1100T = 8GB = FX1400 = XP64SP2 = SW2009SP3
"Node news is good news."

RE: sed-Strip HTML tags (or XML tags)

kellnerp,

   Quite a few years ago, I wrote a crude SGML parser in Perl.  Is there any reason you are not using Perl to do this?

   Perl's switch command searches a string for an arbitrary sequence, and it returns everything in the string up to the sequence, and a everything in the string after the sequence.  If you are messing with text, this is an awesome tool.

               JHG

Red Flag This Post

Please let us know here why this post is inappropriate. Reasons such as off-topic, duplicates, flames, illegal, vulgar, or students posting their homework.

Red Flag Submitted

Thank you for helping keep Eng-Tips Forums free from inappropriate posts.
The Eng-Tips staff will check this out and take appropriate action.

Reply To This Thread

Posting in the Eng-Tips forums is a member-only feature.

Click Here to join Eng-Tips and talk with other members!


Resources


Close Box

Join Eng-Tips® Today!

Join your peers on the Internet's largest technical engineering professional community.
It's easy to join and it's free.

Here's Why Members Love Eng-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close