Contact US

Log In

Come Join Us!

Are you an
Engineering professional?
Join Eng-Tips Forums!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!

*Eng-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.

Students Click Here

sed-Strip HTML tags (or XML tags)

sed-Strip HTML tags (or XML tags)

sed-Strip HTML tags (or XML tags)

So far I have this:


sed -r 's/(<[^>\n]*>)//g'
It does pretty good with tags all on one line, but things like <img src=blah blah blah that may extend over more than one line are not being caught.

Likewise things like <style type=text/css> ... </style> where I want to remove not just the tags, but the text between the tags are not being caught. Again, <style></style> tag pairs run over multiple lines in the general case.

Is there a way to accomplish this in sed? I can do it in awk already.

www.engtran.com  www.niswug.org
Phenom IIx6 1100T = 8GB = FX1400 = XP64SP2 = SW2009SP3
"Node news is good news."

RE: sed-Strip HTML tags (or XML tags)


   Quite a few years ago, I wrote a crude SGML parser in Perl.  Is there any reason you are not using Perl to do this?

   Perl's switch command searches a string for an arbitrary sequence, and it returns everything in the string up to the sequence, and a everything in the string after the sequence.  If you are messing with text, this is an awesome tool.


Red Flag This Post

Please let us know here why this post is inappropriate. Reasons such as off-topic, duplicates, flames, illegal, vulgar, or students posting their homework.

Red Flag Submitted

Thank you for helping keep Eng-Tips Forums free from inappropriate posts.
The Eng-Tips staff will check this out and take appropriate action.

Reply To This Thread

Posting in the Eng-Tips forums is a member-only feature.

Click Here to join Eng-Tips and talk with other members! Already a Member? Login


Low-Volume Rapid Injection Molding With 3D Printed Molds
Learn methods and guidelines for using stereolithography (SLA) 3D printed molds in the injection molding process to lower costs and lead time. Discover how this hybrid manufacturing process enables on-demand mold fabrication to quickly produce small batches of thermoplastic parts. Download Now
Design for Additive Manufacturing (DfAM)
Examine how the principles of DfAM upend many of the long-standing rules around manufacturability - allowing engineers and designers to place a part’s function at the center of their design considerations. Download Now
Taking Control of Engineering Documents
This ebook covers tips for creating and managing workflows, security best practices and protection of intellectual property, Cloud vs. on-premise software solutions, CAD file management, compliance, and more. Download Now

Close Box

Join Eng-Tips® Today!

Join your peers on the Internet's largest technical engineering professional community.
It's easy to join and it's free.

Here's Why Members Love Eng-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close