Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The -m minify option should remove blank lines #33

Closed
simonw opened this issue Feb 28, 2025 · 3 comments
Closed

The -m minify option should remove blank lines #33

simonw opened this issue Feb 28, 2025 · 3 comments
Labels
enhancement New feature or request

Comments

@simonw
Copy link
Owner

simonw commented Feb 28, 2025

Currently it ends up with a LOT of lines that are just whitespace:

curl -s 'https://apnews.com/article/trump-federal-employees-firings-a85d1aaf1088e050d39dcf7e3664bb9f' | \
  strip-tags -m

Output includes:



Copyright 2025 The Associated Press. All Rights Reserved.







     
twitter
  



     
instagram

Replacing space character with . to make the whitespace visible gives:



Copyright 2025 The Associated Press  All Rights Reserved.







.....
twitter
..



.....
instagram
@simonw simonw added the enhancement New feature or request label Feb 28, 2025
@simonw
Copy link
Owner Author

simonw commented Feb 28, 2025

Prototype for a fix:

    if remove_blank_lines:
        # Remove any line that is just whitespace
        final = "\n".join(line for line in final.splitlines() if line.strip())

@simonw
Copy link
Owner Author

simonw commented Feb 28, 2025

I'm going to leave the Python library version stable and add a new remove_blank_lines=True option, but I'll have the CLI -m option use that.

@simonw simonw closed this as completed in e0e2f7a Feb 28, 2025
@simonw
Copy link
Owner Author

simonw commented Feb 28, 2025

This is much more pleasing now:

curl -s 'https://apnews.com/article/trump-federal-employees-firings-a85d1aaf1088e050d39dcf7e3664bb9f' | \
  strip-tags -m

Partial output:

Be Well
Newsletters
    Newsletters
  AP News Alerts Keep your pulse on the news with breaking news alerts from The AP.  
  The Morning Wire Our flagship newsletter breaks down the biggest headlines of the day.  
  Ground Game Exclusive insights and key stories from the world of politics.  
  Beyond the Story Executive Editor Julie Pace brings you behind the scenes of the AP newsroom.  
  AP Entertainment Wire Get AP's first personalized newsletter delivering you entertainment news twice a week.  
  AP Top 25 Women's Basketball Poll Alerts Women's college basketball poll alerts and updates.  

simonw added a commit that referenced this issue Feb 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant