-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Access issues by domain names (Atom feed) #132
Comments
This is our RSS feature. |
#788 will help |
No published branches yet, but @karlcow has a prototype in progress on his laptop. Assigning to him. |
Closing #60 (comment) as a dupe of this. |
Preserving some things I had done for #60 So I can delete my local branch. for webcompat/views.py @app.route('/feeds/<domain_name>')
def domain_feed(domain_name):
'''Route to display a feed for a domain name.
- domain_name would be `mozilla.org`.
- should make a search of all titles, numbers, latest comment date
'''
# User is probably not necessary here.
if g.user:
get_user_info()
# Searching the domain_name and return a JSON with relevant data
# to be defined in helpers
domain_data = feed_summary(domain_name)
return render_template('feed.atom', domain_data) |
Made the first comment more descriptive with the list of things to do. |
Note to self (it will grow with time): There are a couple of ways to do that and to explore. I need to explore the impact about choices for them and the likelihood of creating a performance impact on the application. Some possibilities:
Some additional issues:
Some possible dependencies/information:
|
Ahaha. Brace for impact and its controversy. |
- pep257 - orders of import - ignore webcompat.views check
- Adds a /feed Blueprint - Prepares for the main request feed function
Let's start the experiment. Code! 🚨 |
The main thing I will be experimenting is the creation of static files either generated on first requests or based on a cron. |
- Fixes tests for feeds/ home page - Creates shells for prose - Defines routes for feeds/
- Handles non existent domain names. - Creates a helper file for all things strictly related to feeds - Adjusts test for the right routes and content
There is a feed feature in Werkzeug to keep in mind. |
Dumping ideas. Notebook style. 📓 🐍 pseudo-code @feeds.route('/<domain>', methods=['GET'])
def domain_feed(domain):
"""Serve a feed for a specific domain name."""
# Have we handled this domain already?
if is_known_domain(domain):
# Do we have a static atom feed file for it?
if not is_static_feed(domain):
# Let's create the feed in data/feed/
create_feed(domain)
# we can serve the feed to users.
return serve_domain_feed(domain)
else:
# if we don't know anything we return 404
return (
'{domain} has no feed'.format(domain=domain),
404,
{'Content-Type': 'text/plain'}) I want to minimize the impact of bad feed readers. No matter how much caching you set on feed resources, many feed readers ignore it, and request every couple of minutes. So to avoid to generate a feed each time, I want to serve a static file that we generated on the first request. Another benefit is we get files only for domains that people are interested in. An interesting question will come up with updating, but let's say it's an issue we have to deal with later. Some issues 🚨
|
data quality is interesting… I found so far 370 issues with bogus domains.
Current version. Will evolve. def extract_domain_name(title, issue_number):
"""Extract the domain name from the title string."""
# a domain name doesn't contain space
candidate = title.split(' ', 1)[0]
# domain names are lower cases
candidate = candidate.lower()
# a domain name contains at least one "."
if '.' not in candidate:
return 'BOGUS', issue_number
# Tuple of bogus pattern to check against
bogus_start_patterns = ('resource://', 'file://', 'chrome://')
if candidate.startswith(bogus_start_patterns):
return 'BOGUS', issue_number
# it contains a domain name.
if candidate.startswith('view-source:'):
candidate = candidate.split('view-source:')[1]
if ':' in candidate and not candidate.startswith('http'):
candidate = candidate.split(':')[0]
candidate = 'http://{}'.format(candidate)
# some issues starts with http, we will clean up.
if candidate.startswith('http://') or candidate.startswith('https://'):
candidate = urlparse.urlsplit(candidate).netloc
candidate = candidate.split(':')[0]
# some domains with a path
if '/' in candidate:
candidate = candidate.split('/')[0]
# some bogus domain with &
if '&' in candidate:
candidate = candidate.split('&')[0]
# Handling local domains
local_patterns = ('10.', '127.0.0.1', '192.168.', '172.')
if candidate.startswith(local_patterns):
return 'BOGUS', issue_number
# return issue_number, candidate.encode('utf-8')
# return candidate.encode('utf-8'), title
return candidate.encode('utf-8') Some of them that I'm fixing on the fly have an opportunity to be fixed once and for all. I will re-run it soon with a fresh issue dump. There are still some issues where the domain name is different from This is just for dumping a DB of domain names to generate feeds, but could be ultimately reuse in normalizing the data we receive from people.
Google properties:
|
Do we create a feed when there is no valid issue associated with this domain? |
Ah … crap… Once the BOGUS title removed, we have quite a lot of differences in between titles and URL. And a lot of recent issues. That comes from softvision not entering the same domain for the title and the URL. I think they fixed it after I mentioned it, but I didn't realize we had so many bad ones. I need to fix this. Automatically if I prepare well the data. 😭 Below
|
Recording here so it's not lost. This could be done, aka instead of:
we provide only
I personally prefer the more granular version for different reasons, but I think we could do both. Some of my reasons:
|
Let me kill this with fire. :) And let's revive it once/if one day we have a DB with issues. |
It can be useful for a Web site to be able to know the status of all issues impacting a domain name.
Searching by domain names
example.org
should give the list of all the issues related to this domain name.(added on March 2017)
/feeds/<domain_name>
The text was updated successfully, but these errors were encountered: