Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Access issues by domain names (Atom feed) #132

Closed
2 of 12 tasks
karlcow opened this issue Jun 3, 2014 · 17 comments
Closed
2 of 12 tasks

Access issues by domain names (Atom feed) #132

karlcow opened this issue Jun 3, 2014 · 17 comments

Comments

@karlcow
Copy link
Member

karlcow commented Jun 3, 2014

It can be useful for a Web site to be able to know the status of all issues impacting a domain name.

Searching by domain names example.org should give the list of all the issues related to this domain name.

(added on March 2017)

  • Needs unittest for 200 OK
  • Needs unittest for 404
  • Needs unittest for helpers function grabbing the data
  • Needs unittest for valid feeds and valid domains
  • Requires a route for /feeds/<domain_name>
  • Makes sure the domain is not a random string (XSS)
  • Checks the performance impact
  • Helper functions for gathering the data from a domain_name search string.
    • Issue summary
    • latest comment date
    • issue number
    • (latest comment text?)
    • status
    • only open issues?
  • Templates for feed (entry and feed.atom)
  • Adds a home page for feeds explaining the mechanism and updates policy
  • Creates a system for delivering feeds on a static basis
  • (?) Auto-create on fly or generate daily with a cron.
@miketaylr
Copy link
Member

This is our RSS feature.

@miketaylr miketaylr self-assigned this Sep 30, 2015
@hallvors
Copy link
Contributor

#788 will help

@miketaylr miketaylr assigned karlcow and unassigned miketaylr Oct 17, 2016
@miketaylr
Copy link
Member

No published branches yet, but @karlcow has a prototype in progress on his laptop. Assigning to him.

@miketaylr
Copy link
Member

Closing #60 (comment) as a dupe of this.

@miketaylr miketaylr changed the title Access issues by domain names Access issues by domain names (RSS feed) Oct 17, 2016
@miketaylr miketaylr changed the title Access issues by domain names (RSS feed) Access issues by domain names (Atom feed) Oct 17, 2016
@miketaylr miketaylr reopened this Oct 17, 2016
@karlcow
Copy link
Member Author

karlcow commented Mar 30, 2017

Preserving some things I had done for #60 So I can delete my local branch.

for webcompat/views.py

@app.route('/feeds/<domain_name>')
def domain_feed(domain_name):
      '''Route to display a feed for a domain name.

      - domain_name would be `mozilla.org`.
      - should make a search of all titles, numbers, latest comment date
      '''
      # User is probably not necessary here.
      if g.user:
          get_user_info()
      # Searching the domain_name and return a JSON with relevant data
      # to be defined in helpers
      domain_data = feed_summary(domain_name)
      return render_template('feed.atom', domain_data)

@karlcow
Copy link
Member Author

karlcow commented Mar 30, 2017

Made the first comment more descriptive with the list of things to do.

@karlcow
Copy link
Member Author

karlcow commented May 17, 2017

Note to self (it will grow with time):

There are a couple of ways to do that and to explore. I need to explore the impact about choices for them and the likelihood of creating a performance impact on the application.

Some possibilities:

  1. Generate the feed through a search query each time there is a request for the feed.
    • Pro: Information always fresh
    • Con: A search query is created at each request. Even with caching information, feed reader apps are not very respectful of HTTP best practices. So they will hit the server every time. That might exhaust our search rate limit.
  2. Generate a static feed at first request once an hour or once a day. Deliver the static file for each subsequent request. There might even be Flask extension doing already that. to search
    • Pro: Cache/Performance friendly. We have in cache only the domain name feeds which have been requested and not all domains names.
    • Con: Information age == defined by the cache we are creating (1 day old for example)
  3. Generate once a day with a cron, feeds for every known domains we currently have on webcompat.com
    • Pro: Cache/Performance friendly.
    • Con: same issues than 2. and having useless feeds kept around.

Some additional issues:

  • domain names are not always the source of the issues. Think about a website having disqus comments. The real issue is on disqus which is different from the domain name which has been reported.
  • Changing URIs in the bug report after analysis.
  • Domain names of the same family (music.yandex.ru and radio.yandex.ru) or the blogspot.* (yes blogspot is addressing all country TLDs).
  • Information relevant in the feed has to be determined.
    • domain name
    • URL
    • Steps to reproduce
    • Screenshot?
    • How the feed <item> evolves with time.
      • Do we advertise the change of status?
      • Do we keep it once it is closed? Or do we remove it?
      • Do we advertise comments about the issue when there's a new one? Content comment or just the link to the comment?
      • Do we individual items for each new event or just an item we refresh and change with new information?

Some possible dependencies/information:

@karlcow
Copy link
Member Author

karlcow commented May 18, 2017

karlcow added a commit to karlcow/webcompat.com that referenced this issue Aug 16, 2017
karlcow added a commit to karlcow/webcompat.com that referenced this issue Aug 16, 2017
karlcow added a commit to karlcow/webcompat.com that referenced this issue Aug 16, 2017
karlcow added a commit to karlcow/webcompat.com that referenced this issue Aug 16, 2017
- pep257
- orders of import
- ignore webcompat.views check
karlcow added a commit to karlcow/webcompat.com that referenced this issue Aug 16, 2017
- Adds a /feed Blueprint
- Prepares for the main request feed function
@karlcow
Copy link
Member Author

karlcow commented Aug 16, 2017

Let's start the experiment. Code! 🚨
And we will see if we have to throw everything. 🗑

@karlcow
Copy link
Member Author

karlcow commented Aug 16, 2017

  • I created a Blueprint for /feeds
  • and added a couple of tests

The main thing I will be experimenting is the creation of static files either generated on first requests or based on a cron.

karlcow added a commit to karlcow/webcompat.com that referenced this issue Aug 16, 2017
- Fixes tests for feeds/ home page
- Creates shells for prose
- Defines routes for feeds/
karlcow added a commit to karlcow/webcompat.com that referenced this issue Aug 16, 2017
- Handles non existent domain names.
- Creates a helper file for all things strictly related to feeds
- Adjusts test for the right routes and content
karlcow added a commit to karlcow/webcompat.com that referenced this issue Aug 17, 2017
karlcow added a commit to karlcow/webcompat.com that referenced this issue Aug 17, 2017
@karlcow
Copy link
Member Author

karlcow commented Aug 17, 2017

There is a feed feature in Werkzeug to keep in mind.
http://flask.pocoo.org/snippets/10/

@karlcow
Copy link
Member Author

karlcow commented Aug 17, 2017

Dumping ideas. Notebook style. 📓

🐍 pseudo-code

@feeds.route('/<domain>', methods=['GET'])
def domain_feed(domain):
    """Serve a feed for a specific domain name."""
    # Have we handled this domain already?
    if is_known_domain(domain):
        # Do we have a static atom feed file for it?
        if not is_static_feed(domain):
            # Let's create the feed in data/feed/
            create_feed(domain)
        # we can serve the feed to users.
        return serve_domain_feed(domain)
    else:
        # if we don't know anything we return 404
        return (
            '{domain} has no feed'.format(domain=domain),
            404,
            {'Content-Type': 'text/plain'})

I want to minimize the impact of bad feed readers. No matter how much caching you set on feed resources, many feed readers ignore it, and request every couple of minutes. So to avoid to generate a feed each time, I want to serve a static file that we generated on the first request.

Another benefit is we get files only for domains that people are interested in.

An interesting question will come up with updating, but let's say it's an issue we have to deal with later.

Some issues 🚨

  • data quality. For example, the domain is not always the domain of the issue. I can think of marfeel issues, I can think of issues recently reported for youtube iframe which are on plenty of domain names.
  • Managing duplicates so we do not have a feed with 10s of the same issue.
  • Defining the type of information which would be useful for domain owners. Is it the change of status which is interesting or the new comments. These the most interesting and challenging part, which ties to which data we have and what can we share that will be useful for others.

karlcow added a commit to karlcow/webcompat.com that referenced this issue Aug 24, 2017
@karlcow
Copy link
Member Author

karlcow commented Aug 30, 2017

data quality is interesting…
From a dump I have of all the issues from July 2017. Around 7920 issues. the domain names… are not always here or bogus or with irregular patterns.

I found so far 370 issues with bogus domains.
I tried to cover as much as possible the possible patterns.

title is the issue title, so something ala www.nytimes.com - desktop site instead of mobile site

Current version. Will evolve.

def extract_domain_name(title, issue_number):
    """Extract the domain name from the title string."""
    # a domain name doesn't contain space
    candidate = title.split(' ', 1)[0]
    # domain names are lower cases
    candidate = candidate.lower()
    # a domain name contains at least one "."
    if '.' not in candidate:
        return 'BOGUS', issue_number
    # Tuple of bogus pattern to check against
    bogus_start_patterns = ('resource://', 'file://', 'chrome://')
    if candidate.startswith(bogus_start_patterns):
        return 'BOGUS', issue_number
    # it contains a domain name.
    if candidate.startswith('view-source:'):
        candidate = candidate.split('view-source:')[1]
    if ':' in candidate and not candidate.startswith('http'):
        candidate = candidate.split(':')[0]
        candidate = 'http://{}'.format(candidate)
    # some issues starts with http, we will clean up.
    if candidate.startswith('http://') or candidate.startswith('https://'):
        candidate = urlparse.urlsplit(candidate).netloc
        candidate = candidate.split(':')[0]
    # some domains with a path
    if '/' in candidate:
        candidate = candidate.split('/')[0]
    # some bogus domain with &
    if '&' in candidate:
        candidate = candidate.split('&')[0]
    # Handling local domains
    local_patterns = ('10.', '127.0.0.1', '192.168.', '172.')
    if candidate.startswith(local_patterns):
        return 'BOGUS', issue_number
    # return issue_number, candidate.encode('utf-8')
    # return candidate.encode('utf-8'), title
    return candidate.encode('utf-8')

Some of them that I'm fixing on the fly have an opportunity to be fixed once and for all.
I could spill out a FIXME for those, so the data quality improves for the next run.

I will re-run it soon with a fresh issue dump.

There are still some issues where the domain name is different from URL: in the body.
I can probably create an additional check to extract these and compare.

This is just for dumping a DB of domain names to generate feeds, but could be ultimately reuse in normalizing the data we receive from people.

  • 7920 domain names (July 2017)
  • 7550 valid domain names, aka 95% (minus all the small things I missed)
  • 370 bogus domain names (in issue title)
  • 4141 unique domain names, aka 55% (4141 potential feeds)
  • top 50?
 259 www.youtube.com
 125 www.facebook.com
 112 www.google.com
 110 vk.com
  76 www.netflix.com
  76 m.youtube.com
  65 web.whatsapp.com
  62 m.facebook.com
  55 webcompat.com
  53 addons.mozilla.org
  47 www.coco.fr
  40 twitter.com
  34 music.yandex.ru
  33 www.mozilla.org
  33 s0.2mdn.net
  32 www.twitch.tv
  32 support.mozilla.org
  32 mail.google.com
  27 github.com
  21 www.reddit.com
  20 www.amazon.com
  20 mega.nz
  19 www.pandora.com
  19 www.amazon.in
  19 play.google.com
  18 www.amazon.de
  18 mobile.twitter.com
  17 www.amazon.co.jp
  17 apps.facebook.com
  16 www.hulu.com
  16 www.bing.com
  15 www.primevideo.com
  15 www.linkedin.com
  15 radio.garden
  15 accounts.google.com
  14 www.yahoo.com
  14 outlook.live.com
  13 mailmanager.cityweb.de
  13 inbox.google.com
  13 imgur.com
  13 docs.google.com
  13 chaturbate.com
  12 www.theverge.com
  12 www.nasa.gov
  12 www.google.co.in
  12 video.corriere.it
  12 sj.myie9.com
  12 g1.globo.com
  12 drive.google.com
  12 developer.apple.com

Google properties:

 112 www.google.com
  32 mail.google.com
  19 play.google.com
  15 accounts.google.com
  13 inbox.google.com
  13 docs.google.com
  12 www.google.co.in
  12 drive.google.com
  11 google.com
   7 www.google.ca
   7 news.google.com
   5 www.google.ro
   5 www.google.fr
   5 support.google.com
   5 images.google.com
   4 www.google.com.mx
   4 www.google.co.uk
   4 translate.google.com
   4 tpc.googlesyndication.com
   4 plus.google.com
   4 hangouts.google.com
   4 fonts.google.com
   3 www.google.se
   3 www.google.it
   3 www.google.com.br
   3 www.google.co.jp
   3 groups.google.com
   3 developers.google.com
   2 www.googleadservices.com
   2 www.google.ru
   2 www.google.de
   2 www.google.com.pk
   2 www.google.com.eg
   2 voice.google.com
   2 santatracker.google.com
   2 photos.google.com
   2 news.google.co.in
   2 keep.google.com
   2 insideabbeyroad.withgoogle.com
   2 gmail.google.com
   2 calendar.google.com
   1 www.google.sk
   1 www.google.pt
   1 www.google.me
   1 www.google.hu
   1 www.google.es
   1 www.google.com.vn
   1 www.google.com.ua
   1 www.google.com.sa
   1 www.google.com.co
   1 www.google.com.bd
   1 www.google.co.th
   1 www.google.co.id
   1 www.google.ch
   1 www.google.bg
   1 www.drive.google.com
   1 trends.google.com
   1 translate.googleusercontent.com
   1 translate.google.ro
   1 translate.google.co.kr
   1 testmysite.thinkwithgoogle.com
   1 svg-edit.googlecode.com
   1 streetart.withgoogle.com
   1 storage.googleapis.com
   1 sites.google.com
   1 scholar.google.com
   1 r4---sn-4g5edn7s.googlevideo.com
   1 r3---sn-gwpa-itqd.googlevideo.com
   1 r2---sn-4g5edned.googlevideo.com
   1 productforums.google.com
   1 privacy.google.com
   1 opensource.google.com
   1 news.google.com.tw
   1 news.google.com.br
   1 myaccount.google.com
   1 googleweblight.com
   1 google.co.in
   1 enterprise.google.com
   1 encrypted.google.com
   1 earth.google.com
   1 console.cloud.google.com
   1 com.google
   1 codelabs.developers.google.com
   1 chrome.google.com
   1 books.google.de
   1 books.google.ca
   1 apps.google.com
   1 analytics.googleblog.com

@karlcow
Copy link
Member Author

karlcow commented Aug 30, 2017

Do we create a feed when there is no valid issue associated with this domain?

@karlcow
Copy link
Member Author

karlcow commented Aug 30, 2017

Ah … crap…

Once the BOGUS title removed, we have quite a lot of differences in between titles and URL. And a lot of recent issues. That comes from softvision not entering the same domain for the title and the URL. I think they fixed it after I mentioned it, but I didn't realize we had so many bad ones.

I need to fix this. Automatically if I prepare well the data. 😭

Below (issue_number, title_domain, URL_domain)

(1005, 'jal.co.jp', 'sp5971.jal.co.jp')
(1052, 'excite.co.jp', 'a.excite.co.jp')
(1053, 'excite.co.jp', 'a.excite.co.jp')
(1083, 'btv.cat', 'www.btv.cat')
(110, 'webcrawler.com', 'www.webcrawler.com')
(1139, 'lastampa.it', '')
(1145, 'smo.suumo.jp', 'smp.suumo.jp')
(1161, 'bosch-home.pl', 'www.bosch-home.pl')
(1182, 'video.gazzetta.it', '')
(1183, 'video.gazzetta.it', '')
(1184, 'sportmediaset.mediaset.it', '')
(1185, 'video.corriere.it', '')
(1242, 'menshealth.com', 'www.menshealth.com')
(1257, 'menshealth.com', 'www.menshealth.com')
(1267, 'womenshealthmag.com', 'www.womenshealthmag.com')
(1285, 'm.facebook.com', 'spam-removed')
(1301, 'menshealth.com', 'www.menshealth.com')
(139, 'moleskine.com', 'www.moleskine.com')
(1409, 'webcompat.com', 'support.mozilla.org')
(141, 'virginamerica.com', 'www.virginamerica.com')
(1528, 'docs.google.com', 'goo.gl')
(1591, 'www.facebook.com', 'spam-removed')
(1592, 'www.facebook.com', 'spam-removed')
(1593, 'www.facebook.com', 'spam-removed')
(1595, 'www.facebook.com', 'spam-removed')
(1596, 'www.facebook.com', 'spam-removed')
(1597, 'www.facebook.com', 'spam-removed')
(1598, 'www.facebook.com', 'spam-removed')
(1601, 'www.facebook.com', 'spam-removed')
(1602, 'www.facebook.com', 'spam-removed')
(1603, 'www.facebook.com', 'spam-removed')
(1604, 'www.facebook.com', 'spam-removed')
(1605, 'www.facebook.com', 'spam-removed')
(1611, 'www.facebook.com', 'spam-removed')
(1612, 'www.facebook.com', 'spam-removed')
(1614, 'www.facebook.com', 'spam-removed')
(1616, 'www.facebook.com', 'spam-removed')
(1617, 'www.facebook.com', 'spam-removed')
(1652, 'm.facebook.com', '')
(1687, 'www.fb.com', 'spam-removed')
(1688, 'm.fb.com', '')
(1689, 'www.fb.com', 'spam-removed')
(1690, 'm.fb.com', 'spam-removed')
(1691, 'www.fb.com', 'spam-removed')
(1692, 'www.fb.com', 'spam-removed')
(174, 'www.jetblue.com', 'jetblue.com')
(1807, 'amazon.com', 'https:')
(1850, 'mozillafestival.org', '2015.mozillafestival.org')
(1917, 'www.flipkart.com', 'www')
(1995, '8888.186tcye.pw', '')
(20, 'crosswalkdp.com', 'www.crosswalkdp.com')
(2001, 'glasses.com', 'www.glasses.com')
(2007, 'bioskop21.id', '')
(2008, 'bioskop21.id', '')
(2017, 'appinstallsmobi.com', '')
(2019, 'webcompat.com', 'www.6666hh.com')
(207, 'nfl.com', 'www.nfl.com')
(2107, 'allindiaradio.govt.in', 'allindiaradio.gov.in')
(2181, 'm.facebook.com', '')
(2232, 'm.facebook.com', '')
(2240, 'yuku.com', 'www.yuku.com')
(2243, 'www.sz-runxin.com', '')
(2314, 'saa.qualtrics.com', '')
(2396, 'hotmoza.com', '')
(2433, 'www.marcoborla.it', '')
(2476, 'barbershop.org', 'ebiz.barbershop.org')
(2498, 'video.js', 'github.com')
(2499, 'discovery.com', 'www.discovery.com')
(2502, '1g22.com', '')
(2503, '1g22.com', '')
(2505, 'jornada.una.mx', 'www.jornada.unam.mx')
(2740, 'oneviewcalendar.com', 'www.oneviewcalendar.com')
(28, 'webcompat.com', 'github.com')
(2822, 'dragon8.troyhero.com', '')
(2823, '546r.com', '')
(2884, 'www.tangerine.ca', 'secure.tangerine.ca')
(2891, 'www.luludai.cc', '')
(3, 'volcanicpixels.com', 'www.volcanicpixels.com')
(3066, 'chromestatus.com', 'www.chromestatus.com')
(3146, 'mobile22.gameassists.co.uk', '`http')
(3464, 'codepen.io', '')
(3623, 'largepenissociety.tumblr.com', 'large*society.tumblr.com')
(372, 'cbc.ca', '')
(3835, 'outlook.live.com', '')
(385, 'pch.sweeps.com', 'pch sweeps.com')
(399, 'www.hwbank.it,', 'www.hwbank.it, www.netxhs.it')
(4119, 'www.', 'www. webcompat.com')
(45, 'http.req.url.http_url_safe', 'www.ibm.com')
(4729, 'm.weibo.cn', 'm.weibo.cn -  swipe gesture issue')
(490, 'm.spiegel.de', 'm.spiegel.de   or   spiegel.de')
(4979, 'www.facebook.com', '')
(4987, 'answers.yahoo.com', '')
(5007, 'www.reddit.com', '')
(5008, 'www.reddit.com', '')
(5009, 'www.reddit.com', '')
(5011, 'www.reddit.com', '')
(5012, 'www.twitter.com', '')
(5070, 'www.linkedin.com', '')
(5073, 'www.linkedin.com', '')
(51, 'expedia.co.jp', 'www.expedia.co.jp')
(521, 'okcupid.com', 'www.okcupid.com')
(53, 'nascarwagers.com', 'www.nascarwagers.com')
(54, 'xvideos.com', 'www.xvideos.com')
(5488, 'www.xvideos.com', '')
(5489, 'www.indeed.com', 'indeed.com')
(5509, 'www.spotify.com', 'open.spotify.com')
(5566, 'www.bestbuy.com', 'www.bestbuy-jobs.com')
(5568, 'www.bestbuy.com', 'www.bestbuy-jobs.com')
(5573, 'www.deals.bestbuy.com', 'deals.bestbuy.com')
(5589, 'www.baidu.com.com', 'goo.gl')
(5591, 'www.baidu.com', 'music.baidu.com')
(5592, 'www.baidu.com', 'voice.baidu.com')
(5593, 'www.baidu.com', 'voice.baidu.com')
(5602, 'www.baidu.com', 'goo.gl')
(5604, 'www.disney.com', 'm.disneystore.com')
(5605, 'www.disney.com', 'm.disneystore.com')
(5654, 'www.homedepot.com', 'm.homedepot.com')
(5656, 'www.homedepot.com', 'm.homedepot.com')
(57, 'momondo.com', 'm.momondo.com')
(59, 'independent.co.uk', 'www.independent.co.uk')
(5906, 'www.rumble.com', 'rumble.com')
(5910, 'www.rumble.com', 'rumble.com')
(5936, 'm.privacy2browsing.com', '[removed]')
(5949, 'www.rumble.com', 'rumble.com')
(5953, 'www.rumble.com', 'rumble.com')
(5957, 'www.rumble.com', 'rumble.com')
(6016, 'www.gomovies.to', 'gomovies.to')
(6021, 'www.gomovies.to', 'gomovies.to')
(6023, 'www.gomovies.to', 'gomovies.to')
(6049, 'www.gomovies.to', 'gomovies.to')
(6141, 'youtube.com', 'www.youtube.com')
(617, 'grammarly.com', '')
(6183, 'www.citi.com', 'online.citi.com')
(6184, 'www.citi.com', 'www.privatebank.citibank.com')
(6186, 'www.citi.com', 'www.privatebank.citibank.com')
(6195, 'www.businessinsider.com', 'intelligence.businessinsider.com')
(6216, 'www.wikipedia.org', 'goo.gl')
(6217, 'www.wikipedia.org', 'en.m.wikipedia.org')
(6218, 'www.wikipedia.org', 'en.m.wikipedia.org')
(6219, 'www.wikipedia.org', 'en.m.wikipedia.org')
(6248, 'www.wikipedia.org', 'en.m.wikivoyage.org')
(6250, 'www.wikipedia.org', 'm.mediawiki.org')
(6254, 'www.yahoo.com', 'fr.yahoo.com')
(6255, 'www.yahoo.com', 'research.yahoo.com')
(6256, 'www.yahoo.com', 'research.yahoo.com')
(6397, 'www.yahoo.com', 'fr.yahoo.com')
(6398, 'www.yahoo.com', 'login.yahoo.com')
(6400, 'www.yahoo.com', 'fr.sports.yahoo.com')
(6402, 'www.yahoo.com', 'fr.finance.yahoo.com')
(6403, 'www.yahoo.com', 'fr.finance.yahoo.com')
(6411, 'www.lemonde.fr', 'abo.lemonde.fr')
(6412, 'www.lemonde.fr', 'secure.lemonde.fr')
(6435, 'www.lemonde.fr', 'moncompte.lemonde.fr')
(644, 'inbox.google.com', '')
(6447, 'www.ebay.fr', 'm.ebay.fr')
(6463, 'www.ebay.fr', 'csr.ebay.fr')
(6471, 'www.ebay.fr', 'csr.ebay.fr')
(6475, 'www.ebay.fr', 'm.ebay.fr')
(6477, 'www.ebay.fr', 'm.ebay.fr')
(6499, 'www.allocine.fr', 'secure.allocine.fr')
(6567, 'www.sfr.fr', 'assistance.sfr.fr')
(6576, 'www.lequipe.fr', 'm.lequipe.fr')
(6577, 'www.ebay.fr', 'csr.ebay.fr')
(6584, 'youtube.com', 'www.youtube.com')
(6593, 'www.lequipe.fr', 'm.lequipe.fr')
(6595, 'www.lequipe.fr', 'm.lequipe.fr')
(6628, 'www.aliexpress.com', 'm.fr.aliexpress.com')
(6629, 'www.aliexpress.com', 'm.fr.aliexpress.com')
(6633, 'www.aliexpress.com', 'm.fr.aliexpress.com')
(6656, 'www.aliexpress.com', 'm.fr.aliexpress.com')
(6658, 'www.aliexpress.com', 'm.fr.aliexpress.com')
(666, 'mint.com', 'javascript')
(6663, 'www.aliexpress.com', 'm.fr.aliexpress.com')
(6668, 'www.tumblr.com', 'goo.gl')
(6725, 'www.stackoverflow.com', 'stackoverflow.com')
(6728, 'disqus.com', 'stackoverflow.blog')
(6786, 'www.bfmtv.com', 'rmc.bfmtv.com')
(6789, 'www.leparisien.fr', 'm.leparisien.fr')
(6790, 'www.leparisien.fr', 'm.leparisien.fr')
(6791, 'www.leparisien.fr', 'connect.leparisien.fr')
(6830, 'www.fnac.com', 'secure.fnac.com')
(6893, 'www.societegenerale.fr', 'm.particuliers.societegenerale.fr')
(6896, 'www.societegenerale.fr', '3qv7.la1-c1-frf.salesforceliveagent.com')
(6912, 'www.bouyguestelecom.fr', 'www.mon-compte.bouyguestelecom.fr')
(6914, 'www.bouyguestelecom.fr', 'www.assistance.bouyguestelecom.fr')
(6916, 'www.bouyguestelecom.fr', 'forum.bouyguestelecom.fr')
(6945, 'www.laposte.net', 'compte.laposte.net')
(6955, 'www.ok.ru', 'm.ok.ru')
(696, 'ign.com', 'in.ign.com')
(6973, 'www.ok.ru', 'm.ok.ru')
(6979, 'www.ok.ru', 'm.ok.ru')
(6999, 'www.ouest-france.fr', 'www.ouestfrance-immo.com')
(7, 'youtube.com', 'm.youtube.com')
(7002, 'www.ouest-france.fr', 'www.ouestfrance-immo.com')
(7006, 'www.ouest-france.fr', 'www.ouestfrance-immo.com')
(7012, 'www.deezer.com', 'support.deezer.com')
(7041, 'youtube.com', 'https:')
(707, 'myatt.com', 'myatt.com or http')
(7071, 'www.leroymerlin.fr', 'communaute.leroymerlin.fr')
(7099, 'www.libertyland.co', 'libertyland.co')
(7118, 'www.libertyland.co', 'libertyland.co')
(7119, 'www.mabanque.bnpparibas', 'mabanque.bnpparibas')
(7126, 'www.mabanque.bnpparibas', 'mabanque.bnpparibas')
(7127, 'www.mabanque.bnpparibas', 'mabanque.bnpparibas')
(7186, 'www.liberation,fr', 'www.liberation.fr')
(7188, 'www.vimeo.com', 'vimeo.com')
(7289, 'www.google.com', 'google.com')
(7290, 'www.google.com', 'google.com')
(7292, 'www.google.com', 'google.com')
(7293, 'www.google.com', 'google.com')
(7296, 'www.google.com', 'google.com')
(7298, 'www.google.com', 'google.com')
(7299, 'www.google.com', 'google.com')
(7304, 'www.google.com', 'google.com')
(7305, 'www.google.com', 'google.com')
(7309, 'www.google.com', 'google.com')
(7311, 'www.google.com', 'google.com')
(7319, 'www.google.com', 'google.com')
(7323, 'www.google.com', 'google.com')
(7356, 'www.google.com', 'google.com')
(74, 'www.fresno.courts.ca.gov', '')
(7409, 'www.hotstart.com', 'www.hotstar.com')
(7422, 'www.ntd.tv', 'mb.ntd.tv')
(7424, 'www.ntd.tv', 'mb.ntd.tv')
(7441, 'www.torrentz2.eu', 'torrentz2.eu')
(7443, 'www.ndtv.com', 'm.ndtv.com')
(7451, 'www.ndtv.com', 'm.ndtv.com')
(7468, 'www.ndtv.com', 'auto.ndtv.com')
(7470, 'www.rediff.com', 'm.rediff.com')
(7479, 'www.rediff.com', 'labs.rediff.com')
(7480, 'www.rediff.com', 'labs.rediff.com')
(7481, 'www.rediff.com', 'register.rediff.com')
(7482, 'www.rediff.com', 'ishare.rediff.com')
(7514, 'www.rediff.com', 'zarabol.rediff.com')
(7516, 'www.rediff.com', 'm.rediff.com')
(7522, 'www.rediff.com', 'mypage.rediff.com')
(7585, 'www.moneycontrol.com', 'm.moneycontrol.com')
(7587, 'www.moneycontrol.com', 'm.moneycontrol.com')
(7593, 'www.moneycontrol.com', 'm.moneycontrol.com')
(7594, 'www.snapdeal.com', 'm.snapdeal.com')
(7615, 'www.msn.com-', 'www.msn.com')
(7639, 'www.makemytrip.com', 'holidayz.makemytrip.com')
(7647, 'www.justdial.com', 't.justdial.com')
(7648, 'www.justdial.com', 't.justdial.com')
(7653, 'www.justdial.com', 't.justdial.com')
(7749, 'www.justdial.com', 't.justdial.com')
(7752, 'www.softonic.com', 'features.en.softonic.com')
(7757, 'www.indianexpress.com', 'indianexpress.com')
(7758, 'www.indianexpress.com', 'indianexpress.com')
(7795, 'www.indianexpress.com', 'indianexpress.com')
(7796, 'www.indianexpress.com', 'indianexpress.com')
(78, 'comptoir-hardware.com', 'www.comptoir-hardware.com')
(7803, 'www.xhamster.com', 'm.xhamster.com')
(7804, 'www.shopclues.com', 'm.shopclues.com')
(7858, 'www.oneindia.com', 'recharge.oneindia.com')
(7902, 'www.filehippo.com', 'filehippo.com')
(7904, 'www.indiamart.com', 'm.indiamart.com')
(7909, 'www.indiamart.com', 'm.indiamart.com')
(81, 'citroen.ru', 'www.citroen.ru')
(86, 'outlook.com', 'www.outlook.com')
(89, 'ovh.com', 'www.ovh.com')
(900, 'tastebuds.fm', 'tastebuds.fm and naukri.com')
(918, 'deceeeu.ro', 'deceeu.ro')
(955, 'tastebuds.fm', 'tastebuds.fm and naukri.com , webcompat')
(964, 'www.weibo.com.com', 'www.weibo.com')

@karlcow
Copy link
Member Author

karlcow commented Aug 31, 2017

Recording here so it's not lost.
@denschub yesterday was suggesting that we provide only for 2nd level domain name. To maximize the outreach for web developers of one company.

This could be done, aka instead of:

/feed/www.example.org
/feed/lab.example.org

we provide only

/feed/example.org

I personally prefer the more granular version for different reasons, but I think we could do both. Some of my reasons:

  • foo.tumblr.com != bar.tumblr.com in some cases individual contributor choices.
  • google.fr and google.ro Grouping domain names is sometimes difficult. 2nd level domain names will not catch those.
  • Sometimes a local version or specific project is handled by a complete different team or even company.
  • Companies know usually which domains they want to track.

@karlcow
Copy link
Member Author

karlcow commented Feb 16, 2018

Let me kill this with fire. :) And let's revive it once/if one day we have a DB with issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment