Dynamics of Hacker News

April 28th, 2013

Hacker News is an interesting microcosm of Silicon Valley, where startups and side projects are regularly launched. Users submit links with short descriptions, which are shown on a single page here. Other users vote on submissions, and each vote gives submissions a chance to hit the coveted front page and benefit from a surge in visitor traffic. During peak hours, a submitted link may fall off the newest submission page quite quickly, which has lead to many analyses of the "best time" to submit to Hacker News. Some examples: HNPickup (apparently defunct), one here (without source data), a Quora question, and another analysis by a Hacker News regular. This is another addition to that list, but one with downloadable source data.

  1. Collecting the data
  2. The best time to submit
  3. Voting behavior
  4. A dubious trick
  5. TL;DR

Collecting the Data

The robots.txt file specifies a minimum crawl delay of 30 seconds, so I chose a reasonable value of 5 minutes between crawls of just the /newest page (this took some tweaking to get right, as can be seen in the dataset). I let the crawler run from March 18th to April 16th, during which it collected 5,790 snapshots of the newest submissions page. There were some periods where my server went down, or my crawler was (presumably) banned from Hacker News, which explains the discrepancy from the expected number of points.

Each snapshot is parsed and put into a MySQL table with the following columns:

Download the MySQL table dump (2.6 MB compressed, 22 MB uncompressed), or continue reading for some analysis.

The Best Time to Submit

I'll be the first to admit that any stated "best time to submit" is a specious assertion. No amount of careful timing will magically boost linkspam to the front page, but it's also probably true that the amount of time a link spends on the /newest page affects it's chances of making it to the front page, all other factors being equal.

The simplest analysis is to measure the number of positions each submission falls between snapshots. This should be correlated with the submission rate, but is affected by the fact that upvotes increase the amount of time a submission spends on the /newest page. With almost four weeks of data, Fig. 1 shows the mean and median "fall rates" for submissions broken down by hour, in Pacific Standard Time.

Positions dropped per minute
Fig 1. Mean and median positions dropped per minute by hour of week.

On a Tuesday at 8am Pacific, new submissions drop an average (and median) of more than 1 position per minute. This means a story submitted around 8am will be off the /newest page in under half an hour, since there are 30 slots on the page. Fig. 2 is a more human-readable version of Fig. 1 that shows the amount of time a new submission lingers on the first of the /newest pages.

Time spent on /newest homepage
Fig 2. Mean and median time spent on /newest homepage by hour of week.

Voting Behavior

Assuming that a submission is worthy of upvotes, how does it go about procuring them? All submissions start with a score of 1, which increases by 1 for each upvote received. Hacker News does not allow submissions to be downvoted. From Fig 3(a), we can see that 60% of stories that are upvoted (and reach a score of 2) fall off the /latest page with just that one single upvote (a kindly friend of the submitter, perhaps). About 90% fall off with 10 or fewer upvotes.

What's more interesting is the amount of time till a submission acquires its first upvote. We can break this down by two types of submissions for convenience: those that received at least 1 upvote, and those that have received at least 10 upvotes before they fall off the /newest page (the former set is contained within the latter). Fig 3(b) shows the distribution of times till the first upvote for both these classes of stories. Interesting tidbit: 50% of submissions that are going to receive at least 1 upvote do so within 11 minutes of submission. However, more than 50% of submissions that going to receive at least 10 upvotes receive their first upvote within 5 minutes of submission. Hacker News sniffs out interesting stuff quickly.

If you submit to Hacker News, you probably only need to wait half an hour to figure out how your well your submission is going to fare.

Statistics of stories that were upvoted
Fig 3(a). The cumulative distribution of the highest score attained by a submission while still on the /newest page. Fig 3(b). For submissions that were upvoted, the time in minutes from submission till the first upvote was seen.

A Dubious Trick

In the course of sanity-checking my parser, I discovered a number of instances of an interesting phenomena: a URL submission seen at time T with score S would appear at a later time with a score less than S. It would appear that some submissions are reducing in score, i.e., being downvoted, even though Hacker News does not allow duplicate submissions, or submissions to be downvoted. Here are three examples (usernames omitted):

Snapshot time Score Submission time URL
Apr 2 05:42 2 Apr 2 04:54 http://www.youtube.com/watch?v=zPrEgGAXdhI
Apr 2 06:41 1 Apr 2 06:38 http://www.youtube.com/watch?v=zPrEgGAXdhI
Mar 28 07:01 5 Mar 28 06:43 http://demo.peerkit.com/static/index_demo.html
Mar 28 12:47 1 Mar 28 12:47 http://demo.peerkit.com/static/index_demo.html
Mar 22 06:51 2 Mar 22 06:18 http://blog.smartbear.com/software-quality/bid/275689/What-Makes-Beautiful-Software
Mar 22 07:07 1 Mar 22 07:01 http://blog.smartbear.com/software-quality/bid/275689/What-Makes-Beautiful-Software

In many of these cases, users are clearly using a loophole that allows submissions to be deleted by the person who submitted them. These users are then free to re-submit the same links, effectively increasing the amount of time that their submissions stay on the /newest page. There are on the order of hundreds of instances of this type of behavior, indicating that it's a well-known but not widely prevalent trick. It's probably a loophole that should be patched.