
Instagram: @inthefrow
Recently a friend of mine and fashion blogger, @inthefrow, asked me if I’d help her determine who her most popular Instagram followers are. I knew Instagram had an API so I assumed the task would be trivial. Of course, it wasn’t as straight forward as I first assumed…
TL:DR
From a sample of 21,239 Instagram users:
- 6133 had protected accounts – thats 28%
- Average number of followers: 843
- Median number of followers: 194
- Average number of people followed: 822
- Median number of people followed: 265
- Average follower/follows ratio: 1:1.76
- 533 or 2.5% are disabled or other wise invalid accounts
Instagram api
Since doing the Facebook Election 2012 piece I’ve become a real fan of data API’s, JSON and data. After skimming the Instagram API documentation I figured I’d just have to get an access token and pull all the followers of a user. So I registered an application and set up a simple client-side user authentication.
Early frustrations included:
- the amount of followers you are shown to have is not accurate, the actual number is probably less. It’s a known Instagram issue. I’m missing 10 followers, my friend is missing around 500
- the Instagram API only returns 45-50 followers at a time and not all are users’ followers. So, you have to do multiple requests of the API; but fortunately the response returns the URL you need to return the next set of results.
- the documentation is a little like the Facebook documentation, that is, they are mostly up to date but not always totally accurate.
This simple thing is getting more complex.
Processing the data
I’d seen the JSON output, so decided to generate a HTML table and a downloadable CSV file of the output:
var csv_content = "data:text/csv;charset=utf-8,";
var html = [];
// all_followers is an array of the followers from the Instagram API request
$.each(all_followers,function(index,follower){
var follower_bio = follower.bio;
var follower_website = (follower.website.length > 0) ? "<a href='"+follower.website+"' target='_blank'>"+follower.website+"</a>" : "";
// clean up bio string
follower.bio = follower_bio.replace(/(\r\n|\n|\r|"|')/g,"");
// add a follower number
follower.number = (index+1);
// build table row
html.push("<tr><td>"+follower.number+"</td>");
html.push("<td><img src='"+follower.profile_picture+"'></td>");
html.push("<td><a href='http://instagram.com/"+follower.username+"' target='_blank'>"+follower.username+"</a></td>");
html.push("<td>"+follower.full_name+"</td>");
html.push("<td>"+follower_website+"</td>");
html.push("<td>"+follower.bio+"</td>");
html.push("<td>"+follower.followers+"</td>");
html.push("<td>"+follower.follows+"</td></tr>");
// build CSV row
csvContent += '"'+follower.username+'",';
csvContent += '"'+follower.full_name+'",';
csvContent += '"'+follower.website+'",';
csvContent += '"'+follower.bio+'",';
csvContent += '"'+follower.followers+'",';
csvContent += '"'+follower.follows+'"';
csvContent += '\n';
});
// append generated html
$("table tbody").append(html.join(''));
// encode the csv, put it on the download link href and give the download link a filename
$("a.download_link").attr("href",encodeURI(csv_content)).attr("download","followers.csv");
Generally speaking, I knew this wouldn’t be very efficient, but I was doing this as a quick thing and figured it would do the job (at least in Chrome on my machine).
For my account, with around 80 followers, this worked fine. It did 2 requests to Instagram and displayed the content quickly and easily; the CSV file worked like a charm. For my friend’s account, with 21,000 followers, the script needed to do 430 requests which took a few minutes and you could feel the browser struggling to render the table with 21,000 rows. The CSV link? Forget about it. Crash. Oh, and it didn’t return the follower follow counts. This “simple thing” is getting way more complex.
Follower followers
To pull follower follow counts you have to request each followers basic information. So for my 80 followers, that’s 80 requests. For 21,000 followers, that’s 21,000 requests. And the Instagram API rate limits you to 5,000 an hour.
So, I set about modifying my JavaScript to do these additional calls, one every 0.8 seconds as to not go beyond the rate limit. And then I decided to push the JSON output, user by user, to the page. The download link isn’t going to work, so I changed the output to go into a text-area and write the table row by row. I knew this would take some time to run; so I left it over night.
Eventually it used 100% of my processor and 5 gigs of RAM and the inevitable happened; it crashed the browser. So I ditched the table and tried again. Same result. This simple task has taken way more effort and energy than I expected. Very frustrating.
In the end I decided to batch 2000 followers at time, with a smaller interval between requests, outputting to a textarea and then manually combining to form one CSV. I guess doing this server-side, rather than client-side, would have been more reliable. But I’m a front-end developer, and naturally solve all problems with CSS, JavaScript and HTML.
The data
The dataset can’t be seen as indicative of all Instagram users as my friend specifically blogs, Tweets and posts pictures to Instagram about female fashion and make up; so it’s no surprise that most of the names in the list are girls names. I’d guess around 99%. Instagram doesn’t store gender, so there is no way to be sure.
Instagram reports that my friend has 21,772 followers. The api returned 533 less; it’s perhaps computationally very heavy for Instagram to stay on top of these counts with accounts being made inactive or disabled for whatever reason.
A quick note about protected accounts: Around 28% of the users I requested didn’t return a follower count, and instead returned an error message saying I was not authorized to see that information. On some quick investigation, this seems to be because they have protected accounts; though I guess there may be other reasons why Instagram doesn’t return follower counts, so how accurate this figure is, I don’t know.
Further to the facts above:
- 34 users have no followers
- 45 only had 1 follower
- 461 had 10 or less followers
On the other end of the spectrum:
- 1 has more than 1,000,000 followers
- 5 have more then 250,000 followers
- 10 have more than 100,000 followers
- 21 have more than 50,000 followers
- 70 have more than 20,000 followers
- 120 have more than 10,000 followers
- 1,105 have more than 1,000 followers
The average number of followers and follow counts seem skewed towards larger numbers. The few really big numbers must be enough to alter the averages sufficiently to push the values up. So I think the median value of 194 followers, and to follow 265 people, feels more realistic for what I perceive as averages for people on Instagram.
Conclusions
A large chunk of Instagram users are private, most users follow and are followed by around 200 other people and I’m way less popular than my friend on social networks.
Can you support me as sw dev freelancer? Visit phootprint.co email me at info@phootprint.co
Hi! Any chance you could make this app available for using? Have been trying to do the same analysis for my account on IG.
Leo get in touch with me on twitter (@13twelve) and I’ll work out a way to get you the final JS I used. It ended up being a much more manual process than I expected though..
Hi! Interesting read! Have you tried retrieving info about how many followers users that use a certain hashtag have?
Do you want unlimited content for your page ? I am sure you spend a lot of time writing articles, but you can save it for other tasks, just
search in google: kelombur’s favorite tool
Fascinating thanks.
Where are the peeps number
Give me a shoutout on instagram? I have really been doing poorly and losing followers.
wow that’s a nice technical research you’ve performed, i’m kinda impressed. Nice knoledge of scripts too. i thought you may find this service useful for you or your fellow bloggers – fast-unfollow.com, a web-based service that allows mass unfollow of up to 5000 users per day. It avoids script restrictions by optimizing requests, so fear no bans. I’m not trying to advertize it, just recommend. Btw first 1000 is free so you can evaluate its speed effortlessly.
did you find an average or how many posts there were?
Very informative, thanks for posting.