Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads.
Hello.Hi everyone.
My name is Chris Wanstrath. I go by @defunkt online.
inside   githubAnd today I’m going to talk about GitHub.
inside   githubThat’s me.
GitHub is what we like to call “social coding.”
You can see what your friends are doing from your dashboard or news feed
Everyone has a profile showing off their code and activity
And you can do things like leave comments on commits.
But it wasn’t always like this.
Originally we just wanted to make a git hosting site.In fact, that was the first tagline.
git repository hostinggit repository hosting.That’s what we wanted to do: give us and our friends a place to share git rep...
a brief  historylet’s start with a brief history
It’s not easy to setup a git repository. It never was.But back in 2007 I really wanted to.
I had seen Torvalds’ talk on YouTube about git.But it wasn’t really about git - it was more about distributed version cont...
CVS                                         is stupidBut when Torvalds says “CVS is stupid”
and so are                                                         you“and so are you,” the natural reaction for me is...
To start learning git.
At the time the biggest and best free hosting site was
Right after I had seen the Torvalds video, the god project was posted up on repo.or.czI was interested in the project so I...
Namely this guy, Tom Preston-Werner.Seen here in his famous “I put ketchup on my ketchup” shirt.
I managed to make a few contributions to god before realizing that was not different.git was not different.Just...
This is what I always imagined.No rules. Project belongs to you, not the site. Share, fork, change - do what you want.Give...
So, we set off to create our own site.A git hub - learning, code hosting, etc.
We started with the code browsing and commit viewing...
But once we added the current version of the dashboard, we knew this was different.
And eventually “git repository hosting” gave way to “social coding”
Unleash Your Code                               Join 500,000 coders with                              over 1,500,000 repos...
2007 octoberThe first commit was on a Friday night in October, around 10pm.
2008 januaryWe launched the beta in January at Steff’s on 2nd street in San Francisco’s SOMA district.The first non-github ...
2008 aprilA few short months after that we launched to the public.
Along the way we managed to pick up Scott Chacon, our VP of R&D
Tekkub, our level 80 support druid
Melissa Severini, who keeps us all in check
Kyle Neath, who makes the site pretty
Ryan Tomayko, who helps keep the site running smoothly.
Zach Holman, head of enterprise
Rick Olson, Rails extraordinaire
Eston Bond, Design Generalissimo
Corey Donohoe, Director of Shipology
And Brian Lopez, our bleeding edge cowboy
Oh yeah, and the other founders: PJ and Tom.
github.comThat’s where we’re at today.So let’s talk about the technical details of the website:
.com as opposed to fi, which I’m not going to get into today.You’ll have to invite PJ out if you want to hear about that.
We also have a store
A job board
And do git training
the web siteAs everyone knows, a web “site” is really a bunch of different components.Some of them generate and deliver HT...
rails              , Gist, etc                            1
resque                                                             #Background processing, 50ish different job types curre...
smoke                                     #All git calls happen over the wire                                         3
utils                                             #Exception logging, stats, helper apps, etc                             ...
railsWe use Ruby on Rails 2.2.2 as our web framework.It’s kept up to date with all the security patches and includes custo...
railsGitHub is about 20,000 lines of Rails code, not counting Rails itself, plugins, or gems.
We found out Rails was moving to GitHub in March 2008, after we had reached out tothem and they had turned us down.So it w...
rails pluginsWe currently have 27 Rails plugins installed, and that number is always changing.
shopify / active_merchant
lgn21st / s3_swf_upload
technoweenie /serialized_attributes
rubygemsGitHub depends on about 50 RubyGems
rackOne of the big features in Rails 2.3 is Rack support.
We badly wanted this, but didn’t want to invest the time upgrading.So using a few open source libraries we’ve wrapped our ...
Now we can use awesome Rack middleware like Rack::Bug in GitHub
Coders created and submitted dozens of Rack middleware for the Coderack competition last year.I was a judge so I got the s...
nerdEd / rack-validate
webficient / rack-tidy
talison / rack-mobile-detectsets the X_MOBILE_DEVICE header to the mobile device, ifrecognized
unicornWe use unicorn as our application server- master / worker- 16 workers- preforking
unicorn- instant restart after kill- hard 30s request timeouts- control ram growth
unicorn- 0 downtime deploys- protects against bad rails startup- migrations handled old fashioned way
nginxFor serving static content and slow clients, we use nginxnginx is pretty much the greatest http server everit’s simpl...
nginxLimit ZoneLimit simultaneous connections from a client
nginxLimit RequestsLimit frequency of connections from a clientAnti-DDOS
nginxI see many people using Rack to do what the Limit modules do.Don’t.
nginxmemcachedmemcached supportcan serve directly from memcached
nginxPush Modulecomet!
gitThe next major part of GitHub is git
gritWe wrote an open source library called Gritwhich lets us use git from Ruby
mojombo / grityou can get it hereit originally shelled out to git and just parsed the responses.which worked well for a lo... we realized, however, that can be 100 times faster
gritsystem()Than shelling out
One of the first things Scott worked on was rewriting the core parts of Gritto be pure RubyBasically a Ruby implementation ...
mojombo / gritAnd that’s what we run now
smokeKinda.Eventually we needed to move of our git repositories off of our web serversToday our HTTP servers are distinct ...
smoke“Grit in the cloud”Instead of reading and writing from the disk, Grit makes Smoke callsThe reading and writing then h...
bert-rpcRather than use Protocol Buffers or Thrift or JSON-RPC, Smoke uses BERT-RPC
bert-rpcbert : erlang ::json : javascriptBERT is an erlang-based protocolBERT-RPC is really great at dealing with large bi...
bert-rpcwe have four file servers, each running bert-rpc serversour front ends and job queue make RPC calls to the backend ...
mojombo / bertrpcYou can grab bert-rpc on GitHub
mojombo / bertOr if you just want to play with BERT
chimneyWe have a proprietary library called chimneyIt routes the smoke. I know, don’t blame me.
chimneyAll user routes are kept in RedisChimney is how our BERT-RPC clients know which server to hitIt falls back to a loc...
chimneyIt can also be told a backend is down.Optimized for connection refused but in reality that wasn’t the real problem ...
proxymachineAll anonymous git clones hit the front end machinesthe git-daemon connects to proxymachine, which uses chimney...
mojombo / proxymachineproxymachine can be used to proxy any kind of tcp connectionopen source
sshSometimes you need to access a repository over sshIn those instances, you ssh to an fe and we tunnel your connection to...
node.jsdownloadshttp => https <img>
node.jsdownloadshttp => https <img>event streams
jobsWe do a lot of work in the background at GitHub
resqueCurrently we use a system called Resque.
defunkt / resqueYou can grab it on GitHub
resque-   dealing with pushes-   web hooks-   creating events in the database-   generating GitHub Pages-   clearing & war...
queuesIn Resque, a queue is used as both a priority and a localization techniqueBy localization I mean, “where your worker...
queuescritical,high,lowthese three run on our front end serversResque processes them in this order
queuespageGitHub Pages are generated on their own machine using the `page` queue
queuesarchiveAnd tarball and zip downloads are created on the fly using the `archive` queueon our archiving machines
searchOn GitHub, you can search code, repositories, and people
solrSolr is basically an HTTP interface on top of Lucene. This makes it pretty simpleto use in your code.We use solr becau...
Here I am searching for my name in source code
solrWe’ve had some problems making it stable but luckily the guys at Pivotalhave given us some tipsLike bumping the Java h...
databaseOur database story is pretty uninteresting
mysqlWe use mysql 5
master / slaveAll reads and writes go to the masterWe use the slave for backups and failover
cachingOn the site we do a ton of cachingusing memcached
fragmentsWe cache chunks of HTML all overUsually they are invalidated by some action
fragmentsFormerly we invalidated most of our fragments using a generation scheme,where you put a number into a bunch of re...
fragmentsBut we had high cache eviction due to low ram and hardware constraints, and foundthat scheme did more harm than g...
pageWe cache entire pages using nginx’s memcached moduleLots of HTML, but also other data which gets hit a lot and changes...
page- network graph json- participation graph dataAlways looking to stick more into page caches
objectWe do basic object caching of ActiveRecord objects such asrepositories and users all over the placeCaches are invali...
associationsWe also cache associations as arrays of IDsGrab the array, then do a get_multi on its contents to get a list o...
walkerWe also have a proprietary caching library called Walker
walkerIt originally walked trees and cached them when someone pushedBut now it caches everything related to git:
walker-   commits-   diffs-   commit listing-   branches-   tags-   everything
Every git-related page load hits Walker a lot
walkerFor most big apps, you need to write a caching layerthat knows your business domainGeneric, catch-all caching librar...
eventsAn example of this is our events system
This is one fragment
Each of these is a fragment
They’re also cached as objects
As well as a list of ids
And that’s just for the dashboard...
optimizationsSo what other optimizations have we done
asset serversWell we do the common trick of serving assets from multiple subdomains
asset serversassets0.github.comassets1.github.comand so forth
sha asset idInstead of using timestamps for asset ids, which may end up hitting the diskmultiple times on each request, we...
sha asset id/css/bundle.css?197d742e9fdec3f7/js/bundle.js?197d742e9fdec3f7Now simple code changes won’t force everyone to ...
bundlingFor bundling itself, we use
bundlingyui’s compressor for css and
bundlinggoogle’s closure compiler for javascriptwe don’t use the most aggressive setting because it means changingyour jav...
scripty 301Again, for most of these tricks you need to really payattention to your app.One example is scriptaculous’ wiki
scripty 301When we changed our wiki URL structure, we setup dynamic 301 redirectsfor the old urls.Scriptaculous’ old wiki ...
ajax loadingWe also load data in via ajax in many places.Sometimes a piece of information will just take too long to retri...
If Walker sees that it doesn’t have all the information it needs, it kicks off a jobto stick that information in memcached.
We then periodically hit a URL which checks if the information is in memcached or not.If it is, we get it and rewrite the ...
We use this same trick on the Network Graph
Fork Queue
ajax loadingand anywhere else it makes sense.
comet loadingvery soon this will all be comet, though
monitoringwhat do we use for monitoring?
nagiosOur support team monitors the health of our machines and coreservices using nagios.I don’t really touch the thing.
Here’s a screenshot from my IE browser, complete with the ICQ plugin
resque webWe monitor our queue using Resque’s included Sinatra app
haystackWe use an in-house app called Haystack to monitor arbitrary information,tracked as JSON.
Here’s an example of Haystack’s “exceptions” view
collectdWe also use collectd to monitor load, RAM usage, CPU usage, and otherapp-related metrics
pingdompingdom sends us SMSes when the site is downit’s nice
tendertender is what we use for customer support
it works incredibly well, and they’re constantly improving it
testingOur testing setup is pretty standard
test unitWe mostly use Ruby’s test/unit.We’ve experimented with other libraries including test/spec, shoulda, and RSpec, b...
git fixturesAs many of our fixtures are git repositories, we specify in the test what shawe expect to be the HEAD of that fix...
machinistWe use machinist for our fixtures
notahat / machinist
running_manGives us setup_onceUse it to cache machinist fixtures on a per-test-class basis
technoweenie / running_man
ci joeWe use ci joe, a continuous integration server, to run on tests after each push.He then notifies us if the tests fail.
defunkt / cijoeYou can grab him at github
stagingWe also always deploy the current branch to stagingThis means you can be working on your branch, someone else can b...
security a security page really helps
security@github.comwe get weekly emails to our security email (that people find on the security page)and people are always ...
regular auditsif you can, find a security consultant to poke your site for XSS vulnerabilitieshaving your target audience b...
24/7 monitoring24/7 monitoring is cool too
backupsbackups are incredibly importantdon’t just make backups: ensure you can restore them, as well
sqlwe keep nightly, off-site backups of our sql databases
gitand the same for all our git repositories
the future
pull requests
...and more
Questions?thanks for coming
Thanks.thanks for coming
A job board
A job board
A job board
A job board
A job board
A job board
Upcoming SlideShare
Loading in …5

Published on

A job board

Published in: Technology
автоматический полив

станозолол цена