The Portfolio Website’s Stack
As the beginning blog of this website, I felt it would be appropriate to discuss how this website works and the technology under the hood.
The main requirement is a low-cost, scalable and highly-available website, without being too complex to manage. Therefore, a ‘serverless’ solution was chosen, where the heavy and complex parts of the web infrastructure is managed by a cloud provider (AWS).
As a serverless and scalable website, it uses the following technology stack:
- Front-end:
- HTML
- CSS
- Vanilla Javascript
- Bootstrap Framework
- Back-end:
- AWS Simple Storage Service (S3) Static Web Hosting
- AWS Simple Storage Service (S3) Static Web Hosting
- Content Management System and Static Site Generator:
- Jekyll
- Jekyll
- Content Delivery Network (CDN) and Security:
- AWS CloudFront
- AWS Shield
- Web Security:
- AWS Lambda @ Edge
- AWS Lambda @ Edge
- Domain Registrar and Domain Name System (DNS) Management:
- AWS Route 53
- AWS Route 53
- SSL Certificate Manager:
- AWS Certificate Manager
- AWS Certificate Manager
- Continuous Delivery/Continuous Integration (CI/CD) and Infrastructure-as-Code (IaC):
- AWS Cloud Development Kit (CDK)
- AWS Cloud Development Kit (CDK)
- Traffic and Service Analytics:
- Google Analytics
- Google Analytics
Serverless highly-available and scalable hosting
AWS S3
The website uses a simple static web page framework - no dynamic routing or single-page application framework is used. Therefore, AWS S3 Static Web Hosting is used.
As a serverless cloud service, AWS S3 provides scalable object-storage, as well as scalable access to content stored in it. There’s also requirement to manage bandwidth and server capacity!
AWS S3 has a SLA of 99.99999999999% (also known as the ‘eleven nines’).
Best of all, it is pay-as-you-use, with no upfront costs.
AWS Certificate Manager
AWS manages the issuing and renewal of Public SSL Certificates for CloudFront distributions (free of charge) to ensure authenticity of the domain website, and allows secure HTTPS to access content from the website.
AWS CloudFront
AWS CloudFront, a Content Delivery Network (CDN) is used to:
-
ensure HTTPS secure access to content (will redirect HTTP connections to HTTPS)
-
protect against attacks, such as distributed denial of service (DDoS) attacks - provided by AWS Shield (free for all AWS accounts)
-
improve load times, as the content is cached in AWS Edge Network locations globally (over 200 locations worldwide)
-
manages the geographical routing and load balancing. Depending on where the user is, CloudFront will automatically route to the user’s nearest edge location cache to minimise load times.
As a highly-available managed service, I don’t need to manage the global Edge Network, the actual propagation or caching. I just need to select an AWS S3 bucket as the Origin and the data is automagically globally cached and propagated!
CloudFront caches the static data for 24 hours so if a user visits the site with the 24 hour period, content will be served from the edge location, rather than the origin. Fortunately, AWS does not charge for data transfer between AWS S3 and Amazon CloudFront.
Lambda@Edge for Security
Lambda@Edge is a feature of CloudFront that allows you to run serverless functions (Lambda functions) at the edge locations where your CloudFront distribution is.
It is an event-driven function - i.e. only run when a user visits the website and hits the cloudfront cache. Best of all, there is no server administration required, as the code is automatically distributed to the edge locations.
If a user hits, for example, the Irish edge location, the function code sitting in that location will be invoked.
I created a Lambda@Edge function that adds the below security-related HTTP headers every time a response is sent back to the user.
What HTTP headers are added?
Whenever a user visits a website, the user’s web browser requests a web page, and the server responds with the content along with HTTP headers. Lambda@Edge is used to add special types of HTTP headers (i.e. security headers).
HTTP headers are needed because, by default, web browsers are very trusting - they just load anything that is sent. This makes users vulnerable to malicious attacks, such as cross-site scripting (XSS) and Clickjacking.
The main HTTP Security Headers are added/enforced as part of my Lambda@Edge implementation:
-
Content Security Policy (CSP) - prevents injection-based attacks (e.g. Cross Site Scripting). Basically only allows whitelisted sources load CSS, images, Javascript, etc.
-
HTTP Strict Transport Security (HSTS) - forces the web browser to only connect via HTTPS
-
X-Content-Type-Options - forces the web browser not to load scripts and stylesheets unless the server indicates the correct MIME type
-
X-Frame-Options - prevents Clickjacking, so the user is protected from clicking on invisible iframes on the page
-
X-XSS-Protection - forces the web browser to stops loading pages when they detect reflected cross-site scripting (XSS) attacks
-
Referrer-Policy - controls how much referrer information (i.e. the user’s originating website) is sent to the web server from the web browser
You can use the Mozilla Observatory to see how this website implements the above.
Domain Name System (DNS) and SSL Certificate Management
AWS Route 53 and AWS Certificate Manager DNS routing and SSL Certificate management services integrate nicely with the above AWS S3 and CloudFront implementation.
Being integrated, there is no/very little extra cost to use these services.
With high-availability I don’t need to manage the underlying DNS servers. Furthermore, being integrated, I can create DNS records to route to CloudFront and S3 resources via its AWS alias, rather than the underlying IP addresses.
That way, I don’t need to manage routing parameters or routing tables that handle the underlying DNS or IP addresses of the CloudFront and S3 endpoints. Pretty neat!
Furthermore, AWS will act as registrar to and register and maintain your domain registration with a top-level domain (such as .com).
CI/CD - AWS Cloud Development Kit (CDK)
CDK is an infrastructure-as-code framework that allows developers to programmatically provision AWS infrastructure (via TypeScript, Python, etc.).
It is open-source and allows me to manage infrastructure that would otherwise be hundreds of lines of code. CDK uses high-level object-oriented programming to create abstraction of AWS resources so it becomes logically much easier to deal with resources.
Under the hood, CDK will compile the code into a CloudFormation template and apply it. This allows other non-CDK users to also see the status of deployments via the CloudFormation web UI.
For example, in Typescript, this is how I would create a Lambda function and a S3 bucket (including assuming the relevant IAM roles to it):
const lambda = new lambda.Function(this, 'Lambda', { /* ... */ });
const bucket = new Bucket(this, 'MyBucket');
/* This grants the relevant IAM roles for reading and writing
a S3 to the Lambda function
*/
bucket.grantReadWrite(lambda);
CDK is definitely a time-saver - 200 lines of CDK code is equivalent to up to 1,000 lines of CloudFormation template code!
Google Analytics
Google Analytics (GA) is a free tool used to analyse traffic and behaviour to the website, providing an aggregated and anonymised data to ensure compliance with privacy laws.
The main use-case for Google Analytics in this website is to analyse user behaviour. That is, data collected using a user’s session on the website, such as:
- How long they stayed on each webpage
- Which pages they visited
- How often do new visitors come to the website
- The common ‘pathways’ in which they traverse the website
- Whether they are accessing the website through desktop vs mobile device
This web analytics is essential to ensure the website is optimised for the most common use cases. For example, as part of responsive web design, the website should be optimised for viewing on a mobile device as well.
Tracking is kept to a minimum and therefore the website does not opt-in to tracking of more specific details, such as demographics and interests (e.g. age and gender).
As a note, this website’s privacy policy is accessible here.
Google Analytics Dashboards
A ‘one-stop shop’ approach can be taken with Google Analytics metrics - you can create your own custom dashboards. GA is a service that gets you quicker to insights, without having to worry about data collection, web logging, data aggregations and databases.
Costings
Putting it all together, the total costs per month for running this blog work out to be less than US$1/month.
Note that the CloudFront Distribution uses ‘Price Class ALL’ (ie all edge locations).However, for simplicity purposes, only the most expensive edge location pricing is used.
Edge Location | India | Australia |
---|---|---|
Route 53 | ||
DNS Hosted Zone | US$0.50 | US$0.50 |
DNS Standard Queries to CloudFront is Free | US$0.00 | US$0.00 |
AWS Certificate Manager | ||
SSL Cert for CloudFront is free | US$0.00 | US$0.00 |
AWS Shield | ||
Standard is free | US$0.00 | US$0.00 |
CloudFront | ||
Region 'Transfer Out' ie serving to visitors First 10TB / Month |
US$0.170 /GB |
US$0.114 /GB |
HTTPS requests | $0.0120 /10,000 requests |
$0.0125 /10,000 requests |
First 1,000 Cache Invalidations free | US$0.00 | US$0.00 |
Transfer between S3 Origin and CloudFront Free | US$0.00 | US$0.00 |
Lambda@Edge | ||
Invocation (Global) | US$0.0000006 /request |
US$0.0000006 /request |
Compute @ Edge (128mb 3 seconds) |
US$0.00000625125 /second |
US$0.00000625125 /second |
S3 | ||
First 50 TB / Month | US$0.025 /GB |
US$0.025 /GB |
GET requests | US$0.00044 /1,000 requests |
US$0.00044 /1,000 requests |
PUT requests | US$0.0055 /1,000 requests |
US$0.0055 /1,000 requests |
Therefore, costs will be (for most expensive region):
Using May 2020 web traffic (GA report) | 1,500 visits |
Cache Site Content Size (CloudFront report) | 15 mb |
CloudFront Data Egress Costs | US$0.0026 |
HTTP requests | US$0.0019 |
Lambda@Edge | US$0.0103 |
S3 GET requests | US$0.0007 |
Route 53 Hosted Zone | US$0.5000 |
Total | US$0.5154 |
Closing Remarks
That was a short introduction to the underlying technology of this website, as well as the analytics done on top of it. I figured it is a good way to start a website on data blogging!