Simple Storage Service (S3)

  1. Simple scalable key value Object storage on cloud
  2. Limits
    1. The total volume of data and number of objects you can store are unlimited.
    2. Individual Amazon S3 objects can range in size from a minimum of 0 bytes to a maximum of 5 terabytes.
    3. The largest object that can be uploaded in a single PUT is 5 gigabytes.
    4. For objects larger than 100 megabytes, use Multipart Upload preferably.
    5. Multi-Object Delete API: delete upto 1000 objects: single HTTP request.
  3. Objects are stored in Buckets as key value pairs
    1. Bucket has unique global name
    3. arn:aws:s3:::bucket/resourcekey
    4. key: Folder1/hello.html value: the content of that file
    5. Also objects contain metadata and optional version number
  4. Data Consistency
    1. Read after Write consistency for new PUTs (Immediate)
    2. Eventual consistency for DELETE and Overwrite PUT (may take some time)
  5. The data are stored lexicographical/sorted alphabetically
    1. For performance, save objects of random names (add salt before filename if its based on timestamp)
  6. Tiered Storage
    S3 storage classes comparision

    1. Standard
      1. Availability 99.99% (4 nines)
      2. Durability 99.999999999 (11 nines)
    2. Standard-IA Infrequent access
      1. Cheaper than S3 standard but retrieval fee is charged
      2. 99.9 (3 nines) availability
    3. Single Zone -IA or Zone infrequent access (Released April 2018)
      1. Single zone only. No redundancy
      2. 20% cheaper than Standard-IA
      3. Use case: Store reproducible, infrequently accessed data. Example: second or third backup copies for compliance sake.
    4. Reduced Redundancy Storage
      1. Availability 99.99% (4 nines)
      2. Durability 99.99% also
    5. Glacier class
      1. Cheap but takes 4 hours to retrieve
      2. Use “Bulk retrieval” for cheaper cost
      3. Use expedited retrieval for fast retrievals
  7. Life-cycle policies
    1. Specify rules to move across storage classes at specified age and then finally delete.
  8. Versioning
    1. Total data storage across all versions is billed
    2. Once enabled you cannot disable versioning. You can suspend it for future updates. If you want to turn versioning off, you need to delete the bucket and recreate (version id)
    3. Once you delete the delete marker, you can get the file back that you have deleted while versioning on
  9. Access Control Lists
    1. S3 ACLs is a legacy access control mechanism that predates IAM. However, if you already use S3 ACLs and you find them sufficient, there is no need to change. As a general rule, AWS recommends using S3 bucket policies or IAM policies for access control.
    2. An S3 ACL is a sub-resource that’s attached to every S3 bucket and object. It defines which AWS accounts or groups are granted access and the type of access. When you create a bucket or an object, Amazon S3 creates a default ACL that grants the resource owner full control over the resource.
  10. Bucket policies
      1. Use IAM policies if:
        1. You need to control access to AWS services other than S3. IAM policies will be easier to manage since you can centrally manage all of your permissions in IAM, instead of spreading them between IAM and S3.
        2.  You have numerous S3 buckets each with different permissions requirements. IAM policies will be easier to manage since you don’t have to define a large number of S3 bucket policies and can instead rely on fewer, more detailed IAM policies.
        3. You prefer to keep access control policies in the IAM environment.
      2. Use S3 bucket policies if:
        1. You want a simple way to grant cross-account access to your S3 environment, without using IAM roles.
        2. Your IAM policies bump up against the size limit (up to 2 kb for users, 5 kb for groups, and 10 kb for roles). S3 supports bucket policies of up 20 kb.
        3. You prefer to keep access control policies in the S3 environment.
        4. Make it public
  11. S3 is AWS object storage service on the cloud. Lets you store key/value pairs (bucket name, filename is key the content of the object/file is value)
  12. S3 access is global but a bucket will need a region
  13. Encryption
    1. Client side encryption
    2. Server Side encryption
      1. SSE-S3 using S3 managed Keys
      2. SSE-KMS using KMS keys
      3. SSE-C using client provided keys
  14. Security
    1. Control access to a bucket using bucket ACL or bucket policy
  15. All buckets and objects are pvt by default
  16. Two ways to stop people from accidentally delete objects
    1. Enable versioning
    2. Enable MFA delete
  17. Cross region replication
    1. You need to first turn on versioning
    2. Then goto Management and choose cross region replication
    3. create rule to replicate all or some objects to a destination bucket.
    4. You can specify a different storage class for the replication target bucket
    5. Only new objects (not the existing ones) are replicated
  18. S3 transfer acceleration
    1. Lets you copy files to cloud front edge location as opposed to directly copying to s3 bucket thus saving time/latency since the edge location is closer to you than the S3 bucket
  19. Static website hosting on S3
    1. Create a bucket whose name is same as your domain name (without .com)
    2. Go to static website hosting and enable
    3. Grant public read access
    4. URL will be where region can be us-east-1 etc.
  20. S3 is global but buckets reside in regions. But no need to provide region in url or arn since they are globally unique
  21. Requester Pays Option: Can be used to pass on request/transfer costs to another AWS account
  22. Events:
    1. The bucket owner (or others, as permitted by an IAM policy) can arrange for notifications to be issued to Amazon Simple Queue Service (SQS) or Amazon Simple Notification Service (SNS) when a new object is added to the bucket or an existing object is overwritten. Notifications can also be delivered to AWS Lambda for processing by a Lambda function.
    2. Following events are supported:  s3:ObjectCreated:Puts3:ObjectCreated:Post , s3:ObjectCreated:Copy, s3:ObjectCreated:CompleteMultipartUpload, s3:ObjectCreated:*,s3:ReducedRedundancyObjectLost.
    3. Each notification is delivered as a JSON object with the following fields: Region, Timestamp, Event Type (as listed above), Request, Actor, Principal ID, Source IP of the request, Request ID, Host ID, Notification Configuration  Destination ID, Bucket Name, Bucket ARN, Bucket Owner Principal ID, Object Key, Object Size, Object ETag, Object Version ID (if versioning is enabled on the bucket).
    4. Notifications are delivered to the target in well under a second.
    5. Cost – There is no charge for this feature.
    6. Regions – The bucket and the target must reside in the same AWS Region.
  23. Optimizing S3 performance: If you consistently exceed 100+ PUT/DELETEs or 300+ GETS, you should optimize your S3.
    1. For GET only performance use CloudFront
    2. For PUT/DELETE performance use a hexadecimal hash as the prefix. This will force S3 to use different bucket partitions which will enhance performance
      1. examplebucket/232a2013-26-05-15-00-00/cust123423/photo1.jpg
      2. examplebucket/7b542013-26-05-15-00-00/cust385742/photo2.jpg
      3. examplebucket/921c2013-26-05-15-00-00/cust124843/photo2.jpg
      4. examplebucket/ba652013-26-05-15-00-00/cust874937/photo2.jpg
  24. S3 price – charged for Storage, number of requests, data transfer (tiered so more you use less charge)
  25. Bucket name has to be all lowercase letters
  26. Individual objects inside the same bucket can have different storage class  and you can turn on server side encryption at object level.
  27. Links
    1. A example bucket link
    2. URL for bucket with Static website hosting:
  28. you can turn on SSL  https with cloudfront
  29. S3 access
    1. Every non-anonymous request to S3 must contain authentication information to establish the identity of the principal making the request. In REST, this is done by first putting the headers in a canonical format, then signing the headers using your AWS Secret Access Key.
    2. You can use  pre-signed urls
  30. Amazon S3 Select is a new (Apr 2018) capability
    1. designed to pull out only the data you need from an object, which can dramatically improve the performance and reduce the cost of applications that need to access data in S3.
    2. In the past most applications have to retrieve the entire object and then filter out only the required data for further analysis.
    3. Now S3 Select enables applications to offload the heavy lifting of filtering and accessing data inside objects to the Amazon S3 service.
    4. By reducing the volume of data that has to be loaded and processed by your applications, S3 Select can improve the performance of most applications that frequently access data from S3 by up to 400%.
    5. You can use S3 Select from the AWS SDK for Java, AWS SDK for Python, and AWS CLI.
    6. Use SELECT command as opposed to GET command
  31. Amazon Athena
    1. is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL expressions.
    2. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries you run.
    3. Athena is easy to use. Simply point to your data in Amazon S3, define the schema, and start querying using standard SQL expressions.
<<< Network Address Translation (NAT) Instances, NAT Gateways, Egress only Internet Gateways and Bastion HostsCloud Watch >>>
Copyright 2005-2016 KnowledgeHills. Privacy Policy. Contact .