downContents down

  (AWS) Elastic File System (EFS)


Early in my experience with AWS I discovered EFS and made use of a persistent parallel file system as a tool for launching a clone of the LAM Alaska website on demand. I initially chose the us-west-2 Oregon region to operate in in 2017 and didn't use another region until 2024. By that time success was defined as the ability to launch a fully functional clone of the LAM Alaska website with data from backups created from the main website within the last 24 hours.

Amazon Storage Elastic File System (EFS) pricing

Amazon EFS is built to scale on demand to petabytes without disrupting applications, growing and shrinking automatically as you add and remove files. You pay for usage that includes storage, reads, writes, storage class transitions based on the parameters of the filesystem.

The pricing figures used are for the us-west-2 Oregon region which is my main region and one of the least expensive regions for AWS resources.

Standard, General Purpose, Bursting EFS, as was available in 2017

EFS Can be accessed by 1 to 1000s of EC2 instances concurrently. A default Regional EFS can be accessed by instances in multiple AZs. EFS can be used as network file system for on-premise servers too using AWS Direct Connect.

One Zone and Lifecycle Managed EFS storage classes

Since I first started using EFS AWS has added One Zone and EFS storage classes. There are also Performance and Throughput mode options that may not have been available in 2017 when I created my first EFS.

After investigation into the new EFS options I decided that the Zz backup if moved to EFS from EBS would cost me less. I did testing and forced almost all to go to the Infrequent Access Storage Class and tested that the daily backup does not bring unchanged target files back online and that access from the web interface is still an acceptable latency. I was surprised at the cost of Elastic Throughput reads and writes to the One Zone Lifecycle Managed EFS which is significant and could really cost me if access isn't really infrequent.

Although storage prices can be reduced by One Zone and Lifecycle Managed EFS storage classes the charges for throughput and data access could easily exceed any saving if access is to frequent.

Regional EFS with Elastic Throughput and Lifecycle Management

Standard storage cost is the same as in an EFS without Lifecycle Management but the Infrequent Access storage class can save nearly 95% and Archive storage class even more but Reads and Tiering costs could eat up the savings along with the minimum billable file size and minimum storage duration.

Storage

Throughput and data access

One Zone EFS with Elastic Throughput and Lifecycle Management

Standard storage in a One Zone EFS is the smae price as Infrequent Access in a Regional EFS

Storage

Throughput and data access

The (AWS) Elastic File System (EFS) metadata directory is obsolete

As of August 2024 the aws-efs-mount.bash script replaces the (AWS) Elastic File System (EFS) metadata directory method used when I was addressing the IPv6 only workaround for git before putting my public git repos on GitLab that can be accessed with IPv6 as well as IPv4.

Although I no longer use the metadata directory for initialization it is still the published location for this page.

Mount the LAM aws-efs for the region and Availability Zone the instance was launched in

AWS LAM VPC Elastic File Systems

In 2025 I expanded to three filesystems for each instance (where supported):

All 32 REGIONs enabled for my AWS account, as of January 2025, support Standard and Lifecycle Management EFS but a number of Availability Zones including all for 11 REGIONs do not yet support One Zone Lifecycle Managed EFS. I have detailed those with No One Zone EFS support in the AWS Availability Zone subnet id page I created while testing One Zone EFS for every Availability Zone that supports it.

In 2017 when I created the EFS in the us-west-2 Oregon region it's usage in my LAM Alaska Clone project was based on the $0.30 / GB Month pricing which was the most expensive storage available but was extremely convenient. EFS is mounted on Linux systems as a Network File System (NFS) with the ability to be accessed by thousands of instances and I don't have to manage the server. It is persistent storage that doesn't require a running instance and although expensive I only pay for storage used form the very large capacity available.

The systems using Lifecycle Management allow simple policies that can move data between standard storage and the new Infrequent Access (IA) and Archive storage classes that are less expensive than Standard storage. The One Zone EFS is also cheaper than standard storage and is available to instances in a single Availability Zone instead of all in the region and includes the IA storage class for the cheapest still.

I use "cloud-init query" to retrieve the REGION and Availability_Zone and then the aws-efs-mount.bash script to mount the filesystem.

The cloud-init section to determine the region this instance was launched in and mount the EFS:

 - echo
 - echo 'AWS LAM Get Availability_Zone from cloud-init values'
 - export Availability_Zone=$(cloud-init query availability-zone)
 - echo
 - echo 'AWS LAM Adding nfs4 mount to AWS LAM VPC Elastic File System'
 - export REGION=$(cloud-init query region)
 - /var/www/aws/aws-efs-mount.bash

The mount EFS section has been included in my cloud-init files since I became dependent on it's availability and ease of use but until 2024 when I started expanding to other regions the file system name was hard coded. When a EFS is created within a region a unique file system ID is assigned and a DNS name like <file system ID>.efs.<region>.amazonaws.com is assigned but is only usable in the VPC where the EFS was created and by default is accessible only in that region.

During the process of modifying my scripts to work in any region I reduced the EFS dependency to the ubuntu.tgz and ec2-user.tgz user resource files. These files include some private resources including aws and duckdns credentials specific to an AWS LAM clone.

Many of the other dependencies can be accessed from the us-west-2 Oregon EFS using the aws scp connectivity I set up during initialization. The aws scp connectivity requires credentials which are in the ubuntu.tgz and ec2-user.tgz user resource files so mounting the EFS for the region is a requirement for accessing the us-west-2 Oregon EFS using the aws scp connectivity.

Create locate database for nfs mounted efs

The find and locate commands are two powerful tools to search for one or more particular files

The find command searches for files in real time, meaning that it will crawl the specified directory for your search query when you execute it. The find command works well on fast local filesystems. It can be real slow on large remote filesystems.

The locate command is fast because it uses a database which is optimized for the locate searches. It's main limitation is that it is dependent on the database and only reports on files that were present the last time the database was updated. It also can’t perform as granular of a search as find, as it simply matches files based on their name, although it does accept complicated syntax such as regex.

Both the find and locate commands have considerations and constraints when searching a remote filesystem. Since an EFS is mounted using NFS and can be of massive size, how to search it is worth consideration. I have decided to support searching the AWS EFS using locate and custom database files for the specific filesystems mounted.

A NFS mounted filesystem is not included in a locate database by default.

There are some good reasons that NFS mounted filesystems are not included in the locate database. One is that this very intensive IO operation could slam the NFS server and/or use a lot of the network bandwidth. A second very related reason is that the NFS mounted filesystem is usually mounted on many remote systems. This is the main reason for setting up a NFS server. Bad as it might be to have one remote system add the entries for all the files on a remote filesystem in it's local database it is worse to have multiple systems do this.

Create/Update locate database upon initialization

Support both plocate and mlocate utilities

Amazon Linux 2023 and Amazon Linux 2 are still using mlocate. Ubuntu 24.04 Noble, 22.04 Jammy, and Debian all use the newer plocate utility.

Depending on the Operating System and which locate utility is installed one of the following sets of programs is used during initialization.

The Create-update-efs-{p,m}locate-db.bash scripts also create a /etc/profile.d/plocate.sh script that is sourced to include the {p,m}locate-db files in the LOCATE_PATH for all users.

A long running instance such as my main aws instance can include the appropriate scripts in a daily-bakup job or in the cron.daily job to have the databases rebuilt daily.

An EFS by default can be accessed only from the region it is created in

By default EC2 instances running in multiple Availability Zones within the same AWS Region can access the Elastic File System (EFS) for that region. An EFS can be created to serve only One Availability Zone but by default a mount target is created for each Availability Zone within the region.

A Domain Name Service (DNS) name is assigned for each EFS which resolves to the IP address of the EFS mount target in the same Availability Zone as the EC2 instance querying the DNS. Connectivity and DNS Name resolution is only available on the private IPv4 subnets of the VPC.

I have decided not to use AWS Backup, EFS Replication or encryption on the filesystems I am creating in additional regions. I am not sure any of these features were available when I created the Oregon EFS years ago. The Lifecycle Management and One Zone features were introduced since I started using EFS.

Create an EFS in a region

Populate EFS with the ubuntu.tgz and ec2-user.tgz user resource files

alias ll='ls -lF --time-style=long-iso'
ll -daRh /mnt/efs{,/*,/*/*}
ll -daRh /mnt/efs{,/aws-lam1-ubuntu{,/*},/Amazon-Linux-2023{,/*}}
ll -daRh /mnt/efs/aws-lam1-ubuntu/ubuntu.t* /mnt/efs/Amazon-Linux-2023/ec2-user.t*
sudo chown ubuntu:ubuntu /mnt/efs
sudo chown ec2-user:ec2-user /mnt/efs
sudo chown admin:admin /mnt/efs
sudo chown lam:staff /mnt/efs{,/aws-lam1-ubuntu{,/*},/Amazon-Linux-2023{,/*}}
mkdir /mnt/efs/aws-lam1-ubuntu
scp -p lam@aws:/mnt/efs/aws-lam1-ubuntu/ubuntu.t* \
/mnt/efs/aws-lam1-ubuntu
mkdir /mnt/efs/Amazon-Linux-2023
scp -p lam@aws:/mnt/efs/Amazon-Linux-2023/ec2-user.t* \
/mnt/efs/Amazon-Linux-2023

For every region other than the us-west-2 Oregon region only the ubuntu.t* and ec2-user.t* archives are installed. The instance may update the mlocate.db or plocate.db files depending on which version of locate is used by the Operating System of the instance. This only uses about 100K per region of aws-efs which should incur no monthly charges.

The original us-west-2 Oregon EFS is backed up to my main server

As of 2024 I am back up to 1 G of aws efs storage in the us-west-2 Oregon region costing me $0.30 per or $3.60 per year. I also have a copy of most files in the aws s3://lamurakami bucket also in the us-west-2 Oregon region costing me an additional $0.28 per year. In March and April of 2024 I had begun eliminating the efs copies but in April of 2024 I paused this when I realized I was using copies on the Bk2 backup of aws efs for initialization of qemu instances on the LAM Alaska LAN.

Files in this backup are used by QEMU (Quick Emulator) (software)|QEMU instances on the LAM Alaska LAN during initialization.

Log