How to get most out of Windows Azure Tables

November 6, 2010, 9:13 am

≫ Next: Windows Azure Storage Client Library: CloudBlob.DownloadToFile() may not entirely overwrite file contents

Introduction

Windows Azure Storage is a scalable and durable cloud storage system in which applications can store data and access it from anywhere and at any time. Windows Azure Storage provides a rich set of data abstractions:

Windows Azure Blob – provides storage for large data items like file and allows you to associate metadata with it.
Windows Azure Drives – provides a durable NTFS volume for applications running in Windows Azure cloud.
Windows Azure Table – provides structured storage for maintaining service state.
Windows Azure Queue – provides asynchronous work dispatch to enable service communication.

This post will concentrate on Windows Azure Table, which supports massively scalable tables in the cloud. It can contain billions of entities and terabytes of data and the system will efficiently scale out automatically to meet the table’s traffic needs. However, the scale you can achieve depends on the schema you choose and the application’s access patterns. One of the goals of this post is to cover best practices, tips to follow and pitfalls to avoid that will allow your application to get the most out of the Table Storage.

Table Data Model

To those who are new to Windows Azure Table, we would like to start off with a quick description of the data model since it is a non-relational storage system; a few concepts are different from a conventional database system.

To store data in Windows Azure Storage, you would first need to get an account by signing up here with your live id. Once you have completed registration, you can create storage and hosted services. The storage service creation process will request a storage account name and this name becomes part of the host name you would use to access Windows Azure Storage. The host name for accessing Windows Azure Table is <accountName>.table.core.windows.net.

While creating the account you also get to choose the geo location in which the data will be stored. We recommend that you collocate it with your hosted services. This is important for a couple of reasons – 1) applications will have fast network access to your data, and 2) the bandwidth usage in the same geo location is not charged.

Once you have created a storage service account, you will receive two 512 bit secret keys called primary and secondary access keys. Any one of these secret keys is then used to authenticate user requests to the storage system by creating a HMAC SHA256 signature for the request. The signature is passed with each request to authenticate the user requests. The reason for the two access keys is that it allows you to regenerate keys by rotating between primary and secondary access keys in your existing live applications.

Using this storage account, you can create tables that store structured data. A Windows Azure table is analogous to a table in conventional database system in that it is a container for storing structured data. But an important differentiating factor is that it does not have a schema associated with it. If a fixed schema is required for an application, the application will have to enforce it at the application layer. A table is scoped by the storage account and a single account can have multiple tables.

The basic data item stored in a table is called entity. An entity is a collection of properties that are name value pairs. Each entity has 3 fixed properties called PartitionKey, RowKey and Timestamp. In addition to these, a user can store up to 252 additional properties in an entity. If we were to map this to concepts in a conventional database system, an entity is analogous to a row and property is analogous to a column. Figure 1 show the above described concepts in a picture and more details can be found in our documentation “Understanding the Table Service Data Model”.

Figure 1 Table Storage Concepts

Every entity has 3 fixed properties:

PartitionKey– The first key property of every table. The system uses this key to automatically distribute the table’s entities over many storage nodes.
RowKey– A second key property for the table. This is the unique ID of the entity within the partition it belongs to. The PartitionKey combined with the RowKey uniquely identifies an entity in a table. The combination also defines the single sort order that is provided today i.e. all entities are sorted (in ascending order) by (PartitionKey, RowKey).
Timestamp– Every entity has a version maintained by the system which is used for optimistic concurrency. Update and Delete requests by default send an ETag using the If-Match condition and the operation will fail if the timestamp sent in the If-Match header differs from the Timestamp property value on the server.

The PartitionKey and RowKey together form the clustered index for the table and by definition of a clustered index, results are sorted by <PartitionKey, RowKey>. The sort order is ascending.

Operations on Table

The following are the operations supported on tables

Create a table or entity
Retrieve a table or entity, with filters
Update an entity
Delete a table or entity
Entity Group Transactions - These are transactions across entities in the same partition in a single table

Note: We currently do not support Upsert (Insert an entity or Update it if it already exists). We recommend that an application issue an update/insert first depending on what has the highest probability to succeed in the scenario and handle an exception (Conflict or ResourceNotFound) appropriately. Supporting Upsert is in our feature request list.

For more details on each of the operations, please refer to the MSDN documentation. Windows Azure Table uses WCF Data Services to implement the OData protocol. The wire protocol is ATOM-Pub. We also provide a StorageClient library in the Windows Azure SDK that provides some convenience in handling continuation tokens for queries (See “Continuation Tokens” below) and retries for operations.

The schema used for a table is defined as a .NET class with the additional DataServiceKey attribute specified which informs WCF Data Services that our key is <PartitionKey, RowKey>. Also note that all public properties are sent over the wire as properties for the entity and stored in the table.

[DataServiceKey("PartitionKey", "RowKey")]public class Movie {/// Movie Category is the partition key public string PartitionKey { get; set; }/// Movie Title is the row key public string RowKey { get; set; }public DateTime Timestamp { get; set; }public int ReleaseYear { get; set; }public double Rating { get; set; }public string Language { get; set; }public bool Favorite { get; set; }
 }

An example wire protocol (ATOM Pub) for inserting an entity with the above “Movie” definition is:

<?xml version="1.0" encoding="utf-8" standalone="yes"?><entry xmlns:d="http://schemas.microsoft.com/ado/2007/08/dataservices" xmlns:m="http://schemas.microsoft.com/ado/2007/08/dataservices/metadata" xmlns="http://www.w3.org/2005/Atom"><title /><author><name /></author><updated>2010-10-16T15:48:53.0011614Z</updated><id /><content type="application/xml"><m:properties><d:Favorite m:type="Edm.Boolean">false</d:Favorite><d:Language>English</d:Language><d:PartitionKey>Action</d:PartitionKey><d:Rating m:type="Edm.Double">4.5</d:Rating><d:ReleaseYear m:type="Edm.Int32">2010</d:ReleaseYear><d:Revenue m:type="Edm.Double">0</d:Revenue><d:RowKey>Cop Out</d:RowKey><d:Timestamp m:type="Edm.DateTime">0001-01-01T00:00:00</d:Timestamp></m:properties></content></entry>

Creating Tables and Inserting/Updating Entities

Below is a snippet of C# code that illustrates how to create a table and insert and update entities.

// Connection string is of the format: // "DefaultEndpointsProtocol=http;AccountName=myaccount;AccountKey=mykey" CloudStorageAccount account = CloudStorageAccount.Parse(cxnString);CloudTableClient tableClient = account.CreateCloudTableClient();// Create Movie Table string tableName = "Movies";
tableClient.CreateTableIfNotExist(tableName);TableServiceContext context = tableClient.GetDataServiceContext();// Add movie object to the context context.AddObject(tableName, new Movie("Action", "White Water Rapids Survival"));// The object is not sent to the Table service until SaveChanges is // called. SaveChangesWithRetries wraps the SaveChanges but as the name // suggest, it also provides retries. context.SaveChangesWithRetries();// We should use a new DataServiceContext for this operation // but for brevity, we will skip this best practice in the code snippet // Query for action movies that are rated > 4 var q = (from movie in context.CreateQuery<Movie>(tableName) where movie.PartitionKey == "Action" && movie.Rating > 4.0select movie).AsTableServiceQuery<Movie>();// Make each of the movie that is returned in the result set my favorite // Using the AsTableServiceQuery extension above means that the below // iteration handles continuation tokens since this is not a single point query. // See Queries section for more details on query efficiency and continuation tokens. foreach (Movie movieToUpdate in q)
{
    movieToUpdate.Favorite = true;// This sets the entity to be updated in the // context and no request is sent until SaveChanges is called. This // issues an update with optimistic concurrency check. With the above query, // the context tracks this entity with the associated Etag value. The following // update will set the If-Match header such that entity is updated only if etag // matches with the entity representation on server. context.UpdateObject(movieToUpdate);
}// The batch SaveChangesOptions ensures atomic transaction for all updates context.SaveChangesWithRetries(SaveChangesOptions.Batch);

Options for saving changes

SaveChangesOptions has the following options:

None -This default option results in sending each pending CUD operation as an individual request to the service. If a single operation fails, an exception is thrown and remaining pending operations are not sent to the service.
Batch - This option results in sending all CUD operations currently tracked by the context as a single atomic transaction to the Windows Azure Table service (see Entity Group Transaction for more details and rules for a batch).
ContinueOnError - Results in sending each pending CUD operation as individual requests to the service. If an operation fails, the context records it and remaining operations continue to be sent to the service.
ReplaceOnUpdate - By default, all updates are sent as “Merge” requests i.e. if an update operation for an entity does not send a property, the property value is still retained since the update replaces only those property values that are sent in the request. ReplaceOnUpdate, replaces the entire entity and hence any missing properties on update are not stored any more. Hence, ReplaceOnUpdate can be used to remove properties for an entity.

Some of these options can be combined for a given SaveChanges. For example:

SaveChangesWithRetries(SaveChangesOptions.ContinueOnError | SaveChangesOptions.ReplaceOnUpdate) implies that each pending CUD operation is sent as an individual request to the server and all update operations result in the entity being replaced on the server end rather than merged.

However, some combinations are invalid. For example: SaveChangesWithRetries(SaveChangesOptions.ContinueOnError | SaveChangesOptions.Batch)

Note: SaveChangesWithRetries is provided in the StorageClient library as a wrapper over WCF Data Service’s SaveChanges method. SaveChangesWithRetries, as the name mentions, handles retries on failures.

Entity Tracking

The above example shows an update of entity which was first retrieved via a query. Entities need to be tracked by the context for issuing updates or deletes and hence we retrieved it to allow the context to track the entity (it does this by default). The purpose of tracking is to send the Etag condition using If-Match header when an update or delete is issued. The operation succeeds only if the Etag that the context tracks for an entity matches that on the server. The Etag for an entity is its Timestamp as explained above and the service changes it with every update.

But what if I want to skip the Etag check i.e. unconditionally update/delete the entity? Or what if I know what the Etag is and I want to avoid issuing a query before I update or delete the entity? WCF Data Service API provides an AttachTo method for this purpose. This method takes the table name, entity and Etag to track it with and the context starts tracking the entity which will then allow update and delete to be issued on the entity. If “*” is used for the Etag value during AttachTo, an unconditional update request is sent to the service. A common mistake to be aware of here is that “AttachTo” can throw an exception if the entity is already being tracked by the context. Here is a snippet of code that shows an unconditional delete:

Movie existingEntity = new Movie("Action", "Cop Out");// AttachTo can throw an exception if the entity is already being tracked. // We will need to handle exceptions which is excluded here for brevity. context.AttachTo(tableName, existingEntity, “*”);
context.DeleteObject(existingEntity);
context.SaveChangesWithRetries();

Null Values

The Windows Azure Table service ignores properties that have null values passed in for them, and they are not saved into the table stored in the Azure Table service. For example, if Language is null when using the above entity definition, OData protocol marks this property as having “null” value. The wire protocol (OData’s ATOM Pub) will have:

<d:Language m:null="true"/>

The Azure Table service ignores this “Language” property when it comes in and the entity representation in Windows Azure Tables will not have this property. If the SaveChangesOption is ReplaceOnUpdate, then since all null values are ignored, the end result is that the entity saved on the server side will not contain this property.

Queries

Windows Azure Table supports query using various operators over key and non-keyed properties (see MSDN documentation for more details) and the query results are always sorted by PartitionKey, RowKey, since Windows Azure Table supports just one key which is (PartitionKey, RowKey), and does not currently support secondary indexes. Queries to Windows Azure Table can be categorized into the following:

Single Entity (a.k.a. Point Queries): Point query is a query is to retrieve a single entity by specifying a single PartitionKey and RowKey using equality predicates. Example:

var q = (from movie in context.CreateQuery<Movie>(tableName) where movie.PartitionKey == "Action" && movie.RowKey == "Terminator" select movie);

NOTE: Such queries by default throw “DataServiceQueryException” with error code “ResourceNotFound” when the entity does not exist.

Range Queries: Range query involves scanning range of rows. It can be further categorized into following:
- Row Range Scan: Query results in scanning a range of rows within a single partition. Example:

var q = (from movie in context.CreateQuery<Movie>(tableName) where movie.PartitionKey == "Action" && movie.RowKey.CompareTo("Alien") >= 0   && movie.RowKey.CompareTo("Terminator") <= 0   && movie.IsFavoriteselect movie);

Partition Range Scan: Query results in scanning a range of rows that may span across several partitions. Example:

var q = (from movie in context.CreateQuery<Movie>(tableName) where movie.PartitionKey.CompareTo("Action") >= 0&& movie.PartitionKey.CompareTo("War") < 0 && movie.IsFavoriteselect movie);

Full Table Scan: Query results in scanning the entire table i.e. all rows in all partitions in a table. Four examples:

var q = (from movie in context.CreateQuery<Movie>(tableName) select movie);

var q = (from movie in context.CreateQuery<Movie>(tableName) where movie.PartitionKey.CompareTo("Action") != 0select movie);

var q = (from movie in context.CreateQuery<Movie>(tableName)   where movie.IsFavoriteselect movie);

var q = (from movie in context.CreateQuery<Movie>(tableName) where movie.RowKey.CompareTo("Sh") >= 0 && movie.RowKey.CompareTo("Si") < 0 select movie);

From performance perspective, point queries are the best since it is a clustered index lookup. The performance of range queries depends on the size of the row that needs to be iterated through and not just the size of the final result set.

Here are some queries and are listed along with description of their efficiency:

PartitionKey == “SciFi” and RowKey == “Star Wars”: Single entity lookup which is the most efficient query
PartitionKey == “SciFi” and “Sphere” ≤ RowKey ≤ “Star Wars”: Scans entities in a single partition, efficiency depends on number of entities within this RowKey range that needs to be iterated over.
“Action” ≤ PartitionKey ≤ “Thriller”: Scans entities across multiple partitions and efficiency depends on the number of entities across these partitions that need to be iterated over.
PartitionKey == “Action” || PartitionKey == “Thriller”: The current implementation of the LINQ OR predicate is not optimized to scan just the two partitions and will result in a full table scan. It is recommended to execute the two queries in parallel and results be merged on the client end.
“Cars” ≤ RowKey ≤ “Star Wars”: scans entire table since the PartitionKey has not been specified.

We mentioned earlier that query results are always returned sorted by combination of PartitionKey, RowKey. This sort characteristic along with the fact that there is no native support for prefix operations brings out certain interesting cases for queries which we shall try to cover via examples.

Example 1

Retrieve all action movies whose name begins with “Sh”

var q = (from movie in context.CreateQuery<Movie>(tableName) where movie.PartitionKey == "Action" && movie.RowKey.CompareTo("Sh") >= 0 && movie.RowKey.CompareTo("Si") < 0 select movie);

In this example we use “Sh” as the inclusive lower limit and “Si” as the upper limit. This gives us all action movies whose name starts with “Sh”. This query is efficient, since it is going to a single partition and doing a scan over a range of rows from “Sh” to “Si”.

Example 2

Table “RecentMovies” is used to store the latest movies and the purpose is to display the movies such that most recently released movie is displayed first. The PartitionKey is ReleasedDate with format YYYY-MM-DD.

Let us take a small sample set of release dates sorted by ascending order:

PartitionKey = ReleaseDate	Movie Title
2009-01-20	Movie1
2009-05-01	Movie2
2010-01-31	Movie3
2010-06-20	Movie4
2010-10-23	Movie5

Table 1 Rows sorted by ASC (Released Date)

Now if the released date in Table 1 was used as the PartitionKey as is, the result set will not meet the intended query that we need i.e. Desc (Released Date).

To meet the query requirement from this table, we need to use a PartitionKey sorted in the correct order we want the queries to be returned in. One option is to use a PartitionKey to represent the movie’s release date subtracted from a max value. This would result in restructuring the partition key as YYYY’-MM’-DD’ where:

YYYY’ = MaxYear-YYYY; MaxYear = 9999

MM’ = MaxMonth-MM; MaxMonth = 12

DD’ = MaxDate-DD; MaxDate = 31

PartitionKey = Formatted(ReleaseDate)	Movie Title
7989-02-08	Movie5
7989-06-11	Movie4
7989-11-00	Movie3
7990-07-30	Movie2
7990-11-11	Movie1

Table 2 Rows sorted correctly to meet query need – Sorted By Formatted (Release Date)

An interesting thing to note here is we use the month in the format of MM and not just M (similar for day) i.e. they are fixed length. This is again driven by the fact that rows are lexically sorted. Let us assume that we did not use fixed length – Table 3 shows the sort order which will be incorrect because lexically “7989-11-0” < “7989-2-8”.

PartitionKey = Formatted(ReleaseDate)	Movie Title
7989-11-0	Movie3
7989-2-8	Movie5
7989-6-11	Movie4
7990-11-11	Movie1
7990-07-30	Movie2

Table 3 Rows sorted Incorrectly when Fixed length is not used for Date components

Large result sets and Continuation Tokens

Range queries can require multiple round trips to get the entire result set because the service may return partial results with a continuation token and expect the client to resend the query with continuation token to get the next set of results. The response with a continuation includes a continuation token as custom headers. For a query over your entities, the custom headers representing a continuation token are:

x-ms-continuation-NextPartitionKey
x-ms-continuation-NextRowKey

The client should pass both these values back into the next query as HTTP query options, with the rest of the query remaining the same. The client will then get the next set of entities starting at the continuation token. The next query looks as follows:

http://<serviceUri>/TableName?<originalQuery>&NextPartitionKey=<someValue>&NextRowKey=<someOtherValue>

Note, when using AsTableServiceQuery(), the continuation tokens are hidden and are automatically passed back to the service to continue the query by the storage client library.

Continuation tokens can be returned for multiple reasons:

A response can contain at most 1000 entities. If your query results in more than that, a continuation token is returned.
When $top (Take(N)) is used, a continuation token is returned that allows an application to implement pagination.
If query execution does not complete within 5 seconds, partial results are returned with continuation token.
A Continuation token may also be returned when a partition boundary is crossed. A partition boundary is the logical barrier between two partitions in the storage service. If the next entity to return is in a different partition which is served by a different server, a continuation token is returned with partial results. It is expected that the application will resend the request with the continuation token to get the subsequent entities.

An application should expect continuation tokens and it should continue to reissue the request with the received continuation tokens to get the entire result set. It is normal to receive just a continuation token without any results. This can happen when:

When the filter does not match any entities in the partition range served by a server
It is possible for the service to maintain empty Range Partitions. If the query is across more than one partition, and the start of the query lands on one of these empty Range Partitions a continuation token pointing to the next range partition will be returned. These empty range partitions could be there because all of the entities were deleted for an existing Range Partition or the Range Partition was created to enable future scalability (load) on the table across more Range Partitions. This does not occur often, but it can occur as the service load balances itself, which is why your application must appropriately handle continuation tokens. When a single value for PartitionKey with equality predicate is not used in the query filter, the request can be served by a server that did not have any data in the partition range but was assigned a partition range that was valid for the query. This can happen if data in that partition range is deleted or if the storage service reserved some extra partitions for future use.

Best Practices and Tips for Programming Windows Azure Tables

The following are best practices and tips for programming Windows Azure Tables:

DataServiceContext by default tracks entities that are added via AddObject, AttachTo or returned via query results. This tracking is not thread safe and hence it is highly recommended that a new DataServiceContext be used for every logical operation. This also reduces the overhead caused by having the context to track millions of entities that may be queried or inserted in the lifecycle of an application.
Note: A logical operation may include a query for the entity followed by update/delete operation for the queried entity. This requires two requests to the service, but considered as one logical operation.
Expect InvalidOperationException exception (with message “Context is already tracking a different entity with the same resource Uri”) when AttachTo is called on an entity that is already being tracked since an entity can be added for tracking only once. See this post for more details.
A DataServiceQueryException with error code “ResourceNotFound” is thrown by default when a point query is issued for an entity that does not exist (i.e. a query of form where PartitionKey == “foo” && RowKey == “bar”). This exception can be avoided by setting IgnoreResourceNotFoundException to ignore exceptions when an entity lookup fails. Instead one can rely on an empty result set to indicate that the addressed entity does not exist.

When using the same DataServiceContext instance across all inserts/updates/deletes, if there is a failure in SaveChanges, the entity that failed continues to be tracked in the DataServiceContext and re-tried the next time SaveChanges is invoked. Example:

try {// Add movie object that already exists context.AddObject(tableName, new Movie("Action", "Cop Out"));
    context.SaveChangesWithRetries();// Operation fails with “Conflict” error }catch (Exception e)
{// excluded exception handling for brevity... }

context.AddObject(tableName, new Movie("Action", "Law Abiding Citizen"));// Operation fails with “Conflict” error again since "Cop Out" is // issued again context.SaveChangesWithRetries();

The above is an example where, if your application dealt with the conflict error of “Cop Out” in the try/catch, you would want to use a new DataServiceContext for the next SaveChangesWithRetries to have a clean set of changes to apply.

When using StorageClient library to insert entities with SaveChangesWithRetries(), this will retry the insert of the entity if there is a failure or timeout. If the operation succeeds on the Azure Table Service, but the client issuing the request times out due to a network error, the client will retry the request. Since the entity was already inserted, this can result in a Conflict error being sent back. Therefore, the results of SaveChangesWithRetires() would return a Conflict for the insert even though it succeeded correctly. Similarly, if the client application is doing an entity delete with SaveChangesWithRetries(), then if the operation succeeds at the Azure Table service, but a retry occurs from the client, then the client may receive a ResourceNotFound. Therefore, when an application uses SaveChangesWithRetries() or does its own retries after a timeout or intermittent network error, it needs to expect and handle these potential confusing errors. Applications may have their own specific way to handle such errors. For example, some applications may ignore “Conflict” and “ResourceNotFound” errors since as long as the entity was inserted or deleted nothing else needs to be done. However, certain applications may want to update the entity rather than insert on “Conflict” errors. For such application, it may be beneficial for the application to add a transaction-id property to the entity. The application can then retrieve the entity on Conflict and examine the transaction-id, if it matches then the prior SaveChanges was successful.
DataServiceContext has a MergeOption property which is used to control how the context handles the tracked entities.
Example: context.MergeOption -= MergeOption.NoTracking;
The possible valuesare:
- AppendOnly - When MergeOption on the context is set to AppendOnly and the entity is already tracked (by any of the previous retrieve or add operation with the same DataServiceContext object), then retrieving an entity again from server will not update the already tracked entity in the context. Since the ETag is not updated for the tracked entity, a subsequent update may result in PreCondition failure if there is an ETag mismatch.
- OverwriteChanges - DataServiceContext always loads the entity instance from the server hence keeping it up-to-date and overwriting the previously tracked entity. Any changes done to properties are overwritten with the values returned from the server.
- PreserveChanges - When an entity instance exists in the DataServiceContext, the properties are not overwritten with those returned from the server. So any property changes made to objects currently tracked in the DataServiceContext are preserved but the ETag is updated hence making it a good option when recovering from optimistic concurrency errors.
- NoTracking - Entity instances are not tracked by the DataServiceContext. To update an entity on a context that is not tracking will require the use of AttachTo as shown in the code snippet above. Tracking is mandatory for Insert, Update and Delete operations. Since not tracking entities in the context reduces client overhead (reduces memory footprint), this is a good option to use when the scenario is to query entities but there is no intention to update/delete the entities or insert new entities using this context.

Partitions

Our previous post on “Scalability Targets” described why the concept of a “Partition” is critical to scalability. Our system load balances partitions to meet demands of the traffic but a given partition is served only by a single server and hence a single partition has some limits. In Windows Azure table, the PartitionKey property is used as the partition key. All entities with same PartitionKey value are clustered together and they are served from a single server node. This allows the user to control entity locality by setting the PartitionKey values, and perform Entity Group Transactions over entities in that same partition. A single partition has some limitations and application owners should understand how their entities are accessed and workload requirements in order to select a PartitionKey value that allows the application to scale. Applications need to understand their workload to a given partition, and stress with the simulated peak workload during testing, to make sure they will get the desired results.

Figure 2 Concept of Partitioning

Figure 2 shows an example of a movie listing stored in a Windows Azure Table. The category is used as the PartitionKey. We see in this example that entities are sorted by <PartitionKey, RowKey> and that all entities with same PartitionKey values are stored together and served by a single server. All movies that are categorized as “Action” belong to a single partition. In addition each server has a range of partitions that it serves – Server A serves all entities with PartitionKey up to but not including “Comedy” and Server B serves all entities starting from “Comedy”.

Key Selection

There are well documented steps for key selection in a conventional database system. However, when dealing with a “No SQL” system like Windows Azure Table, the procedure may be different and challenging due to the absence of some features like secondary indexes, relationships, joins, order by, group by etc. Choosing a key is important for an application to be able to scale well because of which we have dedicated an entire section here to document some procedures that may help. In this section we will discuss the impact some options can have and discuss some scenarios with examples.

The things to consider while selecting keys are:

Entity Group Transactions
Scalability
Efficient queries

Entity Group Transactions

Entity Group Transaction provides atomic transaction over a group of entities in a single table that have the same PartitionKey value. Here are the rules for Entity Group Transactions:

A single batch transaction can have up to 100 entities.
Request size for a batch transaction should not exceed 4MB in size.
In a single batch transaction, an entity cannot repeat itself (i.e. the RowKey property should be different).

In many applications, multiple entities have to be inserted, updated, or deleted for a single logical request in an atomic transaction. For such scenarios, it is beneficial to have the same value for the PartitionKey across group of such entities that need to belong to an atomic transaction. Let us walk through an example where we consider Entity Group Transaction in the key selection process. The scenario is a video rental application which should track video rentals and also maintain a total count on rentals per user. In a conventional database system, this is usually maintained in two tables: Rentals and Members. The Rentals table stores the rental information of currently checked out movies and Members table maintains the number of movies rented by each member. A stored procedure would then implement the transaction that inserts the rental information of currently checked out movies into the Rentals table and updates the rental count in the Members table. In Windows Azure Table, Entity Group Transaction along with the fact that an Azure table is schema-less can be utilized to store entities that belong to Rentals and Members in to a single table. The PartitionKey is the MemberId and RowKey depends on the type of entity –

“Member” - stores the member’s rental count and other personal member information
“Rental_<Movie Name>” – stores information about a currently checked out movie for that member.

Now since the PartitionKey is the same for entities representing member and rental information and they are stored in a single table, Entity Group Transaction can be used to update the member information and keep the running count of number of movies that a member rents in a single request. Since Entity Group Transaction provides atomic transaction for entities in the same partition (the entities have the same PartitionKey value) in a single table, the application does not have to handle the scenario of one command succeeding while the other fails.

Scalability

As noted in our “Scalability Targets” post, a single partition can serve up to 500 entities per second, depending upon the workload. When applications have requirements that surpass that limit, they will need to design such that the requests are distributed across the range of partitions rather than just a single partition. Hence, scale is important to consider while selecting keys. The choice ranges from (a) having a single partition by having the same value for PartitionKey for all entities to (b) having a unique value for PartitionKey for every entity (i.e. every entity is in its own partition). There are advantages and disadvantages for each of the choice. Let us first evaluate the two extreme choices:

Single PartitionKey Value

Advantages:

Since all entities belong to a single partition, Entity Group Transactions (batch) can be used to insert/update/delete 100 entities at a time. Batching can considerably reduce costs, and the application can perform atomic operations across each batch of entities.

Disadvantages:

Does not scale beyond the limits of that of a single server since a partition can be served only by a single server. This is not suitable for scenarios where large workloads are expected.

For queries, range row scans can be fast, depending upon the size of the range, and they will be processed by a single server.

New PartitionKey Value for Every Entity

Advantages:

Scales very well since our system has the option to load balance several partitions – potentially every entity can be load balanced effectively (provided “Append only” pattern is avoided).

Disadvantages:

A row range query is expensive since it will result in full table scan..
Entity Group Transaction (batch) is allowed only on the same partition in a single table - since all entities have unique PartitionKey value, batch requests cannot be issued.

For partition range queries, the partition range scans can be efficient, if the ranges are small, though more than one server may need to be visited to satisfy the query, and the query may require using continuation tokens to retrieve all of the results.

Both of the above choices are valid to have under different scenarios. Most applications have a variety of potential keys and you need to ensure that the partitioning you select is scalable. Here are some procedures that can be followed to test if the selected key allows your application to scale:

Once a PartitionKey is selected, go over the access patterns for the data to see if this choice will result in partitions that will become too hot to be served efficiently from a single server? One way to determine if this would result in a hot partition is to implement a Table partition stress test. For this test, create a sample table using your keys and then exert peak stress for your given workload on a single partition to ensure that the table partition can provide the desired throughput for your application. You may wish to leave some margin of error in order to anticipate future additional load your application may experience. This is a common best practice for scale testing your application.
NOTE: A single partition can serve up to 500 entities per second. However, this number depends on your access patterns. For this scalability target, the throughput limit is in terms of entities and not requests to the server. A single batch request (Entity Group Transaction), which can contain 100 entities, so a partition can handle up to the following number of batch requests per second = 500/(# of entities in batch).
If the table partition stress shows your scenario is requiring more than 500 entities per second from a single partition then you should revisit your key choices in order to allow for more granularity. The manner in which the Azure Table service performs throttling is based on the current load of the service. Meaning, that it is possible to run a stress test and achieve more than 500 entities per second (as seen in many of our example charts below) with little or no throttling, however this may not always be the case as the service may experience higher traffic volumes in the future that could result in throttling. Overall it is considered best practice to avoid requiring more than 500 entities per second out of an individual partition.
If the Table partition stress test passes and you have not exceeded the recommended limits on each partition, then you are done. A test can be considered as successful if throttling is not experienced and the throughput was within the established limits. See “Timeouts and ServerBusy – Is it normal?” later in this post for more details.
If the Table partition stress test does not pass, select a more fine-grained PartitionKey. This could be done either by choosing a different PartitionKey or modifying the existing PartitionKey (for example, by concatenating it with the next key property). The purpose of this is to create more partitions so that a single partition does not become too hot in which a large percentage of requests are continuously throttled.

Let us now take an example to learn about how scale can influence our key selection process. Our example scenario consists of a distributed application consisting of various components (i.e. customer portal web roles, worker roles for data aggregation, web roles for reporting, etc.) and multiple instances of each role in a given deployment. Each component (role type) is assigned a unique name. Additionally, each instance of a given component has a unique instance id. We want to construct a logging system that allows a user to perform error and usage reporting over each instance and each component. If we were to build such a logging system for our application to record messages into a table, we have a couple of options when choosing an appropriate PartitionKey for our system:

PartitionKey = Component Name: This seems like a good idea since an application that has many components will be using multiple partitions – one for each component. This allows the service to scale these partitions out when various components start to log heavily. However, if a single component logs a lot of messages and it has many instances running, this partition will become hot and eventually will be limited by the throughput of a single partition (i.e. single partition is limited by the scale of a single server). Though each component has unique name, all instances of one component share the Partitionkey.
PartitionKey = Component Name + Instance Id: Here the combination of component name and instance id allows us to scale better since each instance of a component is a single partition for logging. If the customer portal web role component has 100 instances, each instance will use its own value for PartitionKey. This allows Window Azure Table system to better load balance out the partitions based on the traffic it sees.
PartitionKey = ComponentName + Instance Id + Bucket Id: Let us assume that even option 2 does not scale well, since every instance of this component logs more than what a single partition can handle. Here, one option is to predetermine a number of buckets to distribute these messages into and then select these buckets randomly to insert the log statements. The number of buckets can be determined via estimating the high end of required throughput.

Note: The section titled “Queries” covers query efficiency in detail, but we wanted to quickly point out here that when using a PartitionKey that is made up of concatenated fields, you want to choose an ordering based upon your dominant queries.. In the 3^rd example above you are able to query for all entities in a given component name, Instance Id or BucketID by utilizing lexical comparisons; however if you were to choose PartitionKey = Bucket Id + ComponentName + Instance Id, you will be unable to efficiently query for all entities by Instance ID since entities with that Instance Id will span multiple Buckets and Components, as such you would be forced to perform a full table scan over non indexed fields which would have an adverse effect on performance.

Here are some examples of queries when using ComponentName + Instance Id + BucketId:

Query all entities that belong to component “ReportingRole”

var query = from entity in context.CreateQuery<Logs>(tableName)where entity.PartitionKey.CompareTo("ReportingRole;”) >= 0 && entity.PartitionKey.CompareTo("ReportingRole<”) < 0 select entity;

Query all entities that belong to component “ReportingRole” and Instance 2

var query = from entity in context.CreateQuery<Logs>(tableName)where entity.PartitionKey.CompareTo("ReportingRole;Instance0002;”) >= 0 && entity.PartitionKey.CompareTo("ReportingRole;Instance0002<”) < 0 select entity;

To summarize, depending on estimated traffic, an application can be designed to use appropriate number of partitions which allows it to scale. This will be a good time to mention the “Append Only” pattern that one should avoid while designing the partitioning scheme.

Stay Away From “Append (or Prepend) Only” Pattern (if possible) For High Scale Access

Let us use an example to explain what an “Append Only” pattern is (and prepend only pattern differs in just the sort order). Assume that you have the following successive values for partition keys:

12:00:00
12:00:01
12:00:02
12:00:03
12:00:04
12:00:05
12:00:06
12:00:07

Then of course you know the next write will be 12:00:08, then 12:00:09, etc. Each of the above represents a unique entity and each entity is in its own partition –one may ask that since it follows our golden rule of scaling by having entities in different partitions, why is it that we should avoid it for high scale scenarios?

For query efficiencies, we group the entities into Range Partitions; For example, we might group them into the following 3 partitions (just as an example) based upon the load on the table:

Range Partition 1

12:00:00
12:00:01
12:00:02
12:00:03

Range Partition 2

     5. 12:00:04
     6. 12:00:05
     7. 12:00:06

Range Partition3

8. 12:00:07

We do not by default create a Range Partition for every partition key value, because if we would, we would not get the efficiencies from range partition key queries. If you wanted to query for rows between a time range 12:00:01 and 12:00:03, that would be efficient in the above example since we have grouped those entities into a single range partition. In using range partitions, the writes for 12:00:08, then 12:00:09 are all going to the same partition, which is the last partition in the table (Partition 3 in the example). This is what we call the “Append Only” pattern on a table. If the keys were inserted in descending order, it would be a “Prepend only” pattern and all requests will fall the in first partition range.

The “Append Only” (or “Prepend only”) pattern is to use a lexically increasing (or decreasing) key for successive accesses. Data is stored sorted by PartitionKey (and RowKey for tables). When the entities inserted have their partition key increasing (decreasing) alphanumerically, the request will always be served by the last (or first) range partition for that table, which will be hosted by a single server. Since the keys are increasing (or decreasing), it is always the tail end (or beginning) of this partition range that gets all the requests and hence the write throughput achieved will be limited by that of a single server and load balancing will not help increase the throughput. Bottom line, it is important to know that “Append Only” (or “Prepend Only”) pattern will have limited write throughput of a single partition, in comparison to what can be achieved by writing across the whole key range.

Selecting Partition Key for Efficient Queries

Windows Azure Tables provides one clustered index and results are always sorted in ascending order of PartitionKey, RowKey. Absence of secondary index and sorting implies that selecting PartitionKey values that provides efficient queries requires forethought of the queries that are required for your application. In the “Queries” section we discussed about efficiency of various query types with most efficient being a point (single entity) query. We recommend that high-frequency, latency-critical queries use point query (i.e. PartitionKey and RowKey lookup) or PartitionKey filter on a small range of entities. Using the PartitionKey and RowKey in the query filter limits the query execution to a single or a subset of contiguous entities (depending upon the condition used), thereby improving query performance. The following are some rough guidelines and suggestions for how to choose a PartitionKey for your table for efficient querying:

First determine the important properties for your table. These are the properties frequently used as query filters.
Pick the potential keys from these important properties.
It is important to identify the dominant query for your application workload. From your dominant query, pick the properties that are used in the query filters.
This is your initial set of key properties.
Order the key properties by order of importance in your query.
Do the key properties uniquely identify the entity? If not, include a unique identifier in the set of keys.
If you have only 1 key property, use it as the PartitionKey.
If you have only 2 key properties, use the first as the PartitionKey, and the second as the RowKey.
If you have more than 2 key properties, you can try to concatenate them into two groups – the first concatenated group is the PartitionKey, and the second one is the RowKey. With this approach, your application would need to understand that the PartitionKey for example contained two keys separated by a “-“. The key that is always known for frequently executed, latency sensitive, queries can be chosen as the prefix so that efficient prefix range queries can be executed on PartitionKey and/or RowKey.

In some cases, you may have more than one dominant query which may have different set of properties that need to be queried over. For example: In a micro blogging site, the requirements for dominant queries can be that for a given author:

Retrieve blogs authored by a user by date: RowKey is <Date Published>;<Blog Title>
Retrieve blogs authored by a user by category: RowKey is <Category>;<Blog Title>

In this case, the PartitionKey is Author, but there are two options for RowKey. Using Entity Group Transaction, an application can easily maintain its own secondary index. There are two options here for storing data:

Option 1

Application can store the blog twice in a single transaction in the same partition – one entity with RowKey <Date Published>;<Blog Title> and second entity with RowKey <Category>;<Blog Title>.

Example:

PartitionKey	RowKey	Date Published	Category	Text
Brady	20100920;RandyToVikings	2010/09/20	Sports	Randy has left the building…
Brady	Sports;RandyToVikings	2010/09/20	Sports	Randy has left the building…

Table 4 Entities stored twice to provide fast lookup by category and date

This allows a fast and easy way to query by date range or category. For example, a query for all “Sports” blogs by Brady is easy. For this, we use the prefix search pattern we described in Queries section.

var blogs = from blog in context.CreateQuery<Blogs>("Blogs")where blog.PartitionKey == "Brady" && blog.RowKey.CompareTo("Sports;") >= 0&& blog.RowKey.CompareTo("Sports<") < 0select blog;

The key is in using delimiter + 1 as the separator. In the above example, ‘;’ was the separator used and hence we used ‘<’ in the last predicate.

Now to get all Brady’s blogs in December, we can again do an efficient search:

var blogs = from blog in context.CreateQuery<Blogs>("Blogs")where blog.PartitionKey == "Brady" && blog.RowKey.CompareTo("20101201;") >= 0&& blog.RowKey.CompareTo("20101231<") < 0select blog;

Option 2

If storing entities multiple times is not feasible because of the size of each entity, then an application can insert an additional index row that provides the required lookup to retrieve the primary row that contains all the data. We can benefit from the fact that a table is schema-less by not storing all the properties in the index row.

PartitionKey	RowKey	Date Published	Category	Text
Brady	20100920;RandyToVikings	2010/09/20	Sports	Randy has left the building…
Brady	Sports;RandyToVikings	2010/09/20

Table 5 Entity and Index Stored in Same table to provide fast lookup by category and date

For the above example, let us assume that entity with RowKey <Date Published>;<Blog Title> is the primary row that contains all data. Then store the second entity with RowKey as <Category>;<Blog Title> and Date Published as the only non-key property (second row in Table 5). The purpose of the second entity here is to provide a lookup for the “Date Published”. Now when a query by category is to be executed, it can first retrieve the “Date Published” and then retrieve the entire entity containing the blog. The two queries are look up and a good alternative to scanning large partitions. This is effectively storing and using a second index in the same partition for this table and the index can be easily maintained by using Entity Group Transactions.

Performance Case Study Methodology

We are now going to examine the performance of Windows Azure Tables. Before we do that, we will briefly describe the methodology used to perform the experiments.

The following provides a description of the framework. A Silverlight application running as a Web role is used to submit parameters for a test run. The test parameters are stored as a row in a table. A worker role with multiple instances continuously monitors this table and when it sees a test submitted, it will go ahead and execute the test run. Depending on the test run parameters and number of VMs and threads to use, appropriate number of worker roles queue up N threads that then issue requests against Windows Azure table, blob, or queue depending on what is being tested. We ran the tests against small (called small in the graphs) and extra-large VMs (called XL in the graphs) in production.

A few notes on our application used here:

GUID was used for PartitionKey. For table tests that span across partitions, the partition key is a random GUID.
Each thread is provided a PartitionKey to work with. The row key used is JobID_VMID_WorkerId_Index.
- JobId is an ID that uniquely identifies the test.
- VMID is the id for the VM working
- WorkerID is the id that each thread is assigned (1 through N where N is the maximum threads assigned per VM for the test)
- Index is the unique id for each entity. Each thread maintains a counter which is incremented after every insert
Table entity has one non key property called “Payload” of string type.
All entities are of 1024 bytes and the payload was randomly generated
We use synchronous Table APIs.
We do not retry a request in these tests.
Each thread performs a series of synchronous requests, as soon as one request finishes another is sent out, except for requests that result in an exception being thrown.
In a typical application it is recommended to utilize an exponential backoff strategy in order to respond to the throughput of the service. However in the test below we wanted to concentrate on available throughput of the service for stressing the system, as such when the test encounters an exception it will only wait for 3 seconds before submitting the next request.
We use a new DataServiceContext for every logical operation. WCF Data Services recommends use of a new context for each logical operation since it is not thread safe and also reduces the overhead of tracking vast number of entities that a client may want to store. A logical operation may include a query for the entity followed by update/delete operation for the queried entity.
For Queries, we perform single entity (i.e. equality filter on PartitionKey and RowKey) GETs to compare throughputs. Each thread works on an assigned GUID as the partition key and retrieves each entity sequentially since the RowKey suffix is essentially a counter.
For multi-partition tests each unique worker thread both reads and writes to its own partition using a randomly generated GUID as the partition key, for single partition tests all worker threads in all VMs use the same randomly generate GUID as the partition key. This results in a well distributed load across partitions thus allowing our system to load balance.
We ran all tests in live production environment over a period of several days. As such there is some degree of variability in the data collected that reflects the behavior of the production environment; however we believe this provides an accurate picture of what to expect when designing solutions targeting Azure Storage (Blobs/ Tables / Queues).
We have followed the best practices posted here except for “To improve performance of ADO.NET Data Service deserialization” since this issue has been resolved in recent releases of WCF Data Services and in .NET 4.0.
- We use the following setting for ServicePointManager:
  - ServicePointManager.Expect100Continue = false;
  - ServicePointManager.UseNagleAlgorithm = false;
  - ServicePointManager.DefaultConnectionLimit = 100;
- We turn off expect 100 so that PUT and POST requests do not send the expect header and wait for 100-continue from the server before sending the payload body.
- We turn off Nagling and have covered in depth in this post about how Nagling impacts small packets.
- We set the default concurrent connection limit to 100.
We monitor and track throttling errors in all tests. We observed throttling in results for only Figures 8, 9 and 15, which stressed the load on a single partition over the scalability targets.
Our results were collected after our system reached a steady state. This warm up allowed our service to load balance enough times to meet its throughput needs, which is the same that would occur for a continuously running live hosted service. When you perform your own scalability tests, you should warm up your service to its steady state and run with that for awhile. Then run your scalability tests using those storage accounts. We have seen some studies compare results when running from cold start for just a few minutes and hence did not see as much throughput as we show here.

Performance Results and Tips

Selecting the Appropriate Request Size

A common question we get asked is how can we upload 1+ million entities into Windows Azure Tables as fast as possible from a worker role? The answer is Entity Group Transaction (aka batch). The follow up question asked is how many entities should we batch together in a single request? To gain some insight into these questions we will examine the performance differences for various sized batch requests and also compare with single entity requests.

Let us start with comparing the latencies for single entity insert against batch inserts using a single extra-large VM as the client. Figure 3 shows the latency observed for single entity gets and inserts. The y-axis shows the number of transactions during the run, and the x-axis shows the latency in milliseconds. The results show that majority of gets complete under 40 milliseconds and the majority of inserts complete under 50 milliseconds.

Figure 3 Single entity latencies for inserts and gets

Figure 4 shows the latency for batch of 100 entities executed serially on a single thread on a small VM. The x-axis shows latency in milliseconds and y-axis shows the number of transactions during the run. We see that the majority of batch inserts complete within 1.2 seconds. For the results, we insert 100 entities into a table and all entities belong to the same partition. When comparing Figure 3 and Figure 4 the results show an expected result, which is that batching updates together reduces the latency to process a set of operations versus sending them to the server serially one at a time.

Figure 4 Insert Latency distribution for Batch of 100 Entities

Since batching helps for bulk inserts, let us look at what the right size for batch request may be. Figure 5 shows the average latency for batch requests of various sizes when executed serially using a single thread on small and extra-large VM. The x-axis shows the various batch sizes used and y-axis shows the latency for the requests in milliseconds.

Figure 5 Compare latencies of requests with various sized batch requests to a single partition

Figure 6 shows the average latency it takes for a single entity when part of a batch request. The x-axis shows the various batch sizes used and y-axis shows the latency for each entity (and not the entire batch request) in milliseconds. This data is derived from Figure 5 by dividing the batch transaction latency by number of entities in the batch transactions.

If an application is sensitive to latency, issuing multiple concurrent batch requests containing 20 inserts will be better than issuing a single request containing 40+ entities, because the per entity latency savings peaks out at batch requests of size 20. However, an application may want to batch more entities together to reduce transaction costs. This is because there is an economic benefit when utilizing higher batch sizes as they directly reduce the number of transactions against your storage account by a factor of N. Meaning a batch size of 50 is twice as expensive as batch size of 100 for the same number of aggregate entities, since it will result in twice the number of transactions. Overall, consider throughput, latency and costs when evaluating the batch size to use for your scenario.

Figure 6 Compare per entity latencies of requests for various sized batch requests to a single partition

In Figure 7, we compare the impact of batching on multiple partitions. The test compares different batch sizes on extra-large VM with small VM using 30 threads. The results show a similar trend as seen in the prior results; even with multiple partitions and multiple threads, the performance benefit of batching tends to peak out at around a batch size of 20 for our experimental workload.

The results also show a performance difference between using an extra-large VM and a small VM. The results show that a single extra-large VM provides a little more than 1.5 times the throughput than on a small VM. The reason behind achieving higher throughput on extra-large VM is because extra-large VMs have more resources (memory, cores and network bandwidth) compared to a small VM to handle the 30 threads. In addition the small VM being a single core, .NET 3.5 uses workstation GC which suspends all threads during collection and the lower available memory on small VMs can trigger GC more frequently when compared to extra-large VMs. Even so, there are cost differences between the small and extra-large VM and it depends upon your applications workload whether it will get lower cost per transaction using several small VMs vs. fewer extra-large VMs.

Figure 7 Compare the performance of different sized batch requests to multiple partitions using 30 threads on 1 VM

Scalability of the table

The storage system monitors the usage patterns of the partitions, when it recognizes that there is a lot of traffic to some partitions, the system will respond by automatically spreading them out across many storage servers. This will result in lower latencies and higher overall throughput as the traffic load is now being distributed across many servers. However, a partition (i.e. all entities with same partition key) will be served by a single server to provide transaction ordering and strong consistency for the operations to that partition. It is important to note that the amount of data stored within a partition is not limited by the storage capacity of one storage node (we will have more information on this in a future post that provides an overview of our storage architecture).

Since entities with the same PartitionKey are stored together and served by a single server, it allows efficient querying within a partition. Your application can also benefit from efficient caching and other performance optimizations that are provided by data locality within a partition. There is a tradeoff here between trying to benefit from entity locality, where you get efficient queries over entities in the same partition, and the scalability of your table, where the more partitions your table has the easier it is for Windows Azure Table to spread the load out over many servers.

Important: This prior posting gave the scalability targets of Windows Azure Tables. There we say that we are targeting a storage account to perform up to 5,000 operations per second (where an operation here is a single entity), and a single partition to perform up to 500 entities per second. The results we show in this section are higher than both of these. Our service allows a storage account and partition to go above these targets, based upon the excess load that the storage service can handle. But it is important to design your system within these targets, because if the storage service is under heavy load it will start to throttle the storage account and the partition for excess load over these targets. Also, these targets are not exact in that it also depends upon the exact mix of operations and size of operations for your service. Therefore it is important to perform stress tests to see where the limit of a single partition is and if your expected access patterns allow you to reach the expected targets.

To understand the impact of number of partitions on scalability, we compared the throughput of inserting single entities using 15 threads per extra-large VM into a single partition and multiple partitions. The results for Inserts and Gets are shown in Figure 8 and Figure 9 respectively. The x-axis shows the number of extra-large VMs used to generate the traffic and y-axis shows the entities inserted/retrieved per second. Figure 8 shows single entity inserts, whereas we will look at the throughput using batch requests later in the blog post.

Figure 8 shows that a single partition in a table served around 1000 entities per second at 5 VMs, with 0.0004% of the requests throttled (no throttling occurred at 2 VMs doing 600 entities per second). Then as we increased the load on the single partition, we saw around 0.03% of the requests throttled at 10 VMs, and 0.2% of the requests throttled at 16 VMs, all for the single partition results. These results show the importance of understanding the scalability targets of a single partition, and staying within them. Whereas for the multiple partition results, the throughput continued to scale up as the load increased as we added VMs, since the application distributed the requests across many partitions, and then the Table service was able to appropriately distribute the traffic across many storage nodes to meet the additional traffic demands.

Figure 8 Compare single entity insert throughput between single and multi-partition (Using 15 threads per extra-large VM)

Figure 9 compares the throughput we get for single entity GETs for single and multiple partitions. The results show that distributing the requests over multiple partitions provides higher throughput than a single partition for the same reasons as described for Figure 8. Comparing Figure 8 with Figure 9, we see as expected that GETs provide higher throughput than Inserts since writes are more expensive than reads.

Figure 9 Compare single entity GET throughput between single partition and multi-partition. 15 threads per extra-large VM

Now that we have established the throughput limits of a single partition and the importance of distributing the load across multiple partitions with single entity operations, we will now examine the benefits of using batch operation.

Let us see the impact that concurrency (i.e. for our scenario the number of threads) has over throughput. We compare throughput by varying the number of threads issuing batch inserts of 100 entities with each of the batch requests targeting different partitions. We issue all requests from a single VM. The x-axis in Figure 10 shows the number of threads that concurrently issue the batch insert requests and y-axis shows the throughput in terms of entities/second. As we see in Figure 10, the throughput improves as we increase the number of concurrent batch insert requests from a single VM with batch requests targeting multiple partitions.

Figure 10 shows that over a certain threshold of concurrency (after 15 threads), we see that the improvement flattens out and increasing concurrency does not help much for single VM. This is not due to a limit on the server end, rather this illustrates when the available processing resources of the VM becomes saturated (in this case a single core VM), whereas the throughput continues to go up with the increase in threads for XL VM size.

Figure 10 Compare impact of threads issuing batch inserts (100 entities) targeting multiple partitions on a single VM

Figure 11 performs the same experiment as in Figure 10, but compares the impact of threads on GET throughput on multiple partitions from a single VM. Here we see the similar leveling off of throughput, but at a much earlier number of threads (6), due to the same reasons.

Figure 11 Compare impact of threads on single entity GETs targeting multiple partitions on a single VM

In Figure 12 and Figure 13 we want to see the impact of increasing the number of VMs on batch inserts. We execute batch requests targeting multiple partitions and these batch requests are issued concurrently from 15 threads in Figure 12 and 30 threads in Figure 13. The x-axis shows the number of VMs from which requests were issued concurrently and y-axis shows the throughput in terms of entities/second.

Figure 12 Compare 100 batch upload throughput with varying VMs on multiple partitions, 15 threads per VM

Figure 12 and Figure 13 show that using batch requests and increasing the number of VMs continue to give higher throughput. This shows that the lower throughput seen in Figure 10 is due to client side bottleneck. Please note that the achieved throughput of 17,000 entities per second is a lot higher than the scalability targets we state for an account. As described earlier, the intention here is to show that the system can scale to higher throughputs if the excess throughput is available to be used in the system, but you should design your application within the scalability targets, since that is what is enforced for throttling when the system is under heavy load.

Figure 13 Compare 100 batch upload throughput with varying VMs on multiple partitions, 30 threads per VM

As shown in prior results, we observed high CPU usage on small VMs that limited their throughput as we increased the number of threads. In Figure 14 we compare the throughput for GETs on small and extra-large VMs with 30 threads per VM. The results show the higher throughputs achieved by using more VMs or having more processing power per VM (when using extra-large vs. small VM).

Figure 14 Single entity get throughput over multiple partitions, 30 threads per VM

A few question these results may prompt:

What size of VM should my application use to get the best cost per transaction?
Is it better, in terms of cost, to use many smaller sized VMs or fewer larger VMs to achieve the same throughput?

The answer to these questions is very application dependent, since some applications may optimize their costs by using many smaller VMs, whereas others using fewer larger VMs, depending upon the amount of VM cycles needed for processing each application-level transaction, or other application factors. The only way to tell is to stress test your application while varying the type of VM, the amount of concurrency in each VM, and other factors that can influence your design (e.g., how much batching to perform, etc). Due to the various sizes of VMs available it is possible to create a custom solution that best meets the demands of your scenario, and do so in an economical way.

Scalability Summary

To quickly summarize, for scalability:

Collocate entities that need to be accessed together by assigning them the same PartitionKey value
Understand the scalability of a single partition for your application’s traffic
Distribute entities that have high traffic demands across multiple partitions – our system will monitor and automatically load balance to meet traffic needs
Client side bottlenecks. When looking at the scalability of the application, it is important to understand if the application is experiencing a server side bottleneck or a client side bottleneck. If the client side is the bottleneck, then increasing the number of concurrent requests, the number of VMs used or the size of the VM can increase overall throughput.
It is important to note that in our case study we utilized blocking synchronous calls and increased concurrency by increasing the number of threads for simplicity, since our focus was on showing the server side throughput Windows Azure Tables provides. However, for truly scalable applications it is recommended to use asynchronous operations rather than their synchronous counterparts. Asynchronous operations provide significant benefits in terms of scale, CPU utilization, and Memory utilization.

Timeouts and ServerBusy – Is it normal?

A single server may serve many partitions. Our system monitors these partitions and when a server is heavily loaded, our system load balances these partitions to meet the traffic needs of clients. The server may take 5-10 minutes to decide to load balance in order to not overreact to very small spikes in traffic. Therefore it is possible for clients to experience “Server Busy” or “Timeout” errors when a limit is hit and it is expected that clients backs off on such errors to allow the load to be reduced on the server. Once a server recognizes that a limit is being hit, load balancing kicks in, which involves reassignment of partition(s) to other servers. This reassignment may take between 10-20 seconds for the partition to be available for serving requests, during which time a client may experience “Server Busy” or “Timeout” errors. The retries and exponential back-off strategy provided by the StorageClient (for SaveChangesWithRetries and CloudTableQuery) will help in handling such errors. Then once the partitions are load balanced the system would then provide higher throughput since the partitions are served by different servers thus being able to process more requests across these partitions. However, since a single partition is always served by a single server, its limit is critical to understand for an application’s scale, as described earlier. A client that requires high throughput should design its application so that it uses multiple partitions to meet its scale needs. It is worthwhile to reiterate that “Server Busy” and “Timeout” errors can be experienced in the following cases:

When a server reaches a limit on requests it can process. These requests can be to any/all partitions that it has been assigned to serve. Since load balancing does not kick in at the very instant the server sees spikes, a client can experience “Server busy” until the partitions are load balanced.
During load balancing, which can take up to 10-20 seconds for a partition to be reassigned and made available again for processing requests
A single partition is being asked to serve more requests than what a single partition can serve (see the target scalability limits described earlier)

1 and 2 should be considered normal and the system load balances to meet the traffic needs. However, in case of 3, there is not much that the Azure Storage system can do, since a partition can be served only by a single server in order to efficiently provide strong consistency for your Windows Azure Table operations.

We recommend that clients inspect their data access patterns and schema to see under what load it experiences “Server Busy” and “Timeout”, as well as stress test a single partition for the application to see how many entities per second it can process. When scale is a concern, designing the schema such that traffic is distributed across various partitions allows our system to load balance.

To illustrate this behavior we designed a test to purposely “saturate” a single partition with numerous requests to force the service into the throttling behavior. In the test below we ran each VM with 10 concurrent worker threads, adding additional VMs to increase the overall workload on the partition. The x-axis in Figure 15 indicates the total number of threads issuing requests across all VMs. The left y-axis is the throughput in terms of successful entities/second, and the right y-axis is the percent of the overall requests that were throttled.

Figure 15 Saturation of a single partition with 100 batch requests (10 threads per VM and using 1, 2, 3, 4 and 5 VMs)

Figure 15 shows that between 20 and 30 threads the backend begins to throttle ~11% of the incoming requests. This is after it has reached 2,300 entities per second for 20 threads with almost no throttling. The results also show that there is little performance gained by overloading a single partition as the service will respond to increased load by throttling a higher percentage of incoming requests. At 50 concurrent requests, we see approximately 2,500 successful entities per second, but also around 700 expected throttling failures per second (22% of the total 3,200 request/second being sent by the client). As such it is best to observe best practices and utilize an exponential backoff strategy as well as multiple partitions (if applicable) to minimize throttled requests. This can be seen as the percent of throttling continues to go up, whereas the successful request rate for the partition holds steady at above 2,300 entities per second.

Note: Another important point regarding this result is that for billing purposes the storage account includes transactions that result in expected throttling in addition to successful ones. All of these throttled requests are classified as expected and will be billed, since they are going over the scalability targets for a single partition and the client is not backing off. This provides an additional economic incentive to design an application’s tables in a scalable way, as well as observe exponential backoff strategy in order to achieve the most economical throughput (see for “Understanding Windows Azure Storage Billing” for more on billing).

When throttling occurs, applications should follow the best practices for backoff. To aid applications, the storage client library in the Windows Azure SDK provides retries for operations when using SaveChangesWithRetries method and the CloudTableQuery class (see Creating Tables and Inserting/Updating Entities section for example code). When using this method, the default exponential backoff strategy provided will retry a request 3 times waiting an exponential amount of time between each request. The default implementation is to use 3, 27, and 81 seconds as the backoffs between the 3 retries with a maximum backoff of 90 seconds before failing the request. The storage client library also randomizes the backoff time by an approximate factor of +/- 20% seconds for each interval to ensure that multiple simultaneous clients distribute their requests over time.

Conclusion

Windows Azure Table is a scalable and durable structured storage solution. It differs from a conventional database system and hence the rules of the game are different when designing your schema and selecting your keys. In Windows Azure Storage, a partition is the smallest unit of scale and the system load balances these partitions to meet the traffic demands. Every data object has a partition key which defines the partition. The PartitionKey property in a table entity is its partition key. All entities with the same PartitionKey value will belong to the same partition and will be served by a single server.

The key selection process is critical to scale and we covered some important concepts that aid in the selection process:

Entity Group Transactions a.k.a. batch helps in achieving higher throughput
- Collocate entities that will be accessed together by assigning them the same PartitionKey value.
- Select appropriate size of batch depending on what is critical – throughput or latency or cost.
- This along with concurrency is a great option to upload large number of entities since you get high throughput and batches are cost efficient since you pay for each batch request and not the individual entities.
Scalability
- Entities that demand high traffic throughput should be distributed across multiple partitions.
- Adding more concurrent requests and VMs increases throughput
Queries
- Strive to use single entity and small range based lookups which are the most efficient queries.
- Avoid scans for dominant queries.
- Expect continuation tokens when a query is not a single entity get (point query)
VM Size
- For some scenarios your operations can become CPU bound, ensure your deployment provides adequate CPU to maximize throughput. To constrain our tests, we did as many continuous synchronous operations as possible by adding threads to focus on the throughput and the latency of the Windows Azure Table service. If your application requires dense computation on the entity data, then the number of VMs you need to achieve similar throughput will be larger. It is recommended to run stress tests to determine the amount of throughput a single to multiple VMs of different sizes gives you for your workload.
- VM size dictates the CPU and memory resources available to the system. In the throughput case study above we compared small and extra-large VM sizes in order to show the performance on the client side and the service side as well. In reality your scenario may not require the full capabilities of an extra-large VM, as such you will want to examine the cost/transaction you get from using different sized VM in order to get the most economical overall throughput for your application.
Use Asynchronous operations to better scale applications
- a. Using Asynchronous operations will reduce context switching and reduce CPU load allowing your application to scale to a higher level of concurrency. This improvement will be more noticeable on more transactionally dense scenarios.

Additional Resources

Joe Giardino, Jai Haridas, and Brad Calder

↧

Windows Azure Storage Client Library: CloudBlob.DownloadToFile() may not entirely overwrite file contents

November 12, 2010, 7:04 pm

≫ Next: Windows Azure Storage Client Library: Potential Deadlock When Using Synchronous Methods

≪ Previous: How to get most out of Windows Azure Tables

Update 3/09/011: The bug is fixed in the Windows Azure SDK March 2011 release.

Summary

There is an issue in the Windows Azure Storage Client Library that can lead to unexpected behavior when utilizing the CloudBlob.DownloadToFile() methods.

The current implementation of CloudBlob.DownloadToFile() does not erase or clear any preexisting data in the file. Therefore, if you download a blob which is smaller than the existing file, preexisting data at the tail end of the file will still exist.

Example: Let’s say I have a blob titled movieblob that currently contains all the movies that I would like to watch in the future. I want to download this blob to a local file moviesToWatch.txt, which currently contains a lot of romantic comedies which my wife recently watched, however, when I overwrite that file with the action movies I want to watch (which happens to be a smaller list) the existing text is not completely overwritten which may lead to a somewhat random movie selection.

moviesToWatch.txt	You've Got Mail;P.S. I Love You.;Gone With The Wind;Sleepless in Seattle;Notting Hill;Pretty Woman;The Runaway Bride;The Holiday;Little Women;When Harry Met Sally
movieblob	The Dark Knight;The Matrix;Braveheart;The Core;Star Trek 2:The Wrath of Khan;The Dirty Dozen;
moviesToWatch.txt (updated)	The Dark Knight;The Matrix;Braveheart;The Core;Star Trek 2:The Wrath of Khan;The Dirty Dozen;Woman;The Runaway Bride;The Holiday;Little Women;When Harry Met Sally

As you can see in the updated local moviesToWatch.txt file, the last section of the previous movie data still exists on the tail end of the file.

This issue will be addressed in a forthcoming release of the Storage Client Library.

Workaround

In order to avoid this behavior you can use the CloudBlob.DownloadToStream() method and pass in the stream for a file that you have already called File.Create on, see below.

using(var stream = File.Create("myFile.txt"))
{
myBlob.DownloadToStream(stream);
}

To reiterate, this issue only affects scenarios where the file already exists, and the downloaded blob contents are less than the length of the previously existing file. If you are using CloudBlob.DownloadToFile() to write to a new file then you will be unaffected by this issue. Until the issue is resolved in a future release, we recommend that users follow the pattern above.

Joe Giardino

↧

Windows Azure Storage Client Library: Potential Deadlock When Using Synchronous Methods

November 23, 2010, 11:12 am

≫ Next: Changes in Windows Azure Storage Client Library – Windows Azure SDK 1.3

≪ Previous: Windows Azure Storage Client Library: CloudBlob.DownloadToFile() may not entirely overwrite file contents

Update 11/06/11: The bug is fixed in the Windows Azure SDK September release.

Summary

In certain scenarios, using the synchronous methods provided in the Windows Azure Storage Client Library can lead to deadlock. Specifically, scenarios where the system is using all of the available ThreadPool threads while performing synchronous method calls via the Windows Azure Storage Client Library. Examples or such behavior are ASP.NET requests served synchronously, as well as a simple work client where you queue up N number of worker threads to execute. Note if you are manually creating threads outside of the managed ThreadPool or executing code off of a ThreadPool thread then this issue will not affect you.

When calling synchronous methods, the current implementation of the Storage Client Library blocks the calling thread while performing an asynchronous call. This blocked thread will be waiting for the asynchronous result to return. As such, if one of the asynchronous results requires an available ThreadPool thread (e.g. MemoryStream.BeginWrite, FileStream.BeginWrite, etc.) and no ThreadPool threads are available; its callback will be added to the ThreadPool work queue to wait until a thread becomes available for it to run on. This leads to a condition where the calling thread is blocked until that asynchronous result (callback) unblocks it, but the callback will not execute until threads become unblocked; in other words the system is now deadlocked.

Affected Scenarios

This issue could affect you if your code is executing on a ThreadPool thread and you are using the synchronous methods from the Storage Client Library. Specifically, this issue will arise when the application has used all of its available ThreadPool threads. To find out if your code is executing on a ThreadPool thread you can check System.Threading.Thread.CurrentThread.IsThreadPoolThread at runtime. Some specific methods in the Storage Client Library that can exhibit this issue include the various blob download methods (file, byte array, text, stream, etc.)

Example

For example let’s say that we have a maximum of 10 ThreadPool threads in our system which can be set using ThreadPool.SetMaxThreads. If each of the threads is currently blocked on a synchronous method call waiting for the async wait handle to be set which will require a ThreadPool thread to set the wait handle, we are deadlocked since there are no available threads in the ThreadPool that can set the wait handle.

Workarounds

The following workarounds will avoid this issue:

Use asynchronous methods instead of their synchronous equivalents (i.e. use BeginDownloadByteArray/ EndDownloadByteArray rather than DownloadByteArray). Utilizing the Asynchronous Programming Model is really the only way to guarantee performant and scalable solutions which do not perform any blocking calls. As such, using the Asynchronous Programming Model will avoid the deadlock scenario detailed in this post.
If you are unable to use asynchronous methods, limit the level of concurrency in the system at the application layer to reduce simultaneous in flight operations using a semaphore construct such as System.Threading.Semaphore or Interlocked.Increment/Decrement. An example of this would be to have each incoming client perform an Interlocked.Increment on an integer and do a corresponding Interlocked.Decrement when the synchronous operation completes. With this mechanism in place you can now limit the number of simultaneous in flight operations below the limit of the ThreadPool and return “Server Busy” to avoid blocking more worker threads. When setting the maximum number of ThreadPool threads via ThreadPool.SetMaxThreads be cognizant of any additional ThreadPool threads you are using in the app domain via ThreadPool.QueueUserWorkItem or otherwise so as to accommodate them in your scenario. The goal is to limit the number of blocked threads in the system at any given point. Therefore, make sure that you do not block the thread prior to calling synchronous methods, since that will result in the same number of overall blocked threads. Instead when application reaches its concurrency limit you must ensure that no more additional ThreadPool threads become blocked.

Mitigations

If you are experiencing this issue and the options above are not viable in your scenario, you might try one of the options below. Please ensure you fully understand the implications of the actions below as they will result in additional threads being created on the system.

Increasing the number of ThreadPool threads can mitigate this to some extent; however deadlock will always be a possibility without a limit on simultaneous operations.
Offload work to non ThreadPool threads (make sure you understand the implications before doing this, the main purpose of the ThreadPool is to avoid the cost of constantly spinning up and killing off threads which can be expensive in code that runs frequently or in a tight loop).

Summary

We are currently investigating long term solutions for this issue for an upcoming release of the Windows Azure SDK. As such if you are currently affected by this issue please follow the workarounds contained in this post until a future release of the SDK is made available. To summarize, here are some best practices that will help avoid the potential deadlock detailed above.

Use Asynchronous methods for applications that scale– simply stated synchronous does not scale well as it implies the system must lock a thread for some amount of time. For applications with low demand this is acceptable, however threads are a finite resource in a system and should be treated as such. Applications that desire to scale should use simultaneous asynchronous calls so that a given thread can service many calls.
Limit concurrent requests when using synchronous APIs - Use semaphores/counters to control concurrent requests.
Perform a stress test - where you purposely saturate the ThreadPool workers to ensure your application responds appropriately.

References

MSDN ThreadPool documentation: http://msdn.microsoft.com/en-us/library/y5htx827(v=VS.90).aspx

Implementing the CLR Asynchronous Programming Model: http://msdn.microsoft.com/en-us/magazine/cc163467.aspx

Developing High-Performance ASP.NET Applications: http://msdn.microsoft.com/en-us/library/aa719969(v=VS.71).aspx

Joe Giardino

↧

Changes in Windows Azure Storage Client Library – Windows Azure SDK 1.3

November 30, 2010, 3:39 am

≫ Next: Page Blob Writes in Windows Azure Storage Client Library does not support Streams with non-zero Position

≪ Previous: Windows Azure Storage Client Library: Potential Deadlock When Using Synchronous Methods

We recently released an update to the Storage Client library in SDK 1.3. We wanted to take this opportunity to go over some breaking changes that we have introduced and also list some of the bugs we have fixed (compatible changes) in this release.

Thanks for all the feedback you have been providing us via forums and this site, and as always please continue to do so as it helps us improve the client library.

Note: We have used Storage Client Library v1.2 to indicate the library version shipped with Windows Azure SDK 1.2 and v1.3 to indicate the library shipped with Windows Azure SDK 1.3

Breaking Changes

1. Bug: FetchAttributes ignores blob type and lease status properties

In Storage Client Library v1.2, a call to FetchAttributes never checks if the blob instance is of valid type since it ignores the BlobType property. For example, if a CloudPageBlob instance refers to a block blob in the blob service, then FetchAttributes will not throw an exception when called.

In Storage Client Library v1.3, FetchAttributes records the BlobType and LeaseStatus properties. In addition, it will throw an exception when the blob type property returned by the service does not match with type of class being used (i.e. when CloudPageBlob is used to represent block blob or vice versa).

2. Bug: ListBlobsWithPrefix can display same blob Prefixes multiple times

Let us assume we have the following blobs in a container called photos:

photos/Brazil/Rio1.jpg
photos/Brazil/Rio2.jpg
photos/Michigan/Mackinaw1.jpg
photos/Michigan/Mackinaw2.jpg
photos/Seattle/Rainier1.jpg
photos/Seattle/Rainier2.jpg

Now to list the photos hierarchically, I could use the following code to get the list of folders under the container “photos”. I would then list the photos depending on the folder that is selected.

IEnumerable<IListBlobItem> blobList = client.ListBlobsWithPrefix("photos/");foreach (IListBlobItem item in blobList)
{Console.WriteLine("Item Name = {0}", item.Uri.AbsoluteUri);
}

The expected output is:

photos/Brazil/
photos/Michigan/
photos/Seattle/

However, assume that the blob service returns Rio1.jpg through Mackinaw1.jpg along with a continuation marker which an application can use to continue listing. The client library would then continue the listing with the server using this continuation marker, and with the continuation assume that it receives the remaining items. Since the prefix photos/Michigan is repeated again for Mackinaw2.jpg, the client library incorrectly duplicates this prefix. If this happens, then the result of the above code in Storage Client v1.2 is:

photos/Brazil/
photos/Michigan/
photos/Michigan/
photos/Seattle/

Basically, Michigan would be repeated twice. In Storage Client Library v1.3, we collapse this to always provide the same result for the above code irrespective of how the blobs may be returned in the listing.

3. Bug: CreateIfNotExist on a table, blob or queue container does not handle container being deleted

In Storage Client Library v1.2, when a deleted container is recreated before the service’s garbage collection finishes removing the container, the service returns HttpStatusCode.Conflict with StorageErrorCode as ResourceAlreadyexists with extended error information indicating that the container is being deleted. This error code is not handled by the Storage Client Library v1.2, and it instead returns false giving a perception that the container exists.

In Storage Client Library v1.3, we throw a StorageClientException exception with ErrorCode = ResourceAlreadyExists and exception’s ExtendedErrorInformation’s error code set to XXXBeingDeleted (ContainerBeingDeleted, TableBeingDeleted or QueueBeingDeleted). This exception should be handled by the client application and retried after a period of 35 seconds or more.

One approach to avoid this exception while deleting and recreating containers/queues/tables is to use dynamic (new) names when recreating instead of using the same name.

4. Bug: CloudTableQuery retrieves up to 100 entities with Take rather than the 1000 limit

In Storage Client Library v1.2, using CloudTableQuery limits the query results to 100 entities when Take(N) is used with N > 100. We have fixed this in Storage Client Library v1.3 by setting the limit appropriately to Min(N, 1000) where 1000 is the server side limit.

5. Bug: CopyBlob does not copy metadata set on destination blob instance

As noted in this post, Storage Client Library v1.2 has a bug in which metadata set on the destination blob instance in the client is ignored, so it is not recorded with the destination blob in the blob service. We have fixed this in Storage Client Library v1.3, and if the metadata is set, it is stored with the destination blob. If no metadata is set, the blob service will copy the metadata from the source blob to the destination blob.

6. Bug: Operations that returned PreConditionfailure and NotModified returns BadRequest as the StorageErrorCode

In Storage Client Library v1.2, PreConditionfailure and NotModified errors lead to StorageClientException with StorageErrorCode mapped to BadRequest.

In Storage Client Library v1.3, we have correctly mapped the StorageErrorCode to ConditionFailed

7. Bug: CloudBlobClient.ParallelOperationThreadCount > 64 leads to NotSupportedException

ParallelOperationThreadCount controls the number of concurrent block uploads. In Storage Client Library v1.2, the value can be between 1, int.MaxValue. But when a value greater than 64 was set, UploadByteArray, UploadFile, UploadText methods would start uploading blocks in parallel but eventually fail with NotSupported exception. In Storage Client Library v1.3, we have reduced the max limit to 64. A value greater than 64 will cause ArgumentOutOfRangeException exception right upfront when the property is set.

8. Bug: DownloadToStream implementation always does a range download that results in md5 header not returned

In Storage Client Library v1.2, DownloadToStream, which is used by other variants – DownloadText, DownloadToFile and DownloadByteArray, always does a range GET by passing the entire range in the range header “x-ms-range”. However, in the service, using range for GETs does not return content-md5 headers even if the range encapsulates the entire blob.

In Storage Client Library v1.3, we now do not send the “x-ms-range” header in the above mentioned methods which allows the content-md5 header to be returned.

9. CloudBlob retrieved using container’s GetBlobReference, GetBlockBlobReference or GetPageBlobReference creates a new instance of the container.

In Storage Client Library v1.2, the blob instance always creates a new container instance that is returned via the Container property in CloudBlob class. The Container property represents the container that stores the blob.

In Storage Client Library v1.3, we instead use the same container instance which was used in creating the blob reference. Let us explain this using an example:

CloudBlobClient client = account.CreateCloudBlobClient();CloudBlobContainer container = client.GetContainerReference("blobtypebug");
container.FetchAttributes();
container.Attributes.Metadata.Add("SomeKey", "SomeValue");CloudBlob blockBlob = container.GetBlockBlobReference("blockblob.txt");Console.WriteLine("Are instances same={0}", blockBlob.Container == container);Console.WriteLine("SomeKey value={0}", blockBlob.Container.Attributes.Metadata["SomeKey"]);

For the above code, in Storage Client Library v1.2, the output is:

Are instances same=False
SomeKey metadata value=

This signifies that the blob creates a new instance that is then returned when the Container property is referenced. Hence the metadata “SomeKey” set is missing on that instance until FetchAttributes is invoked on that particular instance.

In Storage Client Library v1.3, the output is:

Are instances same=True
SomeKey metadata value=SomeValue

We set the same container instance that was used to get a blob reference. Hence the metadata is already set.

Due to this change, any code relying on the instances to be different may break.

10. Bug: CloudQueueMessage is always Base64 encoded allowing less than 8KB of original message data.

In Storage Client Library v1.2, the queue message is Base64 encoded which increases the message size by 1/3^rd (approximately). The Base64 encoding ensures that message over the wire is valid XML data. On retrieval, the library decodes this and returns the original data.

However, we want to provide an alternate where that data which is already valid XML data is transmitted and stored in the raw format, so an application can store a full size 8KB message.

In Storage Client Library v1.3, we have provided a flag “EncodeMessage” on the CloudQueue that indicates if it should encode the message using Base64 encoding or send it in raw format. By default, we still Base64 encode the message. To store the message without encoding one would do the following:

CloudQueue queue = client.GetQueueReference("workflow");
queue.EncodeMessage = false;

One should be careful when using this flag, so that your application does not turn off message encoding on an existing queue with encoded messages in it. The other thing to ensure when turning off encoding is that the raw message has only valid XML data since PutMessage is sent over the wire in XML format, and the message is delivered as part of that XML.

Note: When turning off message encoding on an existing queue, one can prefix the raw message with a fixed version header. Then when a message is received, if the application does not see the new version header, then it knows it has to decode it. Another option would be to start using new queues for the un-encoded messages, and drain the old queues with encoded messages with code that does the decoding.

11. Inconsistent escaping of blob name and prefixes when relative URIs are used.

In Storage Client Library v1.2, the rules used to escape a blob name or prefix provided in the APIs like constructors, GetBlobReference, ListBlobXXX etc, are inconsistent when relative URIs are used. The relative Uri name for the blob or prefix to these methods are treated as escaped or un-escaped string based on the input.

For example:

a) CloudBlob blob1 = new CloudBlob("container/space test", service);
b) CloudBlob blob2 = new CloudBlob("container/space%20test", service);
c) ListBlobsWithPrefix("container/space%20test");

In the above cases, v1.2 treats the first two as "container/space test". However, in the third, the prefix is treated as "container/space%20test". To reiterate, relative URIs are inconsistently evaluated as seen when comparing (b) with (c). (b) is treated as already escaped and stored as "container/space test". However, (c) is escaped again and treated as "container/space%20test".

In Storage Client Library v1.3, we treat relative URIs as literal, basically keeping the exact representation that was passed in. In the above examples, (a) would be treated as "container/space test" as before. The latter two i.e. (b) and (c) would be treated as "container/space%20test". Here is a table showing how the names are treated in v1.2 compared to in v1.3.

Method	BlobName Or Prefix	V1.2	V1.3
CloudBlobClient::GetBlobReference CloudBlodDirectory::GetBlobReference CloudBlodContainer::GetBlobReference	space test/blob1	space test/blob1	space test/blob1
CloudBlobClient::GetBlobReference CloudBlodDirectory::GetBlobReference CloudBlodContainer::GetBlobReference	space%20test/blob1	space test/blob1	space%20test/blob1
CloudBlobClient::ListBlobWithPrefix	space test/	space test/	space test/
CloudBlobClient::ListBlobWithPrefix	space%20test/	space%20test/	space%20test/
CloudBlodDirectory::ListBlob	space test/	space test/	space test/
CloudBlodDirectory::ListBlob	space%20test/	space test/	space%20test/

12. Bug: ListBlobsSegmented does not parse the blob names with special characters like ‘ ‘, #, : etc.

In Storage Client Library v1.2, folder names with certain special characters like #, ‘ ‘, : etc. are not parsed correctly for list blobs leading to files not being listed. This gave a perception that blobs were missing. The problem was with the parsing of response and we have fixed this problem in v1.3.

13. Bug: DataServiceContext timeout is in seconds but the value is set as milliseconds

In Storage Client Library v1.2, timeout is incorrectly set as milliseconds on the DataServiceContext. DataServiceContext treatsthe integer as seconds and not milliseconds. We now correctly set the timeout in v1.3.

14. Validation added for CloudBlobClient.WriteBlockSizeInBytes

WriteBlockSizeInBytes controls the block size when using block blob uploads. In Storage Client Library v1.2, there was no validation done for the value.

In Storage Client Library v1.3, we have restricted the valid range for WriteBlockSizeInBytes to be [1MB-4MB]. An ArgumentOutOfRangeException is thrown if the value is outside this range.

Other Bug Fixes and Improvements

1. Provide overloads to get a snapshot for blob constructors and methods that get reference to blobs.

Constructors include CloudBlob, CloudBlockBlob, CloudPageBlob and methods like CloudBlobClient’s GetBlobReference, GetBlockBlobReference and GetPageBlobReference all have overloads that take snapshot time. Here is an example:

CloudStorageAccount account = CloudStorageAccount.Parse(Program.ConnectionString);CloudBlobClient client = account.CreateCloudBlobClient();CloudBlob baseBlob = client.GetBlockBlobReference("photos/seattle.jpg");CloudBlob snapshotBlob = baseBlob.CreateSnapshot();      // save the snapshot time for later useDateTime snapshotTime = snapshotBlob.SnapshotTime.Value;// Use the saved snapshot time to get a reference to snapshot blob 
// by utilizing the new overload and then delete the snapshotCloudBlob snapshotRefernce = client.GetBlobReference(blobName, snapshotTime);
snapshotRefernce.Delete();

2. CloudPageBlob now has a ClearPage method.

We have provided a new API to clear a page in page blob:

public void ClearPages(long startOffset, long length)

3. Bug: Reads using BlobStream issues a GetBlockList even when the blob is a Page Blob

In Storage Client Library v1.2, a GetBlockList is always invoked even for reading Page blobs leading to an expected exception being handled by the code.

In Storage Client Library v1.3 we issue FetchAttributes when the blob type is not known. This avoids an erroneous GetBlockList call on a page blob instance. If the blob type is known in your application, please use the appropriate blob type to avoid this extra call.

Example: when reading a page blob, the following code will not incur the extra FetchAttributes call since the BlobStream was retrieved using a CloudPageBlob instance.

CloudPageBlob pageBlob = container.GetPageBlobReference("mypageblob.vhd");BlobStream stream = pageBlob.OpenRead();// Read from stream…

However, the following code would incur the extra FetchAttributes call to determine the blob type:

CloudBlob pageBlob = container.GetBlobReference("mypageblob.vhd");BlobStream stream = pageBlob.OpenRead();// Read from stream…

4. Bug: WritePages does not reset the stream on retries

As we had posted on our site, In Storage Client Library v1.2, WritePages API does not reset the stream on a retry leading to exceptions. We have fixed this when the stream is seekable, and reset the stream back to the beginning when doing the retry. For non seekable streams, we throw NotSupportedException exception.

5. Bug: Retry logic retries on all 5XX errors

In Storage Client Library v1.2, HttpStatusCode NotImplemented and HttpVersionNotSupported results in a retry. There is no point in retrying on such failures. In Storage Client Library v1.3, we throw a StorageServerException exception and do not retry.

6. Bug: SharedKeyLite authentication scheme for Blob and Queue throws NullReference exception

We now support SharedKeyLite scheme for blobs and queues.

7. Bug: CreateTableIfNotExist has a race condition that can lead to an exception “TableAlreadyExists” being thrown

CreateTableIfNotExist first checks for the table existence before sending a create request. Concurrent requests of CreateTableIfNotExist can lead to all but one of them failing with TableAlreadyExists and this error is not handled in Storage Client Library v1.2 leading to a StorageException being thrown. In Storage Client Library v1.3, we now check for this error and return false indicating that the table already exists without throwing the exception.

8. Provide property for controlling block upload for CloudBlob

In Storage Client Library v1.2, block upload is used only if the upload size is 32MB or greater. Up to 32MB, all blob uploads via UploadByteArray, UploadText, UploadFile are done as a single blob upload via PutBlob.

In Storage Client Library v1.3, we have preserved the default behavior of v1.2 but have provided a property called SingleBlobUploadThresholdInBytes which can be set to control what size of blob will upload via blocks versus a single put blob. For example, the setting the following code will upload all blobs up to 8MB as a single put blob and will use block upload for blob sizes greater than 8MB.

CloudBlobClient.SingleBlobUploadThresholdInBytes = 8 * 1024 *1024;

The valid range for this property is [1MB – 64MB).

Jai Haridas

↧

Page Blob Writes in Windows Azure Storage Client Library does not support Streams with non-zero Position

December 17, 2010, 12:53 pm

≫ Next: Overview of Retry Policies in the Windows Azure Storage Client Library

≪ Previous: Changes in Windows Azure Storage Client Library – Windows Azure SDK 1.3

Update 3/09/011: The bug is fixed in the Windows Azure SDK March 2011 release.

The current Windows Azure Storage Client Library does not support passing in a stream to CloudPageBlob.[Begin]WritePages where the stream position is a non-zero value. In such a scenario the Storage Client Library will incorrectly calculate the size of the data range which will cause the server to return HTTP 500: Internal Server Error. This is surfaced to the client via a StorageServerException with the message "Server encountered an internal error. Please try again after some time.”

HTTP 500 errors are generally retryable by the client as they relate to an issue on the server side, however in this instance it is the client which is supplying the invalid request. As such this request will be retried by the storage client library N times according to the RetryPolicy specified on the CloudBlobClient (default is 3 retries with an exponential backoff delay). However these requests will not succeed and all subsequent retries will fail with the same error.

Workarounds

In the code below I have included a set of extension methods that provide a safe way to invoke [Begin]WritePages which will throw an exception if the source stream is at a non-zero position. You can alter these methods for your specific scenario in case you wish to possibly rewind the stream. Future releases of Storage Client Library will accommodate for this scenario as we continue to expand support for PageBlob at the convenience Layer.

public static class StorageExtensions{/// <summary>
    /// Begins an asynchronous operation to write pages to a page blob while enforcing that the stream is at the beginning/// </summary>
    /// <param name="pageData">A stream providing the page data.</param>
    /// <param name="startOffset">The offset at which to begin writing, in bytes. The offset must be a multiple of 512.</param>
    /// <param name="callback">The callback delegate that will receive notification when the asynchronous operation completes.</param>
    /// <param name="state">A user-defined object that will be passed to the callback delegate.</param>
    /// <returns>An <see cref="IAsyncResult"/> that references the asynchronous operation.</returns>public static IAsyncResult BeginWritePagesSafe(this CloudPageBlob blobRef, Stream pageData, long startOffset, AsyncCallback callback, object state)
    {1if (pageData.Position != 0)
        {throw new InvalidOperationException("Stream position must be set to zero!");
        }return blobRef.BeginWritePages(pageData, startOffset, callback, state);
    }/// <summary>
    /// Writes pages to a page blob while enforcing that the stream is at the beginning/// </summary>
    /// <param name="pageData">A stream providing the page data.</param>
    /// <param name="startOffset">The offset at which to begin writing, in bytes. The offset must be a multiple of 512.</param>public static void WritePagesSafe(this CloudPageBlob blobRef, Stream pageData, long startOffset)
    {if (pageData.Position != 0)
        {throw new InvalidOperationException("Stream position must be set to zero!");
        }

        blobRef.WritePages(pageData, startOffset, null);
    }/// <summary>
    /// Writes pages to a page blob while enforcing that the stream is at the beginning/// </summary>
    /// <param name="pageData">A stream providing the page data.</param>
    /// <param name="startOffset">The offset at which to begin writing, in bytes. The offset must be a multiple of 512.</param>
    /// <param name="options">An object that specifies any additional options for the request.</param>public static void WritePagesSafe(this CloudPageBlob blobRef, Stream pageData, long startOffset, BlobRequestOptions options)
    {if (pageData.Position != 0)
        {throw new InvalidOperationException("Stream position must be set to zero!");
        }

        blobRef.WritePages(pageData, startOffset, options);
    }
}

Summary

The current Storage Client Library requires an additional check prior to passing in a Stream to CloudPageBlob.[Begin]WritePages in order to avoid producing an invalid request. Using the code above or applying similar checks at the application level can avoid this issue. Please note that other types of blobs are unaffected by this issue (i.e. CloudBlob.UploadFromStream) and we will be addressing this issue in a future release of the Storage Client Library.

Joe Giardino

↧

Overview of Retry Policies in the Windows Azure Storage Client Library

February 2, 2011, 11:32 pm

≫ Next: Windows Azure Storage Client Library: Parallel Single Blob Upload Race Condition Can Throw an Unhandled Exception

≪ Previous: Page Blob Writes in Windows Azure Storage Client Library does not support Streams with non-zero Position

The RetryPolicies in the Storage Client Library are used to allow the user to customize the retry behavior when and exception occurs. There are a few key points when using RetryPolicies that users should take into consideration, the first is when they are evaluated, and the second is what the ideal behavior for your scenario is.

When the Storage Client Library processes an operation which results in an exception, this exception is classified internally as either “retryable” or “non-retryable”.

“Non-retryable” exceptions are all 400 ( >=400 and <500) class exceptions (Bad gateway, Not Found, etc.) as well as 501 and 505.
All other exceptions are “retryable”. This includes client side timeouts.

Once an operation is deemed retryable the Storage Client Library evaluates the RetryPolicy to see if the operation should be retried, and if so what amount of time it should backoff (sleep) before executing the next attempt. One thing to note is that if an operation fails the first two times and succeeds on the third the client will not see the exception as all previous exceptions will have been caught. If the operation results in an error on its last attempt is an exception then the last caught exception is rethrown to the client.

Also, please note that the timeout that is specified is applied to each attempt of a transaction; as such an operation with a timeout of 90 seconds can actually take 90 * (N+1) times longer where N is the number of retry attempts following the initial attempt.

Standard Retry Policies

There are three default RetryPolicies that ship with the Storage Client Library listed below. See http://msdn.microsoft.com/en-us/library/microsoft.windowsazure.storageclient.retrypolicies_members.aspx for full documentation

RetryPolicies.NoRetry– No retry is used
RetryPolicies.Retry– Retries N number of times with the same backoff between each attempt.
RetryPolicies.RetryExponential (Default) – Retries N number of times with an exponentially increasing backoff between each attempt. Backoffs are randomized with +/- 20% delta to avoid numerous clients all retrying simultaneously. Additionally each backoff is between 3 and 90 seconds per attempt (RetryPolicies.DefaultMinBackoff, and RetryPolicies.DefaultMaxBackoff respectively) as such an operation can take longer than RetryPolicies.DefaultMaxBackoff. For example let’s say you are on a slow edge connection and you keep hitting a timeout error. The first retry will occur after ~ 3sec following the first failed attempt. The second will occur ~ 30 seconds following the first retry, and the third will occur roughly 90 seconds after that.

Creating a custom retry policy

In addition to using the standard retry polices detailed above you can construct a custom retry policy to fit your specific scenario. A good example of this is if you want to specify specific exceptions or results to retry for or to provide an alternate backoff algorithm.

The RetryPolicy is actually a delegate that when evaluated returns a Microsoft.WindowsAzure.StorageClient.ShouldRetry delegate. This syntax may be a bit unfamiliar for some users, however it provides a lightweight mechanism to construct state-full retry instances in controlled manner. When each operation begins it will evaluate the RetryPolicy which will cause the CLR to create a state object behind the scenes containing the parameters used to configure the policy.

Example 1: Simply linear retry policy

public static RetryPolicy LinearRetry(int retryCount, TimeSpan intervalBetweenRetries)
{return () =>      {return (int currentRetryCount, Exception lastException, out TimeSpan retryInterval) =>
           { // Do custom work here               
               // Set backoffretryInterval = intervalBetweenRetries;    // Decide if we should retry, return boolreturn currentRetryCount < retryCount;          
           };      };
}

The Highlighted blue code conforms to the Microsoft.WindowsAzure.StorageClient.RetryPolicy delegate type; that is a function that accepts no parameters and returns a Microsoft.WindowsAzure.StorageClient.ShouldRetry delegate.

The highlighted yellow code conforms to the signature for the Microsoft.WindowsAzure.StorageClient.ShouldRetry delegate and will contain the specifics of your implementation.

Once you have constructed a retry policy as above you can configure your client to use it via Cloud[Table/Blob/Queue].Client.RetryPolicy = LinearRetry(<retryCount, intervalBetweenRetries>).

Example 2: Complex retry policy which examines the last exception and does not retry on 502 errors

public static RetryPolicy CustomRetryPolicy(int retryCount, TimeSpan intervalBetweenRetries, List<HttpStatusCode> statusCodesToFail)
{return () =>
    {return (int currentRetryCount, Exception lastException, out TimeSpan retryInterval) =>
        {
            retryInterval = intervalBetweenRetries;if (currentRetryCount >= retryCount)
            {// Retries exhausted, return falsereturn false;
            }WebException we = lastException as WebException;if (we != null)
            {HttpWebResponse response = we.Response as HttpWebResponse;if (response == null && statusCodesToFail.Contains(response.StatusCode))
                {// Found a status code to fail, return falsereturn false;
                }
            }return currentRetryCount < retryCount;
        };
    };
}

Note the additional argument statusCodesToFail, which illustrates the point that you can pass in whatever additional data to the retry policy that you may require.

Example 3: A custom Exponential backoff retry policy

public static RetryPolicy RetryExponential(int retryCount, TimeSpan minBackoff, TimeSpan maxBackoff, TimeSpan deltaBackoff)
{// Do any argument Pre-validation here, i.e. enforce max retry count etc. return () =>
      {    return (int currentRetryCount, Exception lastException, out TimeSpan retryInterval) =>
            {if (currentRetryCount < retryCount)
                 {Random r = new Random();// Calculate Exponential backoff with +/- 20% toleranceint increment = (int)((Math.Pow(2, currentRetryCount) - 1) * r.Next((int)(deltaBackoff.TotalMilliseconds * 0.8), (int)(deltaBackoff.TotalMilliseconds * 1.2)));// Enforce backoff boundariesint timeToSleepMsec = (int)Math.Min(minBackoff.TotalMilliseconds + increment, maxBackoff.TotalMilliseconds);

                      retryInterval = TimeSpan.FromMilliseconds(timeToSleepMsec);return true;
                 }

                 retryInterval = TimeSpan.Zero;return false;
            };
      };
}

In example 3 above we see code similar to the default exponential retry policy that is used by default by the Windows Azure Storage Client Library. Note the parameters minBackoff and maxBackoff. Essentially the policy will calculate a desired backoff and then enforce the min / max boundaries on it. For example, the default minimum and maximum backoffs are 3 and 90 seconds respectively that means regardless of the deltaBackoff or increase the policy will only yield a backoff time between 2 and 90 seconds.

Summary

We strongly recommend using the exponential backoff retry policy provided by default whenever possible in order to gracefully backoff the load to your account, especially if throttling was to occur due to going over the scalability targets posted here. You can set this manually by via [Client].RetryPolicy = RetryPolicies.RetryExponential(RetryPolicies.DefaultClientRetryCount, RetryPolicies.DefaultClientBackoff).

Generally speaking a high throughput application that will be making simultaneous requests and can absorb infrequent delays without adversely impacting user experience are recommended to use the exponential backoff strategy detailed above. However for user facing scenarios such as websites and UI you may wish to use a linear backoff in order to maintain a responsive user experience.

Joe Giardino

References

Windows Azure Storage Team Blog

RetryPolicies on MSDN

↧

Windows Azure Storage Client Library: Parallel Single Blob Upload Race Condition Can Throw an Unhandled Exception

February 22, 2011, 9:41 pm

≫ Next: Windows Azure Storage Client Library: Rewinding stream position less than BlobStream.ReadAheadSize can result in lost bytes from BlobStream.Read()

≪ Previous: Overview of Retry Policies in the Windows Azure Storage Client Library

Update 11/06/11: The bug is fixed in the Windows Azure SDK September release.

There is a race condition in the current Windows Azure Storage Client Library that could potentially throw an unhandled exception under certain circumstances. Essentially the way the parallel upload executes is by dispatching up to N (N= CloudBlobClient.ParallelOperationThreadCount) number of simultaneous block uploads at a time and waiting on one of them to return via WaitHandle.WaitAny (Note: CloudBlobClient.ParallelOperationThreadCount is initialized by default to be the number of logical processors on the machine, meaning an XL VM will be initialized to 8). Once an operation returns it will attempt to kick off more operations until it satisfies the desired parallelism or there is no more data to write. This loop continues until all data is written and a subsequent PutBlockList operation is performed.

The bug is that there is a race condition in the parallel upload feature resulting in the termination of this loop before it gets to the PutBlockList. The net result is that some blocks will be added to a blobs uncommitted block list, but the exception will prevent the PutBlockList operation. Subsequently it will appear to the client as if the blob exists on the service with a size of 0 bytes. However, if you retrieve the block list you will be able to see the blocks that were uploaded to the uncommitted block list.

Mitigations

When looking at performance, it is important to distinguish between throughput and latency. If your scenario requires a low latency for a single blob upload, then the parallel upload feature is designed to meet this need. To get around the above issue, which should be a rare occurrence you could catch the exception and retry the operation using the current Storage Client Library. Alternatively the following code can be used to perform the necessary PutBlock / PutBlockList operations to perform the parallel blob upload to work around this issue:

///Joe Giardino, Microsoft 2011 /// <summary> /// Extension class to provide ParallelUpload on CloudBlockBlobs. /// </summary> public static class ParallelUploadExtensions { /// <summary> /// Performs a parallel upload operation on a block blob using the associated serviceclient configuration  /// </summary> /// <param name="blobRef">The reference to the blob.</param> /// <param name="sourceStream">The source data to upload.</param> /// <param name="options">BlobRequestOptions to use for each upload, can be null.</param> /// <summary> /// Performs a parallel upload operation on a block blob using the associated serviceclient configuration  /// </summary> /// <param name="blobRef">The reference to the blob.</param> /// <param name="sourceStream">The source data to upload.</param> /// <param name="blockIdSequenceNumber">The intial block ID, each subsequent block will increment of this value </param> /// <param name="options">BlobRequestOptions to use for each upload, can be null.</param>  public static void ParallelUpload(this CloudBlockBlob blobRef, Stream sourceStream, long blockIdSequenceNumber, BlobRequestOptions options)
 {// Parameter Validation & Locals if (null == blobRef.ServiceClient)
    {throw new ArgumentException("Blob Reference must have a valid service client associated with it");
    }if (sourceStream.Length - sourceStream.Position == 0)
    {throw new ArgumentException("Cannot upload empty stream.");
    }if (null == options)
    {
        options = new BlobRequestOptions()
        {
            Timeout = blobRef.ServiceClient.Timeout,
            RetryPolicy = RetryPolicies.RetryExponential(RetryPolicies.DefaultClientRetryCount, RetryPolicies.DefaultClientBackoff)
        };
    }bool moreToUpload = true;List<IAsyncResult> asyncResults = new List<IAsyncResult>();List<string> blockList = new List<string>();using (MD5 fullBlobMD5 = MD5.Create())
    {do {int currentPendingTasks = asyncResults.Count;for (int i = currentPendingTasks; i < blobRef.ServiceClient.ParallelOperationThreadCount && moreToUpload; i++)
            {// Step 1: Create block streams in a serial order as stream can only be read sequentially string blockId = null;// Dispense Block Stream int blockSize = (int)blobRef.ServiceClient.WriteBlockSizeInBytes;int totalCopied = 0, numRead = 0;
                MemoryStream blockAsStream = null;
                blockIdSequenceNumber++;int blockBufferSize = (int)Math.Min(blockSize, sourceStream.Length - sourceStream.Position);byte[] buffer = new byte[blockBufferSize];
                blockAsStream = new MemoryStream(buffer);do {
                    numRead = sourceStream.Read(buffer, totalCopied, blockBufferSize - totalCopied);
                    totalCopied += numRead;
                }while (numRead != 0 && totalCopied < blockBufferSize);// Update Running MD5 Hashes fullBlobMD5.TransformBlock(buffer, 0, totalCopied, null, 0);          
                blockId = GenerateBase64BlockID(blockIdSequenceNumber);// Step 2: Fire off consumer tasks that may finish on other threads blockList.Add(blockId);IAsyncResult asyncresult = blobRef.BeginPutBlock(blockId, blockAsStream, null, options, null, blockAsStream);
                asyncResults.Add(asyncresult);if (sourceStream.Length == sourceStream.Position)
                {// No more upload tasks moreToUpload = false;
                }
            }// Step 3: Wait for 1 or more put blocks to finish and finish operations if (asyncResults.Count > 0)
            {int waitTimeout = options.Timeout.HasValue ? (int)Math.Ceiling(options.Timeout.Value.TotalMilliseconds) : Timeout.Infinite;int waitResult = WaitHandle.WaitAny(asyncResults.Select(result => result.AsyncWaitHandle).ToArray(), waitTimeout);if (waitResult == WaitHandle.WaitTimeout)
                {throw new TimeoutException(String.Format("ParallelUpload Failed with timeout = {0}", options.Timeout.Value));
                }// Optimize away any other completed operations for (int index = 0; index < asyncResults.Count; index++)
                {IAsyncResult result = asyncResults[index];if (result.IsCompleted)
                    {// Dispose of memory stream (result.AsyncState as IDisposable).Dispose();
                        asyncResults.RemoveAt(index);
                        blobRef.EndPutBlock(result);
                        index--;
                    }
                }
            }
        }while (moreToUpload || asyncResults.Count != 0);// Step 4: Calculate MD5 and do a PutBlockList to commit the blob fullBlobMD5.TransformFinalBlock(new byte[0], 0, 0);byte[] blobHashBytes = fullBlobMD5.Hash;string blobHash = Convert.ToBase64String(blobHashBytes);
        blobRef.Properties.ContentMD5 = blobHash;
        blobRef.PutBlockList(blockList, options);
    }
 } /// <summary> /// Generates a unique Base64 encoded blockID  /// </summary> /// <param name="seqNo">The blocks sequence number in the given upload operation.</param> /// <returns></returns>  private static string GenerateBase64BlockID(long seqNo)
 {// 9 bytes needed since base64 encoding requires 6 bits per character (6*12 = 8*9) byte[] tempArray = new byte[9];for (int m = 0; m < 9; m++)
    {
        tempArray[8 - m] = (byte)((seqNo >> (8 * m)) & 0xFF);
    }Convert.ToBase64String(tempArray);return Convert.ToBase64String(tempArray);
 } 
}

Note: In order to prevent potential block collisions when uploading to a pre-existing blob, use a non-constant blockIdSequenceNumber. To generate a random starting ID you can use the following code.

Random rand = new Random();long blockIdSequenceNumber = (long)rand.Next() << 32;
blockIdSequenceNumber += rand.Next();

Instead of uploading a single blob in parallel, if your target scenario is uploading many blobs you may consider enforcing parallelism at the application layer. This can be achieved by performing a number of simultaneous uploads on N blobs while setting CloudBlobClient.ParallelOperationThreadCount = 1 (which will cause the Storage Client Library to not utilize the parallel upload feature). When uploading many blobs simultaneously, applications should be aware that the largest blob may take longer than the smaller blobs and start uploading the larger blob first. In addition, if the application is waiting on all blobs to be uploaded before continuing, then the last blob to complete may be the critical path and parallelizing its upload could reduce the overall latency.

Lastly, it is important to understand the implications of using the parallel single blob upload feature at the same time as parallelizing multiple blob uploads at the application layer. If your scenario initiates 30 simultaneous blob uploads using the parallel single blob upload feature, the default CloudBlobClient settings will cause the Storage Client Library to use potentially 240 simultaneous put block operations (8 x30) on a machine with 8 logical processors. In general it is recommended to use the number of logical processors to determine parallelism, in this case setting CloudBlobClient.ParallelOperationThreadCount = 1 should not adversely affect your overall throughput as the Storage Client Library will be performing 30 operations (in this case a put block) simultaneously. Additionally, an excessively large number of concurrent operations will have an adverse effect on overall system performance due to ThreadPool demands as well as frequent context switches. In general if your application is already providing parallelism you may consider avoiding the parallel upload feature altogether by setting CloudBlobClient.ParallelOperationThreadCount = 1.

This race condition in parallel single blob upload will be addressed in a future release of the SDK. Please feel free to leave comments or questions,

Joe Giardino

↧

Windows Azure Storage Client Library: Rewinding stream position less than BlobStream.ReadAheadSize can result in lost bytes from BlobStream.Read()

February 27, 2011, 10:34 am

≫ Next: Blob Download Bug in Windows Azure SDK 1.5

≪ Previous: Windows Azure Storage Client Library: Parallel Single Blob Upload Race Condition Can Throw an Unhandled Exception

Update 3/09/011: The bug is fixed in the Windows Azure SDK March 2011 release.

In the current Windows Azure storage client library, BlobStream.Read() may read less than the requested number of bytes if the user rewinds the stream position. This occurs when using the seek operation to a position which is equal or less than BlobStream.ReadAheadSize byte(s) away from the previous start position. Furthermore, in this case, if BlobStream.Read() is called again to read the remaining bytes, data from an incorrect position will be read into the buffer.

What does ReadAheadSize property do?

BlobStream.ReadAheadSize is used to define how many extra bytes to prefetch in a get blob request when BlobStream.Read() is called. This design is suppose to ensure that the storage client library does not need to send another request to the blob service if BlobStream.Read() is called again to read N bytes from the current position, where N < BlobStream.ReadAheadSize. It is an optimization for reading blobs in the forward direction, which reduces the number of the get blob requests to the blob service when reading a Blob.

This bug impacts users only if their scenario involves rewinding the stream to read, i.e. using Seek operation to seek to a position BlobStream.ReadAheadSize bytes less than the previous byte offset.

The root cause of this issue is that the number of bytes to read is incorrectly calculated in the storage client library when the stream position is rewound by N bytes using Seek, where N <=BlobStream.ReadAheadSize bytes away from the previous read’s start offset. (Note, if the stream is rewound more than BlobStream.ReadAheadSize bytes away from the previous start offset, the stream reads work as expected.)

To understand this issue better, let us explain this using an example of user code that exhibits this bug.

We begin with getting a BlobStream that we can use to read the blob, which is 16MB in size. We set the ReadAheadSize to 16 bytes. We then seek to offset 100 and read 16 bytes of data. :

BlobStream stream = blob.OpenRead();
stream.ReadAheadSize = 16;int bufferSize = 16;int readCount;byte[] buffer1 = new byte[bufferSize];
stream.Seek(100, System.IO.SeekOrigin.Begin);
readCount = stream.Read(buffer1, 0, bufferSize);

BlobStream.Read() works as expected in which buffer1 is filled with 16 bytes of the blob data from offset 100. Because of ReadAheadSize set to 16, the Storage Client issues a read request for 32 bytes of data as seen in the request trace as seen in the “x-ms-range” header set to 100-131 in the request trace. The response as we see in the content-length, returns the 32 bytes:

Request and response trace:

Request header:
GET http://foo.blob.core.windows.net/test/blob?timeout=90 HTTP/1.1
x-ms-version: 2009-09-19
User-Agent: WA-Storage/0.0.0.0
x-ms-range: bytes=100-131
…

Response header:
HTTP/1.1 206 Partial Content
Content-Length: 32
Content-Range: bytes 100-131/16777216
Content-Type: application/octet-stream
…

We will now rewind the stream to 10 bytes away from the previous read’s start offset (previous start offset was at 100 and so the new offset is 90). It is worth noting that 10 is < ReadAheadSize which exhibits the problem (note, if we had set the seek to be > ReadAheadSize back from 100, then everything would work as expected). We then issue a Read for 16 bytes starting from offset 90.

byte[] buffer2 = new byte[bufferSize];
stream.Seek(90, System.IO.SeekOrigin.Begin);
readCount = stream.Read(buffer2, 0, bufferSize);

BlobStream.Read() does not work as expected here. It is called to read 16 bytes of the blob data from offset 90 into buffer2, but only 9 bytes of blob data is downloaded because the Storage Client has a bug in calculating the size it needs to read as seen in the trace below. We see that x-ms-range is set to 9 bytes (range = 90-98 rather than 90-105) and the content-length in the response set to 9.

Request and response trace:

Request header:
GET http://foo.blob.core.windows.net/test/blob?timeout=90 HTTP/1.1
x-ms-version: 2009-09-19
User-Agent: WA-Storage/0.0.0.0
x-ms-range: bytes=90-98
…

Response header:
HTTP/1.1 206 Partial Content
Content-Length: 9
Content-Range: bytes 90-98/16777216
Content-Type: application/octet-stream
…

Now, since the previous request for reading 16 bytes just returned 9 bytes, the client will issue another Read request to continue reading the remaining 7 bytes,

readCount = stream.Read(buffer2, readCount, bufferSize – readCount);

BlobStream.Read() still does not work as expected. It is called to read the remaining 7 bytes into buffer2 but the whole blob is downloaded as seen in the request and response trace below. As seen in the request, due to bug in Storage client, an incorrect range is sent to the service which then returns the entire blob data resulting in an incorrect data being read into the buffer. The request trace shows that the range is invalid: 99-98. The invalid range causes the Windows Azure Blob service to return the entire content as seen in the response trace. Since the client does not check to see the range and it was expecting the starting offset to be 99, it copies the 7 bytes from the beginning of the stream which is incorrect.

Request and response trace:

Request header:
GET http://foo.blob.core.windows.net/test/blob?timeout=90 HTTP/1.1x-ms-version: 2009-09-19
User-Agent: WA-Storage/0.0.0.0
x-ms-range: bytes=99-98
…

Response header:
HTTP/1.1 200 OK
Content-Length: 16777216
Content-Type: application/octet-stream
…

Mitigation

The workaround is to set the value of BlobStream.ReadAheadSize to 0 before BlobStream.Read() is called if a rewind operation is required:

BlobStream stream = blob.OpenRead();
stream.ReadAheadSize = 0;

As we explained above, the property BlobStream.ReadAheadSize is an optimization which can reduce the number of the requests to send when reading blobs in the forward direction, and setting it to 0 removes that benefit.

Summary

To summarize, the bug in the client library can result in data from an incorrect offset to be read. This happens only when the user code seeks to a position less than the previous offset where the distance is < ReadAheadSize. The bug will be fixed a future release of the Windows Azure SDK and we will post a link to the download here once it is released.

Justin Yu

↧

Blob Download Bug in Windows Azure SDK 1.5

September 27, 2011, 10:24 pm

≫ Next: SOSP Paper - Windows Azure Storage: A Highly Available Cloud Storage Service with Strong Consistency

≪ Previous: Windows Azure Storage Client Library: Rewinding stream position less than BlobStream.ReadAheadSize can result in lost bytes from BlobStream.Read()

Update: We have now released a fix for this issue. The download blob methods in this version throw an IOException if the connection is closed while downloading the blob, which is the same behavior seen in versions 1.4 and earlier of the StorageClient library.

We strongly recommend that users using SDK version 1.5.20830.1814 upgrade their applications immediately to this new version 1.5.20928.1904. You can determine if you have the affected version of the SDK by going to Programs and Features in the Control Panel and verify the version of the Windows Azure SDK. If version 1.5.20830.1814 is installed, please follow these steps to upgrade:

Click “Get Tools & SDK” on the Windows Azure SDK download page. You do not need to uninstall the previous version first.
Update your projects to use the copy of Microsoft.WindowsAzure.StorageClient.dll found in C:\Program Files\Windows Azure SDK\v1.5\bin\

We found a bug in the StorageClient library in Windows Azure SDK 1.5 that impacts the DownloadToStream, DownloadToFile, DownloadText, and DownloadByteArray methods for Windows Azure Blobs.

If a client is doing a synchronous blob download using the SDK 1.5 and its connection is closed, then the client can get a partial download of the blob. The problem is that the client does not get an exception when the connection is closed, so it thinks the full blob was downloaded. For example, if the blob was 15MB, and the client downloaded just 1MB and the connection was closed, then the client would only have 1MB (instead of 15MB) and think that it had the whole blob. Instead, the client should have gotten an exception. The problem only occurs when the connection to the client is closed, and only for synchronous downloads, but not asynchronous downloads.

The issue was introduced in version 1.5 of the Azure SDK when we changed the synchronous download methods to call the synchronous Read API on the web response stream. We see that once response headers have been received, the synchronous read method on the .NET response stream does not throw an exception when a connection is lost and the blob content has not been fully received yet. Since an exception is not thrown, this results in the Download method behaving as if the entire download has completed and it returns successfully when only partial content has been downloaded.

The problem only occurs when all of the following are true:

A synchronous download method is used
At least the response headers are received by the client after which the connection to the client is closed before the entire content is received by the client

Notably, one scenario where this can occur is if the request timeout happens after the headers have been received, but before all of the content can be transferred. For example, if the client set the timeout to 30 seconds for download of a 100GB blob, then it’s likely that this problem would occur, because 30 seconds is long enough for the response headers to be received along with part of the blob content, but is not long enough to transfer the full 100GB of content.

This does not impact asynchronous downloads, because asynchronous reads from a response stream throw an IOException when the connection is closed. In addition, calls to OpenRead() are not affected as they also use the asynchronous read methods.

We will be releasing an SDK hotfix for this soon and apologize for any inconvenience this may have caused. Until then we recommend that customers use SDK 1.4 or the async methods to download blobs in SDK 1.5. Additionally, customers who have already started using SDK 1.5, can work around this issue by doing the following: Replace your DownloadToStream, DownloadToFile, DownloadText, and DownloadByteArray methods with BeginDownloadToStream/EndDownloadToStream. This will ensure that an IOException is thrown if the connection is closed, similar to SDK 1.4. The following is an example showing you how to do that:

CloudBlob blob = new CloudBlob(uri);
blob.DownloadToStream(myFileStream); // WARNING: Can result in partial successful downloads

// NOTE: Use async method to ensure an exception is thrown if connection is 
// closed after partial download
blob.EndDownloadToStream(
blob.BeginDownloadToStream(myFileStream, null /* callback */, null /* state */));

If you rely on the text/file/byte array versions of download, we have the below extension methods for your convenience, which wraps a stream to work around this problem.

using System.IO;using System.Text;using Microsoft.WindowsAzure.StorageClient;public static class CloudBlobExtensions{/// <summary>
    /// Downloads the contents of a blob to a stream./// </summary>
    /// <param name="target">The target stream.</param>public static void DownloadToStreamSync(this CloudBlob blob, Stream target)
    {
        blob.DownloadToStreamSync(target, null);
    }/// <summary>
    /// Downloads the contents of a blob to a stream./// </summary>
    /// <param name="target">The target stream.</param>
    /// <param name="options">An object that specifies any additional options for the /// request.</param>public static void DownloadToStreamSync(this CloudBlob blob, Stream target, BlobRequestOptions options)
    {
        blob.EndDownloadToStream(blob.BeginDownloadToStream(target, null, null));
    }/// <summary>
    /// Downloads the blob's contents./// </summary>
    /// <returns>The contents of the blob, as a string.</returns>public static string DownloadTextSync(this CloudBlob blob)
    {return blob.DownloadTextSync(null);
    }/// <summary>
    /// Downloads the blob's contents./// </summary>
    /// <param name="options">An object that specifies any additional options for the /// request.</param>
    /// <returns>The contents of the blob, as a string.</returns>public static string DownloadTextSync(this CloudBlob blob, BlobRequestOptions options)
    {
        Encoding encoding = Encoding.UTF8;byte[] array = blob.DownloadByteArraySync(options);return encoding.GetString(array);
    }/// <summary>
    /// Downloads the blob's contents to a file./// </summary>
    /// <param name="fileName">The path and file name of the target file.</param>public static void DownloadToFileSync(this CloudBlob blob, string fileName)
    {
        blob.DownloadToFileSync(fileName, null);
    }/// <summary>
    /// Downloads the blob's contents to a file./// </summary>
    /// <param name="fileName">The path and file name of the target file.</param>
    /// <param name="options">An object that specifies any additional options for the /// request.</param>public static void DownloadToFileSync(this CloudBlob blob, string fileName, BlobRequestOptions options)
    {using (var fileStream = File.Create(fileName))
        {
            blob.DownloadToStreamSync(fileStream, options);
        }
    }/// <summary>
    /// Downloads the blob's contents as an array of bytes./// </summary>
    /// <returns>The contents of the blob, as an array of bytes.</returns>public static byte[] DownloadByteArraySync(this CloudBlob blob)
    {return blob.DownloadByteArraySync(null);
    }/// <summary>
    /// Downloads the blob's contents as an array of bytes. /// </summary>
    /// <param name="options">An object that specifies any additional options for the /// request.</param>
    /// <returns>The contents of the blob, as an array of bytes.</returns>public static byte[] DownloadByteArraySync(this CloudBlob blob, BlobRequestOptions options)
    {using (var memoryStream = new MemoryStream())
        {
            blob.DownloadToStreamSync(memoryStream, options);return memoryStream.ToArray();
        }
    }
}

Usage Examples:

blob.DownloadTextSync();
blob.DownloadByteArraySync();
blob.DownloadToFileSync(fileName);

Joe Giardino

↧

SOSP Paper - Windows Azure Storage: A Highly Available Cloud Storage Service with Strong Consistency

November 20, 2011, 11:08 am

≫ Next: Windows Azure Storage Client for Java Blob Features

≪ Previous: Blob Download Bug in Windows Azure SDK 1.5

We recently published a paper describing the internal details of Windows Azure Storage at the 23^rd ACM Symposium on Operating Systems Principles (SOSP).

The paper can be found here. The conference also posted a video of the talk here, and the slides can be found here.

The paper describes how we provision and scale out capacity within and across data centers via storage stamps, and how the storage location service is used to manage our stamps and storage accounts. Then it focuses on the details for the three different layers of our architecture within a stamp (front-end layer, partition layer and stream layer), why we have these layers, what their functionality is, how they work, and the two replication engines (intra-stamp and inter-stamp). In addition, the paper summarizes some of the design decisions/tradeoffs we have made as well as lessons learned from building this large scale distributed system.

A key design goal for Windows Azure Storage is to provide Consistency, Availability, and Partition Tolerance (CAP) (all 3 of these together, instead of just 2) for the types of network partitioning we expect to see for our architecture. This is achieved by co-designing the partition layer and stream layer to provide strong consistency and high availability while being partition tolerance for the common types of partitioning/failures that occur within a stamp, such as node level and rack level network partitioning.

In this short conference talk we try to touch on the key details of how the partition layer provides an automatically load balanced object index that is scalable to 100s of billions of objects per storage stamp, how the stream layer performs its intra-stamp replication and deals with failures, and how the two layers are co-designed to provide consistency, availability, and partition tolerant for node and rack level network partitioning and failures.

Brad Calder

↧

Windows Azure Storage Client for Java Blob Features

March 5, 2012, 9:41 am

≫ Next: Windows Azure Storage Client for Java Tables Deep Dive

≪ Previous: SOSP Paper - Windows Azure Storage: A Highly Available Cloud Storage Service with Strong Consistency

We have released the Storage Client for Java with support for Windows Azure Blobs, Queues, and Tables. Our goal is to continue to improve the development experience when writing cloud applications using Windows Azure Storage. As such, we have incorporated feedback from customers and forums for the current .NET libraries to help create a more seamless API that is both powerful and simple to use. This blog post serves as an overview of a few new features for Blobs that are currently unique to the Storage Client for Java, which are designed to address common scenarios when working with Cloud workloads.

MD5

One of the key pieces of feedback we get is to make working with MD5 easier and more seamless. For java we have simplified this scenario to provide consistent behavior and simple configuration.

There are two different ways to use Content-MD5s in the Blob service: a transactional MD5 is used to provide data integrity during transport of blocks or pages of a blob, which is not stored with the blob, and an MD5 of the entire blob, which is stored with the blob and returned on subsequent GET operations (see the blog post here for information on what the server provides).

To make this easy, we have designed high level controls for common cross cutting scenarios that will be respected by every API. For example, no matter which API a user chooses to upload a blob (page or block) the MD5 settings will be honored. Additionally we have decoupled transactional MD5 (which is useful to ensure transport integrity of individual blocks and pages) and Blob Level MD5 which sets the MD5 value on the entire blob, which is then returned on subsequent GETs.

The following example illustrates how to use BlobRequestOptions to utilize transactional content md5 to ensure that uploads and downloads are validated correctly. Note: transactional MD5 is not needed when using HTTPS as HTTPS provides its own integrity mechanism. Both the transactional MD5 and the full blob level MD5 are set to false (turned off) by default. The following shows how to turn both of them on.

// Define BlobRequestOptions to use transactional MD5BlobRequestOptions options = new BlobRequestOptions();
options.setUseTransactionalContentMD5(true);
options.setStoreBlobContentMD5 (true); // Set full blob level MD5blob.upload(sourceStream blobLength,null /* AccessCondition */,
            options,null /* OperationContext */);

blobRef.download(outStream,
            null /* AccessCondition */,
            options,null /* OperationContext */);

Sparse Page Blob

The most common use for page blobs among cloud applications is to back a VHD (Virtual Hard Drive) image. When a page blob is first created it exists as a range of zero filled bytes. The Windows Azure Blob service provides the ability to write in increments of 512 byte pages and keep track of which pages have been written to. As such it is possible for a client to know which pages still contain zero filled data and which ones contain valid data.

We are introducing a new feature in this release of the Storage Client for Java which can omit 512 byte aligned ranges of zeros when uploading a page blob, and subsequently intelligently download only the non-zero data. During a download when the library detects that the current data being read exists in a zero’d region the client simply generates these zero’d bytes without making additional requests to the server. Once the read continues on into a valid range of bytes the library resumes making requests to the server for the non-zero’d data.

The following example illustrates how to use BlobRequestOptions to use the sparse page blob feature.

// Define BlobRequestOptions to use sparse page blobBlobRequestOptions options = new BlobRequestOptions();
options.setUseSparsePageBlob(true);
blob.create(length);// Alternatively could use blob.openOutputStreamblob.upload(sourceStream blobLength,null /* AccessCondition */,
            options,null /* OperationContext */);// Alternatively could use blob.openInputStreamblobRef.download(outStream,null /* AccessCondition */,
            options,null /* OperationContext */);

Please note this optimization works in chunked read and commit sizes (configurable via CloudBlobClient. setStreamMinimumReadSizeInBytes and CloudBlobClient.setPageBlobStreamWriteSizeInBytes respectively). If a given read or commit consists entirely of zeros then the operation is skipped altogether. Alternatively if a given read or commit chunk consists of only a subset of non-zero data then it is possible that the library will “shrink” the read or commit chunk by ignoring any beginning or ending pages which consist entirely of zeros. This allows us to optimize both cost (fewer transactions) and speed (less data) in a predictable manner.

Download Resume

Another new feature to this release of the Storage Client for Java is the ability for full downloads to resume themselves in the event of a disconnect or exception. The most cost efficient way for a client to download a given blob is in a single REST GET call. However if you are downloading a large blob, say of several GB, an issue arises on how to handle disconnects and errors without having to pre-buffer data or re-download the entire blob.

To solve this issue, the blob download functionality will now check the retry policy specified by the user and determine if the user desires to retry the operation. If the operation should not be retried it will simply throw as expected, however if the retry policy indicates the operation should be retried then the download will revert to using a BlobInputStream positioned to the current location of the download with an ETag check. This allows the user to simply “resume” the download in a performant and fault-tolerant way. This feature is enabled for all downloads via the CloudBlob.download method.

Best Practices

We’d also like to share some best practices for using blobs with the Storage Client for Java:

Always provide the length of the data being uploaded if it is available; alternatively a user may specify -1 if the length is not known. This is needed for authentication. Uploads that specify -1 will cause the Storage Client to pre-read the data to determine its length, (and potentially to calculate md5 if enabled). If the InputStream provided is not markable BlobOutputStream is used.
Use markable streams (i.e. BufferedInputStream) when uploading Blobs. In order to support retries and to avoid having to prebuffer data in memory a stream must be markable so that it can be rewound in the case of an exception to retry the operation. When the stream provided does not support mark the Storage Client will use a BlobOutputStream which will internally buffer individual blocks until they are commited. Note: uploads that are over CloudBlobClient.getSingleBlobPutThresholdInBytes() (Default is 32 MB, but can be set up to 64MB) will also be uploaded using the BlobOutputStream.
If you already have the MD5 for a given blob you can set it directly via CloudBlob.getProperties().setContentMd5 and it will be sent on a subsequent Blob upload or by calling CloudBlob.uploadProperties(). This can potentially increase performance by avoiding a duplicate calculation of MD5.
Please note MD5 is disabled by default, see the MD5 section above regarding how to utilize MD5.
BlobOutputStreams commit size is configurable via CloudBlobClient.setWriteBlockSizeInBytes() for BlockBlobs and CloudBlobClient.setPageBlobStreamWriteSizeInBytes() for Page Blobs
BlobInputStreams minimum read size is configurable via CloudBlobClient.setStreamMinimumReadSizeInBytes()
For lower latency uploads BlobOutputStream can execute multiple parallel requests. The concurrent request count is default to 1 (no concurrency) and is configurable via CloudBlobClient.setConcurrentRequestCount(). BlobOutputStream is accessible via Cloud[Block|Page]Blob.openOutputStream or by uploading a stream that is greater than CloudBlobClient.getSingleBlobPutThresholdInBytes() for BlockBlob or 4 MB for PageBlob.

Summary

This post has covered a few interesting features in the recently released Windows Azure Storage Client for Java. We very much appreciate all the feedback we have gotten from customers and through the forums, please keep it coming. Feel free to leave comments below,

Joe Giardino
Developer
Windows Azure Storage

Resources

Get the Windows Azure SDK for Java

Learn more about the Windows Azure Storage Client for Java

Learn more about Windows Azure Storage

↧

Windows Azure Storage Client for Java Tables Deep Dive

March 5, 2012, 9:44 am

≫ Next: Windows Azure Storage Client for Java Overview

≪ Previous: Windows Azure Storage Client for Java Blob Features

This blog post serves as an overview to the recently released Windows Azure Storage Client for Java which includes support for the Azure Table Service. Azure Tables is a NoSQL datastore. For detailed information on the Azure Tables data model, see the resources section below.

Design

There are three key areas we emphasized in the design of the Table client: usability, extensibility, and performance. The basic scenarios are simple and “just work”; in addition, we have also provided three distinct extension points to allow developers to customize the client behaviors to their specific scenario. We have also maintained a degree of consistency with the other storage clients (Blob and Queue) so that moving between them feels seamless. There are also some features and requirements that make the table service unique.

For more on the overall design philosophy and guidelines of the Windows Azure Storage Client for Java see the related blog post in the Links section below.

Packages

The Storage Client for Java is distributed in the Windows Azure SDK for Java jar (see below for locations). The Windows Azure SDK for Java jar also includes a “service layer” implementation for several Azure services, including storage, which is intended to provide a low level interface for users to access various services in a common way. In contrast, the client layer provides a much higher level API surface that is more approachable and has many conveniences that are frequently required when developing scalable Windows Azure Storage applications. For the optimal development experience avoid importing the base package directly and instead import the client sub package (com.microsoft.windowsazure.services.table.client). This blog post refers to this client layer.

Common

com.microsoft.windowsazure.services.core.storage– This package contains all storage primitives such as CloudStorageAccount, StorageCredentials, Retry Policies, etc.

Tables

com.microsoft.windowsazure.services.table.client– This package contains all the functionality for working with the Windows Azure Table service, including CloudTableClient, TableServiceEntity, etc.

Object Model

A diagram of the table object model is provided below. The core flow of the client is that a user defines an action (TableOperation, TableBatchOperation, or TableQuery) over entities in the Table service and executes these actions via the CloudTableClient. For usability, these classes provide static factory methods to assist in the definition of actions.

For example, the code below inserts a single entity:

tableClient.execute([Table Name], TableOperation.insert(entity));

Figure 1: Table client object model

Execution

CloudTableClient

Similar to the other Azure storage clients, the table client provides a logical service client, CloudTableClient, which is responsible for service wide operations and enables execution of other operations. The CloudTableClient class can update the Storage Analytics settings for the Table service, list all the tables in the account, and execute operations against a given table, among other operations.

TableRequestOptions

The TableRequestOptions class defines additional parameters which govern how a given operation is executed, specifically the timeout and RetryPolicy that are applied to each request. The CloudTableClient provides default timeout and RetryPolicy settings; TableRequestOptions can override them for a particular operation.

TableResult

The TableResult class encapsulates the result of a single TableOperation. This object includes the HTTP status code, the ETag and a weak typed reference to the associated entity.

Actions

TableOperation

The TableOperation class encapsulates a single operation to be performed against a table. Static factory methods are provided to create a TableOperation that will perform an insert, delete, merge, replace, retrieve, insertOrReplace, and insertOrMerge operation on the given entity. TableOperations can be reused so long as the associated entity is updated. As an example, a client wishing to use table storage as a heartbeat mechanism could define a merge operation on an entity and execute it to update the entity state to the server periodically.

Sample – Inserting an Entity into a Table

// You will need the following imports
import com.microsoft.windowsazure.services.core.storage.CloudStorageAccount;
import com.microsoft.windowsazure.services.table.client.CloudTableClient;
import com.microsoft.windowsazure.services.table.client.TableOperation;
import com.microsoft.windowsazure.services.table.client.TableServiceEntity;

// Create the table client.
CloudTableClient tableClient = storageAccount.createCloudTableClient();
    
tableClient.createTableIfNotExists("people");

// Create a new customer entity.
CustomerEntity customer1 = new CustomerEntity("Harp", "Walter");
customer1.setEmail("Walter@contoso.com");
customer1.setPhoneNumber("425-555-0101");

// Create an operation to add the new customer to the people table.
TableOperation insertCustomer1 = TableOperation.insert(customer1);

// Submit the operation to the table service.
tableClient.execute("people", insertCustomer1);

TableBatchOperation

The TableBatchOperation class represents multiple TableOperation objects which are executed as a single atomic action within the table service. There are a few restrictions on batch operations that should be noted:

You can perform batch updates, deletes, inserts, merge and replace operations.
A batch operation can have a retrieve operation, if it is the only operation in the batch.
A single batch operation can include up to 100 table operations.
All entities in a single batch operation must have the same partition key.
A batch operation is limited to a 4MB data payload.

The CloudTableClient.execute overload which takes as input a TableBatchOperation will return an ArrayList of TableResults which will correspond in order to the entries in the batch itself. For example, the result of a merge operation that is the first in the batch will be the first entry in the returned ArrayList of TableResults. In the case of an error the server may return a numerical id as part of the error message that corresponds to the sequence number of the failed operation in the batch unless the failure is associated with no specific command such as ServerBusy, in which case -1 is returned. TableBatchOperations, or Entity Group Transactions, are executed atomically meaning that either all operations will succeed or if there is an error caused by one of the individual operations the entire batch will fail.

Sample – Insert two entities in a single atomic Batch Operation

// You will need the following imports
import com.microsoft.windowsazure.services.core.storage.CloudStorageAccount;
import com.microsoft.windowsazure.services.table.client.CloudTableClient;
import com.microsoft.windowsazure.services.table.client.TableBatchOperation;
import com.microsoft.windowsazure.services.table.client.TableServiceEntity;

// Create the table client.
CloudTableClient tableClient = storageAccount.createCloudTableClient();
    
tableClient.createTableIfNotExists("people");

// Define a batch operation.
TableBatchOperation batchOperation = new TableBatchOperation();

// Create a customer entity and add to the table
CustomerEntity customer = new CustomerEntity("Smith", "Jeff");
customer.setEmail("Jeff@contoso.com");
customer.setPhoneNumber("425-555-0104");
batchOperation.insert(customer);

// Create another customer entity and add to the table
CustomerEntity customer2 = new CustomerEntity("Smith", "Ben");
customer2.setEmail("Ben@contoso.com");
customer2.setPhoneNumber("425-555-0102");
batchOperation.insert(customer2);        

// Submit the operation to the table service.
tableClient.execute("people", batchOperation);

TableQuery

The TableQuery class is a lightweight query mechanism used to define queries to be executed against the table service. See “Querying” below.

Entities

TableEntity interface

The TableEntity interface is used to define an object that can be serialized and deserialized with the table client. It contains getters and setters for the PartitionKey, RowKey, Timestamp, Etag, as well as methods to read and write the entity. This interface is implemented by the TableServiceEntity and subsequently the DynamicTableEntity that are included in the library; a client may implement this interface directly to persist different types of objects or objects from 3^rd-party libraries. By overriding the readEntity or writeEntity methods a client may customize the serialization logic for a given entity type.

TableServiceEntity

The TableServiceEntity class is an implementation of the TableEntity interface and contains the RowKey, PartitionKey, and Timestamp properties. The default serialization logic TableServiceEntity uses is based off of reflection where an entity “property” is defined by a class which contains corresponding get and set methods where the return type of the getter is the same as that of the input parameter of the setter. This will be discussed in greater detail in the extension points section below. This class is not final and may be extended to add additional properties to an entity type.

Sample – Define a POJO that extends TableServiceEntity

// This class defines one additional property of integer type, since it extends
// TableServiceEntity it will be automatically serialized and deserialized.
public class SampleEntity extends TableServiceEntity {
    private String SampleProperty;

    public String getSampleProperty() {
      return this.SampleProperty;
    }

    public void setSampleProperty (String sampleProperty) {
      this.SampleProperty= sampleProperty;
    }
}

DynamicTableEntity

The DynamicTableEntity class allows clients to update heterogeneous entity types without the need to define base classes or special types. The DynamicTableEntity class defines the required properties for RowKey, PartitionKey, Timestamp, and Etag; all other properties are stored in a HashMap form. Aside from the convenience of not having to define concrete POJO types, this can also provide increased performance by not having to perform serialization or deserialization tasks. We have also provided sample code that demonstrates this.

Sample – Retrieve a single property on a collection of heterogeneous entities

// You will need the following imports
import com.microsoft.windowsazure.services.table.client.CloudTableClient;
import com.microsoft.windowsazure.services.table.client.DynamicTableEntity;
import com.microsoft.windowsazure.services.table.client.EntityProperty;
import com.microsoft.windowsazure.services.table.client.TableQuery;

// Define the query to retrieve the entities, notice in this case we
// only need to retrieve the Count property.
TableQuery<DynamicTableEntity> query = TableQuery.from(tableName, DynamicTableEntity.class).select(new String[] { "Count" });

// Note the TableQuery is actually executed when we iterate over the
// results. Also, this sample uses the DynamicTableEntity to avoid
// having to worry about various types, as well as avoiding any
// serialization processing.
for (DynamicTableEntity ent : tableClient.execute(query)) {
    EntityProperty countProp = ent.getProperties().get("Count");

    // Users should always assume property is not there in case another
    // client removed it.
    if (countProp == null) {
     throw new IllegalArgumentException("Invalid entity, Count property not found!");

    // Display Count property, however you could modify it here and persist it back to the service.
    System.out.println(countProp.getValueAsInteger());
    }
}

EntityProperty

The EntityProperty class encapsulates a single property of an entity for the purposes of serialization and deserialization. The only time the client has to work directly with EntityProperties is when using DynamicTableEntity or implementing the TableEntity.readEntity and TableEntity.writeEntity methods. The EntityProperty stores the given value in its serialized string form and deserializes it on each subsequent get.

Please note, when using a non-String type property in a tight loop or performance critical scenario, it is best practice to not update an EntityProperty directly, as there will be a performance implication in doing so. Instead, a client should deserialize the entity into an object, update that object directly, and then persist that object back to the table service (See POJO Sample below).

The samples below show two approaches that can be a players score property. The first approach uses DynamicTableEntity to avoid having to declare a client side object and updates the property directly, whereas the second will deserialize the entity into a POJO and update that object directly.

Sample –Update of entity property using EntityProperty

// You will need the following imports
import com.microsoft.windowsazure.services.core.storage.StorageException;
import com.microsoft.windowsazure.services.table.client.CloudTableClient;
import com.microsoft.windowsazure.services.table.client.DynamicTableEntity;
import com.microsoft.windowsazure.services.table.client.EntityProperty;
import com.microsoft.windowsazure.services.table.client.TableOperation;
import com.microsoft.windowsazure.services.table.client.TableResult;

// Retrieve entity
TableResult res = tableClient.execute("gamers", TableOperation.retrieve("Smith", "Jeff", DynamicTableEntity.class));
DynamicTableEntity player = res.getResultAsType();

// Retrieve Score property
EntityProperty scoreProp = player.getProperties().get("Score");
    
if (scoreProp == null) {
    throw new IllegalArgumentException("Invalid entity, Score property not found!");
}
    
scoreProp.setValue(scoreProp.getValueAsInteger() + 1;

// Store the updated score
tableClient.execute("gamers", TableOperation.merge(player));

Sample – Update of entity property using POJO

// You will need the following imports
import com.microsoft.windowsazure.services.core.storage.StorageException;
import com.microsoft.windowsazure.services.table.client.CloudTableClient;
import com.microsoft.windowsazure.services.table.client.DynamicTableEntity;
import com.microsoft.windowsazure.services.table.client.EntityProperty;
import com.microsoft.windowsazure.services.table.client.TableOperation;
import com.microsoft.windowsazure.services.table.client.TableResult;
import com.microsoft.windowsazure.services.table.client.TableServiceEntity;

// Entity type with a score property
public class GamerEntity extends TableServiceEntity {
    private int score;

    public int getScore() {
        return this.score;
    }

    public void setScore(int Score) {
        this.score = Score;
    }
}

// Retrieve entity
TableResult res = tableClient.execute("gamers", TableOperation.retrieve("Smith", "Jeff", GamerEntity.class));
GamerEntity player = res.getResultAsType();

// Update Score
player.setScore(player.getScore() + 1);


// Store the updated score
tableClient.execute("gamers", TableOperation.merge(player));

Serialization

There are three main extension points in the table client that allow a user to customize serialization and deserialization of entities. Although completely optional, these extension points enable a number of use-specific or NoSQL scenarios.

EntityResolver

The EntityResolver interface defines a single method (resolve) and allows client-side projection and processing for each entity during serialization and deserialization. This interface is designed to be implemented by an anonymous inner class to provide custom client side projections, query-specific filtering, and so forth. This enables key scenarios such as deserializing a collection of heterogeneous entities from a single query.

Sample – Use EntityResolver to perform client side projection

// You will need the following imports
import com.microsoft.windowsazure.services.table.client.CloudTableClient;
import com.microsoft.windowsazure.services.table.client.EntityProperty;
import com.microsoft.windowsazure.services.table.client.EntityResolver;
import com.microsoft.windowsazure.services.table.client.TableQuery;

// Define the query to retrieve the entities, notice in this case we
// only need to retrieve the Email property.
TableQuery<Customer> query = TableQuery.from(tableName, Customer.class).select(new String[] { "Email" });

// Define a Entity resolver to mutate the entity payload upon retrieval.
// In this case we will simply return a String representing the customers Email 
// address.
EntityResolver<String> emailResolver = new EntityResolver<String>() {
@Override
public String resolve(String PartitionKey, String RowKey, Date timeStamp, HashMap<String, EntityProperty> props, String etag) {
    return props.get("Email").getValueAsString();
    }
};

// Display the results of the query, note that the query now returns
// Strings instead of entity types since this is the type of
// EntityResolver we created.
for (String projectedString : tableClient.execute(query, emailResolver)) {
    System.out.println(projectedString);
}

Annotations

@StoreAs

The @StoreAs annotation is used by a client to customize the serialized property name for a given property. If @StoreAs is not used, then the property name will be used in table storage. The @StoreAs annotation cannot be used to store PartitionKey, RowKey, or Timestamp, if a property is annoted as such it will be ignored by the serializer. Two common scenarios are to reduce the length of the property name for performance reasons, or to override the default name the property may have.

Sample – Alter a property name via the @StoreAs Annotation

// You will need the following imports
import com.microsoft.windowsazure.services.table.client.StoreAs;
import com.microsoft.windowsazure.services.table.client.TableServiceEntity;

// This entity will store the CustomerPlaceOfResidenceProperty as “cpor” on the service.
public class StoreAsEntity extends TableServiceEntity {
    private String cpor;

    @StoreAs(name = "cpor")
    public String getCustomerPlaceOfResidence() {
            return this.cpor;
    }

    @StoreAs(name = "cpor")
    public void setCustomerPlaceOfResidence (String customerPlaceOfResidence) {
        this.cpor = customerPlaceOfResidence;
    }
}

@Ignore

The @Ignore annotation is used on the getter or setter to indicates to the default reflection-based serializer that it should ignore the property during serialization and deserialization.

Sample – Use @Ignore annotation to expose friendly client side property that is backed by PartitionKey

// You will need the following imports
import com.microsoft.windowsazure.services.table.client.Ignore;
import com.microsoft.windowsazure.services.table.client.TableServiceEntity;

// In this sample, the Customer ID is used as the PartitionKey.  A property 
// CustomerID is exposed on the client side to allow friendly access, but
// is annotated with @Ignore to prevent it from being duplicated in the
// table entity.
public class OnlineStoreBaseEntity extends TableServiceEntity {
    @Ignore
    public String getCustomerID() {
        return this.getPartitionKey();
    }

    @Ignore
    public void setCustomerID(String customerID) {
        this.setPartitionKey(customerID);
    }
}

TableEntity.readEntity and TableEntity.writeEntity methods

While they are part of the TableEntity interface, the TableEntity .readEntity and TableEntitywriteEntity methods provide the third major extension points to serialization. By implementing or overriding these methods in an object a client can customize how entities are stored, and potentially improve performance compared to the default reflection-based serializer. See the javadoc for the respective method for more information.

For more on the overall design object model of the Windows Azure Storage Client for Java see the related blog post in the Links section below.

Querying

There are two query constructs in the table client: a retrieve TableOperation which addresses a single unique entity, and a TableQuery which is a standard query mechanism used against multiple entities in a table. Both querying constructs need to be used in conjunction with either a class type that implements the TableEntity interface or with an EntityResolver which will provide custom deserialization logic.

Retrieve

A retrieve operation is a query which addresses a single entity in the table by specifying both its PartitionKey and RowKey. This is exposed via TableOperation.retrieve and TableBatchOperation.retrieve and executed like a typical operation via the CloudTableClient.

Sample – Retrieve a single entity

// You will need the following imports
import com.microsoft.windowsazure.services.table.client.CloudTableClient;
import com.microsoft.windowsazure.services.table.client.TableOperation;

// Create the table client.
CloudTableClient tableClient = storageAccount.createCloudTableClient();

// Retrieve the entity with partition key of "Smith" and row key of "Jeff"
TableOperation retrieveSmithJeff = TableOperation.retrieve("Smith", "Jeff", CustomerEntity.class);

// Submit the operation to the table service and get the specific entity.
CustomerEntity specificEntity = tableClient.execute("people", retrieveSmithJeff).getResultAsType();

TableQuery

Unlike TableOperation and BatchTableOperation the TableQuery requires a source table name as part of its definition. TableQuery contains a static factory method from used to create a new query and provides methods for fluent query construction. The code below produces a query to take the top 5 results from the customers table which have a RowKey greater than 5.

Sample – Query top 5 entities with RowKey greater than or equal to 5

// You will need the following imports
import com.microsoft.windowsazure.services.table.client.TableQuery;
import com.microsoft.windowsazure.services.table.client.TableServiceEntity;
TableQuery<TableServiceEntity> query = 
TableQuery.from(“customers”, TableServiceEntity.class).
where(TableQuery.generateFilterCondition("RowKey", QueryComparisons.GREATER_THAN_OR_EQUAL, "5")).take(5);

The TableQuery is strong typedand must be instantiated with a class type that is accessible and contains a nullary constructor; otherwise an exception will be thrown. The class type must also implement the TableEntity interface. If the client wishes to use a resolver to deserialize entities they may specify one via execute on CloudTableClient and specify the TableServiceEntity class type as demonstrated above.

The TableQuery object provides methods for take, select, where, and source table name. There are static methods provided such as generateFilterCondition and joinFilter which construct other filter strings. Also note, generateFilterCondition provides several overloads that can handle all supported types, some examples are listed below:

// 1. Filter on String
TableQuery.generateFilterCondition("Prop", QueryComparisons.GREATER_THAN, "foo");

// 2. Filter on UUID
TableQuery.generateFilterCondition("Prop", QueryComparisons.EQUAL, uuid));

// 3. Filter on Long
TableQuery.generateFilterCondition("Prop", QueryComparisons.GREATER_THAN, 50L);

// 4. Filter on Double
TableQuery.generateFilterCondition("Prop", QueryComparisons.GREATER_THAN, 50.50);

// 5. Filter on Integer
TableQuery.generateFilterCondition("Prop", QueryComparisons.GREATER_THAN, 50);

// 6. Filter on Date
TableQuery.generateFilterCondition("Prop", QueryComparisons.LESS_THAN, new Date());

// 7. Filter on Boolean
TableQuery.generateFilterCondition("Prop", QueryComparisons.EQUAL, true);

// 8. Filter on Binary
TableQuery.generateFilterCondition("Prop", QueryComparisons.EQUAL, new byte[] { 0x01, 0x02, 0x03 });

Sample – Query all entities with a PartitionKey=”SamplePK” and RowKey greater than or equal to “5”

// You will need the following imports
import com.microsoft.windowsazure.services.table.client.TableConstants;
import com.microsoft.windowsazure.services.table.client.TableQuery;
import com.microsoft.windowsazure.services.table.client.TableQuery.Operators;
import com.microsoft.windowsazure.services.table.client.TableQuery.QueryComparisons;

String pkFilter = TableQuery.generateFilterCondition(TableConstants.PARTITION_KEY, QueryComparisons.EQUAL,"samplePK");

String rkFilter = TableQuery.generateFilterCondition(TableConstants.ROW_KEY, QueryComparisons.GREATER_THAN_OR_EQUAL, "5");

String combinedFilter = TableQuery.combineFilters(pkFilter, Operators.AND, rkFilter);

TableQuery<SampleEntity> query = TableQuery.from(tableName, SampleEntity.class).where(combinedFilter);

Note: There is no logical expression tree provided in the current release, and as a result repeated calls to the fluent methods on TableQuery overwrite the relevant aspect of the query.

Scenarios

NoSQL

A common pattern in a NoSQL datastore is to work with storing related entities with different schema in the same table. A frequent example relates to customers and orders which are stored in the same table. In our case, the PartitionKey for both Customer and Order will be a unique CustomerID which will allow us to retrieve and alter a customer and their respective orders together. The challenge becomes how to work with these heterogeneous entities on the client side in an efficient and usable manner. We discuss this here, and you can also download sample code.

The table client provides an EntityResolver interface which allows client side logic to execute during deserialization. In the scenario detailed above, let’s use a base entity class named OnlineStoreEntity which extends TableServiceEntity.

// You will need the following imports
import com.microsoft.windowsazure.services.table.client.Ignore;
import com.microsoft.windowsazure.services.table.client.TableServiceEntity;

public abstract class OnlineStoreEntity extends TableServiceEntity {
    @Ignore
    public String getCustomerID() {
       return this.getPartitionKey();
    }

    @Ignore
    public void setCustomerID(String customerID) {
        this.setPartitionKey(customerID);
    }
}

Let’s also define two additional entity types, Customer and Order which derive from OnlineStoreEntity and prepend their RowKey with an entity type enumeration, “0001” for customers and “0002” for Orders. This will allow us to query for just a customer, their orders, or both—while also providing a persisted definition as to what client side type is used to interact with the object. Given this, let’s define a class that implements the EntityResolver interface to assist in deserializing the heterogeneous types.

Sample – Using EntityResolver to deserialize heterogeneous entities

// You will need the following imports
import com.microsoft.windowsazure.services.core.storage.StorageException;
import com.microsoft.windowsazure.services.table.client.EntityProperty;
import com.microsoft.windowsazure.services.table.client.EntityResolver;

EntityResolver<OnlineStoreEntity> webStoreResolver = new EntityResolver<OnlineStoreEntity>() {
@Override
public OnlineStoreEntity resolve(String partitionKey, String rowKey, Date timeStamp, HashMap<String, EntityProperty> properties, String etag) throws StorageException {
     OnlineStoreEntity ref = null;
     
     if (rowKey.startsWith("0001")) {
          // Customer
          ref = new Customer();
     }
     else if (rowKey.startsWith("0002")) {
         // Order
         ref = new Order();
     }
     else {
         throw new IllegalArgumentException(String.format("Unknown entity type detected! RowKey: %s", rowKey));
     }

     ref.setPartitionKey(partitionKey);
     ref.setRowKey(rowKey);
     ref.setTimestamp(timeStamp);
     ref.setEtag(etag);
     ref.readEntity(properties, null);
     return ref;
     }
};

Now, on iterating through the results with the following code:

for (OnlineStoreEntity entity : tableClient.execute(customerAndOrderQuery, webStoreResolver)) {
     System.out.println(entity.getClass());
}

It will output:

class tablesamples.NoSQL$Customer
class tablesamples.NoSQL$Order
class tablesamples.NoSQL$Order
class tablesamples.NoSQL$Order
class tablesamples.NoSQL$Order
….

For the complete OnlineStoreSample sample please see the Samples section below.

Heterogeneous update

In some cases it may be required to update entities regardless of their type or other properties. Let’s say we have a table named “employees”. This table contains entity types for developers, secretaries, contractors, and so forth. The example below shows how to query all entities in a given partition (in our example the state the employee works in is used as the PartitionKey) and update their salaries regardless of job position. Since we are using merge, the only property that is going to be updated is the Salary property, and all other information regarding the employee will remain unchanged.

// You will need the following imports
import com.microsoft.windowsazure.services.core.storage.StorageException;
import com.microsoft.windowsazure.services.table.client.CloudTableClient;
import com.microsoft.windowsazure.services.table.client.DynamicTableEntity;
import com.microsoft.windowsazure.services.table.client.EntityProperty;
import com.microsoft.windowsazure.services.table.client.TableBatchOperation;
import com.microsoft.windowsazure.services.table.client.TableQuery;

TableQuery<DynamicTableEntity> query = TableQuery.from("employees", DynamicTableEntity.class).where("PartitionKey eq 'Washington'").select(new String[] { "Salary" });

// Note for brevity sake this sample assumes there are 100 or less employees, however the client should ensure batches are kept to 100 operations or less.  

TableBatchOperation mergeBatch = new TableBatchOperation();
for (DynamicTableEntity ent : tableClient.execute(query)) {
    EntityProperty salaryProp = ent.getProperties().get("Salary");

    // Check to see if salary property is present
    if (salaryProp != null) {
        double currentSalary = salaryProp.getValueAsDouble();

      if (currentSalary < 50000) {
          // Give a 10% raise
          salaryProp.setValue(currentSalary * 1.1);
      } else if (currentSalary < 100000) {
          // Give a 5% raise
          salaryProp.setValue(currentSalary * 1.05);
      }

        mergeBatch.merge(ent);
    }
    else {
         throw new IllegalArgumentException("Entity does not contain salary!");
    }
}

// Execute batch to save changes back to the table service
tableClient.execute("employees", mergeBatch);

Complex Properties

The Windows Azure Table service provides two indexed columns that together provide the key for a given entity (PartitionKey and RowKey). A common best practice is to include multiple aspects of an entity in these keys since they can be queried efficiently. Using the @Ignore annotation, it is possible to define friendly client-side properties that are part of this complex key without persisting them individually.

Let’s say that we are creating a directory of all the people in America. By creating a complex key such as [STATE];[CITY] I can enable efficient queries for all people in a given state or city using a lexical comparison while utilizing only one indexed column. This optimization is exposed in a convenient way by providing friendly client properties on an object that mutate the key appropriately but are not actually persisted to the service.

Note: Take care when choosing to provide setters on columns backed by keys which could cause failures for some operations (delete, merge, replace) since you are effectively changing the identity of the entity.

The sample below illustrates how to provide friendly accessors to complex keys. When only providing getters the @Ignore annotation is optional, since the serializer will not use properties that do not expose a corresponding setter.

Sample – Complex Properties on a POJO using the @Ignore Annotation

// You will need the following imports
import com.microsoft.windowsazure.services.table.client.Ignore;
import com.microsoft.windowsazure.services.table.client.TableServiceEntity;

public class Person extends TableServiceEntity {
    @Ignore
    public String getState() {
        return this.getPartitionKey().substring(0, this.getPartitionKey().indexOf(";"));
    }

    @Ignore
    public String getCity() {
        return this.getPartitionKey().substring(this.getPartitionKey().indexOf(";") + 1);
    }
}

Persisting 3^rd party objects

In some cases we may need to persist objects exposed by 3^rd party libraries, or those which do not fit the requirements of a TableEntity and cannot be modified to do so. In such cases, the recommended best practice is to encapsulate the 3^rd party object in a new client object that implements the TableEntity interface, and provide the custom serialization logic needed to persist the object to the table service via TableEntity.readEntity and TableEntity.writeEntity.

Note: when implementing readEntity/writeEntity, TableServiceEntity provides two static helper methods (readEntityWithReflection and writeEntityWithReflection) that expose the default reflection based serialization which will use the same rules as previously discussed.

Best Practices

When persisting inner classes they must be marked static and provide a nullary constructor to enable deserialization.
Consider batch restrictions when developing your application. While a single entity may be up to 1 MB and a batch can contain 100 operations, the 4 MB payload limit on a batch operation may decrease the total number of operations allowed in a single batch. All operations in a given batch must address entities that have identical PartitionKey values.
Class types should initialize property values to null / default. The Table service will not send null / removed properties to the client which will fail to overwrite these properties on the client side. As such, it is possible to perceive a data loss in this scenario as the non-default properties will have values that do not exist in the received entity.
Take Count on TableQuery is applied to each request and not rewritten between requests. If used in conjunction with the non-segmented execute method this will effectively alter the page size and not the maximum results. For example if we define a TableQuery with take(5) and executes it via executeSegmented we will receive 5 results (potentially less if there is a continuation token involved). However if we enumerate results via the Iterator returned by the execute method then we will eventually receive all results in the table 5 at a time. Please be aware of this distinction.
When implementing readEntity or working with DynamicTableEntity the user should always assume a given property does not exist in the HashMap as it may have been removed by another client or not selected via a projected query. Therefore, it is considered best practice to check for the existence of a property in the HashMap prior to retrieving it.
The EntityProperty class is utilized during serialization to encapsulate a given property for an entity and stores data in its serialized String form. Subsequently, each call to a get method will deserialize the data and each call to a setter / constructor will serialize it. Avoid repeated updates directly on an EntityProperty wherever possible. If your application needs to make repeated updates / reads to a property on a persisted type, use a POJO object directly.
The @StoreAs annotation is provided to customize serialization which can be utilized to provide friendly client side property names and potentially increase performance by decreasing payload size. For example, if there is an entity with many long named properties such as customerEmailAddress we could utilize the @StoreAs annotation to persist this property under the name “cea” which would decrease every payload by 17 bytes for this single property alone. For large entities with numerous properties the latency and bandwidth savings can become significant. Note: the @StoreAs annotationcannot be used to write the PartitionKey, RowKey, or Timestamp as these properties are written separately: attempting to do so will cause the annotated property to be skipped during serialization. To accomplish this scenario provide a friendly client side property annotated with the @Ignore annotation and set the PartitionKey, RowKey, or Timestamp property internally.

Table Samples

As part of the release of the Windows Azure Storage Client for Java we have provided a series of samples that address some common scenarios that users may encounter when developing cloud applications.

Setup

Download the samples jar
Configure the classpath to include the Windows Azure Storage Client for Java, which can be downloaded here.
Edit the Utility.java file to specify your connection string in storageConnectionString. Alternatively if you want to use local storage emulator that ships as part of the Windows Azure SDK you can uncomment the specified key in Utility.java.
Execute each sample via eclipse or command line. For some blob samples some command line arguments are required.

Samples

TableBasics - This sample illustrates basic use of the Table primitives provided. Scenarios covered are:

How to create a table client
Insert an entity and retrieve it
Insert a batch of entities and query against them
Projection (server and client side)
DynamicUpdate – update entities regardless of types using DynamicTableEntity and projection to optimize performance.

OnlineStoreSample – This sample illustrates a common scenario when using a schema-less datastore. In this example we define both customers and orders which are stored in the same table. By utilizing the EntityResolver we can query against the table and retrieve the heterogeneous entity collection in a type safe way.

Summary

This blog post has provided an in-depth overview of the table client in the recently released Windows Azure Storage Client for Java. We continue to maintain and evolve the libraries we provide based on upcoming features and customer feedback. Feel free to leave comments below,

Joe Giardino
Developer
Windows Azure Storage

Resources

Get the Windows Azure SDK for Java

Learn more about the Windows Azure Storage Client for Java

Learn more about Windows Azure Storage

↧

Windows Azure Storage Client for Java Overview

March 5, 2012, 9:49 am

≫ Next: PartitionKey or RowKey containing the percent ‘%’ character causes some Windows Azure Tables APIs to fail

≪ Previous: Windows Azure Storage Client for Java Tables Deep Dive

We released the Storage Client for Java with support for Windows Azure Blobs, Queues, and Tables. Our goal is to continue to improve the development experience when writing cloud applications using Windows Azure Storage. This release is a Community Technology Preview (CTP) and will be supported by Microsoft. As such, we have incorporated feedback from customers and forums for the current .NET libraries to help create a more seamless API that is both powerful and simple to use. This blog post serves as an overview of the library and covers some of the implementation details that will be helpful to understand when developing cloud applications in Java. Additionally, we’ve provided two additional blog posts that cover some of the unique features and programming models for the blob and table service.

Packages

The Storage Client for Java is distributed in the Windows Azure SDK for Java jar (see below for locations). For the optimal development experience import the client sub package directly (com.microsoft.windowsazure.services.[blob|queue|table].client). This blog post refers to this client layer.

The relevant packages are broken up by service:

Common

com.microsoft.windowsazure.services.core.storage– This package contains all storage primitives such as CloudStorageAccount, StorageCredentials, Retry Policies, etc.

Services

com.microsoft.windowsazure.services.blob.client– This package contains all the functionality for working with the Windows Azure Blob service, including CloudBlobClient, CloudBlob, etc.

com.microsoft.windowsazure.services.queue.client– This package contains all the functionality for working with the Windows Azure Queue service, including CloudQueueClient, CloudQueue, etc.

Services

While this document describes the common concepts for all of the above packages, it’s worth briefly summarizing the capabilities of each client library. Blob and Table each have some interesting features that warrant further discussion. For those, we’ve provided additional blog posts linked below. The client API surface has been designed to be easy to use and approachable, however to accommodate more advanced scenarios we have provided optional extension points when necessary.

Blob

The Blob API supports all of the normal Blob Operations (upload, download, snapshot, set/get metadata, and list), as well as the normal container operations (create, delete, list blobs). However we have gone a step farther and also provided some additional conveniences such as Download Resume, Sparse Page Blob support, simplified MD5 scenarios, and simplified access conditions.

To better explain these unique features of the Blob API, we have published an additional blog post which discusses these features in detail. You can also see additional samples in our article How to Use the Blob Storage Service from Java.

Sample – Upload a File to a Block Blob

// You will need these importsimport com.microsoft.windowsazure.services.blob.client.CloudBlobClient;
import com.microsoft.windowsazure.services.blob.client.CloudBlobContainer;
import com.microsoft.windowsazure.services.blob.client.CloudBlockBlob;
import com.microsoft.windowsazure.services.core.storage.CloudStorageAccount;// Initialize AccountCloudStorageAccount account = CloudStorageAccount.parse([ACCOUNT_STRING]);// Create the blob clientCloudBlobClient blobClient = account.createCloudBlobClient();// Retrieve reference to a previously created containerCloudBlobContainer container = blobClient.getContainerReference("mycontainer");// Create or overwrite the "myimage.jpg" blob with contents from a local
// fileCloudBlockBlob blob = container.getBlockBlobReference("myimage.jpg");
File source = new File("c:\\myimages\\myimage.jpg");
blob.upload(new FileInputStream(source), source.length());

(Note: It is best practice to always provide the length of the data being uploaded if it is available; alternatively a user may specify -1 if the length is not known)

Table

The Table API provides a minimal client surface that is incredibly simple to use but still exposes enough extension points to allow for more advanced “NoSQL” scenarios. These include built in support for POJO, HashMap based “property bag” entities, and projections. Additionally, we have provided optional extension points to allow clients to customize the serialization and deserialization of entities which will enable more advanced scenarios such as creating composite keys from various properties etc.

Due to some of the unique scenarios listed above the Table service has some requirements and capabilities that differ from the Blob and Queue services. To better explain these capabilities and to provide a more comprehensive overview of the Table API we have published an in depth blog post which includes the overall design of Tables, the relevant best practices, and code samples for common scenarios. You can also see more samples in our article How to Use the Table Storage Service from Java.

Sample – Upload an Entity to a Table

// You will need these importsimport com.microsoft.windowsazure.services.core.storage.CloudStorageAccount;
import com.microsoft.windowsazure.services.table.client.CloudTableClient;
import com.microsoft.windowsazure.services.table.client.TableOperation;// Retrieve storage account from connection-stringCloudStorageAccount storageAccount = CloudStorageAccount.parse([ACCOUNT_STRING]);// Create the table client.CloudTableClient tableClient = storageAccount.createCloudTableClient();// Create a new customer entity.CustomerEntity customer1 = new CustomerEntity("Harp", "Walter");
customer1.setEmail("Walter@contoso.com");
customer1.setPhoneNumber("425-555-0101");// Create an operation to add the new customer to the people table.TableOperation insertCustomer1 = TableOperation.insert(customer1);// Submit the operation to the table service.tableClient.execute("people", insertCustomer1);

Queue

The Queue API includes convenience methods for all of the functionality available through REST. Namely creating, modifying and deleting queues, adding, peeking, getting, deleting, and updating messages, and also getting the message count. Here is a sample of creating a queue and adding a message, and you can also read How to Use the Queue Storage Service from Java.

Sample – Create a Queue and Add a Message to it

// You will need these importsimport com.microsoft.windowsazure.services.core.storage.CloudStorageAccount;
import com.microsoft.windowsazure.services.queue.client.CloudQueue;
import com.microsoft.windowsazure.services.queue.client.CloudQueueClient;
import com.microsoft.windowsazure.services.queue.client.CloudQueueMessage;// Retrieve storage account from connection-stringCloudStorageAccount storageAccount = CloudStorageAccount.parse([ACCOUNT_STRING]);// Create the queue clientCloudQueueClient queueClient = storageAccount.createCloudQueueClient();// Retrieve a reference to a queueCloudQueue queue = queueClient.getQueueReference("myqueue");// Create the queue if it doesn't already existqueue.createIfNotExist();// Create a message and add it to the queueCloudQueueMessage message = new CloudQueueMessage("Hello, World");
queue.addMessage(message);

Design

When designing the Storage Client for Java, we set up a series of design guidelines to follow throughout the development process. In order to reflect our commitment to the Java community working with Azure, we decided to design an entirely new library from the ground up that would feel familiar to Java developers. While the basic object model is somewhat similar to our .NET Storage Client Library there have been many improvements in functionality, consistency, and ease of use which will address the needs of both advanced users and those using the service for the first time.

Guidelines

Convenient and performant – This default implementation is simple to use, however we will always be able to support the performance-critical scenarios. For example, Blob upload APIs require a length of data for authentication purposes. If this is unknown a user may pass -1, and the library will calculate this on the fly. However, for performance critical applications it is best to pass in the correct number of bytes.
Users own their requests – We have provided mechanisms that will allow users to determine the exact number of REST calls being made, the associated request ids, HTTP status codes, etc. (See OperationContext in the Object Model discussion below for more). We have also annotated every method that will potentially make a REST request to the service with the @DoesServiceRequest annotation. This all ensures that you, the developer, are able to easily understand and control the requests made by your application, even in scenarios like Retry, where the Java Storage Client library may make multiple calls before succeeding.
· Look and feel –

Naming is consistent. Logical antonyms are used for complimentary actions (i.e. upload and download, create and delete, acquire and release)
get/set prefixes follow Java conventions and are reserved for local client side “properties”
Minimal overloads per method. One with the minimum set of required parameters and one overload including all optional parameters which may be null. The one exception is listing methods have 2 minimum overloads to accommodate the common scenario of listing with prefix.

Minimal API Surface – In order to keep the API surface smaller we have reduced the number of extraneous helper methods. For example, Blob contains a single upload and download method that use Input / OutputStreams. If a user wishes to handle data in text or byte form, they can simply pass in the relevant stream.
Provide advanced features in a discoverable way – In order to keep the core API simple and understandable advanced features are exposed via either the RequestOptions or optional parameters.
Consistent Exception Handling - The library immediately will throw any exception encountered prior to making the request to the server. Any exception that occurs during the execution of the request will subsequently be wrapped inside a StorageException.
Consistency – objects are consistent in their exposed API surface and functionality. For example a Blob, Container, or Queue all expose an exists() method

Object Model

The Storage Client for Java uses local client side objects to interact with objects that reside on the server. There are additional features provided to help determine if an operation should execute, how it should execute, as well as provide information about what occurred when it was executed. (See Configuration and Execution below)

Objects

StorageAccount

The logical starting point is a CloudStorageAccount which contains the endpoint and credential information for a given storage account. This account then creates logical service clients for each appropriate service: CloudBlobClient, CloudQueueClient, and CloudTableClient. CloudStorageAccount also provides a static factory method to easily configure your application to use the local storage emulator that ships with the Windows Azure SDK.

A CloudStorageAccount can be created by parsing an account string which is in the format of:

"DefaultEndpointsProtocol=http[s];AccountName=<account name>;AccountKey=<account key>"

Optionally, if you wish to specify a non-default DNS endpoint for a given service you may include one or more of the following in the connection string.

“BlobEndpoint=<endpoint>”, “QueueEndpoint=<endpoint>”, “TableEndpoint=<endpoint>”

Sample – Creating a CloudStorageAccount from an account string

// Initialize AccountCloudStorageAccount account = CloudStorageAccount.parse([ACCOUNT_STRING]);

ServiceClients

Any service wide operation resides on the service client. Default configuration options such as timeout, retry policy, and other service specific settings that objects associated with the client will reference are stored here as well.

For example:

To turn on Storage Analytics for the blob service a user would call CloudBlobClient.uploadServiceProperties(properties)
To list all queues a user would call CloudQueueClient.listQueues()
To set the default timeout to 30 seconds for objects associated with a given client a user would call Cloud[Blob|Queue|Table]Client.setTimeoutInMs(30 * 1000)

Cloud Objects

Once a user has created a service client for the given service it’s time to start directly working with the Cloud Objects of that service. A CloudObject is a CloudBlockBlob, CloudPageBlob, CloudBlobContainer, and CloudQueue, each of which contains methods to interact with the resource it represents in the service.

Below are basic samples showing how to create a Blob Container, a Queue, and a Table. See the samples in the Services section for examples of how to interact with a CloudObjects.

Blobs

// Retrieve reference to a previously created containerCloudBlobContainer container = blobClient.getContainerReference("mycontainer");// Create the container if it doesn't already existcontainer.createIfNotExist()

Queues

// Retrieve a reference to a queueCloudQueue queue = queueClient.getQueueReference("myqueue");// Create the queue if it doesn't already existqueue.createIfNotExist();

Tables

Note: You may notice that unlike blob and queue the table service does not use a CloudObject to represent an individual table, this is due to the unique nature of the table service which will is covered more in depth in the Tables deep dive blog post. Instead, table operations are performed via the CloudTableClient object:

// Create the table if it doesn't already existtableClient.createTableIfNotExists("people");

Configuration and Execution

In each maximum overload of each method provided in the library you will note there are two or three extra optional parameters provided depending on the service, all of which accept null to allow users to utilize just a subset of the features they require. For example to utilize only RequestOptions simply pass in null to AccessCondition and OperationContext. These objects for these optional parameters provide the user an easy way to determine if an operation should execute, how to execute it, and retrieve additional information about how it was executed when it completes.

AccessCondition

An AccessCondition’s primary purpose is to determine if an operation should execute, and is supported when using the Blob service. Specifically, AccessCondition encapsulates Blob leases as well as the If-Match, If-None-Match, If-Modified_Since, and the If-Unmodified-Since HTTP headers. An AccessCondition may be reused across operations as long as the given condition is still valid. For example, a user may only wish to delete a blob if it hasn’t been modified since last week. By using an AccessCondition, the library will send the HTTP "If-Unmodified-Since" header to the server which may not process the operation if the condition is not true. Additionally, blob leases can be specified through an AccessCondition so that only operations from users holding the appropriate lease on a blob may succeed.

AccessCondition provides convenient static factory methods to generate an AccessCondition instance for the most common scenarios (IfMatch, IfNoneMatch, IfModifiedSince, IfNotModifiedSince, and Lease) however it is possible to utilize a combination of these by simply calling the appropriate setter on the condition you are using.

The following example illustrates how to use an AccessCondition to only upload the metadata on a blob if it is a specific version.

blob.uploadMetadata(AccessCondition.generateIfMatchCondition(currentETag), null /* RequestOptions */, null/* OperationContext */);

Here are some Examples:

//Perform Operation if the given resource is not a specified version:AccessCondition.generateIfNoneMatchCondition(eTag)//Perform Operation if the given resource has been modified since a given date:AccessCondition. generateIfModifiedSinceConditionlastModifiedDate)//Perform Operation if the given resource has not been modified since a given date:AccessCondition. generateIfNotModifiedSinceCondition(date)//Perform Operation with the given lease id (Blobs only):AccessCondition. generateLeaseCondition(leaseID)//Perform Operation with the given lease id if it has not been modified since a given date:AccessCondition condition = AccessCondition. generateLeaseCondition (leaseID);
condition. setIfUnmodifiedSinceDate(date);

RequestOptions

Each Client defines a service specific RequestOptions (i.e. BlobRequestOptions, QueueRequestOptions, and TableRequestOptions) that can be used to modify the execution of a given request. All service request options provide the ability to specify a different timeout and retry policy for a given operation; however some services may provide additional options. For example the BlobRequestOptions includes an option to specify the concurrency to use when uploading a given blob. RequestOptions are not stateful and may be reused across operations. As such, it is common for applications to design RequestOptions for different types of workloads. For example an application may define a BlobRequestOptions for uploading large blobs concurrently, and a BlobRequestOptions with a smaller timeout when uploading metadata.

The following example illustrates how to use BlobRequestOptions to upload a blob using up to 8 concurrent operations with a timeout of 30 seconds each.

BlobRequestOptions options = new BlobRequestOptions();// Set ConcurrentRequestCount to 8options.setConcurrentRequestCount(8);// Set timeout to 30 secondsoptions.setTimeoutIntervalInMs(30 * 1000); 

blob.upload(new ByteArrayInputStream(buff),
     blobLength,null /* AccessCondition */,
     options,null /* OperationContext */);

OperationContext

The OperationContext is used to provide relevant information about how a given operation executed. This object is by definition stateful and should not be reused across operations. Additionally the OperationContext defines an event handler that can be subscribed to in order to receive notifications when a response is received from the server. With this functionality, a user could start uploading a 100 GB blob and update a progress bar after every 4 MB block has been committed.

Perhaps the most powerful function of the OperationContext is to provide the ability for the user to inspect how an operation executed. For each REST request made against a server, the OperationContext stores a RequestResult object that contains relevant information such as the HTTP status code, the request ID from the service, start and stop date, etag, and a reference to any exception that may have occurred. This can be particularly helpful to determine if the retry policy was invoked and an operation took more than one attempt to succeed. Additionally, the Service Request ID and start/end times are useful when escalating an issue to Microsoft.

The following example illustrates how to use OperationContext to print out the HTTP status code of the last operation.

OperationContext opContext = new OperationContext();
queue.createIfNotExist(null /* RequestOptions */, opContext);
System.out.println(opContext.getLastResult().getStatusCode());

Retry Policies

Retry Policies have been engineered so that the policies can evaluate whether to retry on various HTTP status codes. Although the default policies will not retry 400 class status codes, a user can override this behavior by creating their own retry policy. Additionally, RetryPolicies are stateful per operation which allows greater flexibility in fine tuning the retry policy for a given scenario.

The Storage Client for Java ships with 3 standard retry policies which can be customized by the user. The default retry policy for all operations is an exponential backoff with up to 3 additional attempts as shown below:

new RetryExponentialRetry(  
    3000 /* minBackoff in milliseconds */,
    30000 /* delatBackoff in milliseconds */,
    90000 /* maxBackoff in milliseconds */,
    3 /* maxAttempts */);

With the above default policy, the retry will approximately occur after: 3,000ms, 35,691ms and 90,000ms

If the number of attempts should be increased, one can use the following:

new RetryExponentialRetry(  
    3000 /* minBackoff in milliseconds */,
    30000 /* delatBackoff in milliseconds */,
    90000 /* maxBackoff in milliseconds */,
    6 /* maxAttempts */);

With the above policy, the retry will approximately occur after: 3,000ms, 28,442ms and 80,000ms, 90,000ms, 90,000ms and 90,000ms.

NOTE: the time provided is an approximation because the exponential policy introduces a +/-20% random delta as described below.

NoRetry - Operations will not be retried

LinearRetry - Represents a retry policy that performs a specified number of retries, using a specified fixed time interval between retries.

ExponentialRetry (default) - Represents a retry policy that performs a specified number of retries, using a randomized exponential backoff scheme to determine the interval between retries. This policy introduces a +/- %20 random delta to even out traffic in the case of throttling.

A user can configure the retry policy for all operations directly on a service client, or specify one in the RequestOptions for a specific method call. The following illustrates how to configure a client to use a linear retry with a 3 second backoff between attempts and a maximum of 3 additional attempts for a given operation.

serviceClient.setRetryPolicyFactory(new RetryLinearRetry(3000,3));

TableRequestOptions options = new TableRequestOptions();
options.setRetryPolicyFactory(new RetryLinearRetry(3000, 3));

Custom Policies

There are two aspects of a retry policy, the policy itself and an associated factory. To implement a custom interface a user must derive from the abstract base class RetryPolicy and implement the relevant methods. Additionally, an associated factory class must be provided that implements the RetryPolicyFactory interface to generate unique instances for each logical operation. For simplicities sake the policies mentioned above implement the RetryPolicyFactory interface themselves, however it is possible to use two separate classes

Note about .NET Storage Client

During the development of the Java library we have identified many substantial improvements in the way our API can work. We are committed to bringing these improvements back to .NET while keeping in mind that many clients have built and deployed applications on the current API, so stay tuned.

Summary

We have put a lot of work into providing a truly first class development experience for the Java community to work with Windows Azure Storage. We very much appreciate all the feedback we have gotten from customers and through the forums, please keep it coming. Feel free to leave comments below,

Joe Giardino
Developer
Windows Azure Storage

Resources

Get the Windows Azure SDK for Java

Learn more about the Windows Azure Storage Client for Java

Learn more about Windows Azure Storage

↧

PartitionKey or RowKey containing the percent ‘%’ character causes some Windows Azure Tables APIs to fail

May 28, 2012, 3:12 pm

≫ Next: Character Encoding Issues Related to Copy Blob API

≪ Previous: Windows Azure Storage Client for Java Overview

Description and Symptoms

We have identified an issue that would affect services using Windows Azure Tables whenever the percent character ‘%’ appears as part of the PartitionKey or RowKey.

The affected APIs are GET entity, Merge Entity, Update Entity, Delete Entity, Insert Or Merge Entity and Insert Or Replace Entity APIs. If any of these APIs are invoked with a PartitionKey or RowKey that contains the ‘%’ character, the user could erroneously receive a 404 Not Found or 400 Bad Request error code. In addition, in the case of upsert (Insert Or Merge Entity and Insert Or Replace APIs), the request might succeed but the stored string might not be what the user intended it to be.

Note that Insert Entity, Entity Group Transactions and Query Entities APIs are not affected since the PartitionKey and RowKey is not part of the URL path segment.

Root Cause

The Windows Azure Table Service is double decoding the URL path segment when processing a request which is resulting in an erroneous interpretation of the string whenever the ‘%’ character appears. Note that the query string portion of the URL is not affected by this issue nor is any URL that appears as part of the HTTP body. Therefore, any other property filters used in a query will be unaffected by this issue – only PartitionKey and RowKey are affected.

Here is an example of how this issue occurs: Inserting an entity with PartitionKey = “Metric%25” and RowKey = “Count” would succeed, since PartitionKey, RowKey and custom values are part of the request payload and not the URL path segment. Now, when you intend to retrieve this existing entity, the Get Entity HTTP URL will look like:

http://foo.table.core.windows.net/Metrics(PartitionKey='Metric%2525',RowKey='Count')

However due to the double decoding bug, the PartitionKey is getting interpreted as “Metric%” on the server side which is not what the user intended. In this case, a 404 Not Found is returned.

Workarounds

If you did not currently commit any entities where ‘%’ is used as part of the PartitionKey or RowKey we suggest that you consider the following:

Avoid using ‘%’ as part of your PartitionKey and RowKey and consider replacing it with another character, for example ‘-‘.
Consider using URL safe Base64 encoding for your PartitionKey and RowKey values.

Note: Do not double encode your PartitionKey and RowKey values as a workaround, since this would not be compatible with future Windows Azure Tables releases when a fix is applied on the server side.

In case you already have inserted entities where ‘%’ appears as part of the PartitionKey or RowKey, we suggest the following workarounds:

For Get Entity:
- Use the Entity Group Transaction with an inner GET Entity command (refer to the example in the subsequent section)
- Use the Query Entities API by relying on the $Filter when retrieving a single entity. While this is not possible for users of the Windows Azure Storage Client library or the WCF Data Services Client library, this workaround is available to users who have control over the wire protocol. As an example, consider the following URL syntax when querying for the same entity mentioned in the “Root Cause” section above:
  http://foo.table.core.windows.net/Metrics()?$filter=(PartitionKey%20eq%20'Metric%2525')%20and%20(RowKey%20eq%20'Count')
For Update Entity, Merge Entity, Delete Entity, Insert Or Merge Entity and Insert Or Replace Entity APIs, use the Entity Group Transaction with the inner operation that you wish to perform. (refer the example in the subsequent section)

Windows Storage Client Library Workaround Code Example

Consider the case where the user has already inserted an entity with PartitionKey = “Metric%25” and RowKey = “Count”. The following code shows how to use the Windows Azure Storage Client Library in order to retrieve and update that entity. The code uses the Entity Group Transaction workaround mentioned in the previous section. Note that both the Get Entity and Update Entity operations are performed as a batch operation.

// Creating a Table Service ContextTableServiceContext tableServiceContext = new TableServiceContext(tableClient.BaseUri.ToString(), tableClient.Credentials);// Create a single point queryDataServiceQuery<MetricEntity> getEntityQuery = (DataServiceQuery<MetricEntity>) 
     from entity in tableServiceContext.CreateQuery<MetricEntity>(customersTableName)
     where entity.PartitionKey == "Metric%25" && entity.RowKey == "Count"select entity;// Create an entity group transaction with an inner Get Entity requestDataServiceResponse batchResponse = tableServiceContext.ExecuteBatch(getEntityQuery);// There is only one response as part of this batchQueryOperationResponse response = (QueryOperationResponse) batchResponse.First();if (response.StatusCode == (int) HttpStatusCode.OK)
{
    IEnumerator queryResponse = response.GetEnumerator();
    queryResponse.MoveNext();// Read this single entityMetricEntity  singleEntity = (MetricEntity)queryResponse.Current;// Updating the entitysingleEntity.MetricValue = 100;
    tableServiceContext.UpdateObject(singleEntity);// Make sure to save with the Batch optiontableServiceContext.SaveChanges(SaveChangesOptions.Batch);
}

Java Storage Client Workaround Code Example

As the issue discussed above is related to the service, the same behavior will exhibit when performing single entity operations using the Storage Client Library for Java. However, it is also possible to use Entity Group Transaction to work around this issue. The latest version that can be used to implement the proposed workaround can be found in here.

// Define a batch operation.TableBatchOperation batchOperation = new TableBatchOperation();// Retrieve the entitybatchOperation.retrieve("Metric%25", "Count", MetricEntity.class);// Submit the operation to the table service.tableClient.execute("foo", batchOperation);

For more on working with Tables via the Java Storage Client see: http://blogs.msdn.com/b/windowsazurestorage/archive/2012/03/05/windows-azure-storage-client-for-java-tables-deep-dive.aspx

Long Term Fix

We will be fixing this issue as part of a version change in a future release. We will update this post with the storage version that contains the fix.

We apologize for any inconvenience this may have caused.

Jean Ghanem

↧

Character Encoding Issues Related to Copy Blob API

May 28, 2012, 3:23 pm

≫ Next: Windows Azure Storage – 4 Trillion Objects and Counting

≪ Previous: PartitionKey or RowKey containing the percent ‘%’ character causes some Windows Azure Tables APIs to fail

This blog applies to the 2011-08-18 storage version or earlier of the Copy Blob API and the Windows Azure Storage Client Library version 1.6.

Two separate problems are discussed in this blog:

Over REST, the service expects the ‘+’ character appearing as part of the x-ms-copy-source header to be percent encoded. When the ‘+’ is not URL encoded, the service would interpret it as space ‘ ’ character.
The Windows Azure Storage Client Library is not URL percent encoding the x-ms-copy-source header value. This leads to a misinterpretation of x-ms-copy-source blob names that include the percent ‘%’ character.

When using Copy Blob, character ‘+’ appearing as part of the x-ms-copy-source header must be URL percent encoded

When using the Copy Blob API, the x-ms-copy-source header value must be URL percent encoded. However, when the server is decoding the string, it is converting character ‘+’ to a space which might not be compatible with the encoding rule applied by the client and in particular, the Windows Azure Storage Client Library.

Example: Assume that an application wants to copy from a source blob with the following key information: AccountName = “foo” ContainerName = “container” BlobName = “audio+video.mp4”

Using the Windows Azure Storage Client Library, the following value for the x-ms-copy-source header is generated and transmitted over the wire:

x-ms-copy-source: /foo/container/audio+video.mp4

When the data is received by the server, the blob name would then be interpreted as “audio video.mp4” which is not what the user intended. A compatible header would be:

x-ms-copy-source: /foo/container/audio%2bvideo.mp4

In that case, the server when decoding this header would interpret the blob name correctly as “audio+video.mp4”

NOTE: The described server behavior in this blog does not apply to the request URL but only applies to the x-ms-copy-source header that is used as part of the Copy Blob API with version 2011-08-18 or earlier.

To get correct Copy Blob behavior, please consider applying the following encoding rules for the x-ms-copy-source header:

URL percent encode character ‘+’ to “%2b”.
URL percent encode space i.e. character ‘ ‘ to “%20”. Note that if you currently happen to encode character space to character ‘+’, the current server behavior will interpret it as a space when decoding. However, this behavior is not compatible with the rule to decode request URLs where character ‘+’ is kept as a ‘+’ after decoding.
In case you are using the Windows Azure Storage Client Library, please apply the workaround at the end of this post.

Windows Azure Storage Client Library is not URL encoding the x-ms-copy-source header

As described in the previous section, x-ms-copy-source header must be URL percent encoded. However the Windows Azure Storage Client Library is transmitting the blob name in an un-encoded manner. Therefore any blob name that has percent ‘%’ in its name followed by a hex number will be misinterpreted on the server side.

Example: Assume that an application wants to copy from a source blob with the following key information: AccountName = “foo” ContainerName = “container” BlobName = “data%25.txt”

Using the Windows Azure Storage Client Library, the following un-encoded value for the x-ms-copy-source header is generated and transmitted over the wire:

x-ms-copy-source: /foo/container/data%25.txt

Data received by the server will be URL decoded and therefore the blob name would be interpreted as “data%.txt” which is not what the user intended. A compatible header would be:

x-ms-copy-source: /foo/container/data%2525.txt

In that case, the server when decoding this header would interpret the blob name correctly as “data%25.txt”

Note that this bug exists in Version 1.6 of the client library and will be fixed in future releases.

As described in the previous sections, the current behavior of Copy Blob APIs exposed by the client library will not work properly in case the characters ‘+’ or ‘%’ appear as part of the source blob name. The affected APIs are CloudBlob.CopyFromBlob and CloudBlob.BeginCopyFromBlob.

To get around this issue, we have provided the following extension method which creates a safe CloudBlob object that can be used as the sourceBlob with any of the copy blob APIs. Please note that the returned object should not be used to access the blob or to perform any action on it.

Note: This workaround is needed for Windows Azure Storage Library version 1.6.

Windows Azure Storage Client Library Code Workaround

Note: This workaround is needed for Windows Azure Storage Library version 1.6.

public static class CloudBlobCopyExtensions
{/// <summary>
    /// This method converts a CloudBlob to a version that can be safely used as a source for the CopyFromBlob or BeginCopyFromBlob APIs only./// The returned object must not be used to access the blob, neither should any of its API be invoked./// This method should only be used against storage version 2011-08-18 or earlier/// and with Windows Azure Storage Client Versions 1.6     /// </summary>
    /// <param name="originBlob">The source blob this being copied</param>
    /// <returns>CloudBlob that can be safely used as a source for the CopyFromBlob or BeginCopyFromBlob APIs only.</returns>public static CloudBlob GetCloudBlobReferenceAsSourceBlobForCopy(this CloudBlob originBlob)
        {
            UriBuilder uriBuilder = new UriBuilder();
            Uri srcUri = originBlob.Uri;// Encode the segment using UrlEncodestring encodedBlobName = HttpUtility.UrlEncode(
                                        HttpUtility.UrlEncode(
                                            originBlob.Name));string firstPart = srcUri.OriginalString.Substring(
                0, srcUri.OriginalString.Length - Uri.EscapeUriString(originBlob.Name).Length);string encodedUrl = firstPart + encodedBlobName;return new CloudBlob(
                encodedUrl,
                originBlob.SnapshotTime,
                originBlob.ServiceClient);
        }

}

Here is how the above method can be used:

// Create a blob by uploading data to itCloudBlob someBlob = container.GetBlobReference("a+b.txt");
someBlob.UploadText("test");
CloudBlob destinationBlob = container.GetBlobReference("a+b(copy).txt");// The below object should only be used when issuing a copy. Do not use sourceBlobForCopy to access the blobCloudBlob sourceBlobForCopy = someBlob.GetCloudBlobReferenceAsSourceBlobForCopy();
destinationBlob.CopyFromBlob(sourceBlobForCopy);

We will update this blog once we have fixed the service. We apologize for any inconvenience that this may have caused.

Jean Ghanem

↧

Windows Azure Storage – 4 Trillion Objects and Counting

July 19, 2012, 4:30 pm

≫ Next: Introducing Windows Azure Storage Client Library 2.0 for .NET and Windows Runtime

≪ Previous: Character Encoding Issues Related to Copy Blob API

Windows Azure Storage has had an amazing year of growth. We have over 4 trillion objects stored, process an average of 270,000 requests per second, and reach peaks of 880,000 requests per second.

About a year ago we hit the 1 trillion object mark. Then for the past 12 months, we saw an impressive 4x increase in number of objects stored, and a 2.7x increase in average requests per second.

The following graph shows the number of stored objects in Windows Azure Storage over the past year. The number of stored objects is counted on the last day of the month shown. The object count is the number of unique user objects stored in Windows Azure Storage, so the counts do not include replicas.

The following graph shows the average and peak requests per second. The average requests per second is the average over the whole month shown, and the peak requests per second is the peak for the month shown.

We expect this growth rate to continue, especially since we just lowered the cost of requests to storage by 10x. It now costs $0.01 per 100,000 requests regardless of request type (same cost for puts and gets). This makes object puts and gets to Windows Azure Storage 10x to 100x cheaper than other cloud providers.

In addition, we now offer two types of durability for your storage – Locally Redundant Storage (LRS) and Geo Redundant Storage (GRS). GRS is the default storage that we have always provided, and now we are offering a new type of storage called LRS. LRS is offered at a discount and provides locally redundant storage, where we maintain an equivalent 3 copies of your data within a given location. GRS provides geo-redundant storage, where we maintain an equivalent 6 copies of your data spread across 2 locations at least 400 miles apart from each other (3 copies are kept in each location). This allows you to choose the desired level of durability for your data. And of course, if your data does not require the additional durability of GRS you can use LRS at a 23% to 34% discounted price (depending on how much data is stored). In addition, we also employ a sophisticated erasure coding scheme for storing data that provides higher durability than just storing 3 (for LRS) or 6 (for GRS) copies of your data, while at the same time keeping the storage overhead low, as described in our USENIX paper.

We are also excited about our recent release of Windows Azure Virtual Machines, where the persistent disks are stored as objects (blobs) in Windows Azure Storage. This allows the OS and data disks used by your VMs to leverage the same LRS and GRS durability provided by Windows Azure Storage. With that release we also provided access to Windows Azure Storage via easy to use client libraries for many popular languages (.net, java, node.js, php, and python), as well as REST.

Windows Azure Storage uses a unique approach of storing different object types (Blobs, Disks/Drives, Tables, Queues) in the same store, as described in our SOSP paper. The total number of blobs (disk/drives are stored as blobs), table entities, and queue messages stored account for the 4+ trillion objects in our unified store. By blending different types of objects across the same storage stack, we have a single stack for replicating data to keep it durable, a single stack for automatic load balancing and dealing with failures to keep data available, and we store all of the different types of objects on the same hardware, blending their workloads, to keep prices low. This allows us to have one simple pricing model for all object types (same cost in terms of GB/month, bandwidth, as well as transactions), so customers can focus on choosing the type of object that best fits their needs, instead of being forced to use one type of object over another due to price differences.

We are excited about the growth ahead and continuing to work with customers to provide a quality service. Please let us know if you have any feedback, questions or comments! If you would like to learn more about Windows Azure, click here.

Brad Calder

↧

Introducing Windows Azure Storage Client Library 2.0 for .NET and Windows Runtime

October 29, 2012, 1:00 am

≫ Next: Windows Azure Storage Client Library 2.0 Breaking Changes & Migration Guide

≪ Previous: Windows Azure Storage – 4 Trillion Objects and Counting

Today we are releasing version 2.0 of the Windows Azure Storage Client Library. This is our largest update to our .NET library to date which includes new features, broader platform compatibility, and revisions to address the great feedback you’ve given us over time. The code is available on GitHub now. The libraries are also available through NuGet, and also included in the Windows Azure SDK for .NET - October 2012; for more information and links see below. In addition to the .NET 4.0 library, we are also releasing two libraries for Windows Store apps as Community Technology Preview (CTP) that fully supports the Windows Runtime platform and can be used to build modern Windows Store apps for both Windows RT (which supports ARM based systems), and Windows 8, which runs in any of the languages supported by Windows Store apps (JavaScript, C++, C#, and Visual Basic). This blog post serves as an overview of these libraries and covers some of the implementation details that will be helpful to understand when developing cloud applications in .NET regardless of platform.

What’s New

We have introduced a number of new features in this release of the Storage Client Library including:

Simplicity and Usability - A greatly simplified API surface which will allow developers new to storage to get up and running faster while still providing the extensibility for developers who wish to customize the behavior of their applications beyond the default implementation.
New Table Implementation - An entirely new Table Service implementation which provides a simple interface that is optimized for low latency/high performance workloads, as well as providing a more extensible serialization model to allow developers more control over their data.
Rich debugging and configuration capabilities– One common piece of feedback we receive is that it’s too difficult to know what happened “under the covers” when making a call to the storage service. How many retries were there? What were the error codes? The OperationContext object provides rich debugging information, real-time status events for parallel and complex actions, and extension points allowing users the ability to customize requests or enable end to end client tracing
Windows Runtime Support - A Windows Runtime component with support for developing Windows Store apps using JavaScript, C++,C#, and Visual Basic; as well as a Strong Type Tables Extension library for C++, C#, and Visual Basic
Complete Sync and Asynchronous Programming Model (APM) implementation - A complete Synchronous API for .Net 4.0. Previous releases of the client implemented synchronous methods by simply surrounding the corresponding APM methods with a ManualResetEvent, this was not ideal as extra threads remained blocked during execution. In this release all synchronous methods will complete work on the thread in which they are called with the notable exceptions of the stream implementations available via Cloud[Page|Block]Blob.Open[Read|Write]due to parallelism.
Simplified RetryPolicies - Easy and reusable RetryPolicies
.NET Client Profile– The library now supports the .NET Client Profile. For more on the .Net Client Profile see here.
Streamlined Authentication Model - There is now a single StorageCredentials type that supports Anonymous, Shared Access Signature, and Account and Key authentication schemes
Consistent Exception Handling - The library immediately will throw any exception encountered prior to making the request to the server. Any exception that occurs during the execution of the request will subsequently be wrapped inside a single StorageException type that wraps all other exceptions as well as providing rich information regarding the execution of the request.
API Clarity - All methods that make requests to the server are clearly marked with the [DoesServiceRequest] attribute
Expanded Blob API - Blob DownloadRange allows user to specify a given range of bytes to download rather than rely on a stream implementation
Blob download resume - A feature that will issue a subsequent range request(s) to download only the bytes not received in the event of a loss of connectivity
Improved MD5 - Simplified MD5 behavior that is consistent across all client APIs
Updated Page Blob Implementation - Full Page Blob implementation including read and write streams
Cancellation - Support for Asynchronous Cancellation via the ICancellableAsyncResult. Note, this can be used with .NET CancellationTokens via the CancellationToken.Register() method.
Timeouts - Separate client and server timeouts which support end to end timeout scenarios
Expanded Azure Storage Feature Support– It supports the 2012-02-12 REST API version with implementation for for Blob & Container Leases, Blob, Table, and Queue Shared Access Signatures, and Asynchronous Cross-Account Copy Blob

Design

When designing the new Storage Client for .NET and Windows Runtime, we set up a series of design guidelines to follow throughout the development process. In addition to these guidelines, there are some unique requirements when developing for Windows Runtime, and specifically when projecting into JavaScript, that has driven some key architectural decisions.

For example, our previous RetryPolicy was based on a delegate that the user could configure; however as this cannot be supported on all platforms we have redesigned the RetryPolicy to be a simple and consistent implementation everywhere. This change has also allowed us to simplify the interface in order to address user feedback regarding the complexity of the previous implementation. Now a user who constructs a custom RetryPolicy can re-use that same implementation across platforms.

Windows Runtime

A key driver in this release was expanding platform support, specifically targeting the upcoming releases of Windows 8, Windows RT, and Windows Server 2012. As such, we are releasing the following two Windows Runtime components to support Windows Runtime as Community Technology Preview (CTP):

Microsoft.WindowsAzure.Storage.winmd - A fully projectable storage client that supports JavaScript, C++, C#, and VB. This library contains all core objects as well as support for Blobs, Queues, and a base Tables Implementation consumable by JavaScript
Microsoft.WindowsAzure.Storage.Table.dll – A table extension library that provides generic query support and strong type entities. This is used by non-JavaScript applications to provide strong type entities as well as reflection based serialization of POCO objects

Breaking Changes

With the introduction of Windows 8, Windows RT, and Windows Server 2012 we needed to broaden the platform support of our current libraries. To meet this requirement we have invested significant effort in reworking the existing Storage Client codebase to broaden platform support, while also delivering new features and significant performance improvements (more details below). One of the primary goals in this version of the client libraries was to maintain a consistent API across platforms so that developer’s knowledge and code could transfer naturally from one platform to another. As such, we have introduced some breaking changes from the previous version of the library to support this common interface. We have also used this opportunity to act on user feedback we have received via the forums and elsewhere regarding both the .Net library as well as the recently released Windows Azure Storage Client Library for Java. For existing users we will be posting an upgrade guide for breaking changes to this blog that describes each change in more detail.

Please note the new client is published under the same NuGet package as previous 1.x releases. As such, please check any existing projects as an automatic upgrade will introduce breaking changes.

Additional Dependencies

The new table implementation depends on three libraries (collectively referred to as ODataLib), which are resolved through the ODataLib (version 5.0.2) packages available through NuGet and not the WCF Data Services installer which currently contains 5.0.0 versions. The ODataLib libraries can be downloaded directly or referenced by your code project through NuGet. The specific ODataLib packages are:

http://nuget.org/packages/Microsoft.Data.OData/5.0.2

http://nuget.org/packages/Microsoft.Data.Edm/5.0.2

http://nuget.org/packages/System.Spatial/5.0.2

Namespaces

One particular breaking change of note is that the name of the assembly and root namespace has moved to Microsoft.WindowsAzure.Storage instead of Microsoft.WindowsAzure.StorageClient.In addition to aligning better with other Windows Azure service libraries this change allows developers to use the legacy 1.X versions of the library and the 2.0 release side-by-side as they migrate their applications. Additionally, each Storage Abstraction (Blob, Table, and Queue) has now been moved to its own sub-namespace to provide a more targeted developer experience and cleaner IntelliSense experience. For example the Blob implementation is located in Microsoft.WindowsAzure.Storage.Blob, and all relevant protocol constructs are located in Microsoft.WindowsAzure.Storage.Blob.Protocol.

Testing, stability, and engaging the open source community

We are committed to providing a rock solid API that is consistent, stable, and reliable. In this release we have made significant progress in increasing test coverage as well as breaking apart large test scenarios into more targeted ones that are more consumable by the public.

Microsoft and Windows Azure are making great efforts to be as open and transparent as possible regarding the client libraries for our services. The source code for all the libraries can be downloaded via GitHub under the Apache 2.0 license. In addition we have provided over 450 new Unit Tests for the .Net 4.0 library alone. Now users who wish to modify the codebase have a simple and light weight way to validate their changes. It is also important to note that most of these tests run against the Storage Emulator that ships via the Windows Azure SDK for .NET allowing users to execute tests without incurring any usage on their storage accounts. We will also be providing a series of higher level scenarios and How-To’s to get users up and running both simple and advanced topics relating to using Windows Azure Storage.

Summary

We have put a lot of work into providing a truly first class development experience for the .NET community to work with Windows Azure Storage. In addition to the content provided in these blog posts we will continue to release a series of additional blog posts which will target various features and scenarios in more detail, so check back soon. Hopefully you can see your past feedback reflected in this new library. We really do appreciate the feedback we have gotten from the community, so please keep it coming by leaving a comment below or participating on our forums.

Joe Giardino
Serdar Ozler
Justin Yu
Veena Udayabhanu

Windows Azure Storage

Resources

Get the Windows Azure SDK for .Net

↧

Windows Azure Storage Client Library 2.0 Breaking Changes & Migration Guide

October 29, 2012, 3:00 am

≫ Next: Windows Azure Storage Emulator 1.8

≪ Previous: Introducing Windows Azure Storage Client Library 2.0 for .NET and Windows Runtime

The recently released Windows Azure Storage Client Library for .Net includes many new features, expanded platform support, extensibility points, and performance improvements. In developing this version of the library we made some distinct breaks with Storage Client 1.7 and prior in order to support common paradigms across .NET and Windows Runtime applications. Additionally, we have addressed distinct pieces of user feedback from the forums and users we’ve spoken with. We have made great effort to provide a stable platform for clients to develop their applications on and will continue to do so. This blog post serves as a reference point for these changes as well as a migration guide to assist clients in migrating existing applications to the 2.0 release. If you are new to developing applications using the Storage Client in .Net you may want to refer to the overview here to get acquainted with the basic concepts. This blog post will focus on changes and future posts will be introducing the concepts that the Storage Client supports.

Namespaces

The core namespaces of the library have been reworked to provide a more targeted Intellisense experience, as well as more closely align with the programming experience provided by other Windows Azure Services. The root namespace as well as the assembly name itself have been changed from Microsoft.WindowsAzure.StorageClient to Microsoft.WindowsAzure.Storage. Additionally, each service has been broken out into its own sub namespace. For example the blob implementation is located in Microsoft.WindowsAzure.Storage.Blob, and all protocol relevant constructs are in Microsoft.WindowsAzure.Storage.Blob.Protocol. Note: Windows Runtime component will not expose Microsoft.WindowsAzure.Storage.[Blob|Table|Queue].Protocol namespaces as they contain dependencies on .Net specific types and are therefore not projectable.

The following is a detailed listing of client accessible namespaces in the assembly.

Microsoft.WindowsAzure.Storage– Common types such as CloudStorageAccount and StorageException. Most applications should include this namespace in their using statements.
Microsoft.WindowsAzure.Storage.Auth – The StorageCredentials object that is used to encapsulate multiple forms of access (Account & Key, Shared Access Signature, and Anonymous).
Microsoft.WindowsAzure.Storage.Auth.Protocol – Authentication handlers that support SharedKey and SharedKeyLite for manual signing of requests
Microsoft.WindowsAzure.Storage.Blob – Blob convenience implementation, applications utilizing Windows Azure Blobs should include this namespace in their using statements

Microsoft.WindowsAzure.Storage.Blob.Protocol – Blob Protocol layer

Microsoft.WindowsAzure.Storage.Queue – Queue convenience implementation, applications utilizing Windows Azure Queues should include this namespace in their using statements

Microsoft.WindowsAzure.Storage.Queue.Protocol – Queue Protocol layer

Microsoft.WindowsAzure.Storage.Table – New lightweight Table Service implementation based on OdataLib. We will be posting an additional blog that dives into this new Table implementation in more greater detail.

Microsoft.WindowsAzure.Storage.Table.DataServices –The legacyTable Service implementation based on System.Data.Services.Client. This includes TableServiceContext, CloudTableQuery, etc.
Microsoft.WindowsAzure.Storage.Table.Protocol – Table Protocol layer implementation

Microsoft.WindowsAzure.Storage.RetryPolicies - Default RetryPolicy implementations (NoRetry, LinearRetry, and ExponentialRetry) as well as the IRetryPolicy interface
Microsoft.WindowsAzure.Storage.Shared.Protocol – Analytics objects and core HttpWebRequestFactory

What’s New

Added support for the .NET Client Profile, allowing for easier installation of your application on machines where the full .NET Framework has not been installed.
There is a new dependency on the three libraries released as OdataLib, which are available via nuget and codeplex.

A reworked and simplified codebase that shares a large amount of code between platforms
Over 450 new unit tests published to GitHub
All APIs that execute a request against the storage service are marked with the DoesServiceRequest attribute
Support for custom user headers
OperationContext– Provides an optional source of diagnostic information about how a given operation is executing. Provides mechanism for E2E tracing by allowing clients to specify a client trace id per logical operation to be logged by the Windows Azure Storage Analytics service.
True “synchronous” method support. SDK 1.7 implemented synchronous methods by simply wrapping a corresponding Asynchronous Programming Model (APM) method with a ManualResetEvent. In this release all work is done on the calling thread. This excludes stream implementations available via Cloud[Page|Block]Blob.OpenRead and OpenWrite and parallel uploads.
Support for Asynchronous cancellation via ICancellableAsyncResult. Note this can be hooked up to .NET cancellation tokens via the Register() method as illustrated below:

…
ICancellableAsyncResult result = container.BeginExists(callback, state);
token.Register((o) => result.Cancel(), null /* state */);
…

Timeouts – The Library now allows two separate timeouts to be specified. These timeouts can be specified directly on the service client (i.e. CloudBlobClient) or overridden via the RequestOptions. These timeouts are nullable and therefore can be disabled.

The ServerTimeout is the timeout given to the server for each request executed in a given logical operation. An operation may make more than one requests in the case of a Retry, parallel upload etc., the ServerTimeout is sent for each of these requests. This is set to 90 seconds by default.
The MaximumExecutionTime provides a true end to end timeout. This timeout is a client side timeout that spans all requests, including any potential retries, a given operation may execute. This is disabled by default.

Full PageBlob support including lease, cross account copy, and read/write streams
Cloud[Block|Page]Blob DownloadRange support
Blobs support download resume, in the event of an error the subsequent request will be truncated to specify a range at the correct byte offset.
The default MD5 behavior has been updated to utilize a FIPS compliant implementation. To use the default .NET MD5 please set CloudStorageAccount.UseV1MD5 = true;

Breaking Changes

General

Dropped support for .NET Framework 3.5, Clients must use .Net 4.0 or above
Cloud[Blob|Table|Queue]Client.ResponseReceived event has been removed, instead there are SendingRequest and ResponseReceived events on the OperationContext which can be passed into each logical operation
All Classes are sealed by default to maintain consistency with Windows RT library
ResultSegments are no longer generic. For example, in Storage Client 1.7 there is a ResultSegment<CloudTable>, while in 2.0 there is a TableResultSegment to maintain consistency with Windows RT library.
RetryPolicies

The Storage Client will no longer prefilter certain types of exceptions or HTTP status codes prior to evaluating the users RetryPolicy. The RetryPolicies contained in the library will by default not retry 400 class errors, but this can be overridden by implementing your own policy
A retry policy is now a class that implements the IRetryPolicy interface. This is to simplify the syntax as well as provide commonality with the Windows RT library

StorageCredentials

CloudStorageAccount.SetConfigurationSettingPublisher has been removed. Instead the members of StorageCredentials are now mutable allowing users to accomplish similar scenarios in a more streamlined manner by simply mutating the StorageCredentials instanceassociated with a given client(s) via the provided UpdateKey methods.
All credentials types have been simplified into a single StorageCredentials object that supports Anonymous requests, Shared Access Signature, and Account and Key authentication.

Exceptions

StorageClientException and StorageServerException are now simplified into a single Exception type: StorageException. All APIs will throw argument exceptions immediately; once a request is initiated all other exceptions will be wrapped.
StorageException no longer directly contains ExtendedErrorInformation. This has been moved inside the RequestResult object which tracks the current state of a given request

Pagination has been simplified. A segmented result will simply return up to the maximum number of results specified. If a continuation token is received it is left to the user to make any subsequent requests to complete a given page size.

Blobs

All blobs must be accessed via CloudPageBlob or CloudBlockBlob, the CloudBlob base class has been removed. To get a reference to the concrete blob class when the client does not know the type please see the GetBlobReferenceFromServer on CloudBlobClient and CloudBlobContainer
In an effort to be more transparent to the application layer the default parallelism is now set to 1 for blob clients. (This can be configured via CloudBlobClient.ParallelOperationThreadCount) In previous releases of the sdk, we observed many users scheduling multiple concurrent blob uploads to more fully exploit the parallelism of the system. However, when each of these operations was internally processing up to 8 simultaneous operations itself there were some adverse side effects on the system. By setting parallelism to 1 by default it is now up to the user to opt in to this concurrent behavior.
CloudBlobClient.SingleBlobUploadThresholdInBytes can now be set as low as 1 MB.
StreamWriteSizeInBytes has been moved to CloudBlockBlob and can now be set as low as 16KB. Please note that the maximum number of blocks a blob can contain is 50,000 meaning that with a block size of 16KB, the maximum blob size that can be stored is 800,000KB or ~ 781 MB.
All upload and download methods are now stream based, the FromFile, ByteArray, Text overloads have been removed.
The stream implementation available via CloudBlockBlob.OpenWrite will no longer encode MD5 into the block id. Instead the block id is now a sequential block counter appended to a fixed random integer in the format of [Random:8]-[Seq:6].
For uploads if a given stream is not seekable it will be uploaded via the stream implementation which will result in multiple operations regardless of length. As such, when available it is considered best practice to pass in seekable streams.
MD5 has been simplified, all methods will honor the three MD5 related flags exposed via BlobRequestOptions

StoreBlobContentMD5 – Stores the Content MD5 on the Blob on the server (default to true for Block Blobs and false for Page Blobs)
UseTransactionalMD5 – Will ensure each upload and download provides transactional security via the HTTP Content-MD5 header. Note: When enabled, all Download Range requests must be 4MB or less. (default is disabled, however any time a Content-MD5 is sent by the server the client will validate it unless DisableContentMD5Validation is set)
DisableContentMD5Validation – Disables any Content-MD5 validation on downloads. This is needed to download any blobs that may have had their Content-MD5 set incorrectly
Cloud[Page|Block]Blob no longer exposes BlobAttributes. Instead the BlobProperties, Metadata, Uri, etc. are exposed on the Cloud[Page|Block]Blob object itself

The stream available via Cloud[Page|Block]Blob.OpenRead() does not support multiple Asynchronous reads prior to the first call completing. You must first call EndRead prior to a subsequent call to BeginRead.
Protocol

All blob Protocol constructs have been moved to the Microsoft.WindowsAzure.Storage.Blob.Protocol namespace. BlobRequest and BlobResponse have been renamed to BlobHttpWebRequestFactory and BlobHttpResponseParsers respectively.
Signing Methods have been removed from BlobHttpWebRequestFactory, alternatively use the SharedKeyAuthenticationHandler in the Microsoft.WindowsAzure.Storage.Auth.Protocol namespace

Tables

New Table Service Implementation - A new lightweight table implementation is provided in the Microsoft.WindowsAzure.Storage.Table namespace. Note: For backwards compatibility the Microsoft.WindowsAzure.Storage.Table.DataServices.TableServiceEntity was not renamed, however this entity type is not compatible with the Microsoft.WindowsAzure.Storage.Table.TableEntity as it does not implement ITableEntity interface.
DataServices

The legacy System.Data.Services.Client based implementation has been migrated to the Microsoft.WindowsAzure.Storage.Table.DataServices namespace.
The CloudTableClient.Attach method has been removed. Alternatively, use a new TableServiceContext
TableServiceContext will now protect concurrent requests against the same context. To execute concurrent requests please use a separate TableServiceContext per logical operation.
TableServiceQueries will no longer rewrite the take count in the URI query string to take smaller amounts of entities based on the legacy pagination construct. Instead, the client side Lazy Enumerable will stop yielding results when the specified take count is reached. This could potentially result in retrieving a larger number of entities from the service for the last page of results. Developers who need a finer grained control over the pagination of their queries should leverage the segmented execution methods provided.

Protocol

All Table protocol constructs have been moved to the Microsoft.WindowsAzure.Storage.Table.Protocol namespace. TableRequest and TableResponse have been renamed to TableHttpWebRequestFactory and TableHttpResponseParsers respectively.
Signing Methods have been removed from TableHttpWebRequestFactory, alternatively use the SharedKeyLiteAuthenticationHandler in the Microsoft.WindowsAzure.Storage.Auth.Protocol namespace

Queues

Protocol

All Queue protocol constructs have been moved to the Microsoft.WindowsAzure.Storage.Queue.Protocol namespace. QueueRequest and QueueResponse have been renamed to QueueHttpWebRequestFactory and QueueHttpResponseParsers respectively.
Signing Methods have been removed from QueueHttpWebRequestFactory, alternatively use the SharedKeyAuthenticationHandler in the Microsoft.WindowsAzure.Storage.Auth.Protocol namespace

Migration Guide

In addition to the detailed steps above, below is a simple migration guide to help clients begin migrating existing applications.

Namespaces

A legacy application will need to update their “using” to include:

Microsoft.WindowsAzure.Storage
If using credentials types directly add a using statement to Microsoft.WindowsAzure.Storage.Auth
If you are using a non-default RetryPolicy add a using statement to Microsoft.WindowsAzure.Storage.RetryPolicies
For each Storage abstraction add the relevant using statement Microsoft.WindowsAzure.Storage.[Blob|Table|Queue]

Blobs

Any code that access a blob via CloudBlob will have to be updated to use the concrete types CloudPageBlob and CloudBlockBlob. The listing methods will return the correct object type, alternatively you may discern this from via FetchAttributes(). To get a reference to the concrete blob class when the client does not know the type please see the GetBlobReferenceFromServer on CloudBlobClient and CloudBlobContainer objects
Be sure to set the desired Parallelism via CloudBlobClient.ParallelOperationThreadCount
Any code that may rely on the internal MD5 semantics detailed here, should update to set the correct MD5 flags via BlobRequestOptions

Tables

If you are migrating an existing Table application you can choose to re-implement it via the new simplified Table Service implementation, otherwise add a using to the Microsoft.WindowsAzure.Storage.Table.DataServices namespace

DataServiceContext (the base implementation of the TableServiceContext) is not threadsafe, subsequently it has been considered best practice to avoid concurrent requests against a single context, though not explicitly prevented. The 2.0 release will now protect against simultaneous operations on a given context. Any code that may rely on concurrent requests on the same TableServiceContext should be updated to execute serially, or utilize multiple contexts.

Summary

This blog posts serves as a guide to the changes introduced by the 2.0 release of the Windows Azure Storage Client libraries.

We very much appreciate all the feedback we have gotten from customers and through the forums, please keep it coming. Feel free to leave comments below,

Joe Giardino
Serdar Ozler
Veena Udayabhanu
Justin Yu

Windows Azure Storage

Resources

Get the Windows Azure SDK for .Net

↧

Windows Azure Storage Emulator 1.8

October 30, 2012, 3:00 am

≫ Next: Known Issues for Windows Azure Storage Client Library 2.0 for .NET and Windows Runtime

≪ Previous: Windows Azure Storage Client Library 2.0 Breaking Changes & Migration Guide

In our continuous endeavor to enrich the development experience, we are extremely pleased to announce the new Storage Emulator, which has much improved parity with the Windows Azure Storage cloud service.

What is Storage Emulator?

Storage Emulator emulates the Windows Azure Storage blob, table and queue cloud services on local machine which helps developers in getting started and basic testing of their storage applications locally without incurring the cost associated with cloud service. This version of Windows Azure Storage emulator supports Blob, Tables and Queues up until REST version 2012-02-12.

How it works?

Storage Emulator exposes different http end points (port numbers: 10000 for blob, 10001 for queue and 10002 for table services) on local host to receive and serve storage requests. Upon receiving a request, the emulator validates the request for its correctness, authenticates it, authorizes (if necessary) it, works with the data in SQL tables and file system and finally sends a response to the client.

Delving deeper into the internals, Storage Emulator efficiently stores the data associated with queues and tables in SQL tables. However, for blobs, it stores the metadata in SQL tables and actual data on local disk one file for each blob, for better performance. When deleting blobs, the Storage Emulator does not synchronously clean up unreferenced blob data on disk while performing blob operations. Instead it compacts and garbage collects such data in the background for better scalability and concurrency.

Storage Emulator Dependencies:

SQL Express or LocalDB
.NET 4.0 or later with SQL Express or .NET 4.0.2 or later with LocalDB

Installing Storage Emulator

Storage Emulator can work with LocalDB, SQL express or even a full blown SQL server as its SQL store.

The following steps would help in getting started with emulator using LocalDB.

Install .NET framework 4.5 from here.
Install X64 or X86 LocalDB from here.
Install the Windows Azure Emulator from here.

Alternatively, if you have storage emulator 1.7 installed, you can do an in place update to the existing emulator. Please note that storage emulator 1.8 uses a new SQL schema and hence a DB reset is required for doing an in place update, which would result in loss of your existing data.

The following steps would help in performing an in place update.

Shutdown the storage emulator, if running
Replace the binaries ‘Microsoft.WindowsAzure.DevelopmentStorage.Services.dll’, ‘Microsoft.WindowsAzure.DevelopmentStorage.Store.dll’ and ‘Microsoft.WindowsAzure.DevelopmentStorage.Storev4.0.2.dll’, located at storage emulator installation path (Default path is "%systemdrive%\Program Files\Microsoft SDKs\Windows Azure\Emulator\devstore") with those available here.
Open up the command prompt in admin mode and run ‘dsinit /forceCreate’ to recreate the DB. You can find the ‘dsinit’ tool at the storage emulator installation path.
Start the storage emulator

What’s new in 1.8?

Storage emulator 1.8 supports the REST version 2012-02-12, along with earlier versions. Below are the service specific enhancements.

Blob Service Enhancements:

In 2012-02-12 REST version, Windows Azure Storage cloud service introduced support for container leases, improved blob leases and asynchronous copy blob across different storage accounts. Also, there were enhancements for blob shared access signatures and blob leases in the 2012-02-12 version. All those new features are supported in Storage Emulator 1.8.

Since the emulator has just one built in account, one can initiate cross account copy blob by providing a valid cloud based URL. Emulator serves such cross account copy blob requests, asynchronously, by downloading the blob data, in chunks of 4MB, and updating the copy status.

To know more about the new features in general, the following links would be helpful:

Storage Emulator 1.8 also garbage collects the unreferenced page blob files which may be produced as a result of delete blob requests, failed copy blob requests etc.

Queue Service Enhancements:

In 2012-02-12 REST version, Windows Azure Storage cloud service introduced support for Queue shared access signatures (SAS). Storage Emulator 1.8 supports Queue SAS.

Table Service Enhancements:

In 2012-02-12 REST version, Windows Azure Storage cloud service introduced support for table shared access signatures (SAS). Storage Emulator 1.8 supports Table SAS.

In order to achieve full parity with Windows Azure Storage table service APIs, the table service in emulator is completely rewritten from scratch to support truly schema less tables and expose data for querying and updating using ODATA protocol. As a result, Storage Emulator 1.8 fully supports the below table operations which were not supported in Emulator 1.7.

Query Projection: You can read more about it here.
Upsert operations: You can read more about it here.

Known Issues/Limitations

The storage emulator supports only a single fixed account and a well-known authentication key. They are: Account name: devstoreaccount1, Account key: Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==
The URI scheme supported by the storage emulator differs from the URI scheme supported by the cloud storage services. The development URI scheme specifies the account name as part of the hierarchical path of the URI, rather than as part of the domain name. This difference is due to the fact that domain name resolution is available in the cloud but not on the local computer. For more information about URI differences in the development and production environments, see “Using storage service URIs” section in Overview of Running a Windows Azure Application with the Storage Emulator.
The storage emulator does not support Set Blob Service Properties or SetServiceProperties for blob, queue and table services.
Date properties in the Table service in the storage emulator support only the range supported by SQL Server 2005 (For example, they are required to be later than January 1, 1753). All dates before January 1, 1753 are changed to this value. The precision of dates is limited to the precision of SQL Server 2005, meaning that dates are precise to 1/300th of a second.
The storage emulator supports partition key and row key property values of less than 900 bytes. The total size of the account name, table name, and key property names together cannot exceed 900 bytes.
The storage emulator does not validate that the size of a batch in an entity group transaction is less than 4 MB. Batches are limited to 4 MB in Windows Azure, so you must ensure that a batch does not exceed this size before transitioning to the Windows Azure storage services.
Avoid using ‘PartitionKey’ or ‘RowKey’ that contains ‘%’ character due to the double decoding bug
Get messages from queue might not return messages in the strict increasing order of message’s ‘Insertion TimeStamp’ + ‘visibilitytimeout’

Summary

Storage Emulator 1.8 has a great extent of parity with the Windows Azure Storage cloud service in terms of API support and usability and we will continue to improve it. We hope you all like it and please share your feedback with us to make it better.

Nagarjun Guraja

Windows Azure Storage

↧

Known Issues for Windows Azure Storage Client Library 2.0 for .NET and Windows Runtime

November 1, 2012, 8:38 am

≫ Next: (Cross-Post) Windows Azure’s Flat Network Storage and 2012 Scalability Targets

≪ Previous: Windows Azure Storage Emulator 1.8

An updated Know Issues Blog Post can be found by clicking on this link

The client issues detailed in this blog have been resolved in version 2.0.4 or earlier, and you can obtain the latest NuGet Package here.

We recently released the 2.0 version of the Windows Azure Storage Client Library. This is our largest update to our .NET library to date which includes new features, broader platform compatibility, and revisions to address the great feedback you’ve given us over time. For more about this release see here. For information regarding breaking changes see here.

This SDK 2.0 release contains a few known issues that will be addressed in the next release of the libraries and are detailed below.

Known Issues

Service Client Retry Policy does not support null

The Cloud[Blob|Queue|Table]Client.RetryPolicy does not support null, if you wish to disable retries simply use RetryPolicies.NoRetry (client.RetryPolicy = new NoRetry()).

CloudStorageAccount.Parse cannot parse DevelopmentStorageAccount strings if a proxy is not specified.

CloudStorageAccount.Parse() and TryParse() do not support DevelopmentStorageAccount strings if a proxy is not specified. CloudStorageAccount.DevelopmentStorageAccount.ToString() will serialize to the string: “UseDevelopmentStorage=true” which illustrate this particular issue. Passing this string into CloudStorageAccount.Parse() or TryParse () will throw a KeyNotFoundException.

The example below illustrates this issue:

// Will serialize to "UseDevelopmentStorage=true"
CloudStorageAccount myAccount = CloudStorageAccount.DevelopmentStorageAccount;
CloudStorageAccount.Parse(myAccount.ToString());  // Will Throw KeyNotFoundException

To work around this issue you may specify a proxy Uri as below:

// Will serialize to "UseDevelopmentStorage=true;DevelopmentStorageProxyUri=http://ipv4.fiddler"
CloudStorageAccount myAccount = CloudStorageAccount.GetDevelopmentStorageAccount(new Uri("http://ipv4.fiddler"));
CloudStorageAccount.Parse(myAccount.ToString());

Summary

We continue to work hard on delivering a first class development experience for the .Net community to work with Windows Azure Storage. We will address these issues in upcoming releases of the SDK and on GitHub.

Joe Giardino
Windows Azure Storage

Resources

Get the Windows Azure SDK for .Net

↧

Introduction

Table Data Model

Operations on Table

Creating Tables and Inserting/Updating Entities

Options for saving changes

Entity Tracking

Null Values

Queries

Example 1

Example 2

Large result sets and Continuation Tokens

Best Practices and Tips for Programming Windows Azure Tables

Partitions

Key Selection

Entity Group Transactions

Scalability

Single PartitionKey Value

New PartitionKey Value for Every Entity

Stay Away From “Append (or Prepend) Only” Pattern (if possible) For High Scale Access

Selecting Partition Key for Efficient Queries

Option 1

Option 2

Performance Case Study Methodology

Performance Results and Tips

Selecting the Appropriate Request Size

Scalability of the table

Scalability Summary

Timeouts and ServerBusy – Is it normal?

Conclusion

Additional Resources

Update 3/09/011: The bug is fixed in the Windows Azure SDK March 2011 release.

Summary

Workaround

Summary

Affected Scenarios

Example

Workarounds

Mitigations

Summary

References

Breaking Changes

Other Bug Fixes and Improvements

Workarounds

Summary

Standard Retry Policies

Creating a custom retry policy

Example 1: Simply linear retry policy

Example 2: Complex retry policy which examines the last exception and does not retry on 502 errors

Example 3: A custom Exponential backoff retry policy

Summary

References

What does ReadAheadSize property do?

Mitigation

Summary

MD5

Sparse Page Blob

Download Resume

Best Practices

Summary

Resources

Design

Packages

Object Model

Execution

Actions

Entities

Serialization

Querying

Scenarios

Persisting 3rd party objects

Best Practices

Table Samples

Setup

Samples

Summary

Resources

Packages

Services

Blob

Table

Persisting 3^rd party objects