Selvatech.com : in the jungle of financial I.T.

Monday, June 27, 2005

An in-depth examination of Yahoo! Financial Web Services

As I was building a statistical tools on stock quotes, I realized that I didn't find any reliable source or article describing the services provided by Yahoo! Finance.

I am using these services for three years now, and I have to admit that it is one of the best tool for anyone working on stock quotations (all over the world) and... it's free (massive argument for testing) !! There are so many data sources (Telekurs, Comstock...), that it's best to have a look and try to see wich company provides it.

All datas are available on web page (see
http://finance.yahoo.com/q?s=GE for a summary of financial information on stock code GE : General electric for example), some datas are available as Excel spreadsheet download, and, finally, some datas are available as "Web Services". Well, it's more a proprietary service on internet than a Web Services (no web service description file...).

I will go into details about 3 services : quotation service, index components service and historical data service.


Quotation Web Service

As you can see on http://finance.yahoo.com/q?s=GE page, there is a small "Download data" link. This link will download a CSV file using the hyperlink http://finance.yahoo.com/d/quotes.csv?s=GE&f=sl1d1t1c1ohgv&e=.csv .

The data received look like the following :

GE,34.74,"6/27/2005","1:25pm",-0.04,34.72,34.86,34.60,14050200

Let's try to understand how it works :

Obviously the GE parameter in the URL (s=GE) is related to the stock code, in fact on New Yorck Stock Exchange, General electric stocks are symbolized by GE code. You will notice that the quotation place is not showed. You can write comma separated stock quotes to retrieve several quotations at the same time. For example : http://finance.yahoo.com/d/quotes.csv?s=GE,MSFT&f=sl1d1t1c1ohgv&e=.csv .

Once the stock code is references, the f parameter is a long string composed of letters and figures. Strangely, I found no web site describing precisely these parameters. I found their meaning empirically :

aAska2Average Daily Volumea5Ask Size
bBidb2Ask (Real-time)b3Bid (Real-time)
b4Book Value b6Bid SizecChange & Percent Change
c1Quotation Net Changec3Commissionc6Change (Real-time)
c8After Hours Change (Real-time)dDividend/Shared1Last Trade Date
d2Trade DateeEarnings/Sharee1Error Indication (returned for symbol changed / invalid)
e7EPS Estimate Current Yeare8EPS Estimate Next Yeare9EPS Estimate Next Quarter
f6Float SharesgDay's LowhDay's High
j52-week Lowk52-week High g1Holdings Gain Percent
g3Annualized Gaing4Holdings Gaing5Holdings Gain Percent (Real-time)
g6Holdings Gain (Real-time)iMore Infoi5Order Book (Real-time)
j1Market Capitalizationj3Market Cap (Real-time)j4EBITDA
j5Change From 52-week Lowj6Percent Change From 52-week Lowk1Last Trade (Real-time) With Time
k2Change Percent (Real-time)k3Last Trade Sizek4Change From 52-week High
k5Percent Change From 52-week HighlLast Trade (With Time)l1Last Trade (Price Only)
l2High Limitl3Low LimitmDay's Range
m2Day's Range (Real-time)m350-day Moving Averagem4200-day Moving Average
m5Change From 200-day Moving Average m6Percent Change From 200-day Moving Averagem7Change From 50-day Moving Average
m8Percent Change From 50-day Moving AveragenNamen4Notes
oOpen pPrevious Closep1Price Paid
p2Change in Percentp5Price/Salesp6Price/Book
qEx-Dividend DaterP/E Ratior1Dividend Pay Date
r2P/E Ratio (Real-time)r5PEG Ratior6Price/EPS Estimate Current Year
r7Price/EPS Estimate Next YearsSymbols1Shares Owned
s7Short Ratiot1Last Trade Timet6Trade Links
t7Ticker Trendt81 yr Target PricevVolume
v1Holdings Valuev7Holdings Value (Real-time)w52-week Range
w1Day's Value Changew4Day's Value Change (Real-time)xStock Exchange
y Dividend Yield


The only web site giving this information (as far as I searched it) is http://www.gummy-stuff.org/Yahoo-data.htm.

It e querystring parameter (e for extension?) looks like being optional (e=.csv).

We will see later in this article how to retrieve quotation data from a VB.Net Application (ASP.Net or Windows Application). As said before, you can't declare the quotation place when you call the web service. In fact, the stock code contains this information. For General electric (GE), this is not obvious, but if you take a stock on the french market, L'Oreal for example, its code will be OR.PA where PA means Paris. Some companies can be quoted on several stock exchange, and each quotation is different. For example, let's take Microsoft shares : on XETRA Stock Exchange (Germany), its code is MSF.DE, the quotation is in Euro, and on NASDAQ, the code is MSFT ans the quotation in Dollars. Of course, to make things more difficult, codes are not standardized ! So you should pay the highest attention in choosing the Stock Symbol to use in the quotation web service. Unfortunately, Yahoo doesn't provide any web service for symbol lookup.

Indices components Web Service

This is probably the simplest web service provided, but it can be very helpful. It's goal is to provide a list of all stocks composing an indice : Dow Jones Industrial, Nasdaq 100, CAC40... These indices quotations are calculated as an average of its components quotation. This web service maintains the list of components (a huge work in itself) and their quotation.

Its call is very similar to the regular quotation service, as if you were adding all stock symbols in the querystring. This is web page of the service : http://finance.yahoo.com/q/cp?s=%5ENDX for Nasdaq 100 components which symbol is ^NDX. To download the information, we call http://finance.yahoo.com/d/quotes.csv?s=@%5ENDX&f=sl1d1t1c1ohgv&e=.csv where we recognize the parameters of the quotation web service. The result is a csv stream that can be considered as a web service and looks like :

AAPL,37.31,"6/28/2005","4:00pm",+0.21,37.49,37.59,37.17,12555196
ADBE,28.70,"6/28/2005","4:00pm",-0.14,29.05,29.05,28.25,7916337
ADSK,34.81,"6/28/2005","4:00pm",+0.86,34.00,34.92,33.49,2695397
ALTR,19.52,"6/28/2005","4:00pm",-0.19,20.00,20.00,19.41,9181970
AMAT,16.31,"6/28/2005","4:00pm",+0.08,16.36,16.42,16.24,16429051


However, this web service has a strange behavior depending on the URL called. For American indices, you have to deal with pagination : you can only get 50 stocks in your file. So, to download all data for Nasdaq 100, you have to call the service 6twice (there are 100 stocks composing the indice !). This is highly inefficient but understandable as a provider point of view. But what strikes me, is that it is not the same policy on other countries stock markets ! Considering French Yahoo, if you download the information from http://fr.old.finance.yahoo.com/d/quotes.csv?s=@^NDX&f=snl1d1t1c1ohgv&e=.csv , you will get the whole information with a single call !!!

Historical data Web Service

A list of historical quotations (and dividend payment) is available through a web service. This web service takes as parameters 2 dates , a stock symbol and a frequency (daily, weekly, monthlyor dividend only). In the case of weekly and monthly frequency, an average on each period is done.

At this URl , http://finance.yahoo.com/q/hp?s=GE&a=00&b=1&c=2004&d=05&e=28&f=2005&g=d, a list of daily quotations for General Electric is given from january 1st 2004 to june 28th 2005. The parameters are almost strightforward :

a : first date month (0-based array : january =0, february = 1,... december =11)
b : first date day
c : fist date year
d : last date month (0-based array : january =0, february = 1,... december =11)
e : last date day
f : last date year
g : frequency (d = daily, w = weekly, m = monthly, v = dividends only)

The URL to call as a web service is http://ichart.finance.yahoo.com/table.csv?s=GE&a=00&b=1&c=2004&d=05&e=28&f=2005&g=d&ignore=.csv

No pagination problem here, the result is a table with the first row describing the columns :

Date,Open,High,Low,Close,Volume,Adj. Close*
27-Jun-05,34.72,34.86,34.59,34.61,23402900,34.61
24-Jun-05,34.75,35.98,34.15,34.78,67540496,34.78
23-Jun-05,35.50,35.63,34.52,34.66,48926900,34.66
22-Jun-05,36.20,36.34,35.68,35.72,29200400,35.50

Brightly, Yahoo guys provides us with the Adjusted Close which takes into consideration dividends payment and stock splits.

Tuesday, May 31, 2005

Passing ID with querystring made secure

Sometimes you would love to pass some parameters as querystring to your web page : for example, you want to send an email with a link to a specific page with parameters, or you want to make sure that your page can be added to the Favorites.
But security concerns will refrain from using this simple technic...


Do you really want to reveal your secrets ?

The main reason to avoid passing parameters as querystring is that your web site users will be able to guess other parameters value based on what they see. The worst example would be to make a database user id visible in the URL. Changing the URL, and setting a new id would probably allow to impersonate another user... Of course there are many ways to check the user id in the destination web page, but wouldn't it be nice to refrain the "I-have-always-dreamed-of-being-a-hacker" temptation ?
Don't feel either to confident in passing parameters as POST : if you really want to sniff the communication with the server, you will find the information as clear text.

The only solution : Encryption !

Using encryption, it will be almost impossible for the user to "understand" the crypted text passed as parameters, but you (and only you) will still be able to decrypt it and retrieve the correct parameter. We would like to translate http://www.selvatech.com/user_id=194 into http://www.selvatech.com/user_id=AZRZERVBFVF945FFERF566T3 (anyone will admit that guessing another user id based on AZRZERVBFVF945FFERF566T3 is harder than on 194!). Isn't it what we want ?

Let's take a second to talk about cryptology, and more precisely .Net cryptology.

All native .Net cryptology classes are in the System.Security.Cryptography namespace.
Many articles deal will the security aspect in .Net, so I will make it short and give you the link to the best articles at the end of this post.

There are various available methods in the .Net Framework to encrypt piece of plain text, the most famous are Rijndael (RijndaelManaged) , DES (DESCryptoServiceProvider), Triple Des (TripleDESCryptoServiceProvider), RC2 (RC2CryptoServiceProvider). All this classes have some caracteristics in common :

- They use a key with a determined length (expressed in bits) to encrypt and decrypt a piece of text,
- They can use an Initial Vector (IV) , different Padding modes, Cipher modes and Block Size,
- The Array of Bytes is the best type to use within the encryption classes.

In this post, we will only deal with symmetric encryption which uses a secret key value to encrypt and decrypt the data. Both the sender and receiver need the same key in order to encrypt or decrypt. In our case the program which sends the email containing the link, and the destination web page must access the secret key, and use exactly the same parameters for encryption and decryption.

After reading Using encryption in .Net by Steve Johnson, I think it's good to rewrite the principles exposed in his conclusion :

  • Always use proven, public technology
  • Use Rijndael, if possible
  • Use 256-bit keys and blocks
  • Take care how you derive keys
  • Use PKCS7 padding in .NET 1.1, ISO10126 in .NET 2.0
  • Don't use ECB mode unless you have a good reason and know what you're doing
  • Always use a unique IV for each message
  • Take the time to be explicit in your code for important details, even when you're using defaults

In the rest of this post, I will try to put in practice Steve principles (except the "Always use a unique IV for each message" which I think is not adapted to symmetric encryption because both the sender and the receiver must know the Initial Vector).

Now let's see some VB.Net coding in action !

  • The first problem we have to solve is : how do I generate a 256-bits key ?

This is a very good question ! 256 bits means 32 bytes (1 byte = 8 bits) to provide to the encryption algorithm. As the key is all you need to know to "understand" your secrets (well you need to know the encryption method, the IV used, the padding too ...), it's a good practice to think twice before choosing.

We could try to use a "simple password" like one of my favorite : iloveselvatech and generate a 256-bits key from it. How can we do this ?

So simply : we only need to use some hashing methods included in the .Net Framework. Hashing is the transformation of a string of characters into a fixed-length value or key that represents the original string. In other words a hash is like a fingertip : it identifies uniquely the string from which it has been generated. Different hashing algorithms exist in .Net : the most famous are MD5 and SHA. MD5 produces 128-bits hash, whereas SHA can produce 160, 256, 384 and 512-bits hash. The good point in using hashing with encryption in .Net is that both use array of bytes as native type.

In our example, we can use a SHA256 hash from our iloveselvatech password. This is our simple hash-SHA256 class (the whole thing is in the ComputeHash method) :

Public Class hash_SHA256

Private _dhashSHA256 As System.Security.Cryptography.SHA256Managed
Public Sub New()
_dhashSHA256 = New System.Security.Cryptography.SHA256Managed
End Sub

Public Function ComputeHash(ByVal data As String) As Byte()
Return _dhashSHA256.ComputeHash(string2array(data))
End Function

Public Function string2array(ByVal data As String) As Byte()
Return System.Text.Encoding.Default.GetBytes(data)
End Function

Public Function array2string(ByVal data As Byte()) As string
Return System.Text.Encoding.Default.GetString(data)
End Function

End Class

Then, we will generate our 256-bits length message :

Private Function Generate256bitskey (byval str as string) as byte()
dim hashclass As hash_SHA256
hashclass = New hash_SHA256
return hashclass.ComputeHash(str)

End Function

In our case, a SHA256 hash of iloveselvatech will be : {246 , 70 , 154 , 164 , 75 , 50 , 110 , 43 , 92 , 82 , 207 , 70 , 192 , 125 , 121 , 100 , 102 , 222 , 109 , 9 , 192 , 236 , 14 , 131 , 139 , 146 , 110 , 243 , 65 , 126 , 149 , 192} as an array of bytes.

But, even if you can find this key difficult to read, it's easy to break if you guess that you are using a SHA256 algorithm. Using a brute force strategy, it is relatively easy (but time consuming) to find the original 14 letters code which produce this unreadable key.

Another technic, and better approach, is to generate your own 256-bits key. This is the same approach I will use for generating the Initial Vector later. Here is my code :

Private Function Generate256bitsKey() As Byte()
Dim table() As Byte
Dim str As String
dim i as integer
str="WWW.SELVATECH.COM"
For i=1 To 100
str = str & Chr(3 * i Mod 255)
Next
table=System.Text.Encoding.Default.GetBytes(str)
Dim tabiv(32) As Byte
Array.Copy(table, tabiv, 31)
return tabiv
End Function

In this case, I generate a long string with a determined, but difficult to find algorithm, that I convert into a 32-bytes long array of byte (remember that 32-bytes means 256 bits...)

  • How do I build my encrypting class ?

We choosed to use a Rijndael 256-bits key and blocks algorithm, using PKCS7 padding and not ECB cipher mode. I won't go into details with these specifications, that's what MSDN is here for ! Happily, .Net classes are smart enough, to instanciate easily our object :

Public Class crypto_Rijndael

Private _rijndael As System.Security.Cryptography.RijndaelManaged

Public Sub New()
_rijndael = New System.Security.Cryptography.RijndaelManaged
_rijndael.Mode = System.Security.Cryptography.CipherMode.CBC
_rijndael.Padding= System.Security.Cryptography.PaddingMode.PKCS7
_rijndael.BlockSize=256
_rijndael.KeySize=256
_rijndael.IV=GenerateIV()
_rijndael.Key=GenerateCryptoKey()
End Sub

End Class

Where GenerateIV() and GenerateCryptoKey() produce 32-bytes-length array as seens above.

Then we need 2 methods : Encrypt and Decrypt.

Public Function Encrypt(ByVal data As String) As Byte()
Dim rijEncrypt As System.Security.Cryptography.ICryptoTransform = _rijndael.CreateEncryptor()
Dim buff() As Byte = string2array(data)
Return rijEncrypt.TransformFinalBlock(buff, 0, buff.Length)
End Function

Public Function Decrypt(ByVal sEncryptedData As String) As Byte()
Dim rijEncrypt As System.Security.Cryptography.ICryptoTransform = _rijndael.CreateDecryptor()
Dim buff() As Byte = Convert.FromBase64String(sEncryptedData)
Return rijEncrypt.TransformFinalBlock(buff, 0, buff.Length)
End Function

As you can see, these methods are very simple, however, there is an important point we have to speak about : encoding. This point will make clear the use of Convert.FromBase64String in the Decrypt method.

  • What is the problem with encoding ?
We want to cipher a string into another string that can be used in a querystring. This point should help us to focus on Base64 and HEX formats.

An HEX-formated text will look like : 414F664141644D383962336874425933714 whereas Base64-formated text will look like : AOfAAdM89b3htBY3qAehu5omm0QvLXLrr0YHo/tUAfo=.
Base64 encoding format is about 33% more efficient than HEX encoding, however, passing Base64-encoded information withtin the URL can generate problem because of URLencoding mechanism, for example, + in the URL is re-encoded as ' '. This problem doesn't exist with HEX-encoding.

We will first transform the array of bytes which is returned by the Encrypt method as a Base64-string, and we will translate this string into a HEX-formated string before writing it to the querystring (there is no simple way to convert an array of byte into a HEX-string).

Our method for writing the querysting will be :

url = string_TO_hexa(Convert.ToBase64String(crypto_Rijndael.Encrypt(encryptedtext)))

with

Private Function string_TO_hexa(ByVal data As String) As String
Dim i As Integer
string_TO_hexa = ""
For i = 1 To Len(data)
string_TO_hexa = string_TO_hexa & Strings.Right("0" + Hex(Asc(Mid(data, i, 1))), 2)
Next
End Function


Then, at the destination page, we get the querystring parameter HEX-encoded.

secrettext = array2string(crypto_Rijndael.Decrypt(hexa_TO_string(querystring)))

with

Public Function hexa_TO_string(ByVal data As String) As String
Dim i As Integer
For i = 1 To Len(data) Step 2
hexa_TO_string = hexa_TO_string & Chr(CType("&H" & Mid(data, i, 2), Integer))
Next
End Function

With the function above, we translate the HEX-formated text back into a Base64-string. And during the Decrypt process, we use :

Dim buff() As Byte = Convert.FromBase64String(sEncryptedData)

which translates a Base64-string into an array of Bytes.

Conclusion

First of all, I spent a long time fighting with Cipher.Mode when I built my instances of Rijndael, the reason is if you don't pick up ECB mode (less secure, cf articles), don't forget to initialise the IV property of the
RijndaelManaged class (otherwise, it will generate each time you instanciate the class a random new IV, and in this case, you won't be able to decrypt your text).

The .Net classes related to Cryptography (at least for the symmetric encryption) are very user friendly so there's no point of not using them !

Some interesting articles about cryptology and .Net:
An Overview of Cryptography : long article about cryptology in general
Cryptography in .NET : the use of private and public keys in .Net
Using Symmetric Cryptography in an ASP.NET Web Page : using cryptology in an ASP.Net web page
Cryptography Simplified in Microsoft .NET : Microsoft article about basic cryptology (hashing, keys...) with .Net
Keeping Secrets: A Guide to VB .NET Cryptography : A guide for private key generation using vectors
Encrypting QueryStrings with .NET : Exactly what we want to do, but if the article is good, I've found some formating problems in the code
Performance Comparison: Security Design Choices : Compare hashing and encrypting techniques performances
Using Encryption in .NET : Excellent article on .Net encryption (keys, encryption algorithm, detailled study of Rijndael Provider)