— Updated on 15 July 2020 with links to the latest documentation
With MiFID II becoming business-as-usual, now may be a good time to discuss the industry noise on FIRDS data quality, and the perceived lack of a robust industry-wide approach to address the challenges.
I should start by stating that I am much more optimistic on the data quality issue than some other commentators. Both in terms of the level of challenges being faced and also in terms of the effort to fix the issues. Hopefully by the end of this article, you will have all the information to make up your own mind on this matter.
Let’s start with the CFI
In this article, I have focused on a single attribute, in order to provide a sufficiently detailed analysis of the issue without requiring readers to wade through a book-sized article.
I have selected the CFI code for the analysis, given its criticality to MiFID II. As a quick reminder, transparency reporting, transaction reporting and reference data reporting are all heavily dependent on the CFI code:
- For transparency reports (RTS2), see para 9.11 on page 79 of the FIRDS transparency reporting instructions as an example.
- For transaction reports (RTS22), see para 19 on page 18 of the ESMA reporting instructions as an example.
- For instrument reference data reports (RTS23), see the FIRDS validation spreadsheet as an example.
So how good (or bad) is the quality of the CFI code submissions to FIRDS? And how easy is it to improve the quality? The short answer is that the quality is surprisingly high already, and the issues that do exist are surprisingly easy to fix!
For the longer answer, read on….
CFI Data Quality – Overview
Let’s start by reviewing the quality of the current submissions.
I am aware that much of the evidence driving the discussion to date has been anecdotal, which has occasionally added to industry confusion. Therefore in this article, I will present objective analysis of the FIRDS reference data that is rigorous, detailed and systematic.
As usual, I’ll get my disclaimers out of the way first: The scope is OTC derivatives and the analysis has been performed rapidly (our current focus is on technical stability, which I have blogged on separately). If you find anything wrong in this article, then do get in touch, and I will be delighted to update the findings.
OK, so with the formalities out of the way, let’s look at some overview statistics first, before we delve into the details. The analysis below is from FIRDS data as of 10th March:
- A total of 68 MICs submitted OTC derivatives reference data reports, consisting of 1,716,367 ISINs.
- 1,548,171 submissions contained correct CFIs
- The remaining 168,196 incorrect CFI codes represent 9.8% of the total submissions
In other words, 90.2% of all OTC derivative CFI codes in FIRDS are correct.
CFI Data Quality – the Detail
Slicing the data by asset class, highlights the interesting fact that only the FX asset class has a problem:
Asset Class | Total Errors | % Errors |
Rates | 0 | 0.0% |
Commodities | 1 | 0.0% |
Credit | 31 | 0.0% |
Equity | 28 | 0.0% |
Foreign_Exchange | 168,136 | 9.8% |
Total | 168,196 | 9.8% |
So let’s break down the FX asset class into its constituent products to see which FX products are causing the CFI data quality issues:
FX Product | Forward | Option | Swap | Total Errors | % Errors |
Vanilla_Option | 83,272 | 83,272 | 4.9% | ||
NDO | 64,660 | 64,660 | 3.8% | ||
Forward | 10,723 | 10,723 | 0.6% | ||
NDF | 6,933 | 6,933 | 0.4% | ||
Non_Standard | 2,499 | 2,499 | 0.1% | ||
Barrier_Option | 33 | 33 | 0.0% | ||
Digital_Option | 11 | 11 | 0.0% | ||
FX_Swap | 5 | 5 | 0.0% | ||
Total Errors | 17,656 | 150,475 | 5 | 168,136 | 9.8% |
% Errors | 1.0% | 8.8% | 0.0% | 9.8% | Â |
The table shows that the main issue resides with FX options, with the vanilla option product single-handedly being responsible for 50% of all the errors in CFI codes submitted to FIRDS.
So let’s continue our dissection by looking in detail at the FX Vanilla_Option product.
Vanilla FX Option Details
The table below shows the break down of the CFI codes that are associated with the Vanilla FX option product.
Vanilla_Option | Total Errors | % Errors |
HFTAVP | 37,423 | 2.2% |
HFTDVP | 37,389 | 2.2% |
HFTAVC | 6,338 | 0.4% |
HFTAVE | 1,061 | 0.1% |
HFTDVE | 1,061 | 0.1% |
Total Errors | 83,272 | 4.9% |
Looking at page 5 of the DSB user guide we can see that the first CFI code above, HFTAVP, represents a vanilla spot FX European call option. This is the code that users should have reported and did not report in 37,423 cases.
So let’s see which CFI codes were actually reported instead of HFTAVP:
Vanilla_Option HFTAVP |
Total Errors | % Errors |
HFTDVP | 21,689 | 1.3% |
HFVAVP | 9,047 | 0.5% |
HFVDVP | 6,687 | 0.4% |
Total Errors | 37,423 | 2.2% |
The table shows that users submitted three separate CFI codes to FIRDS in error, instead of the correct code of HFTAVP. Let’s cover each one in turn:
- The largest number of errors is a result of users submitting HFTDVP, which page 5 of the DSB FX user guide shows to be for classification of a vanilla spot FX European put option. You will recall that the correct CFI code represents a vanilla spot FX European call option. Hence the error. Let’s validate by taking an example ISIN, EZ11F28QTBX5, which was submitted by MICÂ GFBO (GFI Brokers OTF). The instrument short name is correctly stated as ‘NA/O Van Call SGD USD 20330211‘, hence the invalidity of the put classification.
- The next error is users submitting HFVAVP. Page 5 of the DSB FX user guide identifies this CFI code as a volatility option and not a spot FX option. Again, looking at a specific ISIN makes the point quite clear: ISIN EZ5TB0WDH4P5, which was submitted by MICs BGCO, GFBO, GFBM, has short name of ‘NA/O Van Call EUR JPY 20180423′ which shows that this is a DSBÂ Vanilla product. A quick look at page 10 of the DSB FX user guide shows that the CFI 3rd letter for a vanilla fx option MUST be T and cannot ever be V (see second to last row in the table).
- The last error is users submitting HFVDVP. This is simply a combination of both of the above two errors at the same time. ISIN EZ21N1C2Y187, which was submitted by MICs GFBO and HSBC, provides our example this time: the instrument short name is ‘NA/O Van Call RUB USD 20180718‘ which as per above, implies that the correct CFI code should be HFTAVP.
What have we learnt?
The key takeaway from this painfully detailed dissection of a single CFI code is that the generation of a correct CFI code can be non-trivial (I have not even delved into nuances such as the normalisation of Put/Call based on the specific currency pair).
On the other hand, it is heartening that less than 10% of OTC derivative CFI code submissions to FIRDS have any issues. So what explains this discrepancy?
The answer is very simple, and leads to my optimism for why these issues can be tackled and remediated easily.
But first, I want to explain the role of the DSB in CFI code generation: The DSB is the only official source of the CFI code for each OTC-ISIN. This is such an important point that I will copy /and paste the relevant ESMA Q&A on the matter (see section 4, page 15):
Question 1[Last update: 02/02/2017]
What ISINs, CFI codes and FISNs can be used to identify financial instruments?Answer 1
For the purpose of reporting reference data under the requirements of MiFIR Article 27, ISO 6166 ISINs, ISO 10962 CFI codes and ISO 18774 FISNs issued by the relevant National Numbering Agency (NNA) should be used. For further information please refer to the following link: https://www.anna-web.org/standards/about-identification-standards/.
What the above paragraph implies is that users should not be making up their own CFI codes (and occasionally getting them wrong), but rather they should just take the DSB generated CFI code and sending that through to ESMA.
Such an approach will immediate improve the CFI error rate from the current 90% to 100%!
The majority of DSB users already delegate the detailed nuances of CFI generation to the DSB golden source, which explains the overall 90%+ accuracy rate for OTC derivatives. CFI generation is part of our core mission, and we are in regular conversations with regulators and ISO to ensure their accuracy and suitability. What’s more, there is no additional cost for using the DSB CFI code. Indeed, it is freely available within the end of day file that is generated at 1am every morning.
And finally…
So why are 9.8% of CFI codes still incorrect? To be honest, within the DSB, we are somewhat surprised not to say bemused by both the industry noise and the lack of implementation of such an obvious solution by the 28 submiting MICs.
One observation is that many of the CFI errors are on ISIN submissions from the larger Trading Venues (as opposed to Systematic Internalisers or smaller trading venues). As regular readers will know, the DSB has achieved very good engagement with SIs and the smaller trading venues, which may explain the good CFI accuracy from this group of DSB users. We hope over time to establish similar engagement with the large trading venues, which will assist them in driving down their error rates in regulatory reference data submissions to close to zero as well.
So I will conclude by re-iterating the DSB’s core mantra: we stand ready to assist our users. Come and speak to us!
Sassan Danesh, Management Team, DSB.
PS I hope you found this blog useful and informative. This is the first of a series of occasional blogs on the current state of OTC derivatives instrument reference data in industry.
4 April 2018: The content of this blog was updated on 4 April to reflect the fact that all FIRDS reference data for a given ISIN are guaranteed to be the same even across different MICS. The statement below is made on page 60 of the FIRDS Reference Data Reporting Instructions:
Even if the system detects an inconsistency, the system will incorporate to the list of instruments published the submitted ISIN, MIC and associated dates of request for admission, first trade, and termination. The rest of the non-free text fields will be aligned to the values set by the Relevant Competent Authority for that instrument, when ESMA distributes/publishes the data.