BING PHONE BOOK SERVICE
[Document subtitle]
Ashish Shah, SunitaS, VinaSura, BhJoshi
[Email address]
Abstract
An overview of the BingPhone Book Service. The firstversion is meantto supportthe Skype
Dialer scenarios.
1 Contents
2 Goals..........................................................................................................................................3
3 Caller ID Functionality and Graph Inference..................................................................................3
4 Spam Inference for Blocked Callers...............................................................................................4
5 High Level Workflows..................................................................................................................4
6 Architecture Overview.................................................................................................................5
7 High Level Design Overview .........................................................................................................6
7.1 Client Service Workflows......................................................................................................6
7.2 Inferencing Engine Workflow................................................................................................8
8 Performance and Scale Targets for V1 Release..............................................................................8
8.1 Lookup Performance Goals...................................................................................................8
8.2 Server Call Load...................................................................................................................8
8.3 Address book Sync Load.......................................................................................................9
8.4 Storage Capacity..................................................................................................................9
9 Bing Phone Book Data Storage Model...........................................................................................9
9.1 Target Storage Capacity Requirements..................................................................................9
9.2 Bing Phone Book Service Storage Design...............................................................................9
9.3 Bing Phone Book Service Data Storage..................................................................................9
10 Geo Scale-Out Model.............................................................................................................10
10.1 Centralized........................................................................................................................10
10.2 Federated Model...............................................................................................................11
11 Security.................................................................................................................................11
11.1 Data Privacy ......................................................................................................................11
11.2 Authenticating the Client Application and User....................................................................12
11.3 Transport Security .............................................................................................................13
12 Backup/Disaster Recovery Requirements................................................................................13
13 Bing Phone Book Client Design Details....................................................................................14
13.1 Overview - Periodic Uploads and Downloads.......................................................................14
13.2 Caching.............................................................................................................................14
13.3 NameLookup.....................................................................................................................14
13.4 Blocked Phone Number handling........................................................................................14
13.5 Special Scenario Handling...................................................................................................14
13.5.1 LowBatteryConditions.................................................................................................14
13.5.2 Shutdown ..................................................................................................................14
13.5.3 Lost Phone Scenario....................................................................................................14
13.5.4 Dual Sim Scenario.......................................................................................................14
13.6 Client Side Performance Metrics.........................................................................................15
13.6.1 Metrics......................................................................................................................15
13.6.2 Client Side Metrics Framework....................................................................................15
13.7 Other Client Side Telemetry................................................................................................15
14 Bing Phone Book Service Design Details..................................................................................15
14.1 Deployment Topology and Sizing........................................................................................15
14.2 REST APIs ..........................................................................................................................15
14.3 Name Inferencing Engine ...................................................................................................15
14.4 Spammer Prediction ..........................................................................................................16
14.5 SQL Azure Storage Data Layer Overview..............................................................................16
14.5.1 Options......................................................................................................................16
14.5.2 Phone Number Entity Table Schema............................................................................17
14.5.3 PhoneNumberDataTable Schema................................................................................18
14.5.4 PhoneNumberUpdated Table Schema........................................................................19
14.5.5 SQL Maintenance .......................................................................................................19
14.6 Security.............................................................................................................................19
14.6.1 Implicit Grant Type Sequence......................................................................................19
14.6.2 Using Skype Dialer Authentication – Multiple Resource Server Support.........................20
15 Server Side Performance Metrics............................................................................................20
16 Monitoring............................................................................................................................20
Bing Phone Book Service for Skype Dialer
2 Goals
The main purpose forthisservice forthe V1 release istoprovide the followingfunctionalitytothe Skype
Dialer:
1) Provide CallerIdlookup byphone numberforincomingandoutgoingcalls (forpeopleaswell as
local businessphone numbers)
2) Provide acrowd-sourcedspamdetectionandtaggingmechanismforincomingcallers.
3) Provide round-trippable storage forincomingcall blocklistfor all registeredusers,withimplicit
portabilityacross the users devices.
Designgoalsforthe service:
1. Be agnosticof the callingapplication.There’snoimplicitorexplicitdependencyinthe designon
Skype infrastructure orcouplingwiththe Skype Dialer.The same service canbe leveragedinthe
future foranotherdialerendpoint,e.g.,for the native WindowsPhone dialer.
2. Be agnosticof the device ecosystemandclientOS – specificallynodependencyhasbeentaken
on Google servicesinthe service.
3. Keepdesignscalabletoglobal audienceandnotspecifictoIndia.Designandimplementation
shouldscale beyondIndiamarketwithoutanycode level changes.There maybe marketspecific
configfileswhichcanbe extendedtonew marketsasneeded.
4. ClientAPIcomponentsshouldbe designedinaway thatcallingapplicationcanenforce required
SLAsfor batteryand network/datausage.
5. Ensure appropriate securityof userprivate data.
3 Caller ID Functionality and Graph Inference
In orderto serve the primarypurpose of providingcallerIDinformationforphone numbers,the phone
bookservice reliesonthree inputs:
1. Informationfromregisteredusers(asenteredwhenthe userfirstsignsupforSkype Dialer).
2. Local businessinformationseededfromthe BingLocal index.E.g.,we have seeded~5Mlocal
businessnumbersforIndiaintothe callerIDservice already. More can be seededforother
marketsas we scale the service globally.
3. Highconfidence name inferenceonthe phone-numberlinkedgraphcrowdsourcedfrom
registereduser’saddressbooks.Due tothe factthat differentusersmaystore the same number
withdifferentnames intheirphone bookswe cannotimplicitlytrustthe uploadedname
associationfroma single addressbook.However,ahighconfidence inference canbe drawn
froma groupof uploadednamesfrommanyaddressbooks.
Followingpicturedemonstratesthisprocess.Usernamesin thiscolorare registeredusers,namesin this
colorare highconfidence inferrednames.
4 Spam Inference for Blocked Callers
The Dialerproviderfunctionalitytoblockincomingcallsfromspecificphone numbers.The phone book
service leveragesthisfunctionalitytocreate acrowd-sourcedinferenceforspamcallers.Everytime a
userblocksan incomingnumber,theycontributeone spamvote forthatnumberintothe inference
system.We uploadthe user’sblocklisttothe phone bookservice atregularintervals.
Aggregate informationfromuserblocklistsisusedtodoa simple inference basedonnumberof users
that have blocked aspecificnumber.Asthe systemmatures,we candomore sophisticatedspam
inference basedonwhethernumberistoll-free number,trustlevel forthe originatinguser,knowntele-
marketingprefixes,etc.
Once the spam inference runs,itaddsa spamflag and spamvote count for all numbersthatwere
inferredasspamoriginators.The same informationisreturnedfromthe callerIdlookupcall tothe
phone bookservice andusedbythe Dialerto show spamstatusfor incomingcallers.
The service alsohasan abilitytopre-loadaset of knowntopspammersas a white list(non-inferred).
For India,we have sourceda listof ~500 top spamnumbersandseededintothe phone bookservice.
5 High Level Workflows
Basedon above functionality,thereare 4 basicapp workflowsthatare relevantforthe CallerIdservice:
- NewUserRegistration
o Dialerwill registernewuserswiththe callerIDservice usinganormalizedphone
numberas the userID, no passwordrequired.
o User authenticatedwithMSA usingMSA token.
- Phone Book Sync
o Dialerwill sync(upload) the user’slocal phone booktothe callerIDservice atregular
intervals,includinguser’scontactswithphone numbersfromall sources(local phone
book,outlook.comaddressbook,facebookcontacts,linkedincontacts,etc.).
- CallerIdLookup
o For all incomingandoutgoingcallswhere the incomingoroutgoingcall numberisnotin
the user’slocal phone book,Dialerwill call the service foracallerId lookup(with
normalizedphone numberaskey).
o If phone numberentryexistsinthe clouddirectory,CallerIDservice willreturn:
 Registeredname (if available)
 Inferredname (if available)
 Spam flagandspam count(if available)
- Phone NumberBlocking
o Everytime the userblocksan incomingnumber,Dialerwill updatethe caller IDservice
withthat information.Internally,everyblockactiontranslatesintoaspamcount
incrementforthe blockednumber.
6 Architecture Overview
For interface simplicityandwire protocol abstraction,we have providedaclient javalibraryforthe
phone bookservice,whichunderthe coverscallsRESTful webservice APIshostedonAzure.Currently
we have a clientlibraryimplementationforAndroid(4.0+).Same canbe easilyportedovertoWindows
Phone,iOS,etc.
The Bing Phone Service isanAzure PaaSservice hostingRESTful endpointin Azure WebRoles. Multiple
service instance roleswill be usedforload-balancingandhighavailability,leveragingAzure’sauto-scale
feature forloadelasticitywhile maintainingend-to-endperformance SLAs. The inferenceengine will be
hostedinan Azure WorkerRole. The SNR APIsprovide aREST endpoint
Currentversionof the service usesSQLAzure asits data store (tobe swappedoutwithBingObjectStore
for higherscalabilitypost-V1release).SQLAzure provideshighavailabilityandreliabilitybydesign,
howeverithaslimitsonthe storage itcan support. The SQL store will be partitionedandsized
accordingto V1 scale and performance requirements.
Bing Phone Book Service
Worker Role
Web Role
Web RoleWeb Role
Skype Dialer
DB(SQL Azure)
REST APIS
Bing PhoneBook Client
MFC Test Role
Inferencing
Engine
Bing Service
SNR REST APIS
In addition,anAzure Storage Accountisprovisionedtosupportpersistence of logging/diagnosticdata.
Otherscale out optionsandstorage designoptionsare discussedlater.
7 High Level Design Overview
The Bing Phone BookService mustsupportthe fourworkflowsdescribedearlierinthe spec. Essentially
it needsto
1) Store the phone bookdata, bothcontact informationandblocklistinformationfromthe
registeredusers
2) Host an Inferencingenginethatdoesname inferencingfornumbersuploadedbythe registered
users
3) Buildthe Top Spammerlist
The Bing Phone BookData Layerabstracts the storage from the service. The PhoneNumberUpdateTable
storesthe storesthe contact data and blocklistinformation thatissyncedtothe cloud. The
PhoneNumberEntityTable isthe table thatthe name inferencingenginecreatesandthe one thatis used
for CallerIDlookup. The TopSpammerlistisbuiltandrefreshed onthe server. Itisalsocached onthe
server.
7.1 Client Service Workflows
The highlevel interactionsbetweenthe Client,the BingPhone BookService andthe BingPhone Book
Data Layer are capturedhere. Theyare discussedindetail inthe ClientDesignsection.
SkypeDialer
App
BingPhoneBookClient
BingPhoneBook
Service
GetCallerIdServiceInstance(phonenumber, appcontext)
RegisterUserAsync(UserName, UserPhoneNumber) RegisterUser(UserName, UserPhone)
SyncWithServiceAsync
GetTopSpammers( UserPhoneNumber,N)
ArrayOf(PhoneNumberEntity)
[diffnotEmpty] SyncContacts(UserPhoneNumber, Contacts)
SyncBlockedLists(UserPhoneNumber, BlockedList)
LookupPhoneBookEntryAsync(PhoneNumber)
SuggestNameForPhoneNumber
ArrayOf(PhoneNumberEntity)
[notincache, notinblockedlist, notintopspammerlist]
LookupPhoneNumber(UserPhoneNumber, PhoneNumber)
[PhoneNumberEntity]
Shutdown
[PhoneNumberEntity]
Bing Phone
Book Service
Data Layer
GetPhoneLocation
cachedlocationforphone
UpdateNetworkStatus
AddtoBlockList
AddToPhNumUpdatedTable
AddtoPhNumDataTable
AddToPhNumDataTable
AddToPhNumEntityTable
GetTopSpammersFromEntityTable
MarkAsSpaminPhNumEntityTablePhone Book Sync
New User
Registration
Caller Id Lookup
Incoming Call
Blocking
Provide Name For
Number
Low Battery
Notification
7.2 Inferencing Engine Workflow
Inferencing
Engine
Bing Phone Book
Data Layer
ReadPhoneNumbersForNameInferencing
UpdatePhoneEntityNumberTableWithName(Is this a point to point update?)
8 Performance and Scale Targets for V1 Release
Belowisa listof assumptionsthatwill be usedtovalidate the designanddrive the scale and
performance testmetrics. Note thatthe numbersare purposefullykeptatthe higherendtoensure that
the querylatencyforlookupcallsisacceptable underadverse loadconditions.
We will assumea scale target of 10M registered users for the V1 release.
8.1 Lookup Performance Goals
The initial end-to-endlatencygoal forthe callerIDlookupisfor90th
percentile lookupstobe within
150ms round-triptime onWi-Fi andLTE/3G networkconnections,and250ms round-triptime for2G
networkconnections.Round-triptimesare measuredfromthe time the clientlibraryAPIisinvokedto
havingthe callerIdresponse availableinthe Dialer.
8.2 Server Call Load
We will assume thatonan average a single userwill receiveormake 20 callsperday.Of these we will
furtherassume thatat most 10 callswill resultsinacalleridlookup(i.e.numbernotinthe user’slocal
phone book). Lastlywe will assume thatmostof these callswill be made duringdaytime,spanninga
~12 hourtime period.
Above assumptionsdictateanaverage rate of ~250 qps (queriespersecond)for the callerIDlookup
operation forevery1Mregisteredusers(10* 1M / 12 * 3600). For 10M registeredusers,that
translates to a target capacity of~2500 QPS.
8.3 Address book Sync Load
We will upload/downloadphone bookdatatwice aday at most. Thiswouldimply20 millionsync
calls. Assumingthatall the sync callsare staggered,the syncrate at the service wouldbe ~230 (Max)
Sync TopSpammer and Sync BlockedList calls per second.
However,the clienttoservice syncisdesignedtoonlyuploadincremental changestothe user’saddress
book – we expectonly1/4th
of the sync callson the clienttoactuallyhave data to upload(andhence
resultsina service call). Assumingthatall the synccallsare staggered withequal distributionovera24
hour period,above impliesanaverage rate of ~60 QPS for sync calls to the service.
8.4 Storage Capacity
We will assume onanaverage userswill have 250 uploadable contactswithphone numbersintheir
device phone book(includingcontactssyncedfromvariouscloudservicessuchasemail services,
facebook,etc.).Of these we willassume thatauserwill contribute upto100 unique new contactsto the
service.Thisnumberwillstarttotrail downas the service andsize of the phone directorygrows.
9 Bing Phone Book Data Storage Model
9.1 Target Storage Capacity Requirements
The total predictedSize forthe Phone NumberDataTable isaround 250 GB (10M users* 250 Entries*
100 Bytes/Entry). The total predictedsizeforPhone NumberEntityiscalculatedbased onthe
assumptionthateachof the 10M userwill contribute atmost100 unique contactsandis expectedtobe
around200GB (10 Users * 100 Unique EntriesPerUser * 200 Bytesperentry). Includingthe othertable
sizes,we predictthe Storage Capacityrequirementstobe around500 GB.
9.2 Bing Phone Book Service Storage Design
The Phone NumberEntityTable mustbe able to serve queriesata low latencytobe able tomeetthe
latencySLAs. The bulkof writeswill be intothe Phone NumberDataTable. Note thatonlynumbersfor
whichan inference name iscalculatedare stored orinserted intothe Phone NumberEntityTable. The
phone numberentitytable insertionrate afteraninferencingrunshouldreduce overtime butmaybe
detrimental tothe lookupperformance duringthe initialonboardingof new users.
9.3 Bing Phone Book Service Data Storage
The current V1 implementation isbasedon SQLAzure. Appropriate numberof Standardor Premium
Instanceswill be configuredtosupportthe latencyrequirements.
Basedon the limitedcalculationsforthe V1release itisclearthat SQL Azure isnotthe correct longterm
storage solutionforthe BingPhone BookService since the SQLAzure Data Size limitis250GB for
Standardand 500 forPremium. The BingPhone BookService DataLayer abstractionisimportantfor
thisreason. If size andlatencyrequirementsdictate, anAzure File Table Storage plugin will be builtfor
V1.
The schema andotherstorage relatedare discussed inmore detail inthe BingPhone BookService
DesignDetails.
Azure Table Store and ObjectStore are underthe scannerfor longtermstorage strategy for the Bing
Phone Service.
10 Geo Scale-Out Model
As the BING Phone bookservice growstoserve different geographies,the followingare optionscanbe
consideredforGeoScale.
10.1 Centralized
In thismodel,the inference enginewill runinone region,butthe lookupdataisreplicatedtodifferent
regionsforreadscale out and forkeepingthe lookuplatencies low.
PhoneNumberDataTable_RegionB
Bing Phone Book ServiceBing Phone Book Service
Web Role
Bing Phone Book Service
Worker Role
Inferencing
Engine
Web Role
Web RoleWeb Role Web Role
Web Role
Web Role
Web RoleWeb Role
Worker RoleWorker Role
Region A Region B Region C
PhoneNumberDataTable
PhoneNumberEntityTable_Global PhoneNumberEntityTable_Global(Replica)
PhoneNumberDataTable_RegionB(Replica)
PhoneNumberDataTable_RegionC
PhoneNumberEntityTable_Global(Replica)
PhoneNumberDataTable_RegionB(Replica)
The phone data isuploadedtothe Service Stampinthe closestregion. Thisdataneedstobe made
available tothe stampthat will hostthe inferencingengine. Anotherwaytothinkaboutthisisto think
of thisas twoservices:aFront End service thatperformscalleridlookupandphone datacollectionand
a Backendservice(offline) thathoststhe inferencingengine andpreparesthe global phonenumber
entitytable. The phone numberentitytableneedstobe replicatedfromRegionA tootherregions,in
the picture shownabove.
10.2 Federated Model
In thismodel,the InferencingEngine wouldrunineachregion. One optionisto use a Phone Number
Prefix toRegionMap(is thisfeasible tobuildinthe mobileworld?) whereeachphone numberis
mappedtoa region. A phone numberuploadisforwardedtothe stampbasedonthisPhone Number
Map. A furthersimplificationtothiswouldbe tosimplydropthe numbersthatare not ownedbythe
region. Itis unlikelythatpeople will store asignificantnumberof non-local numbersintheircontacts.
Bing Phone Book ServiceBing Phone Book ServiceBing Phone Book Service
Worker Role
Inferencing
Engine
Web Role
Web RoleWeb Role
Web Role
Web RoleWeb Role
Web Role
Web RoleWeb Role
Worker RoleWorker Role
Inferencing
Engine
Inferencing
Engine
Region A Region B Region C
PhoneNumberDataTable_Region A
PhoneNumberEntityTable_Region A
PhoneNumberDataTable_Region B
PhoneNumberEntityTable_Region B
PhoneNumberDataTable_Region C
PhoneNumberEntityTable_Region C
The federatedmodel seemssuitablesince the name inferencingshouldlargelyrunlocallyandmost
numbers thatare uploadedfromthe Phone Contactsare likelytobe local. Thiswouldimplythat
potentiallylittle datahasto flowacrossdata centersas comparedtothe centralizedmodel. The
disadvantage wouldbe thatthe inferencingof nameswouldbe local. Costwise,the federatedmodel
seemstobe more desirable.
11 Security
11.1 Overview
11.2 Data Privacy
Data privacyis ensured thoughStorage LayerAccessrestrictions. Accesstothe Data Store isrestricted
to the Service anda fewrestrictedusers. The publicAPIsthatthe service exposesdonotprovide access
to any tenantspecificdata,butonlytothe data thatthe inferencingandspampredictionengine
generate. The BingPhone Bookservice willfollow the recommendedsecurityguidelines.
For SQL Azure DBs, firewall rulescanbe setuptorestrictaccess to a setof IPAddresses,sothatonlythe
BingPhone Bookservice,WebandWorkerroleshave accessto the DB. The V1 implementation
supportsfirewalling. Inaddition, SQLAzure onlysupportsSQLserverauthenticationsocare needstobe
takenthat the passwordisnot compromised.
WithAzure Tables,standardAzure Protectionmechanismsarounduse of primary/secondarykey,key
rolloverandkeyregenerationwill be usedtoprotectaccessto the data. Otheroperational procedures
will be usedtocontainkeymanagementoperationsandvisibilitytoarestrictedsetof personnel. In
addition, configurationwithinaprivate VNETandAzure RBACsecurityinvestigationsare inprogressto
hardenthe access to the Data Store.
11.3 Authenticating the Client Application and User (RPS)
The Skype Dialerisusingthe RelyingPartySuite of authenticationprotocol.
WithRPS, the DialerAppcan getticketsfora givenservice once the userisloggedintoMSA andhis
access token hasbeenretrieved.
NextSteps:
Chose AuthenticationPolicy:MBI_SSL (24 hourrefresh,noforce signinonexpiry,tickettype compact)
<AuthPolicy>MBI</AuthPolicy> in rpcserver.xml
Live Authconfigurationpropertiesinweb.config
Smart ClientProtocol setup
11.4 Authenticating the Client Application and User (OAUTH)
The phone bookservice needstoauthenticate the clientapplicationandthe userto ensure that
authorizedusersorrogue usersare notinvokingthe service endpoints withpotential maliciousintent.
We assume thatthe Skype DialerapplicationrequiresaMicrosoftAccountand that the BingPhone Book
Service will use the WindowsLive Service asthe AuthenticationServer.
In the OAuthTerminology,the BingPhoneBookclientlibrary wouldbe the ClientandBingPhone Book
Service wouldbe the Resource Server,asshowninthe picture below. However,itisthe Skype Dialer
Applicationwhichactuallydrivesthe userauthenticationprocesswiththe AuthorizationServer.
Investigations are inprogressasto howOAUTH supportsmultiple resourceserversandif we can
leverage that.
Bing Phone
Book (Client)
Windows Live
(Authorization Server)
Bing Phone Book Service
(Resource Server)
Access App(Client Id, RedirectURI)
User
Agent
Login via Auth Server
Login Via Auth Server
Redirect to Client Redirect URI, + Access Token
Access Redirect URI+Access Token
HTLM doc, with embedded script
Extract the Access Token
For desktopandmobile applications,the OAUTHimplicitgranttype sequence isrecommended.
For the BingPhone BookService, the resourcesmade available throughitsPublicAPI,donotrelate toa
specificuser. Inthat context,agrant type of ClientCredential forgenericapplicationaccessmight
suffice. If userauthorizationisrequired,thenthe ImplicitGrantrequestsequence forOAUTH,as
describedabove,maybe used. [TBD. Doesthe grant type of clientcredentialsrequire the client
secret?]
RefertoSupportfor ImplicitGrantOAuth2.0 forWindows Live/MicrosoftAccountService fora
descriptionof the implicitgrantflow.
The detailsof the OAuthImplicitgranttype are discussedinthe SecuritySectionsunderService Design
Details.
11.5 Transport Security
Https isusedto provide serverauthenticationtopreventman-in-middleattackandit providesfor
encryptionof communicationbetweenthe clientandserver,whichensuresthatthere cannotbe any
eavesdroppingortamperingof contents. OAUTH2.0, in any case, requiresSSLsecurity,forthe reasons
describedabove.
12 Backup/Disaster Recovery Requirements
TBD
13 Bing Phone Book Client Design Details
13.1 Overview - Periodic Uploads and Downloads
A mobile device (orSIM) isregisteredwiththe BingPhone BookService (BPBS) whenthe userfirstuses
the SKYPE Dialer. Subsequently,the Bing PhoneBook Clientconnectstothe BingPhone BookService to
uploadphone bookdata,to resolve namestophone numbers andtogetspammerinformation.
The Skype Dialerapplicationcallsthe BingPhoneBook Clientperiodicallytouploadthe usersContacts
and the user’sBlockedListtothe BingPhone BookService anddownloadthe TopSpammerList. Diffsare
maintainedonthe clientsothatsubsequentlyonlychangesinthe contactlistare uploaded.
Currentlythere isno supportforchange notifications usingwhich the clientcouldavoidcalculatingthe
diff betweenthe current ContactListand the last uploadedContact List. Also,the frequencyof these
uploadsanddownloadsare notconfigurable independentlyforContactList,BlockedListand
TopSpammerList.
These uploadsanddownloadsare staggeredfordifferentusersbyusingaHASH of the user’sphone
numberandderivingatime of day for syncingthe phone bookdatato the Bing Phone BookService.
13.2 Caching
The Phone BookService Clientcachesthe lastXnumberslookedupandtopspammerliston the client.
The spammerlistinthe cache is updatedeverytime the clientsyncswiththe service. [Vinay,whatis
the caching mechanism? Canyouadd the details? Whyisthe top spammerlistnotcachedon the
serveras well ]
13.3 NameLookup
On an incomingcall,the PhoneBookClient checksthe local cache tosee if the numberison the local
blockedlistorthe spam listandif the lookupcanbe resolvedlocally. If not,itcallsthe Azure Service if
the networkmode allowsit.
13.4 Blocked Phone Number handling
Whena numberismarkedas blocked,the BingPhoneBookclientcallsthe BingPhoneBookService
synchronously. If thenumberhasn’tbeen uploaded asyet,then will AddToBlockListhandlethat
appropriately?
13.5 Special Scenario Handling
13.5.1 LowBatteryConditions
DialerAppplicationwill call SetNetworkStatuswiththe appropriatesettings.
13.5.2 Shutdown
Dialerapplicationcalls shutdown. CurrentCallerIdService instance becomesinvalidpostthiscall. New
instance shouldbe requestedviaGetCallerIdServiceInstance.
13.5.3 Lost Phone Scenario
13.5.4 Dual Sim Scenario
13.6 Client Side Performance Metrics
13.6.1 Metrics
Here is a listof clientside performance metricsthatthe instrumentationwill support:
13.6.2 Client Side Metrics Framework
13.7 Other Client Side Telemetry
The Bing Phone Bookclientwill use the same clienttelemetryframeworkasthe Skype DialerClient
does.
14 Bing Phone Book Service Design Details
14.1 Deployment Topology and Sizing
 3 Instancesof WebRole : A1
 1 Instance of Workerrole : A0
 1 Instance of MFC Role:A0 -> thiscouldbe foldedasoptional inthe webrole?
14.2 REST APIs
API Type Request Response
Register POST UserName,UserPhoneNumber,
UserAppToken(UserSkypeToken)
SyncContacts POST UserPhoneNumber,Contacts
SyncBlockedList POST UserPhoneNumber,BlockedList
GetTopSpammers GET UserPhoneNumber,N PhoneNumberEntityList
LookupPhoneNumber GET UserPhoneNumber,
PhoneNumberToLookup
PhoneNumberEntity
The REST GET APIs,inthiscase,GetTopSpammers, will be designedtoleverage HTTP andserverside
caching. [Are we sending thelast modified date or an etag header??Notcritical initially, since the
frequency of thecall is not thathigh,butwith high volumeof users,it mightstill be useful]
Comment: Currently,the orderof parametersisswitched?Isthere areasonfor that? Wouldbe nice to
be consistent
14.3 Name Inferencing Engine
The name inferencingengine evaluatesmultiple rowsforagivenphone numberinthe
PhoneNumberDataTable forthe listof phone numbers inthe PhoneNumbersUpdatedTable and
generatesaninferencedname.
The current algorithm isfairlysimple. Itgeneratesaninference name forboth full name andfirstname,
if more than some threshold of entities(currentthresholdis5) have the same name set as the full name
or the firstname.
Currently we do not receive or usethe first and last name information fromthelocal contactlist.
Q: is the inferencingengine doingbulkreads/updates? Itisimportantthatthe Bing Phone BookService
serveslookupsatlowlatencyevenwhile the Inferencingengineisupdatingentries. Potentiallythiscan
cause latencyissues.
14.4 Spammer Prediction
The spam count fora numberisincrementedif itisfoundtobe on the blockedlistfora givenuser. If
the spam countfor a numberisgreaterthan the threshold(whatisthisvalue currently),itisassumedto
be spam. The top spammerlistiscalledbythe BingPhone BookService. Isthis doneperiodically? Does
it cachethis on the server side?
14.5 SQL Azure Storage Data Layer Overview
Do we have anestimate of what percentage of these numbers wouldbe spam?
14.5.1 Options
Followingoptionsare plausibleandwere evaluated:
SQL Azure
SQL Azure isreallynota desirable optionforover500 GB of data, bothfrom performance andfrom
COGs perspective.
Azure Tables
Azure essentiallyprovidesasimple table indexedbypartitionandrow key. A single batchrequestmay
containonly100 entities(andnotexceed4MBinsize). Azure Tablessupportaqueryfilteroptionbut
not an orderingoperation.
For the lookuptable,the partitionkeycouldbe HASHof the firstdigitsof the phone numberandthe
row keythe phone number.
For the data table as well,the partitioncouldbe the HASHof the phone numberandthe row keythe
row id.
To calculate topspammers,a workerrole wouldneedtoreadintomemorythe spammers. The reads
wouldneedtobe done as batchedreads. The workerrole wouldeventuallywritethistopspammerslist
intoa separate table. The top spammerlistcanalsobe cachedon the serverside.
ObjectStore
TBD.
Three tablesinSQL Azure are usedby the BingPhone BookService:
 PhoneNumberEntityTable : Lookupsare servedfromthis.
 PhoneNumberDataTable :All syncdata(contacts,suggestion) isinsertedintothistable
 PhoneNumberUpdated :Usedby the InferencingEngine todetectwhetheraphone number
entryinthe data table hasupdatesor not
14.5.2 Phone Number Entity Table Schema
Seminal Details
 A filterednonclusteredindex based‘IsSpam’ columnis usedtoimprove performance forordering
thisspammersbythe spam count.
 A separate table calledphone numbersupdatedisusedbythe inferencingengine tolimitthe re-
inferencingoperationstorunonlyforimpactednumbers.
ColumnName ColumnType IsNullable Comments
PhoneNumber Nvarchar(20) Notnull,
Primary
Key
RegisteredName Nvarchar(max) null RegisteredName of the phone numberif registered ownull
InferredName Nvarchar(max) null Inferredname of the phone numberif naybythe inference
model
IsSpam Bit not null Is phone numberaspam detectedbyspaminference (isthisa
sparse column?)
SpamCount Bigint not null Spam countfor the phone number
Locality Nvarchar(50) null Localityof the phone number(mainlyinsertedforBingLocal
Entities)
City Nvarchar(30) null cityof the Phone Number(mainlyinsertedforBingLocal
Entities)
UserAppToken Nvarhcar(50) null An unique tokengivenby the appfor the corresponding
phone number- WHATIS THIS FOR?
LastUpdatedTime Datetime not null The time whenthisrow was lastupdated.
IndexesonPhoneNumberEntityTable:
Column IndexType Comments
PrimaryPhone Clustered As it’sa primarykey
UserAppToken Non-clustered To do lookup forthe givenuser
app tokenandgettingthe
phone numberandotherinfo
IsSpam Non-Clustered,Filtered(where
IsSpam=1)
To easilygettopspammers
14.5.3 PhoneNumberDataTable Schema
ColumnName ColumnType IsNullable Comments
RowId Bigint Notnull,
PrimaryKey
Requiredforhavingatleastone
primarykey
PhoneNumber Nvarchar(20) Notnull
ContactName Nvarchar(max) Null
TypeOfPhoneNumber Int Notnull the type of phone number
definedbelow:
Unknown= 0,
Mobile = 1,
Home = 2,
Work = 3,
Company= 4,
HomeFax = 5,
WorkFax = 6
Source Nvarchar(max) Null source of the phone numberlike
Facebook,Outlook/Spam
LastUpdatedTime DateTime Notnull Whenwas lasttime thisrowwas
updated
IndexesonPhoneNumberData:
Column IndexType Comments
RowId Non-Clustered PrimaryKey
PhoneNumber Clustered For fastaggregator operation
on phoneNumberusedin
inferencing
14.5.4 PhoneNumberUpdated Table Schema
ColumnName ColumnType IsNullable Comments
PhoneNumber Nvarchar(20) Notnull,PrimaryKey
HasUpdates Bit NotNull Setto 1 if there is
some updatesin
othertable,resetto
0 byinference once
done.
Indexes:
Column IndexType Comments
PhoneNumber Clustered As itis a Primarykey
14.5.5 SQL Maintenance
Do we force run updatestatistics? When? How do we manage index fragmentationthatwill be
causedby repeatedupdates? Inour case,how bad will the index fragmentationbe?
14.6 Security
14.6.1 Implicit Grant Type Sequence
Before a clientapplicationcanrequestaccesstoresources ona resource owner,the clientapplication
mustregisterwiththe authorizationserverassociatedwiththe resource server. Atthe time of
registration,the clientapplicationisassignedaclientidanda clientsecretbythe authorizationserver.
The clientidand the secretisunique tothe clientapplicationonthatthat authorizationserver. Itis
importantthatthe clientidentityandthe secretgeneratedbythe providerisnotsharedwithanyone.
In our case,since the app isdeployedandisa java,it ispreferable nottouse the secret. For thisreason,
the OAUTH ImplicitGrantType isrecommendedformobile ordesktopapplications.
Duringthe time of registration,the clientalsoregistersaredirectURI. This redirectURIis usedwhena
resource ownergrantsauthorizationtothe clientapplication.Whenaresource ownerhassuccessfully
authorizedthe clientapplicationviathe authorizationserver,the resource ownerisredirectedbackto
the clientapplication,tothe redirectURI.
Note that the authorization service willonlyredirectuserstoa registeredURI,whichhelpsprevent
some attacks.Any HTTP redirectURIsmust be protectedwithTLSsecurity,sothe service will only
redirecttoURIs beginningwith"https".Thispreventstokensfrombeinginterceptedduringthe
authorizationprocess.
An implicitauthorizationgrantissimilartoan authorizationcode grant,exceptthe accesstokenis
returnedtothe clientapplicationalreadyafterthe userhasfinishedthe authorization.The accesstoken
isthus returnedwhenthe useragentisredirectedtothe redirectURI.
Thisof course meansthat the accesstokenis accessible inthe useragent,ornative application
participatinginthe implicitauthorizationgrant.The accesstokenisnot storedsecurelyonawebserver.
Furthermore,the clientapplication need onlysenditsclientIDtothe authorizationserver.If the client
were tosendits clientsecrettoo,the clientsecretwouldhave tobe storedinthe useragentor native
applicationtoo.Thatwouldmake itvulnerable tohacking.
14.6.2 Using Skype Dialer Authentication – Multiple Resource Server Support
15 Server Side Performance Metrics
16 Monitoring
Azure DiagnosticFrameworkandAzure MonitoringFrameworkwillbe usedtoinstrument, persistand
monitorthe service.

More Related Content

PDF
Report on dotnetnuke
PDF
Linux training
PDF
School software
DOC
Sap hr implementation config rc - Aditi Tarafdar
PDF
Rails Cookbook
PDF
Akeeba backup-guide
PDF
MXIE Phone User's Manual
PDF
I ntools+v+7+tutorial
Report on dotnetnuke
Linux training
School software
Sap hr implementation config rc - Aditi Tarafdar
Rails Cookbook
Akeeba backup-guide
MXIE Phone User's Manual
I ntools+v+7+tutorial

What's hot (20)

PDF
Fimmda
PDF
Google Search Quality Rating Program General Guidelines 2011
PDF
Glogster edu-users-guide
PDF
SchoolAdmin - School Fees Collection & Accounting Software
DOCX
Bhuma learning portal_ui
PDF
Analysis Of The Modern Methods For Web Positioning
PDF
Test and target book
PDF
Drools expert-docs
PDF
Manual
PDF
First7124911 visual-cpp-and-mfc-programming
PDF
How to manage future grid dynamics: system value of Smart Power Generation in...
PDF
Cv 22 user manual v1.0220081021124358
PDF
Cognos v10.1
PDF
Openbravo for Retail Solution Description (RMP19)
PDF
Ppm7.5 demand cg
PDF
Ppm7.5 cmd tokval
PDF
Abcsubmit User Manual - Documentation
PDF
R Ints
PDF
Course lab 2_guide_eng
Fimmda
Google Search Quality Rating Program General Guidelines 2011
Glogster edu-users-guide
SchoolAdmin - School Fees Collection & Accounting Software
Bhuma learning portal_ui
Analysis Of The Modern Methods For Web Positioning
Test and target book
Drools expert-docs
Manual
First7124911 visual-cpp-and-mfc-programming
How to manage future grid dynamics: system value of Smart Power Generation in...
Cv 22 user manual v1.0220081021124358
Cognos v10.1
Openbravo for Retail Solution Description (RMP19)
Ppm7.5 demand cg
Ppm7.5 cmd tokval
Abcsubmit User Manual - Documentation
R Ints
Course lab 2_guide_eng
Ad

Viewers also liked (17)

PDF
Graduation projects in Crispico
PDF
REVERSE MORTGAGES AND RETIREMENT FINANCIAL PLANNING
PPTX
Centro de triagem do lumiar
PDF
Planificando bajo TPACK
PPT
小姪女 090709
PPTX
BEE KEEPER CANDLES POWER POINT 4
DOCX
PFuller Portfolio
PPT
小姪女
PPTX
Mercado romano
PDF
GM Certificate
PPT
Social media for communal leaders (3)
PPTX
Valor sul amadora
KEY
Consumentengedrag perceptie
ODP
Tópicos literarios
PPS
O lapis
PPTX
"Конфетти" коллекция Bremani
PPTX
Sistem Informasi Manajemen
Graduation projects in Crispico
REVERSE MORTGAGES AND RETIREMENT FINANCIAL PLANNING
Centro de triagem do lumiar
Planificando bajo TPACK
小姪女 090709
BEE KEEPER CANDLES POWER POINT 4
PFuller Portfolio
小姪女
Mercado romano
GM Certificate
Social media for communal leaders (3)
Valor sul amadora
Consumentengedrag perceptie
Tópicos literarios
O lapis
"Конфетти" коллекция Bremani
Sistem Informasi Manajemen
Ad

Similar to Bing Phone Book Service Arch Spec (20)

PDF
Solving the BYOD Problem with Open Standards
PPTX
Real time voice call integration - Confoo 2012
PPTX
Session Initiation Protocol - In depth analysis
PPTX
Avaya Sip Within Your Enterprise
PPTX
Session initiation protocol
KEY
Sip2012 :: outbound
PPT
Introduction To SIP
PPT
Si pp introduction_2
PDF
Rethinking the PBX
PDF
FreeSBC How To - Advanced SIP Routing
PDF
Vo Ip Partner Information Pack
PPTX
FreeSBC How To - Advanced SIP Routing
PDF
White label reseller pdf, v10
PDF
Scanning The Intertubes For Voip
DOC
Ring Central Hosted Vo Ip
PDF
Sip for mobile applications
PDF
Nllug 2010 - Web-services bootcamp
PDF
Nllug 2010-web-services
PDF
Difference between class 4 and class 5 softswitch
PDF
3CX Basic Notes
Solving the BYOD Problem with Open Standards
Real time voice call integration - Confoo 2012
Session Initiation Protocol - In depth analysis
Avaya Sip Within Your Enterprise
Session initiation protocol
Sip2012 :: outbound
Introduction To SIP
Si pp introduction_2
Rethinking the PBX
FreeSBC How To - Advanced SIP Routing
Vo Ip Partner Information Pack
FreeSBC How To - Advanced SIP Routing
White label reseller pdf, v10
Scanning The Intertubes For Voip
Ring Central Hosted Vo Ip
Sip for mobile applications
Nllug 2010 - Web-services bootcamp
Nllug 2010-web-services
Difference between class 4 and class 5 softswitch
3CX Basic Notes

More from Sunita Shrivastava (7)

DOCX
Cognito Unified API Specification
PPTX
Dev Analytics Overview
PPTX
Dev Analytics Aggregate DB Design Analysis
PPTX
Logical Architecture for Protection
DOCX
Search Approach - ES, GraphDB
PPTX
Index Provisioning for ALM Search - My Presentation
PPTX
ALM Search Presentation for the VSS Arch Council
Cognito Unified API Specification
Dev Analytics Overview
Dev Analytics Aggregate DB Design Analysis
Logical Architecture for Protection
Search Approach - ES, GraphDB
Index Provisioning for ALM Search - My Presentation
ALM Search Presentation for the VSS Arch Council

Bing Phone Book Service Arch Spec

  • 1. BING PHONE BOOK SERVICE [Document subtitle] Ashish Shah, SunitaS, VinaSura, BhJoshi [Email address] Abstract An overview of the BingPhone Book Service. The firstversion is meantto supportthe Skype Dialer scenarios.
  • 2. 1 Contents 2 Goals..........................................................................................................................................3 3 Caller ID Functionality and Graph Inference..................................................................................3 4 Spam Inference for Blocked Callers...............................................................................................4 5 High Level Workflows..................................................................................................................4 6 Architecture Overview.................................................................................................................5 7 High Level Design Overview .........................................................................................................6 7.1 Client Service Workflows......................................................................................................6 7.2 Inferencing Engine Workflow................................................................................................8 8 Performance and Scale Targets for V1 Release..............................................................................8 8.1 Lookup Performance Goals...................................................................................................8 8.2 Server Call Load...................................................................................................................8 8.3 Address book Sync Load.......................................................................................................9 8.4 Storage Capacity..................................................................................................................9 9 Bing Phone Book Data Storage Model...........................................................................................9 9.1 Target Storage Capacity Requirements..................................................................................9 9.2 Bing Phone Book Service Storage Design...............................................................................9 9.3 Bing Phone Book Service Data Storage..................................................................................9 10 Geo Scale-Out Model.............................................................................................................10 10.1 Centralized........................................................................................................................10 10.2 Federated Model...............................................................................................................11 11 Security.................................................................................................................................11 11.1 Data Privacy ......................................................................................................................11 11.2 Authenticating the Client Application and User....................................................................12 11.3 Transport Security .............................................................................................................13 12 Backup/Disaster Recovery Requirements................................................................................13 13 Bing Phone Book Client Design Details....................................................................................14 13.1 Overview - Periodic Uploads and Downloads.......................................................................14 13.2 Caching.............................................................................................................................14 13.3 NameLookup.....................................................................................................................14 13.4 Blocked Phone Number handling........................................................................................14 13.5 Special Scenario Handling...................................................................................................14 13.5.1 LowBatteryConditions.................................................................................................14
  • 3. 13.5.2 Shutdown ..................................................................................................................14 13.5.3 Lost Phone Scenario....................................................................................................14 13.5.4 Dual Sim Scenario.......................................................................................................14 13.6 Client Side Performance Metrics.........................................................................................15 13.6.1 Metrics......................................................................................................................15 13.6.2 Client Side Metrics Framework....................................................................................15 13.7 Other Client Side Telemetry................................................................................................15 14 Bing Phone Book Service Design Details..................................................................................15 14.1 Deployment Topology and Sizing........................................................................................15 14.2 REST APIs ..........................................................................................................................15 14.3 Name Inferencing Engine ...................................................................................................15 14.4 Spammer Prediction ..........................................................................................................16 14.5 SQL Azure Storage Data Layer Overview..............................................................................16 14.5.1 Options......................................................................................................................16 14.5.2 Phone Number Entity Table Schema............................................................................17 14.5.3 PhoneNumberDataTable Schema................................................................................18 14.5.4 PhoneNumberUpdated Table Schema........................................................................19 14.5.5 SQL Maintenance .......................................................................................................19 14.6 Security.............................................................................................................................19 14.6.1 Implicit Grant Type Sequence......................................................................................19 14.6.2 Using Skype Dialer Authentication – Multiple Resource Server Support.........................20 15 Server Side Performance Metrics............................................................................................20 16 Monitoring............................................................................................................................20
  • 4. Bing Phone Book Service for Skype Dialer 2 Goals The main purpose forthisservice forthe V1 release istoprovide the followingfunctionalitytothe Skype Dialer: 1) Provide CallerIdlookup byphone numberforincomingandoutgoingcalls (forpeopleaswell as local businessphone numbers) 2) Provide acrowd-sourcedspamdetectionandtaggingmechanismforincomingcallers. 3) Provide round-trippable storage forincomingcall blocklistfor all registeredusers,withimplicit portabilityacross the users devices. Designgoalsforthe service: 1. Be agnosticof the callingapplication.There’snoimplicitorexplicitdependencyinthe designon Skype infrastructure orcouplingwiththe Skype Dialer.The same service canbe leveragedinthe future foranotherdialerendpoint,e.g.,for the native WindowsPhone dialer. 2. Be agnosticof the device ecosystemandclientOS – specificallynodependencyhasbeentaken on Google servicesinthe service. 3. Keepdesignscalabletoglobal audienceandnotspecifictoIndia.Designandimplementation shouldscale beyondIndiamarketwithoutanycode level changes.There maybe marketspecific configfileswhichcanbe extendedtonew marketsasneeded. 4. ClientAPIcomponentsshouldbe designedinaway thatcallingapplicationcanenforce required SLAsfor batteryand network/datausage. 5. Ensure appropriate securityof userprivate data. 3 Caller ID Functionality and Graph Inference In orderto serve the primarypurpose of providingcallerIDinformationforphone numbers,the phone bookservice reliesonthree inputs: 1. Informationfromregisteredusers(asenteredwhenthe userfirstsignsupforSkype Dialer). 2. Local businessinformationseededfromthe BingLocal index.E.g.,we have seeded~5Mlocal businessnumbersforIndiaintothe callerIDservice already. More can be seededforother marketsas we scale the service globally. 3. Highconfidence name inferenceonthe phone-numberlinkedgraphcrowdsourcedfrom registereduser’saddressbooks.Due tothe factthat differentusersmaystore the same number withdifferentnames intheirphone bookswe cannotimplicitlytrustthe uploadedname associationfroma single addressbook.However,ahighconfidence inference canbe drawn froma groupof uploadednamesfrommanyaddressbooks. Followingpicturedemonstratesthisprocess.Usernamesin thiscolorare registeredusers,namesin this colorare highconfidence inferrednames.
  • 5. 4 Spam Inference for Blocked Callers The Dialerproviderfunctionalitytoblockincomingcallsfromspecificphone numbers.The phone book service leveragesthisfunctionalitytocreate acrowd-sourcedinferenceforspamcallers.Everytime a userblocksan incomingnumber,theycontributeone spamvote forthatnumberintothe inference system.We uploadthe user’sblocklisttothe phone bookservice atregularintervals. Aggregate informationfromuserblocklistsisusedtodoa simple inference basedonnumberof users that have blocked aspecificnumber.Asthe systemmatures,we candomore sophisticatedspam inference basedonwhethernumberistoll-free number,trustlevel forthe originatinguser,knowntele- marketingprefixes,etc. Once the spam inference runs,itaddsa spamflag and spamvote count for all numbersthatwere inferredasspamoriginators.The same informationisreturnedfromthe callerIdlookupcall tothe phone bookservice andusedbythe Dialerto show spamstatusfor incomingcallers. The service alsohasan abilitytopre-loadaset of knowntopspammersas a white list(non-inferred). For India,we have sourceda listof ~500 top spamnumbersandseededintothe phone bookservice. 5 High Level Workflows Basedon above functionality,thereare 4 basicapp workflowsthatare relevantforthe CallerIdservice: - NewUserRegistration o Dialerwill registernewuserswiththe callerIDservice usinganormalizedphone numberas the userID, no passwordrequired. o User authenticatedwithMSA usingMSA token. - Phone Book Sync
  • 6. o Dialerwill sync(upload) the user’slocal phone booktothe callerIDservice atregular intervals,includinguser’scontactswithphone numbersfromall sources(local phone book,outlook.comaddressbook,facebookcontacts,linkedincontacts,etc.). - CallerIdLookup o For all incomingandoutgoingcallswhere the incomingoroutgoingcall numberisnotin the user’slocal phone book,Dialerwill call the service foracallerId lookup(with normalizedphone numberaskey). o If phone numberentryexistsinthe clouddirectory,CallerIDservice willreturn:  Registeredname (if available)  Inferredname (if available)  Spam flagandspam count(if available) - Phone NumberBlocking o Everytime the userblocksan incomingnumber,Dialerwill updatethe caller IDservice withthat information.Internally,everyblockactiontranslatesintoaspamcount incrementforthe blockednumber. 6 Architecture Overview For interface simplicityandwire protocol abstraction,we have providedaclient javalibraryforthe phone bookservice,whichunderthe coverscallsRESTful webservice APIshostedonAzure.Currently we have a clientlibraryimplementationforAndroid(4.0+).Same canbe easilyportedovertoWindows Phone,iOS,etc. The Bing Phone Service isanAzure PaaSservice hostingRESTful endpointin Azure WebRoles. Multiple service instance roleswill be usedforload-balancingandhighavailability,leveragingAzure’sauto-scale feature forloadelasticitywhile maintainingend-to-endperformance SLAs. The inferenceengine will be hostedinan Azure WorkerRole. The SNR APIsprovide aREST endpoint Currentversionof the service usesSQLAzure asits data store (tobe swappedoutwithBingObjectStore for higherscalabilitypost-V1release).SQLAzure provideshighavailabilityandreliabilitybydesign, howeverithaslimitsonthe storage itcan support. The SQL store will be partitionedandsized accordingto V1 scale and performance requirements.
  • 7. Bing Phone Book Service Worker Role Web Role Web RoleWeb Role Skype Dialer DB(SQL Azure) REST APIS Bing PhoneBook Client MFC Test Role Inferencing Engine Bing Service SNR REST APIS In addition,anAzure Storage Accountisprovisionedtosupportpersistence of logging/diagnosticdata. Otherscale out optionsandstorage designoptionsare discussedlater. 7 High Level Design Overview The Bing Phone BookService mustsupportthe fourworkflowsdescribedearlierinthe spec. Essentially it needsto 1) Store the phone bookdata, bothcontact informationandblocklistinformationfromthe registeredusers 2) Host an Inferencingenginethatdoesname inferencingfornumbersuploadedbythe registered users 3) Buildthe Top Spammerlist The Bing Phone BookData Layerabstracts the storage from the service. The PhoneNumberUpdateTable storesthe storesthe contact data and blocklistinformation thatissyncedtothe cloud. The PhoneNumberEntityTable isthe table thatthe name inferencingenginecreatesandthe one thatis used for CallerIDlookup. The TopSpammerlistisbuiltandrefreshed onthe server. Itisalsocached onthe server. 7.1 Client Service Workflows The highlevel interactionsbetweenthe Client,the BingPhone BookService andthe BingPhone Book Data Layer are capturedhere. Theyare discussedindetail inthe ClientDesignsection.
  • 8. SkypeDialer App BingPhoneBookClient BingPhoneBook Service GetCallerIdServiceInstance(phonenumber, appcontext) RegisterUserAsync(UserName, UserPhoneNumber) RegisterUser(UserName, UserPhone) SyncWithServiceAsync GetTopSpammers( UserPhoneNumber,N) ArrayOf(PhoneNumberEntity) [diffnotEmpty] SyncContacts(UserPhoneNumber, Contacts) SyncBlockedLists(UserPhoneNumber, BlockedList) LookupPhoneBookEntryAsync(PhoneNumber) SuggestNameForPhoneNumber ArrayOf(PhoneNumberEntity) [notincache, notinblockedlist, notintopspammerlist] LookupPhoneNumber(UserPhoneNumber, PhoneNumber) [PhoneNumberEntity] Shutdown [PhoneNumberEntity] Bing Phone Book Service Data Layer GetPhoneLocation cachedlocationforphone UpdateNetworkStatus AddtoBlockList AddToPhNumUpdatedTable AddtoPhNumDataTable AddToPhNumDataTable AddToPhNumEntityTable GetTopSpammersFromEntityTable MarkAsSpaminPhNumEntityTablePhone Book Sync New User Registration Caller Id Lookup Incoming Call Blocking Provide Name For Number Low Battery Notification
  • 9. 7.2 Inferencing Engine Workflow Inferencing Engine Bing Phone Book Data Layer ReadPhoneNumbersForNameInferencing UpdatePhoneEntityNumberTableWithName(Is this a point to point update?) 8 Performance and Scale Targets for V1 Release Belowisa listof assumptionsthatwill be usedtovalidate the designanddrive the scale and performance testmetrics. Note thatthe numbersare purposefullykeptatthe higherendtoensure that the querylatencyforlookupcallsisacceptable underadverse loadconditions. We will assumea scale target of 10M registered users for the V1 release. 8.1 Lookup Performance Goals The initial end-to-endlatencygoal forthe callerIDlookupisfor90th percentile lookupstobe within 150ms round-triptime onWi-Fi andLTE/3G networkconnections,and250ms round-triptime for2G networkconnections.Round-triptimesare measuredfromthe time the clientlibraryAPIisinvokedto havingthe callerIdresponse availableinthe Dialer. 8.2 Server Call Load We will assume thatonan average a single userwill receiveormake 20 callsperday.Of these we will furtherassume thatat most 10 callswill resultsinacalleridlookup(i.e.numbernotinthe user’slocal phone book). Lastlywe will assume thatmostof these callswill be made duringdaytime,spanninga ~12 hourtime period. Above assumptionsdictateanaverage rate of ~250 qps (queriespersecond)for the callerIDlookup operation forevery1Mregisteredusers(10* 1M / 12 * 3600). For 10M registeredusers,that translates to a target capacity of~2500 QPS.
  • 10. 8.3 Address book Sync Load We will upload/downloadphone bookdatatwice aday at most. Thiswouldimply20 millionsync calls. Assumingthatall the sync callsare staggered,the syncrate at the service wouldbe ~230 (Max) Sync TopSpammer and Sync BlockedList calls per second. However,the clienttoservice syncisdesignedtoonlyuploadincremental changestothe user’saddress book – we expectonly1/4th of the sync callson the clienttoactuallyhave data to upload(andhence resultsina service call). Assumingthatall the synccallsare staggered withequal distributionovera24 hour period,above impliesanaverage rate of ~60 QPS for sync calls to the service. 8.4 Storage Capacity We will assume onanaverage userswill have 250 uploadable contactswithphone numbersintheir device phone book(includingcontactssyncedfromvariouscloudservicessuchasemail services, facebook,etc.).Of these we willassume thatauserwill contribute upto100 unique new contactsto the service.Thisnumberwillstarttotrail downas the service andsize of the phone directorygrows. 9 Bing Phone Book Data Storage Model 9.1 Target Storage Capacity Requirements The total predictedSize forthe Phone NumberDataTable isaround 250 GB (10M users* 250 Entries* 100 Bytes/Entry). The total predictedsizeforPhone NumberEntityiscalculatedbased onthe assumptionthateachof the 10M userwill contribute atmost100 unique contactsandis expectedtobe around200GB (10 Users * 100 Unique EntriesPerUser * 200 Bytesperentry). Includingthe othertable sizes,we predictthe Storage Capacityrequirementstobe around500 GB. 9.2 Bing Phone Book Service Storage Design The Phone NumberEntityTable mustbe able to serve queriesata low latencytobe able tomeetthe latencySLAs. The bulkof writeswill be intothe Phone NumberDataTable. Note thatonlynumbersfor whichan inference name iscalculatedare stored orinserted intothe Phone NumberEntityTable. The phone numberentitytable insertionrate afteraninferencingrunshouldreduce overtime butmaybe detrimental tothe lookupperformance duringthe initialonboardingof new users. 9.3 Bing Phone Book Service Data Storage The current V1 implementation isbasedon SQLAzure. Appropriate numberof Standardor Premium Instanceswill be configuredtosupportthe latencyrequirements. Basedon the limitedcalculationsforthe V1release itisclearthat SQL Azure isnotthe correct longterm storage solutionforthe BingPhone BookService since the SQLAzure Data Size limitis250GB for Standardand 500 forPremium. The BingPhone BookService DataLayer abstractionisimportantfor thisreason. If size andlatencyrequirementsdictate, anAzure File Table Storage plugin will be builtfor V1. The schema andotherstorage relatedare discussed inmore detail inthe BingPhone BookService DesignDetails.
  • 11. Azure Table Store and ObjectStore are underthe scannerfor longtermstorage strategy for the Bing Phone Service. 10 Geo Scale-Out Model As the BING Phone bookservice growstoserve different geographies,the followingare optionscanbe consideredforGeoScale. 10.1 Centralized In thismodel,the inference enginewill runinone region,butthe lookupdataisreplicatedtodifferent regionsforreadscale out and forkeepingthe lookuplatencies low. PhoneNumberDataTable_RegionB Bing Phone Book ServiceBing Phone Book Service Web Role Bing Phone Book Service Worker Role Inferencing Engine Web Role Web RoleWeb Role Web Role Web Role Web Role Web RoleWeb Role Worker RoleWorker Role Region A Region B Region C PhoneNumberDataTable PhoneNumberEntityTable_Global PhoneNumberEntityTable_Global(Replica) PhoneNumberDataTable_RegionB(Replica) PhoneNumberDataTable_RegionC PhoneNumberEntityTable_Global(Replica) PhoneNumberDataTable_RegionB(Replica) The phone data isuploadedtothe Service Stampinthe closestregion. Thisdataneedstobe made available tothe stampthat will hostthe inferencingengine. Anotherwaytothinkaboutthisisto think of thisas twoservices:aFront End service thatperformscalleridlookupandphone datacollectionand a Backendservice(offline) thathoststhe inferencingengine andpreparesthe global phonenumber entitytable. The phone numberentitytableneedstobe replicatedfromRegionA tootherregions,in the picture shownabove.
  • 12. 10.2 Federated Model In thismodel,the InferencingEngine wouldrunineachregion. One optionisto use a Phone Number Prefix toRegionMap(is thisfeasible tobuildinthe mobileworld?) whereeachphone numberis mappedtoa region. A phone numberuploadisforwardedtothe stampbasedonthisPhone Number Map. A furthersimplificationtothiswouldbe tosimplydropthe numbersthatare not ownedbythe region. Itis unlikelythatpeople will store asignificantnumberof non-local numbersintheircontacts. Bing Phone Book ServiceBing Phone Book ServiceBing Phone Book Service Worker Role Inferencing Engine Web Role Web RoleWeb Role Web Role Web RoleWeb Role Web Role Web RoleWeb Role Worker RoleWorker Role Inferencing Engine Inferencing Engine Region A Region B Region C PhoneNumberDataTable_Region A PhoneNumberEntityTable_Region A PhoneNumberDataTable_Region B PhoneNumberEntityTable_Region B PhoneNumberDataTable_Region C PhoneNumberEntityTable_Region C The federatedmodel seemssuitablesince the name inferencingshouldlargelyrunlocallyandmost numbers thatare uploadedfromthe Phone Contactsare likelytobe local. Thiswouldimplythat potentiallylittle datahasto flowacrossdata centersas comparedtothe centralizedmodel. The disadvantage wouldbe thatthe inferencingof nameswouldbe local. Costwise,the federatedmodel seemstobe more desirable. 11 Security 11.1 Overview 11.2 Data Privacy Data privacyis ensured thoughStorage LayerAccessrestrictions. Accesstothe Data Store isrestricted to the Service anda fewrestrictedusers. The publicAPIsthatthe service exposesdonotprovide access to any tenantspecificdata,butonlytothe data thatthe inferencingandspampredictionengine generate. The BingPhone Bookservice willfollow the recommendedsecurityguidelines. For SQL Azure DBs, firewall rulescanbe setuptorestrictaccess to a setof IPAddresses,sothatonlythe BingPhone Bookservice,WebandWorkerroleshave accessto the DB. The V1 implementation
  • 13. supportsfirewalling. Inaddition, SQLAzure onlysupportsSQLserverauthenticationsocare needstobe takenthat the passwordisnot compromised. WithAzure Tables,standardAzure Protectionmechanismsarounduse of primary/secondarykey,key rolloverandkeyregenerationwill be usedtoprotectaccessto the data. Otheroperational procedures will be usedtocontainkeymanagementoperationsandvisibilitytoarestrictedsetof personnel. In addition, configurationwithinaprivate VNETandAzure RBACsecurityinvestigationsare inprogressto hardenthe access to the Data Store. 11.3 Authenticating the Client Application and User (RPS) The Skype Dialerisusingthe RelyingPartySuite of authenticationprotocol. WithRPS, the DialerAppcan getticketsfora givenservice once the userisloggedintoMSA andhis access token hasbeenretrieved. NextSteps: Chose AuthenticationPolicy:MBI_SSL (24 hourrefresh,noforce signinonexpiry,tickettype compact) <AuthPolicy>MBI</AuthPolicy> in rpcserver.xml Live Authconfigurationpropertiesinweb.config Smart ClientProtocol setup 11.4 Authenticating the Client Application and User (OAUTH) The phone bookservice needstoauthenticate the clientapplicationandthe userto ensure that authorizedusersorrogue usersare notinvokingthe service endpoints withpotential maliciousintent. We assume thatthe Skype DialerapplicationrequiresaMicrosoftAccountand that the BingPhone Book Service will use the WindowsLive Service asthe AuthenticationServer. In the OAuthTerminology,the BingPhoneBookclientlibrary wouldbe the ClientandBingPhone Book Service wouldbe the Resource Server,asshowninthe picture below. However,itisthe Skype Dialer Applicationwhichactuallydrivesthe userauthenticationprocesswiththe AuthorizationServer. Investigations are inprogressasto howOAUTH supportsmultiple resourceserversandif we can leverage that.
  • 14. Bing Phone Book (Client) Windows Live (Authorization Server) Bing Phone Book Service (Resource Server) Access App(Client Id, RedirectURI) User Agent Login via Auth Server Login Via Auth Server Redirect to Client Redirect URI, + Access Token Access Redirect URI+Access Token HTLM doc, with embedded script Extract the Access Token For desktopandmobile applications,the OAUTHimplicitgranttype sequence isrecommended. For the BingPhone BookService, the resourcesmade available throughitsPublicAPI,donotrelate toa specificuser. Inthat context,agrant type of ClientCredential forgenericapplicationaccessmight suffice. If userauthorizationisrequired,thenthe ImplicitGrantrequestsequence forOAUTH,as describedabove,maybe used. [TBD. Doesthe grant type of clientcredentialsrequire the client secret?] RefertoSupportfor ImplicitGrantOAuth2.0 forWindows Live/MicrosoftAccountService fora descriptionof the implicitgrantflow. The detailsof the OAuthImplicitgranttype are discussedinthe SecuritySectionsunderService Design Details. 11.5 Transport Security Https isusedto provide serverauthenticationtopreventman-in-middleattackandit providesfor encryptionof communicationbetweenthe clientandserver,whichensuresthatthere cannotbe any eavesdroppingortamperingof contents. OAUTH2.0, in any case, requiresSSLsecurity,forthe reasons describedabove. 12 Backup/Disaster Recovery Requirements TBD
  • 15. 13 Bing Phone Book Client Design Details 13.1 Overview - Periodic Uploads and Downloads A mobile device (orSIM) isregisteredwiththe BingPhone BookService (BPBS) whenthe userfirstuses the SKYPE Dialer. Subsequently,the Bing PhoneBook Clientconnectstothe BingPhone BookService to uploadphone bookdata,to resolve namestophone numbers andtogetspammerinformation. The Skype Dialerapplicationcallsthe BingPhoneBook Clientperiodicallytouploadthe usersContacts and the user’sBlockedListtothe BingPhone BookService anddownloadthe TopSpammerList. Diffsare maintainedonthe clientsothatsubsequentlyonlychangesinthe contactlistare uploaded. Currentlythere isno supportforchange notifications usingwhich the clientcouldavoidcalculatingthe diff betweenthe current ContactListand the last uploadedContact List. Also,the frequencyof these uploadsanddownloadsare notconfigurable independentlyforContactList,BlockedListand TopSpammerList. These uploadsanddownloadsare staggeredfordifferentusersbyusingaHASH of the user’sphone numberandderivingatime of day for syncingthe phone bookdatato the Bing Phone BookService. 13.2 Caching The Phone BookService Clientcachesthe lastXnumberslookedupandtopspammerliston the client. The spammerlistinthe cache is updatedeverytime the clientsyncswiththe service. [Vinay,whatis the caching mechanism? Canyouadd the details? Whyisthe top spammerlistnotcachedon the serveras well ] 13.3 NameLookup On an incomingcall,the PhoneBookClient checksthe local cache tosee if the numberison the local blockedlistorthe spam listandif the lookupcanbe resolvedlocally. If not,itcallsthe Azure Service if the networkmode allowsit. 13.4 Blocked Phone Number handling Whena numberismarkedas blocked,the BingPhoneBookclientcallsthe BingPhoneBookService synchronously. If thenumberhasn’tbeen uploaded asyet,then will AddToBlockListhandlethat appropriately? 13.5 Special Scenario Handling 13.5.1 LowBatteryConditions DialerAppplicationwill call SetNetworkStatuswiththe appropriatesettings. 13.5.2 Shutdown Dialerapplicationcalls shutdown. CurrentCallerIdService instance becomesinvalidpostthiscall. New instance shouldbe requestedviaGetCallerIdServiceInstance. 13.5.3 Lost Phone Scenario 13.5.4 Dual Sim Scenario
  • 16. 13.6 Client Side Performance Metrics 13.6.1 Metrics Here is a listof clientside performance metricsthatthe instrumentationwill support: 13.6.2 Client Side Metrics Framework 13.7 Other Client Side Telemetry The Bing Phone Bookclientwill use the same clienttelemetryframeworkasthe Skype DialerClient does. 14 Bing Phone Book Service Design Details 14.1 Deployment Topology and Sizing  3 Instancesof WebRole : A1  1 Instance of Workerrole : A0  1 Instance of MFC Role:A0 -> thiscouldbe foldedasoptional inthe webrole? 14.2 REST APIs API Type Request Response Register POST UserName,UserPhoneNumber, UserAppToken(UserSkypeToken) SyncContacts POST UserPhoneNumber,Contacts SyncBlockedList POST UserPhoneNumber,BlockedList GetTopSpammers GET UserPhoneNumber,N PhoneNumberEntityList LookupPhoneNumber GET UserPhoneNumber, PhoneNumberToLookup PhoneNumberEntity The REST GET APIs,inthiscase,GetTopSpammers, will be designedtoleverage HTTP andserverside caching. [Are we sending thelast modified date or an etag header??Notcritical initially, since the frequency of thecall is not thathigh,butwith high volumeof users,it mightstill be useful] Comment: Currently,the orderof parametersisswitched?Isthere areasonfor that? Wouldbe nice to be consistent 14.3 Name Inferencing Engine The name inferencingengine evaluatesmultiple rowsforagivenphone numberinthe PhoneNumberDataTable forthe listof phone numbers inthe PhoneNumbersUpdatedTable and generatesaninferencedname. The current algorithm isfairlysimple. Itgeneratesaninference name forboth full name andfirstname, if more than some threshold of entities(currentthresholdis5) have the same name set as the full name or the firstname.
  • 17. Currently we do not receive or usethe first and last name information fromthelocal contactlist. Q: is the inferencingengine doingbulkreads/updates? Itisimportantthatthe Bing Phone BookService serveslookupsatlowlatencyevenwhile the Inferencingengineisupdatingentries. Potentiallythiscan cause latencyissues. 14.4 Spammer Prediction The spam count fora numberisincrementedif itisfoundtobe on the blockedlistfora givenuser. If the spam countfor a numberisgreaterthan the threshold(whatisthisvalue currently),itisassumedto be spam. The top spammerlistiscalledbythe BingPhone BookService. Isthis doneperiodically? Does it cachethis on the server side? 14.5 SQL Azure Storage Data Layer Overview Do we have anestimate of what percentage of these numbers wouldbe spam? 14.5.1 Options Followingoptionsare plausibleandwere evaluated: SQL Azure SQL Azure isreallynota desirable optionforover500 GB of data, bothfrom performance andfrom COGs perspective. Azure Tables Azure essentiallyprovidesasimple table indexedbypartitionandrow key. A single batchrequestmay containonly100 entities(andnotexceed4MBinsize). Azure Tablessupportaqueryfilteroptionbut not an orderingoperation. For the lookuptable,the partitionkeycouldbe HASHof the firstdigitsof the phone numberandthe row keythe phone number. For the data table as well,the partitioncouldbe the HASHof the phone numberandthe row keythe row id. To calculate topspammers,a workerrole wouldneedtoreadintomemorythe spammers. The reads wouldneedtobe done as batchedreads. The workerrole wouldeventuallywritethistopspammerslist intoa separate table. The top spammerlistcanalsobe cachedon the serverside. ObjectStore TBD. Three tablesinSQL Azure are usedby the BingPhone BookService:  PhoneNumberEntityTable : Lookupsare servedfromthis.  PhoneNumberDataTable :All syncdata(contacts,suggestion) isinsertedintothistable  PhoneNumberUpdated :Usedby the InferencingEngine todetectwhetheraphone number entryinthe data table hasupdatesor not
  • 18. 14.5.2 Phone Number Entity Table Schema Seminal Details  A filterednonclusteredindex based‘IsSpam’ columnis usedtoimprove performance forordering thisspammersbythe spam count.  A separate table calledphone numbersupdatedisusedbythe inferencingengine tolimitthe re- inferencingoperationstorunonlyforimpactednumbers. ColumnName ColumnType IsNullable Comments PhoneNumber Nvarchar(20) Notnull, Primary Key RegisteredName Nvarchar(max) null RegisteredName of the phone numberif registered ownull InferredName Nvarchar(max) null Inferredname of the phone numberif naybythe inference model IsSpam Bit not null Is phone numberaspam detectedbyspaminference (isthisa sparse column?) SpamCount Bigint not null Spam countfor the phone number Locality Nvarchar(50) null Localityof the phone number(mainlyinsertedforBingLocal Entities) City Nvarchar(30) null cityof the Phone Number(mainlyinsertedforBingLocal Entities) UserAppToken Nvarhcar(50) null An unique tokengivenby the appfor the corresponding phone number- WHATIS THIS FOR? LastUpdatedTime Datetime not null The time whenthisrow was lastupdated. IndexesonPhoneNumberEntityTable: Column IndexType Comments PrimaryPhone Clustered As it’sa primarykey UserAppToken Non-clustered To do lookup forthe givenuser app tokenandgettingthe phone numberandotherinfo
  • 19. IsSpam Non-Clustered,Filtered(where IsSpam=1) To easilygettopspammers 14.5.3 PhoneNumberDataTable Schema ColumnName ColumnType IsNullable Comments RowId Bigint Notnull, PrimaryKey Requiredforhavingatleastone primarykey PhoneNumber Nvarchar(20) Notnull ContactName Nvarchar(max) Null TypeOfPhoneNumber Int Notnull the type of phone number definedbelow: Unknown= 0, Mobile = 1, Home = 2, Work = 3, Company= 4, HomeFax = 5, WorkFax = 6 Source Nvarchar(max) Null source of the phone numberlike Facebook,Outlook/Spam LastUpdatedTime DateTime Notnull Whenwas lasttime thisrowwas updated IndexesonPhoneNumberData: Column IndexType Comments RowId Non-Clustered PrimaryKey PhoneNumber Clustered For fastaggregator operation on phoneNumberusedin inferencing
  • 20. 14.5.4 PhoneNumberUpdated Table Schema ColumnName ColumnType IsNullable Comments PhoneNumber Nvarchar(20) Notnull,PrimaryKey HasUpdates Bit NotNull Setto 1 if there is some updatesin othertable,resetto 0 byinference once done. Indexes: Column IndexType Comments PhoneNumber Clustered As itis a Primarykey 14.5.5 SQL Maintenance Do we force run updatestatistics? When? How do we manage index fragmentationthatwill be causedby repeatedupdates? Inour case,how bad will the index fragmentationbe? 14.6 Security 14.6.1 Implicit Grant Type Sequence Before a clientapplicationcanrequestaccesstoresources ona resource owner,the clientapplication mustregisterwiththe authorizationserverassociatedwiththe resource server. Atthe time of registration,the clientapplicationisassignedaclientidanda clientsecretbythe authorizationserver. The clientidand the secretisunique tothe clientapplicationonthatthat authorizationserver. Itis importantthatthe clientidentityandthe secretgeneratedbythe providerisnotsharedwithanyone. In our case,since the app isdeployedandisa java,it ispreferable nottouse the secret. For thisreason, the OAUTH ImplicitGrantType isrecommendedformobile ordesktopapplications. Duringthe time of registration,the clientalsoregistersaredirectURI. This redirectURIis usedwhena resource ownergrantsauthorizationtothe clientapplication.Whenaresource ownerhassuccessfully authorizedthe clientapplicationviathe authorizationserver,the resource ownerisredirectedbackto the clientapplication,tothe redirectURI. Note that the authorization service willonlyredirectuserstoa registeredURI,whichhelpsprevent some attacks.Any HTTP redirectURIsmust be protectedwithTLSsecurity,sothe service will only redirecttoURIs beginningwith"https".Thispreventstokensfrombeinginterceptedduringthe authorizationprocess. An implicitauthorizationgrantissimilartoan authorizationcode grant,exceptthe accesstokenis returnedtothe clientapplicationalreadyafterthe userhasfinishedthe authorization.The accesstoken isthus returnedwhenthe useragentisredirectedtothe redirectURI.
  • 21. Thisof course meansthat the accesstokenis accessible inthe useragent,ornative application participatinginthe implicitauthorizationgrant.The accesstokenisnot storedsecurelyonawebserver. Furthermore,the clientapplication need onlysenditsclientIDtothe authorizationserver.If the client were tosendits clientsecrettoo,the clientsecretwouldhave tobe storedinthe useragentor native applicationtoo.Thatwouldmake itvulnerable tohacking. 14.6.2 Using Skype Dialer Authentication – Multiple Resource Server Support 15 Server Side Performance Metrics 16 Monitoring Azure DiagnosticFrameworkandAzure MonitoringFrameworkwillbe usedtoinstrument, persistand monitorthe service.