SlideShare a Scribd company logo
MachineLearning
inPHP
Poland,Warsaw,October2016
"Learn, someday this pain will be useful to you"
Agenda
• How to teach tricks to your PHP
• Application : searching for code in comments
• Complex learning
Speaker
• Damien Seguy
• Exakat CTO
• Static analysis of PHP code
MachineLearning
• Teaching the machine
• Supervised learning : learning then applying
• Application build its own model : training phase
• It applies its model to real cases : applying phase
Applications
• Play go, chess, tic-tac-toe and beat everyone else
• Fraud detection and risk analysis
• Automated translation or automated transcription
• OCR and face recognition
• Medical diagnostics
• Walk, welcome guest at hotels, play football
• Finding good PHP code
PhpApplications
• Recommendations systems
• Predicting user behavior
• SPAM
• conversion user to customer
• ETA
• Detect code in comments
RealUseCase
• Identify code in comments
• Classic problem
• Good problem for machine learning
• Complex, no simple solution
• A lot of data and expertise are available
SupervisedTraining
History
data
Training
ModelReal data Results
SupervisedTraining
History
data
Training
ModelReal data Results
TheFannExtension
• ext/fann (https://pecl.php.net/package/fann)
• Fast Artificial Neural Network
• http://leenissen.dk/fann/wp/
• Neural networks in PHP
• Works on PHP 7, thanks to the hard work of Jakub Zelenka
• https://github.com/bukka/php-fann
NeuralNetworks
• Imitation of nature
• Input layer
• Output layer
• Intermediate layers
NeuralNetworks
• Imitation of nature
• Input layer
• Output layer
• Intermediate layers
<?php 
$num_layers         = 1; 
$num_input          = 5; 
$num_neurons_hidden = 3; 
$num_output         = 1; 
$ann = fann_create_standard($num_layers, $num_input, 
                            $num_neurons_hidden, $num_output); 
// Activation function
fann_set_activation_function_hidden($ann, 
                                  FANN_SIGMOID_SYMMETRIC); 
fann_set_activation_function_output($ann, 
                                   FANN_SIGMOID_SYMMETRIC); 
Initialisation
PreparingData
Raw data Extract Filter Human review Fann ready
• Extract data from raw source
• Remove any useless data from extract
• Apply some human review to filtered data
• Format data for FANN
ExpertAtWork
// Test if the if is in a compressed format
// nie mowie po polsku
// There is a parser specified in `Parser::$KEYWORD_PARSERS`
// $result should exist, regardless of $_message
// TODO : fix this; var_dump($var);
// $a && $b and multidimensional
// numGlyphs + 1
//$annots .= ' /StructParent ';
// $cfg['Servers'][$i]['controlpass'] = 'pmapass';
// if(ob_get_clean()){
InputVector
• 'length' : size of the comment
• 'countDollar' : number of $
• 'countEqual' : number of =
• 'countObjectOperator' number of -> operator ($o->p)
• 'countSemicolon' : number of semi-colon ;
InputData
47 5 1
825 0 0 0 1
0
37 2 0 0 0
0
55 2 2 0 1
1
61 2 1 3 1
1
...
NumberOfInput
NumberOfIncomingData
NumberOfOutgoingData
 * (at your option) any later v
 * 
 * Exakat is distributed in the
 * but WITHOUT ANY WARRANTY; wi
 * MERCHANTABILITY or FITNESS F
 * GNU Affero General Public Li
 * 
 * You should have received a c
 * along with Exakat.  If not, 
 * 
 * The latest code can be found
 * 
*/ 
// $x[3] or $x[] and multidimen
//if ($round == 3) { die('Round
//$this->errors[] = $this->lang
BlackMagic
151
372000
0
// $X[3] Or $X[] And Multidimensional
EXT/FANN
It'sAComment
Training
<?php
$max_epochs         = 500000; 
$desired_error      = 0.001; 
// the actual training
if (fann_train_on_file($ann, 
                       'incoming.data', 
                       $max_epochs, 
                       $epochs_between_reports, 
                       $desired_error)) {
        fann_save($ann, 'model.out'); 
}
fann_destroy($ann); 
?>
Machine learning in php   php con poland
Machine learning in php   php con poland
Machine learning in php   php con poland
Training
• 47 cases
• 5 characteristics
• 3 hidden neurons
• + 5 input + 1 output
• Duration : 5.711 s
Application
History
data
Training
ModelReal data Results
Application
<?php  
$ann = fann_create_from_file('model.out');  
$comment = '//$gvars = $this->getGraphicVars();'; 
$input   = makeVector($comment); 
$results = fann_run($ann, $input);  
if ($results[0] > 0.8) { 
     print ""$comment" -> $results[0] n";  
}  
?>
Results>0.8
• Answer between 0 and 1
• Values ranges from -14 to 0,999
• The closer to 1, the safer. The closer to 0, the safer.
• Is this a percentage? Is this a carrots count ?
• It's a mix of counts…
ScoresDistribution
-16
-12
-8
-4
0
6 0 . 0 0 0 0 0 0
7 0 . 0 0 0 0 0 0
8 0 . 0 0 0 0 0 0
9 0 . 0 0 0 0 0 0
1 0 0 . 0 0 0 0 0 0
RealCases
• Tested on 14093 comments
• Duration 68.01ms
• Found 1960 issues (14%)
0.99999893
// $cfg['Servers'][$i]['controlhost'] = '';    
0.99999928
//$_SESSION['Import_message'] = $message->getDisplay();    
/* 0.99999928
if (defined('SESSIONUPLOAD')) { 
    // write sessionupload back into the loaded PMA session 
    $sessionupload = unserialize(SESSIONUPLOAD); 
    foreach ($sessionupload as $key => $value) { 
        $_SESSION[$key] = $value; 
    } 
    // remove session upload data that are not set anymore 
    foreach ($_SESSION as $key => $value) { 
        if (mb_substr($key, 0, mb_strlen(UPLOAD_PREFIX)) 
            == UPLOAD_PREFIX 
            && ! isset($sessionupload[$key]) 
0.98780382
//LEAD_OFFSET = (0xD800 - (0x10000 >> 10)) = 55232    
0.99361396
// We have server(s) => apply default configuration
    
0.98383027
// Duration = as configured    
0.99999928
// original -> translation mapping    
0.97590065
// = (   59 x 84   ) mm  = (  2.32 x 3.31  ) in 
TRUE POSITIVE FALSE POSITIVE
TRUE NEGATIVE FALSE NEGATIVE
FOUND BY
FANN
(MACHINE
LEARNING)
TARGET (EXPERT WORK)
TRUE
POSITIVE
FALSE
POSITIVE
TRUE
NEGATIVE
FALSE
NEGATIVE
FOUND BY
FANN
TARGET
0.99999923
0.73295981
0.99999851
0.2104115
// $cfg['Servers'][$i]['table_coords'] = 'pma__
//(isset($attribs['height'])?$attribs['height']
// if ($key != null) did not work for index "0"
// the PASSWORD() function  
Results
• 1960 issues
• 50+% of false positive
• With an easy clean, 822 issues reported
• 14k comments, analyzed in 68 ms (367ms in PHP5)
• Total time of coding : 27 mins.
// = (   59 X 84   ) Mm  = (  2.32 X 3.31  ) In    
/* Vim: Set Expandtab Sw=4 Ts=4 Sts=4: */
Learn Better,NotHarder
• Better training data
• Improve characteristics
• Configure the neural network
• Change algorithm
• Automate learning
• Update constantly
Real data
History
data
Training
Model Results
Retroaction
BetterTrainingData
• More data, more data, more data
• Varied situations, real case situations
• Include specific cases
• Experience is capital
• https://homes.cs.washington.edu/~pedrod/papers/
cacm12.pdf
ImproveCharacteristics
• Add new characteristics
• Remove the one that are less interesting
• Find the right set of characteristics
NetworkConfiguration
• Input vector
• Intermediate neurons
• Activation function
• Output vector
0
5 0 0 0
1 0 0 0 0
1 5 0 0 0
2 0 0 0 0
1 2 3 4 5 6 7 8 9 1 0
1 layer 2 layers 3 layers 4 layers
TimeOfTraining(Ms)
ChangeAlgorithm
• First add more data before changing algorithm
• Try cascade2 algorithm from FANN
• 0.6 => 0 found
• 0.5 => 2 found
• Not found by the first algorithm
• Ant colony, genetics algorithm, gravitational search,
artificial immune, nie mowie po polsku, annealing,
harmony search, interior point search, taboo search
FindingTheBest
• Test with 2-4 layers

10 neurons
• Measure results
0
2 2 5 0
4 5 0 0
6 7 5 0
9 0 0 0
1 2 3 4 5 6 7 8 9 1 0 11 1 2 1 3
1 layer 2 layers 3 layers 4 layers
DeepLearning
• Chaining the neural networks
• Translators, scorers, auto-encoders
• Unsupervised Learning
OtherTools
• PHP ext/fann
• Langage R
• https://github.com/kachkaev/php-r
• Scikit-learn
• https://github.com/scikit-learn/scikit-learn
• Mahout
• https://mahout.apache.org/
Conclusion
• Machine learning is about data, not code
• There are tools to use it with PHP
• Fast to try, easy results or fast fail
• Use it for complex problems, that accepts error
HTTP ://WWW.EXAKAT.IO
@EXAKAT
HTTP ://WWW.SLIDESH ARE.NET /DSEG UY/
PHP 7.1 P REPARATION WORKSHOP
Dzięki Czemu

More Related Content

What's hot (20)

PPTX
PHP 7 Crash Course - php[world] 2015
Colin O'Dell
 
PDF
Php through the eyes of a hoster pbc10
Combell NV
 
PDF
Magento code audit
Ecommerce Solution Provider SysIQ
 
PDF
Review unknown code with static analysis - bredaphp
Damien Seguy
 
PDF
Static Analysis of PHP Code – IPC Berlin 2016
Rouven Weßling
 
ODP
Caching and tuning fun for high scalability @ FrOSCon 2011
Wim Godden
 
PDF
Preparing for the next PHP version (5.6)
Damien Seguy
 
ODP
The why and how of moving to PHP 5.5/5.6
Wim Godden
 
PDF
PHPcon Poland - Static Analysis of PHP Code – How the Heck did I write so man...
Rouven Weßling
 
ODP
PHP Barcelona 2010 - Architecture and testability
Giorgio Sironi
 
PDF
PHP7 is coming
julien pauli
 
PDF
What is the Joomla Framework and why do we need it?
Rouven Weßling
 
PDF
Inside the JVM - Follow the white rabbit!
Sylvain Wallez
 
PPTX
Clojure 7-Languages
Pierre de Lacaze
 
PDF
Zend Certification PHP 5 Sample Questions
Jagat Kothari
 
PDF
2021.laravelconf.tw.slides2
LiviaLiaoFontech
 
PPTX
PHP 5.6 New and Deprecated Features
Mark Niebergall
 
PDF
PHP 8: Process & Fixing Insanity
GeorgePeterBanyard
 
PDF
PHP7 - The New Engine for old good train
Xinchen Hui
 
PHP 7 Crash Course - php[world] 2015
Colin O'Dell
 
Php through the eyes of a hoster pbc10
Combell NV
 
Review unknown code with static analysis - bredaphp
Damien Seguy
 
Static Analysis of PHP Code – IPC Berlin 2016
Rouven Weßling
 
Caching and tuning fun for high scalability @ FrOSCon 2011
Wim Godden
 
Preparing for the next PHP version (5.6)
Damien Seguy
 
The why and how of moving to PHP 5.5/5.6
Wim Godden
 
PHPcon Poland - Static Analysis of PHP Code – How the Heck did I write so man...
Rouven Weßling
 
PHP Barcelona 2010 - Architecture and testability
Giorgio Sironi
 
PHP7 is coming
julien pauli
 
What is the Joomla Framework and why do we need it?
Rouven Weßling
 
Inside the JVM - Follow the white rabbit!
Sylvain Wallez
 
Clojure 7-Languages
Pierre de Lacaze
 
Zend Certification PHP 5 Sample Questions
Jagat Kothari
 
2021.laravelconf.tw.slides2
LiviaLiaoFontech
 
PHP 5.6 New and Deprecated Features
Mark Niebergall
 
PHP 8: Process & Fixing Insanity
GeorgePeterBanyard
 
PHP7 - The New Engine for old good train
Xinchen Hui
 

Viewers also liked (19)

PPT
S3 Overview Presentation
bcburchn
 
PDF
Functional programming with php7
Sérgio Rafael Siqueira
 
PDF
Reactive Laravel - Laravel meetup Groningen
Jasper Staats
 
KEY
PHPSpec BDD for PHP
Marcello Duarte
 
PDF
Php 7.2 compliance workshop php benelux
Damien Seguy
 
PDF
Hunt for dead code
Damien Seguy
 
PDF
php & performance
simon8410
 
PDF
(Have a) rest with Laravel
Commit University
 
PDF
Php in the graph (Gremlin 3)
Damien Seguy
 
PDF
Static analysis saved my code tonight
Damien Seguy
 
PDF
Google Analytics Campaign Tracking Fundamentals
Kayden Kelly
 
PPTX
Google tag manager
Gusani Tech Ideation
 
KEY
Amazon's Simple Storage Service (S3)
James Gray
 
PDF
當六脈神劍遇上 PhpStorm
Oomusou Xiao
 
PDF
SEO com Resultados Reais - Webinar SemRush
SEO Master
 
PPTX
Intro to Bot Framework v3
Shahed Chowdhuri
 
PPT
Intro to Amazon S3
Yu Lun Teo
 
PDF
Php version 7
RANVIJAY GAUR
 
PDF
Kubernetes laravel and kubernetes
William Stewart
 
S3 Overview Presentation
bcburchn
 
Functional programming with php7
Sérgio Rafael Siqueira
 
Reactive Laravel - Laravel meetup Groningen
Jasper Staats
 
PHPSpec BDD for PHP
Marcello Duarte
 
Php 7.2 compliance workshop php benelux
Damien Seguy
 
Hunt for dead code
Damien Seguy
 
php & performance
simon8410
 
(Have a) rest with Laravel
Commit University
 
Php in the graph (Gremlin 3)
Damien Seguy
 
Static analysis saved my code tonight
Damien Seguy
 
Google Analytics Campaign Tracking Fundamentals
Kayden Kelly
 
Google tag manager
Gusani Tech Ideation
 
Amazon's Simple Storage Service (S3)
James Gray
 
當六脈神劍遇上 PhpStorm
Oomusou Xiao
 
SEO com Resultados Reais - Webinar SemRush
SEO Master
 
Intro to Bot Framework v3
Shahed Chowdhuri
 
Intro to Amazon S3
Yu Lun Teo
 
Php version 7
RANVIJAY GAUR
 
Kubernetes laravel and kubernetes
William Stewart
 
Ad

Similar to Machine learning in php php con poland (20)

PDF
Machine learning in php las vegas
Damien Seguy
 
PDF
[update] Introductory Parts of the Book "Dive into Deep Learning"
Young-Min kang
 
PDF
Machine Learning with Artificial Neural Networks in Clojure
Stefan Richter
 
PDF
Scaling Deep Learning with MXNet
AI Frontiers
 
PPTX
Practical ML
Antonio Pitasi
 
PPTX
Automatic Attendace using convolutional neural network Face Recognition
vatsal199567
 
PDF
Intro to TensorFlow and PyTorch Workshop at Tubular Labs
Kendall
 
PDF
Week 1.pdf
AnjaliJain608033
 
PDF
Main principles of Data Science and Machine Learning
Nikolay Karelin
 
PDF
Getting started with Machine Learning
Gaurav Bhalotia
 
PPTX
Machine Learning, Deep Learning and Data Analysis Introduction
Te-Yen Liu
 
PPTX
Deep Learning Tutorial
Ligeng Zhu
 
PDF
Introduction to machine learning and applications (1)
Manjunath Sindagi
 
PPTX
Computer Design Concepts for Machine Learning
Facultad de Informática UCM
 
PPS
Brief Tour of Machine Learning
butest
 
DOCX
SVM & MLP on Matlab program
Hussain Ala'a Alkabi
 
PPT
Chapter01.ppt
butest
 
PDF
9.b-CMPS 403-F20-Session 9-Intro to ML II.pdf
AmirMohamedNabilSale
 
PDF
Yufeng Guo | Coding the 7 steps of machine learning | Codemotion Madrid 2018
Codemotion
 
Machine learning in php las vegas
Damien Seguy
 
[update] Introductory Parts of the Book "Dive into Deep Learning"
Young-Min kang
 
Machine Learning with Artificial Neural Networks in Clojure
Stefan Richter
 
Scaling Deep Learning with MXNet
AI Frontiers
 
Practical ML
Antonio Pitasi
 
Automatic Attendace using convolutional neural network Face Recognition
vatsal199567
 
Intro to TensorFlow and PyTorch Workshop at Tubular Labs
Kendall
 
Week 1.pdf
AnjaliJain608033
 
Main principles of Data Science and Machine Learning
Nikolay Karelin
 
Getting started with Machine Learning
Gaurav Bhalotia
 
Machine Learning, Deep Learning and Data Analysis Introduction
Te-Yen Liu
 
Deep Learning Tutorial
Ligeng Zhu
 
Introduction to machine learning and applications (1)
Manjunath Sindagi
 
Computer Design Concepts for Machine Learning
Facultad de Informática UCM
 
Brief Tour of Machine Learning
butest
 
SVM & MLP on Matlab program
Hussain Ala'a Alkabi
 
Chapter01.ppt
butest
 
9.b-CMPS 403-F20-Session 9-Intro to ML II.pdf
AmirMohamedNabilSale
 
Yufeng Guo | Coding the 7 steps of machine learning | Codemotion Madrid 2018
Codemotion
 
Ad

More from Damien Seguy (20)

PDF
Strong typing @ php leeds
Damien Seguy
 
PPTX
Strong typing : adoption, adaptation and organisation
Damien Seguy
 
PDF
Qui a laissé son mot de passe dans le code
Damien Seguy
 
PDF
Analyse statique et applications
Damien Seguy
 
PDF
Top 10 pieges php afup limoges
Damien Seguy
 
PDF
Top 10 php classic traps DPC 2020
Damien Seguy
 
PDF
Meilleur du typage fort (AFUP Day, 2020)
Damien Seguy
 
PDF
Top 10 php classic traps confoo
Damien Seguy
 
PDF
Tout pour se préparer à PHP 7.4
Damien Seguy
 
PDF
Top 10 php classic traps php serbia
Damien Seguy
 
PDF
Top 10 php classic traps
Damien Seguy
 
PDF
Top 10 chausse trappes
Damien Seguy
 
PDF
Code review workshop
Damien Seguy
 
PDF
Understanding static analysis php amsterdam 2018
Damien Seguy
 
PDF
Review unknown code with static analysis php ce 2018
Damien Seguy
 
PDF
Everything new with PHP 7.3
Damien Seguy
 
PDF
Php 7.3 et ses RFC (AFUP Toulouse)
Damien Seguy
 
PDF
Tout sur PHP 7.3 et ses RFC
Damien Seguy
 
PDF
Review unknown code with static analysis php ipc 2018
Damien Seguy
 
PDF
Code review for busy people
Damien Seguy
 
Strong typing @ php leeds
Damien Seguy
 
Strong typing : adoption, adaptation and organisation
Damien Seguy
 
Qui a laissé son mot de passe dans le code
Damien Seguy
 
Analyse statique et applications
Damien Seguy
 
Top 10 pieges php afup limoges
Damien Seguy
 
Top 10 php classic traps DPC 2020
Damien Seguy
 
Meilleur du typage fort (AFUP Day, 2020)
Damien Seguy
 
Top 10 php classic traps confoo
Damien Seguy
 
Tout pour se préparer à PHP 7.4
Damien Seguy
 
Top 10 php classic traps php serbia
Damien Seguy
 
Top 10 php classic traps
Damien Seguy
 
Top 10 chausse trappes
Damien Seguy
 
Code review workshop
Damien Seguy
 
Understanding static analysis php amsterdam 2018
Damien Seguy
 
Review unknown code with static analysis php ce 2018
Damien Seguy
 
Everything new with PHP 7.3
Damien Seguy
 
Php 7.3 et ses RFC (AFUP Toulouse)
Damien Seguy
 
Tout sur PHP 7.3 et ses RFC
Damien Seguy
 
Review unknown code with static analysis php ipc 2018
Damien Seguy
 
Code review for busy people
Damien Seguy
 

Recently uploaded (20)

PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PDF
Python basic programing language for automation
DanialHabibi2
 
PDF
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PDF
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PDF
July Patch Tuesday
Ivanti
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PDF
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 
PDF
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
PPTX
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
PDF
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
Python basic programing language for automation
DanialHabibi2
 
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
July Patch Tuesday
Ivanti
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 

Machine learning in php php con poland