2025-03-03-Philly-AAAI-GoodData-Build Secure RAG Apps With Open LLM
https://aaai.org/conference/aaai/aaai-25/workshop-list/#ws14
2. Tim Spann
paasdev.bsky.social
@PaasDev // Blog: datainmotion.dev
Senior Solutions Engineer, Snowflake
NY/NJ/Philly - Cloud Data + AI Meetups
ex-Zilliz, ex-Pivotal, ex-Cloudera, ex-HPE,
ex-StreamNative, ex-EY, ex-Hortonworks.
https://medium.com/@tspann
https://github.com/tspannhw
3. This week in Apache NiFi, Apache Polaris,
Apache Flink, Apache Kafka, ML, AI,
Streamlit, Jupyter, Apache Iceberg, Python,
Java, LLM, GenAI, Snowflake, Unstructured
Data and Open Source friends.
https://bit.ly/32dAJft
AI + Streaming Weekly by Tim Spann
7. Unstructured Data
● Lots of formats
● Text, Documents, PDF
● Images, Videos, Audio
● Email
● Variants
Unstructured
8. ● Open Data like Open AQ - Air
Quality Data
● Location, Time,Sensors
● Apache Avro, Parquet, Orc
● JSON and XML
● Hierarchical Data
● Logs
● Key-Value
Semi-Structured Data
https://docs.snowflake.com/en/sql-refe
rence/data-types-semistructured
Semi-structured
14. • Guaranteed delivery
• Data buffering
- Backpressure
- Pressure release
• Prioritized queuing
• Flow specific QoS
- Latency vs. throughput
- Loss tolerance
• Data provenance
• Supports push and pull
models
• Hundreds of processors
• Visual command and
control
• Hundreds of sources
• Flow templates
• Pluggable/multi-role
security
• Designed for extension
• Clustering
• Version Control
Apache NiFi for Data Ingest, Movement and Routing
15. • Moving Binary, Unstructured, Image
and Tabular Data
• Enrichment
• Universal Visual Processor
• Simple Event Processor
• Routing
• Feeding data to Central Messaging
• Support for modern protocols
• Kafka Protocol Source/Sink
• Pulsar Protocol Source/Sink
The Power of Apache NiFi
16. APACHE NIFI 2.0 FEATURES
Major Updates:
● Python Integration
● Parameterization
● JDK 21+
● Provenance / Data Lineage
● Rules Engine for Development Assistance
● Additional Azure Processors
● Integration with Zendesk, Slack,
● Database Tables as Schemas
● Amazon Glue Schema Registry
● OpenTelemetry Support
Real-Time Integration and AI
20. RECORD-ORIENTED DATA WITH NIFI
• Record Readers - Avro, CSV, Grok, IPFIX, JSAN1, JSON, Parquet,
Scripted, Syslog5424, Syslog, WindowsEvent, XML
• Record Writers - Avro, CSV, FreeFromText, Json, Parquet, Scripted, XML
• Record Reader and Writer support referencing a schema registry for
retrieving schemas when necessary.
• Enable processors that accept any data format without having to worry about
the parsing and serialization logic.
• Allows us to keep FlowFiles larger, each consisting of multiple records, which
results in far better performance.
22. CaptionImage
● Python 3.10+
● Hugging Face
● Salesforce/blip-image-captioning-large
● Generate Captions for Images
● Adds captions to FlowFile Attributes
● Does not require download or copies of
your images
https://github.com/tspannhw/FLaNK-python-processors
23. RESNetImageClassification
● Python 3.10+
● Hugging Face
● Transformers
● Pytorch
● Datasets
● microsoft/resnet-50
● Adds classification label to FlowFile
Attributes
● Does not require download or copies of
your images
https://github.com/tspannhw/FLaNK-python-processors
24. Address To Lat/Long
● Python 3.10+
● geopy Library
● Nominatim
● OpenStreetMaps (OSM)
● openstreetmap.org/copyright
● Returns as attributes and JSON file
● Works with partial addresses
● Categorizes location
● Bounding Box
https://github.com/tspannhw/FLaNKAI-Boston
27. Open Source Edition
● Apache NiFi in
Docker
● Runs in Docker
● Try new features
quickly
● Develop applications
locally
● Docker NiFi
○ docker run --name nifi -p 8443:8443 -d -e
SINGLE_USER_CREDENTIALS_USERNAME=admin -e
SINGLE_USER_CREDENTIALS_PASSWORD=ctsBtRBKHRAx69EqUghv
vgEvjnaLjFEB apache/nifi:latest
● Licensed under the ASF License
● Unsupported
https://hub.docker.com/r/apache/nifi
28. Free Data and AI Event
● King of Prussia
● Princeton
● New York
● Virtual
https://www.snowflake.com/events/data-for-breakfast/