Introduction
SoftBank Corp. (“SoftBank”) has integrated Cisco Foundation AI’s Foundation-sec-1.1-8B-Instruct model into their Security Operations Center (SOC) triaging workflow, enabling full automation of suspicious software detection, dynamic policy verification, and corresponding actions. The Foundation-sec-1.1-8B-Instruct model plays a crucial role by categorizing software names into 17 different categories for policy enforcement, effectively enabling end-to-end workflow automation.
In this blog, we explain how the Foundation-sec-1.1-8B-Instruct model fits into SoftBank’s triaging process and how we achieve high accuracy in software categorization.
The Automated Triaging Workflow
Figure 1: Suspicious file detection workflow in SoftBank.
Suspicious software detection is a common use case in security operations. At SoftBank, software categories are defined based on capabilities and security risks. Once a category is determined, and depending on the network where the software is detected, relevant company policies are applied and appropriate actions are taken.
Previously, file categorization, policy verification, and response actions were performed manually by analysts, which is a time-consuming and labor-intensive process. To allow analysts to focus on higher-priority investigations, SoftBank decided to automate the workflow using automation frameworks and large language models (LLMs).
Automation frameworks streamlined policy checks and response actions. However, automating software categorization was challenging due to the vast number of possible software, overlapping functionalities, and organization-specific categorization rules. As a result, categorization became the final piece needed for this automated assistance to human analysts.
Foundation AI Model for Categorization
To solve the categorization challenge, SoftBank chose LLMs for their general knowledge of software and ability to follow instructions. Due to data privacy requirements, cloud-based LLMs were not an option. Foundation-sec-1.1-8B-Instruct stood out as an open-source model that can be deployed on-premises. Its compact size reduces operational costs, and its security-specific pre-training allows it to outperform similar general-purpose open-source models in security tasks.
For categorization, the model receives a software name as input and selects one of 17 output categories. The main challenge lies in overlapping category definitions and software with multiple functionalities. Additionally, to ensure smooth workflow integration, the model’s output must be strictly formatted as the category name only.
Output Optimization
To address these challenges, the Cisco Foundation AI team collaborated closely with SoftBank on prompt tuning to ensure stable and accurate model outputs.
Optimization 1: Output Formatting
First, few-shot examples were appended at the end of the prompt to guide the model on correct output formatting. The last part of the prompt was formatted as following:
# Examples
Input: SOFTWARE_1
Output: CAT_001
Input: SOFTWARE_2
Output: CAT_005
Input: SOFTWARE_3
Output: CAT_011
# Now it is your turn:
Input: <INPUT NAME>
Output:
These few-shot examples, combined with system prompts that define output rules and include validation, ensure the model consistently outputs a valid category for each input. We also integrated output validation into the workflow; if the model fails to return a valid category name, the inference process re-runs until a correct output is obtained. This combination of prompt engineering and output validation allows us to achieve stable, well-formatted categorization results.
Optimization 2: Category Description
Next, we incorporated categorization rules—based on analyst logic and historical data—into the prompt to clarify the scope of each category. However, some overlap naturally occurs between categories.
For example, “File Transfer,” “File Sharing,” and “Forbidden Internet Service” are governed by different rules. While cloud storage software like OneDrive should be categorized as “Forbidden Internet Service,” the model often misclassifies it as “File Sharing” due to its sharing functionality. Similar ambiguities exist between pairs like “Packet Capture & Vulnerability Scanning” and “Server Service & File Transfer.” To improve model performance, we identified these common misclassifications and added descriptive guidance to help the model distinguish between them.
For instance, we added the following reasoning logic for the “Packet Capture” and “Vulnerability Scanning” categories:
Confirmation for Ambiguous Cases (Evaluate in order):
1. Does it output vulnerability reports or CVE information? → Yes: Vulnerability Scanning / No: Proceed to next.
2. Is the primary purpose packet interception, recording, or visualization? → Yes: Packet Capture / No: Proceed to next.
3. Is the primary purpose network monitoring or bandwidth monitoring? → Yes: Packet Capture / No: Proceed to next.
4. Is the primary purpose discovering or diagnosing vulnerabilities in the target? → Yes: Vulnerability Scanning / No: CAT_001.
Throughout this process, we kept the prompt concise to avoid confusion and ensure reliable categorization.
Optimization 3: Preprocessing and Postprocessing
The 17th category, “Undetermined,” is designed to capture software that does not fit into the other 16 categories. During testing, we observed that the model often force-assigned a category to software that should have been marked as “Undetermined.” In production, these misclassifications result in false positives, as the “Undetermined” category does not trigger any specific rules.
While prompt tuning reduced many of these instances, some organization-specific cases remained where potentially sensitive files were incorrectly flagged as benign. To mitigate this, we implemented whitelisting as a preprocessing step and added postprocessing to further filter out false positives.
Categorization Results
Testing was conducted on a curated dataset of historical detections and human-annotated categories. To prevent overfitting, we expanded the dataset with common software names and manually verified ground-truth labels.
Using these 17 categories, the Foundation-sec-1.1-8B-Instruct model achieved 80.75% accuracy, which is comparable to the performance of cloud-based LLMs on the same task. When combined with our rule-based system and the new pre/post-processing steps, the overall workflow accuracy reached 90%, making it highly effective for daily operations.
Conclusions
SoftBank’s adoption of the Cisco Foundation AI model demonstrates that, while LLMs are often used for summarization and analysis, they can also effectively handle categorization tasks without resource-intensive retraining or fine-tuning. This approach shows that by carefully identifying which workflow tasks truly require generative AI, organizations can reduce computational demands and improve reliability while achieving automation goals—compared to relying entirely on LLM-based workflows.
Looking ahead, SoftBank plans to extend this approach beyond suspicious file detection to automate intrusion detection system (IDS) responses as well. Given that IDS automation will involve handling sensitive network and security-related information, the Foundation AI model’s data privacy and security features make it particularly well-suited for these future security operations workflows.
Customer Testimonials
“Through our joint PoV with Cisco, we confirmed that the Cisco Foundation AI model can help streamline an important step in our SOC triaging workflow: software categorization. Its on-premises deployment model meets our data privacy requirements, and the PoV demonstrated practical accuracy, including over 85% accuracy at the workflow-action level, with further improvement expected through preprocessing and policy-based controls. This approach can help our analysts reduce manual triage effort and allocate more attention to higher-priority security investigations.”
—Hajime Uematsu, Director, Security Verification Department, SoftBank Corp.
