Files
crpt-aggregation/README.md
2026-05-08 14:59:56 +03:00

135 lines
4.9 KiB
Markdown

# XML Generator CLI Tool
This Python CLI tool processes semicolon-separated CSV files to generate XML files with `<pack_content>` sections.
## Features
- Reads semicolon-separated CSV files (Excel format)
- Handles BOM (Byte Order Mark) characters automatically
- Groups individual CIS codes by SET CIS values
- Generates XML using a template file
- Supports column name override for CIS codes
- Provides dry-run mode for testing
- Proper XML escaping for special characters
- **NEW:** Set composition validation using dictionary rules
- **NEW:** Auto-generation of document ID and operation time
- **NEW:** Custom document parameters (ID, number, operation time)
- **NEW:** Validation-only mode for checking data integrity
- **NEW:** Comprehensive error and warning reporting
## Installation
```bash
pip install -r requirements.txt
```
## Usage
### Basic Usage
```bash
python xml_generator.py input.csv template.xml -o output.xml
```
### Options
- `--output/-o FILE`: Output XML file path (prints to stdout if not specified)
- `--cis-column/-c TEXT`: Column name for CIS codes (default: "Код")
- `--encoding/-e TEXT`: CSV file encoding (default: utf-8)
- `--dry-run`: Show what would be processed without generating output
- `--set-dict FILE`: Path to set dictionary CSV file for validation
- `--document-id TEXT`: Document ID to use in XML (auto-generated if not provided)
- `--document-number TEXT`: Document number to use in XML
- `--operation-time TEXT`: Operation time in ISO format (auto-generated if not provided)
- `--validate-only`: Only validate composition without generating XML
### Examples
1. **Basic XML generation**:
```bash
python xml_generator.py set_distributed.csv sets_creation.xml -o output.xml
```
2. **With validation and custom parameters**:
```bash
python xml_generator.py --set-dict set_dict.csv --document-id "DOC_123" --document-number "NUM_456" --operation-time "2024-01-15T10:30:00+03:00" set_distributed.csv sets_creation.xml -o output.xml
```
3. **Validation only (no XML generation)**:
```bash
python xml_generator.py --validate-only --set-dict set_dict.csv set_distributed.csv sets_creation.xml
```
4. **Dry run with validation**:
```bash
python xml_generator.py --dry-run --set-dict set_dict.csv set_distributed.csv sets_creation.xml
```
5. **Override CIS column**:
```bash
python xml_generator.py --cis-column "MyColumn" input.csv template.xml -o output.xml
```
6. **Auto-generated parameters**:
```bash
python xml_generator.py --document-number "MY_DOC_001" set_distributed.csv sets_creation.xml -o output.xml
# Document ID and operation time will be auto-generated
```
## CSV File Format
### Distributed CSV File (set_distributed.csv)
The main CSV file should be semicolon-separated with columns:
- `SET CIS`: Pack codes that become `<pack_code>` elements
- `Код` (or specified column): Individual CIS codes that become `<cis>` elements
- `SET GTIN`: SET GTIN codes for validation (optional)
- `GTIN`: Individual GTIN codes for validation (optional)
### Set Dictionary CSV File (set_dict.csv)
For validation, provide a semicolon-separated dictionary file with:
- `GTIN SET`: SET GTIN codes (matches `SET GTIN` in distributed file)
- `GTIN ITEM`: Individual GTIN codes (matches `GTIN` in distributed file)
- `COUNT`: Expected count of each GTIN in the set
- `SET NAME`: Descriptive name of the set (optional)
## Validation Features
When using `--set-dict`, the tool validates that each SET CIS contains the correct composition:
- **Missing GTINs**: Checks if required GTIN items are missing from sets
- **Wrong counts**: Verifies that GTIN item counts match expectations
- **Extra GTINs**: Identifies unexpected GTIN items in sets
- **Validation modes**:
- `--validate-only`: Only validate without generating XML
- `--dry-run`: Show validation results and processing preview
- Normal mode: Validate and generate XML (errors prevent generation)
### Validation Output
- ✅ **OK**: Set composition is valid
- ⚠️ **WARNING**: Set has unexpected items but required items are present
- ❌ **ERROR**: Set is missing required items or has wrong counts
## XML Template
The tool uses an XML template file and replaces existing `<pack_content>` sections with generated data. The generated sections are inserted before the `</Document>` tag.
### Template Parameters
The tool can automatically replace these parameters in your template:
- `document_id="..."` - Replaced with `--document-id` value
- `document_number="..."` - Replaced with `--document-number` value
- `operation_date_time="..."` - Replaced with `--operation-time` value
## Example Output
```xml
<pack_content>
<pack_code><![CDATA[0104639970975627215!rYq<zP+sPBY]]></pack_code>
<cis><![CDATA[0104639970975245215!lv).%ApB'P>]]></cis>
<cis><![CDATA[0104639970975306215"ZqzWoJbOFB4]]></cis>
<cis><![CDATA[0104639970975276215%fTQ*tVRESUU]]></cis>
</pack_content>
```
## Requirements
- Python 3.6+
- click>=8.0.0