Fitter - new way for collect information from the API’s/Websites
Fitter CLI - small cli command which provide result from Fitter for test/debug/home usage
Fitter Lib - library which provide functional of fitter CLI as a library
go get github.com/PxyUp/fitter
package main
import (
"fmt"
"github.com/PxyUp/fitter/lib"
"github.com/PxyUp/fitter/pkg/config"
"log"
"net/http"
)
func main() {
res, err := lib.Parse(&config.Item{
ConnectorConfig: &config.ConnectorConfig{
ResponseType: config.Json,
Url: "https://random-data-api.com/api/appliance/random_appliance",
ServerConfig: &config.ServerConnectorConfig{
Method: http.MethodGet,
},
},
Model: &config.Model{
ObjectConfig: &config.ObjectConfig{
Fields: map[string]*config.Field{
"my_id": {
BaseField: &config.BaseField{
Type: config.Int,
Path: "id",
},
},
"generated_id": {
BaseField: &config.BaseField{
Generated: &config.GeneratedFieldConfig{
UUID: &config.UUIDGeneratedFieldConfig{},
},
},
},
"generated_array": {
ArrayConfig: &config.ArrayConfig{
RootPath: "@this|@keys",
ItemConfig: &config.ObjectConfig{
Field: &config.BaseField{
Type: config.String,
},
},
},
},
},
},
},
}, nil, nil, nil, nil)
if err != nil {
log.Fatal(err)
}
fmt.Println(res.ToJson())
}
Output:
{
"generated_array": ["id","uid","brand","equipment"],
"my_id": 6000,
"generated_id": "26b08b73-2f2e-444d-bcf2-dac77ac3130e"
}
Download latest version from the release page
or locally:
go run cmd/fitter/main.go --path=./examples/config_api.json
Download latest version from the release page
or locally:
go run cmd/cli/main.go --path=./examples/cli/config_cli.json
--input=\""124"\"
--input=124
--input='{"test": 5}'
./fitter_cli_${VERSION} --path=./examples/cli/config_cli.json --copy=true
Examples:
It is the way how you fetch the data
type ConnectorConfig struct {
ResponseType ParserType `json:"response_type" yaml:"response_type"`
Url string `json:"url" yaml:"url"`
Attempts uint32 `json:"attempts" yaml:"attempts"`
NullOnError bool `yaml:"null_on_error" json:"null_on_error"`
StaticConfig *StaticConnectorConfig `json:"static_config" yaml:"static_config"`
IntSequenceConfig *IntSequenceConnectorConfig `json:"int_sequence_config" yaml:"int_sequence_config"`
ServerConfig *ServerConnectorConfig `json:"server_config" yaml:"server_config"`
BrowserConfig *BrowserConnectorConfig `yaml:"browser_config" json:"browser_config"`
PluginConnectorConfig *PluginConnectorConfig `json:"plugin_connector_config" yaml:"plugin_connector_config"`
ReferenceConfig *ReferenceConnectorConfig `yaml:"reference_config" json:"reference_config"`
FileConfig *FileConnectorConfig `json:"file_config" yaml:"file_config"`
}
https://api.open-meteo.com/v1/forecast?latitude=}&longitude=}&hourly=temperature_2m&forecast_days=1
Config can be one of:
Example:
{
"response_type": "xpath",
"attempts": 3,
"url": "https://openweathermap.org/find?q={PL}",
"browser_config": {
"playwright": {
"timeout": 30,
"wait": 30,
"install": false,
"browser": "Chromium"
}
}
}
Connector can be defined via plugin system. For use that you need apply next flags to Fitter/Cli(location of the plugins):
... --plugins=./examples/plugin
–plugins - looking for all files with “.so” extension in provided folder(subdirs excluded)
type PluginConnectorConfig struct {
Name string `json:"name" yaml:"name"`
Config json.RawMessage `json:"config" yaml:"config"`
}
{
"name": "connector",
"config": {
"name": "Elon"
}
}
Build plugin
go build -buildmode=plugin -gcflags="all=-N -l" -o examples/plugin/connector.so examples/plugin/hardcoder/connector.go
Make sure you export Plugin variable which implements pl.ConnectorPlugin interface
Example for CLI:
https://github.com/PxyUp/fitter/blob/master/examples/cli/config_plugin.json#L5
Plugin example:
package main
import (
"encoding/json"
"fmt"
"github.com/PxyUp/fitter/pkg/config"
"github.com/PxyUp/fitter/pkg/logger"
"github.com/PxyUp/fitter/pkg/builder"
pl "github.com/PxyUp/fitter/pkg/plugins/plugin"
)
var (
_ pl.ConnectorPlugin = &plugin{}
Plugin plugin
)
type plugin struct {
log logger.Logger
Name string `json:"name" yaml:"name"`
}
func (pl *plugin) Get(parsedValue builder.Interfacable, index *uint32, input builder.Interfacable) ([]byte, error) {
return []byte(fmt.Sprintf(`{"name": "%s"}`, pl.Name)), nil
}
func (pl *plugin) SetConfig(cfg *config.PluginConnectorConfig, logger logger.Logger) {
pl.log = logger
if cfg.Config != nil {
err := json.Unmarshal(cfg.Config, pl)
if err != nil {
pl.log.Errorw("cant unmarshal plugin configuration", "error", err.Error())
return
}
}
}
Connector which allow get prefetched data from references
type ReferenceConnectorConfig struct {
Name string `yaml:"name" json:"name"`
}
Example
https://github.com/PxyUp/fitter/blob/master/examples/cli/config_ref.json#L66
Improved version of static connector which generate int sequence as result
type IntSequenceConnectorConfig struct {
Start int `json:"start" yaml:"start"`
End int `json:"end" yaml:"end"`
Step int `json:"step" yaml:"step"`
}
Example
{
"start": 0,
"end": 2
// Generate [0, 1]
}
Connector type which fetch data from provided file
type FileConnectorConfig struct {
Path string `yaml:"path" json:"path"`
UseFormatting bool `yaml:"use_formatting" json:"use_formatting"`
}
Connector type which fetch data from provided string
type StaticConnectorConfig struct {
Value string `json:"value" yaml:"value"`
Raw json.RawMessage `json:"raw" yaml:"raw"`
}
Example:
https://github.com/PxyUp/fitter/blob/master/examples/cli/config_static_connector.json#L5
{
"value": "[1,2,3,4,5]"
}
Connector type which fetch data using golang http.Client(server side request like curl)
type ServerConnectorConfig struct {
Method string `json:"method" yaml:"method"`
Headers map[string]string `yaml:"headers" json:"headers"`
Timeout uint32 `yaml:"timeout" json:"timeout"`
JsonRawBody json.RawMessage `json:"json_raw_body" yaml:"json_raw_body"`
Body string `yaml:"body" json:"body"`
Proxy *ProxyConfig `yaml:"proxy" json:"proxy"`
}
Example:
{
"method": "GET",
"proxy": {
"server": "http://localhost:8080",
"username": "pyx"
}
}
Right now default timeout it is 10 sec.
type ProxyConfig struct {
// Proxy to be used for all requests. HTTP and SOCKS proxies are supported, for example
// `http://myproxy.com:3128` or `socks5://myproxy.com:3128`. Short form `myproxy.com:3128`
// is considered an HTTP proxy.
Server string `json:"server" yaml:"server"`
// Optional username to use if HTTP proxy requires authentication.
Username string `json:"username" yaml:"username"`
// Optional password to use if HTTP proxy requires authentication.
Password string `json:"password" yaml:"password"`
}
{
"server": "http://localhost:8080",
"username": "pyx"
}
Connector type which emulate fetching of data via browser
type BrowserConnectorConfig struct {
Chromium *ChromiumConfig `json:"chromium" yaml:"chromium"`
Docker *DockerConfig `json:"docker" yaml:"docker"`
Playwright *PlaywrightConfig `json:"playwright" yaml:"playwright"`
}
Config can be one of:
Example:
{
"docker": {
"wait": 10000,
"image": "docker.io/zenika/alpine-chrome:with-node",
"entry_point": "chromium-browser",
"purge": true
}
}
Use locally installed Chromium for fetch the data
type ChromiumConfig struct {
Path string `yaml:"path" json:"path"`
Timeout uint32 `yaml:"timeout" json:"timeout"`
Wait uint32 `yaml:"wait" json:"wait"`
Flags []string `yaml:"flags" json:"flags"`
}
Example:
{
"path": "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome",
"wait": 10000
}
Use Docker for spin up container for fetch data
type DockerConfig struct {
Image string `yaml:"image" json:"image"`
EntryPoint string `json:"entry_point" yaml:"entry_point"`
Timeout uint32 `yaml:"timeout" json:"timeout"`
Wait uint32 `yaml:"wait" json:"wait"`
Flags []string `yaml:"flags" json:"flags"`
Purge bool `json:"purge" yaml:"purge"`
NoPull bool `yaml:"no_pull" json:"no_pull"`
PullTimeout uint32 `yaml:"pull_timeout" json:"pull_timeout"`
}
Docker default image: docker.io/zenika/alpine-chrome
Example:
{
"wait": 10000,
"image": "docker.io/zenika/alpine-chrome:with-node",
"entry_point": "chromium-browser",
"purge": true
}
Run browsers via playwright framework
type PlaywrightConfig struct {
Browser PlaywrightBrowser `json:"browser" yaml:"browser"`
Install bool `yaml:"install" json:"install"`
Timeout uint32 `yaml:"timeout" json:"timeout"`
Wait uint32 `yaml:"wait" json:"wait"`
TypeOfWait *playwright.WaitUntilState `json:"type_of_wait" yaml:"type_of_wait"`
PreRunScript string `json:"pre_run_script" yaml:"pre_run_script"`
Stealth bool `json:"stealth" yaml:"stealth"`
Proxy *ProxyConfig `yaml:"proxy" json:"proxy"`
}
Example
{
"timeout": 30,
"wait": 30,
"install": false,
"browser": "Chromium"
}
With model we define result of the scrapping
type Model struct {
ObjectConfig *ObjectConfig `yaml:"object_config" json:"object_config"`
ArrayConfig *ArrayConfig `json:"array_config" yaml:"array_config"`
BaseField *BaseField `json:"base_field" yaml:"base_field"`
IsArray bool `json:"is_array" yaml:"is_array"`
}
Config can be one of:
Example:
{
"object_config": {}
}
Configuration of the object and fields
type ObjectConfig struct {
Fields map[string]*Field `json:"fields" yaml:"fields"`
Field *BaseField `json:"field" yaml:"field"`
ArrayConfig *ArrayConfig `json:"array_config" yaml:"array_config"`
}
Config can be one of:
Example:
{
"fields": {
"title": {
"base_field": {
"type": "string",
"path": "type"
}
}
}
}
Configuration of the array and fields
type ArrayConfig struct {
RootPath string `json:"root_path" yaml:"root_path"`
Reverse bool `yaml:"reverse" json:"reverse"`
ItemConfig *ObjectConfig `json:"item_config" yaml:"item_config"`
LengthLimit uint32 `json:"length_limit" yaml:"length_limit"`
StaticConfig *StaticArrayConfig `json:"static_array" yaml:"static_array"`
}
Config can be one of:
Example:
{
"root_path": "#content dt.quote > a",
"item_config": {
"field": {
"type": "string"
}
}
}
Common of the field
type Field struct {
BaseField *BaseField `json:"base_field" yaml:"base_field"`
ObjectConfig *ObjectConfig `json:"object_config" yaml:"object_config"`
ArrayConfig *ArrayConfig `json:"array_config" yaml:"array_config"`
FirstOf []*Field `json:"first_of" yaml:"first_of"`
}
Config can be one of:
Example:
{
"base_field": {
"type": "string",
"path": "div.current-temp span.heading"
}
}
In case we want get some static information or generate new one
type BaseField struct {
Type FieldType `yaml:"type" json:"type"`
Path string `yaml:"path" json:"path"`
HTMLAttribute string `json:"html_attribute" yaml:"html_attribute"`
Generated *GeneratedFieldConfig `yaml:"generated" json:"generated"`
FirstOf []*BaseField `json:"first_of" yaml:"first_of"`
}
Important: by default “string” type trimmed and all special chars is replaced, if you need plain string use “raw_string”
Config can be one of or empty:
Examples
{
"generated": {
"uuid": {}
}
}
{
"type": "string",
"path": "text()"
}
Provide functionality of generating field on the flight
type GeneratedFieldConfig struct {
UUID *UUIDGeneratedFieldConfig `yaml:"uuid" json:"uuid"`
Static *StaticGeneratedFieldConfig `yaml:"static" json:"static"`
Formatted *FormattedFieldConfig `json:"formatted" yaml:"formatted"`
Plugin *PluginFieldConfig `yaml:"plugin" json:"plugin"`
Calculated *CalculatedConfig `yaml:"calculated" json:"calculated"`
File *FileFieldConfig `yaml:"file" json:"file"`
Model *ModelField `yaml:"model" json:"model"`
FileStorageField *FileStorageField `json:"file_storage" yaml:"file_storage"`
}
Config can be one of:
Examples:
{
"uuid": {}
}
https://github.com/PxyUp/fitter/blob/master/examples/cli/config_cli.json#L58
{
"model": {
"type": "array",
"model": {
"array_config": {
"root_path": "#content dt.quote > a",
"item_config": {
"field": {
"type": "string"
}
}
}
},
"connector_config": {
"response_type": "HTML",
"attempts": 3,
"browser_config": {
"url": "http://www.quotationspage.com/random.php",
"chromium": {
"path": "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome",
"wait": 10000
}
}
}
}
}
Generate random UUID V4 on the flight, can be used for generate uniq id
type UUIDGeneratedFieldConfig struct {
Regexp string `yaml:"regexp" json:"regexp"`
}
Generate static field
type StaticGeneratedFieldConfig struct {
Type FieldType `yaml:"type" json:"type"`
Value string `json:"value" yaml:"value"`
Raw json.RawMessage `json:"raw" yaml:"raw"`
}
Example
{
"type": "int",
"value": "65"
}
{
"type": "array",
"value": "[65,45]"
}
{
"type": "array",
"raw": [65,45]
}
Generate formatted field which will pass value from parent base field
type FormattedFieldConfig struct {
Template string `yaml:"template" json:"template"`
}
Example: https://github.com/PxyUp/fitter/blob/master/examples/cli/config_cli.json#L98
{
"template": "https://news.ycombinator.com/item?id={PL}"
}
Field can be used for store field result as local file
type FileStorageField struct {
Content string `json:"content" yaml:"content"`
Raw json.RawMessage `yaml:"raw" yaml:"raw"`
FileName string `json:"file_name" yaml:"file_name"`
Path string `json:"path" yaml:"path"`
Append bool `json:"append" yaml:"append"`
}
{
"content": "}, }\n",
"append": true,
"file_name": "}.csv",
"path": "/Users/pxyup/fitter/examples/cli/test/csv"
}
Field can be used for download file from server locally
type FileFieldConfig struct {
Config *ServerConnectorConfig `yaml:"config" json:"config"`
Url string `yaml:"url" json:"url"`
FileName string `json:"file_name" yaml:"file_name"`
Path string `json:"path" yaml:"path"`
}
Result of the field will be local file path as string
{
"url": "https://images.shcdn.de/resized/w680/p/dekostoff-gobelinstoff-panel-oriental-cat-46-x-46_P19-KP_2.jpg",
"path": "/Users/pxyup/fitter/bin",
"config": {
"method": "GET"
}
}
With propagated URL (inject of the parent value as a string)
{
"url": "https://picsum.photos{PL}",
"path": "/Users/pxyup/fitter/bin",
"config": {
"method": "GET"
}
}
Config example:
https://github.com/PxyUp/fitter/blob/master/examples/cli/config_image.json
https://github.com/PxyUp/fitter/blob/master/examples/cli/config_image_multiple.json
Field can generate different types depends from expression
type CalculatedConfig struct {
Type FieldType `yaml:"type" json:"type"`
Expression string `yaml:"expression" json:"expression"`
}
FNull - alias for builder.Nullvalue
FNil - alias for nil
isNull(value T) - function for check is value is FNull
fRes - it is raw(with proper type) result from the parsing base field
fIndex - it is index in parent array(only if parent was array field)
fResJson - it is JSON string representation of the raw result
fResRaw - result in bytes format
FNewLine - new line separator
{
"type": "bool",
"expression": "fRes > 500"
}
Field can be some external plugin for fitter
type PluginFieldConfig struct {
Name string `json:"name" yaml:"name"`
Config json.RawMessage `json:"config" yaml:"config"`
}
Field type which can be generated on the flight by news model and connector
type ModelField struct {
// Type of parsing
ConnectorConfig *ConnectorConfig `yaml:"connector_config" json:"connector_config"`
// Model of the response
Model *Model `yaml:"model" json:"model"`
Type FieldType `yaml:"type" json:"type"`
Path string `yaml:"path" json:"path"`
Expression string `yaml:"expression" json:"expression"`
}
Examples:
https://github.com/PxyUp/fitter/blob/master/examples/cli/config_cli.json#L60
{
"type": "array",
"model": {
"array_config": {
"root_path": "#content dt.quote > a",
"item_config": {
"field": {
"type": "string"
}
}
}
}
}
https://github.com/PxyUp/fitter/blob/master/examples/cli/config_weather.json#L37
{
"type": "string",
"path": "temp.temp",
"model": {
"object_config": {
"fields": {
"temp": {
"base_field": {
"type": "string",
"path": "//div[@id='forecast_list_ul']//td/b/a/@href",
"generated": {
"model": {
"type": "string",
"model": {
"object_config": {
"fields": {
"temp": {
"base_field": {
"type": "string",
"path": "div.current-temp span.heading"
}
}
}
}
},
"connector_config": {
"response_type": "HTML",
"attempts": 4,
"url": "https://openweathermap.org{PL}",
"browser_config": {
"playwright": {
"timeout": 30,
"wait": 30,
"install": false,
"browser": "FireFox",
"type_of_wait": "networkidle"
}
}
}
}
}
}
}
}
}
},
"connector_config": {
"response_type": "xpath",
"attempts": 3,
"url": "https://openweathermap.org/find?q={PL}",
"browser_config": {
"playwright": {
"timeout": 30,
"wait": 30,
"install": false,
"browser": "Chromium"
}
}
}
}
Provide static(fixed length) array generation
type StaticArrayConfig struct {
Items map[uint32]*Field `yaml:"items" json:"items"`
Length uint32 `yaml:"length" json:"length"`
}
Examples:
{
"0": {
"base_field": {
"type": "string",
"path": "div.current-temp span.heading"
}
}
}
{
"length": 4,
"0": {
"base_field": {
"type": "string",
"path": "div.current-temp span.heading"
}
}
}
{
"length": 4,
"2": {
"base_field": {
"type": "string",
"path": "div.current-temp span.heading"
}
}
}
Examples:
}" + "hello"}}}
Current time is: {PL} with token from TokenRef=} and TokenObjectRef=}
Current time is: {PL} with token from TokenRef=} and TokenObjectRef=}
TokenRef=} and TokenObjectRef=} Object=} {PL} Env=} {INDEX} {HUMAN_INDEX}
Special map which prefetched(before any processing) and can be user for connector or for placeholder
Can be used for:
type Reference struct {
*ModelField
Expire uint32 `yaml:"expire" json:"expire"`
}
For Fitter
type RefMap map[string]*Reference
type Config struct {
// Other Config Fields
Limits *Limits `yaml:"limits" json:"limits"`
References RefMap `json:"references" yaml:"references"`
}
For Fitter Cli
type RefMap map[string]*Reference
type CliItem struct {
// Other Config Fields
Limits *Limits `yaml:"limits" json:"limits"`
References RefMap `json:"references" yaml:"references"`
}
Example
https://github.com/PxyUp/fitter/blob/master/examples/cli/config_ref.json#L2
{
"references": {
"TokenRef": {
"expire": 10,
"connector_config": {
"response_type": "json",
"static_config": {
"value": "\"plain token\""
}
},
"model": {
"base_field": {
"type": "string"
}
}
},
"TokenObjectRef": {
"connector_config": {
"response_type": "json",
"static_config": {
"value": "{\"token\":\"token from object\"}"
}
},
"model": {
"object_config": {
"fields": {
"token": {
"base_field": {
"type": "string",
"path": "token"
}
}
}
}
}
}
}
}
Provide limitation for prevent DDOS, big usage of memory
type Limits struct {
HostRequestLimiter HostRequestLimiter `yaml:"host_request_limiter" json:"host_request_limiter"`
ChromiumInstance uint32 `yaml:"chromium_instance" json:"chromium_instance"`
DockerContainers uint32 `yaml:"docker_containers" json:"docker_containers"`
PlaywrightInstance uint32 `yaml:"playwright_instance" json:"playwright_instance"`
}
https://github.com/PxyUp/fitter/blob/master/examples/cli/config_cli.json#L2
{
"limits": {
"host_request_limiter": {
"hacker-news.firebaseio.com": 5
},
"chromium_instance": 3,
"docker_containers": 3,
"playwright_instance": 3
}
}