Eino: Document Loader Guide
Introduction
Document Loader loads documents from various sources (e.g., web URLs, local files) and converts them into a standard document format. It’s useful for scenarios such as:
- Loading web content from URLs
- Reading local documents like PDF or Word
Component Definition
Interface
Code:
eino/components/document/interface.go
type Loader interface {
Load(ctx context.Context, src Source, opts ...LoaderOption) ([]*schema.Document, error)
}
Load
- Purpose: load documents from a given source
- Params:
ctx: request context and callback managersrc: document source including URIopts: loader options
- Returns:
[]*schema.Document: loaded documentserror
Source
type Source struct {
URI string
}
Defines source information:
URI: a web URL or local file path
Document
type Document struct {
ID string
Content string
MetaData map[string]any
}
Standard document fields:
ID: unique identifierContent: document contentMetaData: source info, embeddings, scores, sub-indexes, and custom metadata
Options
LoaderOption represents loader options. There is no global common option; each implementation defines its own and wraps via WrapLoaderImplSpecificOptFn.
Usage
Standalone
Code:
eino-ext/components/document/loader/file/examples/fileloader
import (
"github.com/cloudwego/eino/components/document"
"github.com/cloudwego/eino-ext/components/document/loader/file"
)
loader, _ := file.NewFileLoader(ctx, &file.FileLoaderConfig{ UseNameAsID: true })
filePath := "../../testdata/test.md"
docs, _ := loader.Load(ctx, document.Source{ URI: filePath })
log.Printf("doc content: %v", docs[0].Content)
In Orchestration
// Chain
chain := compose.NewChain[string, []*schema.Document]()
chain.AppendLoader(loader)
runnable, _ := chain.Compile()
result, _ := runnable.Invoke(ctx, input)
// Graph
graph := compose.NewGraph[string, []*schema.Document]()
graph.AddLoaderNode("loader_node", loader)
Options and Callbacks
Callback Example
Code:
eino-ext/components/document/loader/file/examples/fileloader
import (
"github.com/cloudwego/eino/callbacks"
"github.com/cloudwego/eino/components/document"
"github.com/cloudwego/eino/compose"
"github.com/cloudwego/eino/schema"
callbacksHelper "github.com/cloudwego/eino/utils/callbacks"
"github.com/cloudwego/eino-ext/components/document/loader/file"
)
handler := &callbacksHelper.LoaderCallbackHandler{
OnStart: func(ctx context.Context, info *callbacks.RunInfo, input *document.LoaderCallbackInput) context.Context {
log.Printf("start loading docs...: %s\n", input.Source.URI)
return ctx
},
OnEnd: func(ctx context.Context, info *callbacks.RunInfo, output *document.LoaderCallbackOutput) context.Context {
log.Printf("complete loading docs, total: %d\n", len(output.Docs))
return ctx
},
}
helper := callbacksHelper.NewHandlerHelper().Loader(handler).Handler()
chain := compose.NewChain[document.Source, []*schema.Document]()
chain.AppendLoader(loader)
run, _ := chain.Compile(ctx)
outDocs, _ := run.Invoke(ctx, document.Source{ URI: filePath }, compose.WithCallbacks(helper))
log.Printf("doc content: %v", outDocs[0].Content)
Existing Implementations
- File Loader — local filesystem: Loader — local file
- Web Loader — HTTP/HTTPS: Loader — web url
- S3 Loader — S3-compatible storage: Loader — Amazon S3
Implementation Notes
Option Mechanism
type MyLoaderOptions struct {
Timeout time.Duration
RetryCount int
}
func WithTimeout(timeout time.Duration) document.LoaderOption {
return document.WrapLoaderImplSpecificOptFn(func(o *MyLoaderOptions) { o.Timeout = timeout })
}
func WithRetryCount(count int) document.LoaderOption {
return document.WrapLoaderImplSpecificOptFn(func(o *MyLoaderOptions) { o.RetryCount = count })
}
Callback Structures
Code:
eino/components/document/callback_extra_loader.go
type LoaderCallbackInput struct {
Source Source
Extra map[string]any
}
type LoaderCallbackOutput struct {
Source Source
Docs []*schema.Document
Extra map[string]any
}
Full Implementation Example
import (
"github.com/cloudwego/eino/callbacks"
"github.com/cloudwego/eino/components/document"
"github.com/cloudwego/eino/schema"
)
func NewCustomLoader(config *Config) (*CustomLoader, error) {
return &CustomLoader{
timeout: config.DefaultTimeout,
retryCount: config.DefaultRetryCount,
}, nil
}
type CustomLoader struct {
timeout time.Duration
retryCount int
}
type Config struct {
DefaultTimeout time.Duration
DefaultRetryCount int
}
func (l *CustomLoader) Load(ctx context.Context, src document.Source, opts ...document.LoaderOption) ([]*schema.Document, error) {
options := &customLoaderOptions{
Timeout: l.timeout,
RetryCount: l.retryCount,
}
options = document.GetLoaderImplSpecificOptions(options, opts...)
var err error
defer func() {
if err != nil {
callbacks.OnError(ctx, err)
}
}()
ctx = callbacks.OnStart(ctx, &document.LoaderCallbackInput{
Source: src,
})
docs, err := l.doLoad(ctx, src, options)
if err != nil {
return nil, err
}
ctx = callbacks.OnEnd(ctx, &document.LoaderCallbackOutput{
Source: src,
Docs: docs,
})
return docs, nil
}
func (l *CustomLoader) doLoad(ctx context.Context, src document.Source, opts *customLoaderOptions) ([]*schema.Document, error) {
return []*schema.Document{{
Content: "Hello World",
}}, nil
}
Notes
MetaDatais critical for storing document source and other metadata- Return meaningful errors on load failures for easier debugging
Other References
Last modified
December 12, 2025
: chore: update websocket docs (#1479) (967538e)