SQL Server Integration Services

From Wikipedia, the free encyclopedia

Jump to: navigation, search

SQL Server Integration Services (SSIS) is a component of Microsoft SQL Server 2005 and 2008. It replaces Data Transformation Services, which has been a feature of SQL Server since Version 7.0. Unlike DTS, which was included in all versions, SSIS is only available in the "Standard" and "Enterprise" editions.

Integration Services provides a platform to build data integration and workflow applications. The primary use for SSIS is data warehousing, as the product features a fast and flexible tool for data extraction, transformation, and loading (ETL). The tool may also be used to automate maintenance of SQL Server databases, update multidimensional cube data, and perform other functions.

Contents

[edit] Features

SSIS features a wizard that lets users create packages which moves data from a single data source to a destination with no transformations. This tool is called the Import/Export Wizard. The Wizard is appropriate for use to quickly move data into or out of SQL Server from or to a variety of sources, including text files and other SQL Server instances.

Developers tasked with creating or maintaining SSIS packages use a visual development tool based on Microsoft Visual Studio called the SQL Server Business Intelligence Development Studio (BIDS). It allows users to edit SSIS packages using a drag-and-drop user interface. A scripting environment in which to write programming code is also available in the tool. The package holds a variety of elements that define the workflow for the package. Upon package execution, the tool provides color-coded, real-time monitoring.

Connections 
A connection includes the information necessary to connect to a particular data source. Tasks can reference the connection by its name, allowing the details of the connection to be changed or configured at runtime.
Tasks 
A task is an atomic work unit that performs some action. There are a couple of dozen tasks that ship in the box, ranging from the file system task (which can copy or move files) to the data transformation task. The data transformation task actually copies data; it implements the ETL features of the product.
Precedence Constraints 
Tasks are linked by precedence constraints. The precedence constraint preceding a particular task must be met before that task executes. The runtime supports executing tasks in parallel if their precedence constraints so allow. Constraints may otherwise allow different paths of execution depending on the success or failure of other tasks. Together with the tasks, precedence constraints comprise the workflow of the package.
Event Handlers 
A workflow can be designed for a number of events in the different scopes where they might occur. In this way, tasks may be executed in response to happenings within the package —such as cleaning up after errors.
Variables 
Tasks may reference variables to store results, make decisions, or affect their configuration.

A package may be saved to a file or to a store with a hierarchical namespace within a SQL Server instance. In either case, the package content is persisted in XML.

Once completed, the designer also allows the user to start the package's execution. Once started, the package may be readily debugged or monitored.

[edit] Features of the data flow task

The Data Flow task, arguably the most important task in the product, features a different design surface than that of the workflow. Data flows are edited within the same design tool. The user can draw data sources, transforms, and data destinations connecting them together in order to achieve the transfer and transforms they desire.

Data sources can connect to any number of source types using OLEDB or ADO. They generally execute a SQL statement (including a stored procedure) to retrieve rows, though there are exceptions. Most notably, a flat file data source allows reading from text files, and an XML adapter can obtain source data from XML files. Similarly, data destinations write data to their target by executing a statement on its connection, or writing to a file.

Various transforms exist in the product, including a sort, aggregation, and lookup. A derived column transform evaluates an expression to compute the value for a new column. A script transform is available which allows more complicated transforms to be written in VB.NET or C# (C# support appeared in SQL 2008).

SSIS can have more than one transform in its operation. Transforms, data sources, and data destinations can have multiple inputs and outputs. Most have error outputs so that rows which would cause the transform to fail can be optionally redirected for further or alternate processing.

Any number of sources or destinations are supported in a data flow.

Transformations supported

SSIS provides the following built-in transformations:

  • Conditional Split
  • Multicast
  • Union-All, Merge, and Merge Join
  • Sort
  • Fuzzy Grouping
  • Lookup and Fuzzy Lookup
  • Percentage Sampling and Row Sampling
  • Copy/Map, Data Conversion, and Derived Column
  • Aggregation
  • Data Mining Model Training, Data Mining Query, Partition Processing, and Dimension Processing
  • Pivot and UnPivot
  • Slowly Changing Dimension
  • Script Component

[edit] Other included tools

Aside from the Import/Export Wizard and the designer, the product includes a few other notable tools.

DTEXEC executes a package from the command line wherever it may be stored. Before running the package, the tool may be instructed to apply configuration information, which will allow the same package to be reused with slightly different parameters, including different connection strings for its endpoints.


DTUTIL provides the ability to manage packages, again from the command prompt. The tool can copy or move a package from a file into the server store, or back out again. Among a few other sundry functions, it can be used to delete, rename, encrypt, or decrypt packages.

[edit] Extensibility and Programmability

Users may write code to define their own connection objects, log providers, transforms, and tasks.

SSIS features a programmable object model that allows developers to write their own hosts for package execution. Such a host can respond to events, start and stop packages, and so on. The object model also allows developers to create, store, and load packages, as well as create, destroy, and modify any of the contained objects.

It can be used on all versions of SQL Server 2005 except Express and Workgroup.

[edit] See Also


[edit] External links


Personal tools
Languages